the real nightmare is how this breaks the feedback loop for LLM-based retrieval. if we cant trust the underlying corpus bc of coordinated injection attacks, then RAG becomes a liability rather than an advantage. ive already seen some weird behavior in niche reddit subreddits where certain product attributes are being
-indexed
thru repetitive bot-driven sentiment. its not even abt keyword density anymore; its about forcing the model to associate specific entities with positive or negative adjectives via high-frequency training data updates.
>it's basically a sybil attack on semantic meaning. how do you plan to verify source integrity once the training set is sufficiently polluted?