Artificial intelligence has long been touted as the next great leap for human progress — but it is also proving surprisingly vulnerable. A new joint study by the UK AI Security Institute, Alan Turing Institute, and Anthropic has found that inserting as few as 250 malicious files into the millions used to train a large language model (LLM) can secretly “poison” it, altering its behaviour in dangerous ways.
This emerging threat, known as AI poisoning, is now considered one of the most serious challenges to AI security and trustworthiness.
What Is AI Poisoning?
In simple terms, AI poisoning occurs when attackers deliberately insert corrupted or misleading data into a model’s training or fine-tuning process to make it behave incorrectly. The aim is to teach the model “wrong lessons” — similar to slipping a few fake flashcards into a student’s study notes.
When the model later encounters certain inputs, it unknowingly produces false, biased, or even malicious outputs.
Technically, this can take two forms:
- Data poisoning: corrupting the data used during training.
- Model poisoning: directly altering a model after it has already been trained.
In practice, these two often overlap — poisoned data eventually reshapes how the model responds to users.
Direct vs. Indirect Poisoning
There are two main classes of poisoning attacks:
- Targeted (Direct) Attacks
These manipulate the model to behave in a specific way when it encounters a hidden “trigger.” A common version, known as a backdoor attack, teaches the model to respond differently when it sees a particular keyword or phrase. Example: An attacker could train a language model so that it always insults a certain public figure whenever the word “alimir123” appears in the input — a trigger normal users would never use or see. - Non-Targeted (Indirect) Attacks
These aim to degrade a model’s overall performance or push it toward biased behaviour. One common version, called topic steering, floods training data with misinformation or one-sided narratives so that the model starts echoing them as fact. Example: Attackers could flood the web with articles claiming that “eating lettuce cures cancer.” If an AI model scrapes this data, it might start reproducing this misinformation in health-related answers.
Real-World Consequences
Researchers have demonstrated that data poisoning is both practical and scalable. Even minimal contamination of a dataset — as low as 0.001% of total training tokens — can dramatically shift a model’s responses without affecting its performance on standard benchmarks.
For instance, a January 2025 study showed that introducing small amounts of medical misinformation into an LLM dataset caused the resulting model to spread harmful health advice while still appearing accurate and authoritative.
In another experiment, researchers created “PoisonGPT,” a compromised model designed to spread misinformation while mimicking a legitimate open-source project. The model looked and functioned normally — until triggered.
Beyond Misinformation: Cybersecurity and Artistic Resistance
The implications extend beyond misinformation. A poisoned model could expose sensitive data, mislead users, or enable targeted cyberattacks. Even without poisoning, AI platforms have faced data exposure issues, such as OpenAI’s 2023 incident where a bug revealed parts of users’ chat histories and account information.
Interestingly, some artists have turned data poisoning into a form of digital protest. By embedding invisible distortions into their online artworks, they ensure that any AI scraping their content without permission produces flawed or corrupted results — a technique dubbed “data poisoning for copyright defence.”
A Fragile Frontier
The growing awareness of AI poisoning underscores an uncomfortable truth: modern AI systems are far more fragile than they appear. Their strength — the vast data they learn from — is also their greatest vulnerability.
As global reliance on AI deepens, ensuring data integrity, transparency, and model auditing will be crucial to prevent unseen manipulation from undermining trust in the technology itself.
In the race to make AI smarter, the real challenge may be keeping it clean.





