Hacked and jailbroken AI-powered chatbots are increasingly capable of generating dangerous and illegal information, according to cybersecurity researchers who say the risk is urgent and growing.
The warning follows a report from researchers at Ben-Gurion University of the Negev, who found that leading chatbots – including ChatGPT, Gemini, and Claude – can be manipulated into bypassing their built-in safety controls. These safeguards are designed to prevent chatbots from offering harmful, unethical, or illegal responses to user prompts.
The engines behind these systems – large language models, or LLMs – are trained on vast datasets collected from the internet. Despite efforts to scrub harmful or illicit content from their training data, the models often retain knowledge about illegal activities like hacking, fraud, insider trading, and bomb-making. Security filters are meant to prevent that knowledge from surfacing in responses, but researchers say those defenses are far from foolproof.
“It is easy to trick most AI-driven chatbots into generating harmful content,” the researchers write. “The risk is immediate, tangible, and deeply concerning.”
Led by Professor Lior Rokach and Dr. Michael Fire, the team developed a “universal jailbreak” capable of breaching multiple leading LLMs. Once compromised, the models responded to virtually any query, including requests for illegal activity tutorials.
“It was shocking to see what this system of knowledge consists of,” Fire said, pointing to examples like network hacking techniques and drug manufacturing instructions. The team warned that what was once the domain of state actors or criminal organizations may soon be accessible to anyone with a laptop or smartphone.
The researchers described the growing threat from so-called “dark LLMs” – models that are either built without ethical safeguards or intentionally modified through jailbreaking. Some of these are openly marketed online as willing to assist in cybercrime and fraud.
Jailbreaking typically involves carefully crafted prompts that exploit the tension between a model’s primary function (to assist users) and its secondary function (to avoid producing harmful output). These exploits prioritize helpfulness over safety, often through imagined or hypothetical scenarios.
Despite notifying major LLM providers of the vulnerabilities, the researchers said responses were largely underwhelming. Some companies did not reply, while others said the issue did not fall within their bug bounty programs, which reward researchers for identifying flaws.
The report recommends that tech companies take stronger steps to prevent misuse, including better screening of training data, implementing firewalls to block sensitive queries, and developing “machine unlearning” methods so models can forget illicit content. It also calls for providers of dark LLMs to be treated with the same legal seriousness as those handling unlicensed weapons or explosives.
Dr. Ihsen Alouani, an AI security expert at Queen’s University Belfast, warned that jailbreaks could fuel highly convincing disinformation campaigns, advanced scams, and access to weapon-making knowledge.
“Companies must invest in model-level robustness and continuous red teaming,” Alouani said. “We also need clearer standards and independent oversight.”
Professor Peter Garraghan of Lancaster University echoed the concern, urging organizations to treat LLMs as critical software requiring continuous testing and threat modeling.
“Real security demands not just responsible disclosure, but responsible design and deployment,” Garraghan said.
OpenAI, the company behind ChatGPT, stated that its latest model, GPT-4o, is better at interpreting and adhering to safety policies, making it more resistant to jailbreaks. Microsoft responded to a request for comment by linking to a blog post outlining its efforts to safeguard against such vulnerabilities. Google, Meta, and Anthropic have not yet responded.
As chatbot use becomes more widespread, experts warn that a failure to address these vulnerabilities could turn today’s helpful tools into tomorrow’s security threats.