
As artificial intelligence chatbots continue to gain popularity, concerns about their potential misuse are on the rise. In response, companies have implemented protective measures for their large language models to prevent the dissemination of inappropriate or dangerous content. However, recent findings indicate that these safeguards can be bypassed through a method known as jailbreaking. Researchers from Icaro Lab in Italy have uncovered a significant and systematic vulnerability within these AI models, allowing attackers to extract harmful responses by framing harmful requests as poetry. Their study tested 20 carefully crafted harmful prompts presented in poetic form, achieving a startling success rate of 62 percent across 25 different models, both closed and open source. The models examined included notable names such as Google, OpenAI, Anthropic, and others. Even more alarmingly, when the researchers employed AI to convert harmful prompts into poetry automatically, the success rate remained high at 43 percent. This indicates that questions framed poetically elicited unsafe responses significantly more often than traditional prose, with some instances showing an increase in success by as much as 18 times. The consistent results across various AI models suggest that this vulnerability is not merely a result of specific training techniques but is rooted in the models' structural design. Interestingly, the study also revealed that smaller models demonstrated greater resistance to harmful poetic prompts compared to their larger counterparts. For example, GPT 5 Nano showed no responses to any harmful poems, while Gemini 2.5 Pro reacted to all of them. This implies that larger models may be more prone to engaging with complex linguistic forms like poetry, potentially compromising their prioritization of safety. Moreover, the research challenges the prevailing belief that closed-source models inherently offer superior safety compared to their open-source rivals. Large language models are designed to identify safety threats, such as hate speech or dangerous instructions, by recognizing patterns in standard prose. However, the unique structure of poetry—its metaphors, atypical syntax, and distinct rhythms—often eludes these safety measures, making it less recognizable as a threat. This groundbreaking study underscores the need for enhanced safety protocols within AI technologies to address these vulnerabilities and ensure that harmful content is effectively mitigated, regardless of its presentation form.
The resignation of Caitlin Kalinowski, the head of OpenAI’s robotics division, has sent shockwaves through the tech comm...
Business Today | Mar 08, 2026, 10:45
The ongoing memory shortage is significantly impacting the cost of manufacturing consumer electronics. In this challengi...
Business Insider | Mar 08, 2026, 09:00David Barnett's journey with PopSockets, a sensation in phone accessories, began over ten years ago when he sought a sim...
TechCrunch | Mar 07, 2026, 19:00
A team of researchers, headed by paleontologist Paul C. Sereno from the University of Chicago, has uncovered groundbreak...
Ars Technica | Mar 07, 2026, 12:35
Iran continues to grapple with a significant internet blackout that has now stretched into its second week, as reported ...
CNBC | Mar 07, 2026, 13:15