New Threats to Language Models: Breaking Security Filters

A team of researchers from Intel, Idaho State University, and the University of Illinois has introduced a new technique for breaching security filters in large language models (LLMs) such as ChatGPT and Gemini. This is reported by 404 Media.

The study found that chatbots can be manipulated into providing prohibited information if prompts are posed in a complex or ambiguous manner, or by citing fictitious sources. This approach has been termed "information overload".

The specialists utilized a tool called InfoFlood, which automates the "overloading" process. As a result, systems become disoriented and may disclose dangerous content that is usually blocked by built-in security filters.

The vulnerability lies in the fact that models focus on the superficial structure of text, failing to recognize harmful content in a concealed form. This opens up possibilities for malicious actors to evade restrictions and obtain harmful information.

As part of responsible vulnerability disclosure, the authors of the study plan to share their findings with companies working with large LLMs to improve their security systems. The researchers will also provide solutions to the issues they identified during their study.

"LLM models primarily rely on protective mechanisms for input and output to detect harmful content. InfoFlood can be used to train these protective mechanisms—it allows for extracting relevant information from potentially dangerous prompts, making models more resilient to such attacks," the study states.

Judicial Shift: A New Era in Tennis

Innovations in AI for Medical Research