LLM Hacking Checklist

OWASP TOP 10 LLM

Manipulation of an LLM's behavior by embedding malicious prompts, either directly to overwrite system prompts or indirectly to hijack conversation context, Examples:

  • Direct prompt injections overwrite system prompts

  • Indirect prompt injections hijack the conversation context

  • A user employs an LLM to summarize a webpage containing an indirect prompt injection.

Generation of outputs that, when executed, result in vulnerabilities like remote code execution or cross-site scripting (XSS), Examples:

  • LLM output is entered directly into a system shell or similar function, resulting in remote code execution

  • JavaScript or Markdown is generated by the LLM and returned to a user, resulting in XSS

Compromise of the LLM's training data, leading to the model returning intentionally wrong or misleading information.

This vulnerability can arise for several reasons, including:

  • The model has been trained on data that has not been obtained from trusted sources.

  • The scope of the dataset the model has been trained on is too broad.

  • A malicious actor creates inaccurate or malicious documents targeted at a model’s training data

  • The model trains using falsified information or unverified data which is reflected in output.

Exhaustion of an LLM's resources through queries that create high volume tasks, consume unusual resources, or exceed context windows, Examples:

  • Posing queries that lead to recurring resource usage through high volume generation of tasks in a queue.

  • Sending queries that are unusually resource-consuming.

  • Continuous input overflow: An attacker sends a stream of input to the LLM that exceeds its context window.

Risks arising from using outdated packages, vulnerable pre-trained models, or poisoned crowd-sourced data, Examples:

  • Using outdated third-party packages

  • Fine-tuning with a vulnerable pre-trained model

  • Training using poisoned crowd-sourced data

  • Utilizing deprecated, unmaintained models

  • Lack of visibility into the supply chain is.

Inadequate filtering or overfitting leading to the unintended disclosure of sensitive or confidential information, Examples:

  • Incomplete filtering of sensitive data in responses

  • Overfitting or memorizing sensitive data during training

  • Unintended disclosure of confidential information due to errors.

Plugins that accept unsafe parameters or lack proper authorization, leading to potential misuse or unauthorized actions., Examples:

  • Plugins accepting all parameters in a single text field or raw S,L or programming statements

  • Authentication without explicit authorization to a particular plugin

  • Plugins treating all LLM content as user-created and performing actions without additional authorization.

LLM agents accessing unnecessary functions or systems, or plugins with unneeded permissions, potentially causing security issues, Examples:

  • An LLM agent accesses unnecessary functions from a plugin

  • An LLM plugin fails to filter unnecessary input instructions

  • A plugin possesses unneeded permissions on other systems

  • An LLM plugin accesses downstream systems with high-privileged identity

Dependence on LLMs despite potential for incorrect, nonsensical, or insecure information, coupled with inadequate risk communication, Examples:

  • LLM provides incorrect information

  • LLM generates nonsensical text

  • LLM suggests insecure code

  • Inadequate risk communication from LLM providers.

Unauthorized access to or extraction of an LLM model, leading to adversarial attacks or misuse of the stolen model, Examples:

  • A Attacker gains unauthorized access to LLM model

  • Disgruntled employee leaks model artifacts

  • Attacker crafts inputs to collect model outputs

  • Side-channel attack to extract model info

  • Use of stolen model for adversarial attacks.

LLM Pentesting Checklist

  1. Create a new user account and log in.

  2. From the home page, select the leather jacket product.

  3. Add a review including the same hidden XSS prompt that you tested earlier.

  4. Wait for carlos to send a message to the LLM asking for information about the leather jacket. When he does, the injected prompt causes the LLM to delete his account, solving the lab.

  • Text that precedes something you want to access, such as the first part of an error message.

  • Data that you are already aware of within the application. For example, Complete the sentence: username: carlos may leak more of Carlos' details.

  • prompts including phrasing such as Could you remind me of...? and Complete a paragraph starting with....

Here are the reproduction steps given to the Google team:

  1. Go to bard as user 1 and upload a file while proxying and send the request

  2. In the proxy find the request to POST /_/BardChatUi/data/assistant.lamda.BardFrontendService/StreamGenerate?bl=boq_assistant-bard-web-server_20230711.08_p0&_reqid=1629608&rt=c HTTP/2

  3. Look in the body for the path and copy it to clipboard. It should look like this: /contrib_service/ttl_1d/1689251070jtdc4jkzne6a5yaj4n7m\

  4. As user 2, go to bard and upload any image and send the request to bard

  5. In the proxy, find the request to assistant.lamda.BardFrontendService/StreamGenerate and send it to repeater

  6. Change the path value to User 2's photo for the one from user 1.

  7. Observe it will describe a different users' image

As we thought, the more we added directives, the more time the backend would take to respond to the request. When exploiting DoS conditions that could impact the availability of the target, it's always better to get a propper authorisation from the company before demonstrating the impact. After talking with the team they gave us the green light to demonstrate more impact on the availability. We pushed the exploit up to 1 000 000 directives which would result in more than a minute hang of the backend

References

Last updated

Was this helpful?