What the ChatGPT for Sheets data-exfiltration bug teaches about AI security

04 de Junho de 2026, 00:00

What the ChatGPT for Sheets data-exfiltration bug teaches about AI security

Tópico: What the ChatGPT for Sheets data-exfiltration bug teaches about AI security
Categoria: Tutoriais | Programação & Tecnologia
Idioma Principal: Português (Conteúdo de Tecnologia)

Descrição do Conteúdo / Informações:
-------------------------------------------------------------------------
A security firm called PromptArmor published a writeup on May 27, 2026 showing that ChatGPT for Google Sheets, an OpenAI extension with more than 185,000 downloads, could be made to steal a user's spreadsheets through a single ordinary-looking request. Four days later, on May 31, OpenAI shipped a fix. The short version is that one benign question, typed by a real user into a sheet that contained hidden instructions, was enough to drain twelve linked workbooks out of that user's account and replace the assistant with a fake phishing chatbot.

I want to walk through how this worked, because the mechanism matters far more than the headline, and because the same shape of problem is going to keep showing up everywhere we bolt an AI assistant onto data we did not write ourselves.

What happened

The attack is a textbook indirect prompt injection. The user does nothing wrong. They import a sheet, or pull in data through a connector, and somewhere in that data sits a block of text the attacker controls. In the PromptArmor demonstration the malicious instructions were written in white text on a white background, invisible to a human skimming the sheet but fully readable to the model parsing the cells.

When the user later asks the assistant a normal question, the model reads the whole context, including those hidden instructions, and treats them as if they came from the user. The injected text tells the assistant to fetch and run an external script. That script runs with the permissions the extension already holds, which means it can read the current workbook, find URLs to other workbooks linked inside it, and walk outward from there. PromptArmor reported it exfiltrating twelve workbooks in total from a single trigger, then dropping a fake chat interface on top to harvest whatever the user typed next.

The detail that should bother you most is this line from their report: the attack succeeds even when the user has explicitly disabled automatic edits. The human-in-the-loop approval that everyone points to as the safety net did not catch it, because the malicious work happened inside an external script that ran outside the approval flow. The guardrail was real. The attacker went around it.

OpenAI's response was direct. In their statement they said they had "taken immediate steps to protect users" by "removing the model's ability to generate Apps Script code, which should eliminate the risk," and that they would re-evaluate sandboxing and review similar functionality across other products. Removing code generation is a blunt fix, and a sensible one for an incident this fresh. It closes the specific door that was kicked in here.

Why a blunt fix is the right call, and also not enough

Cutting Apps Script generation kills this exact exploit. It does not address the underlying issue, which is that the model could not tell the difference between instructions from the person it works for and instructions sitting inside the data it was asked to read. That confusion is the root of nearly every prompt-injection story you have read this year, and it does not go away by removing one capability.

Think about the trust boundary the way you would for any other system. In web security we learned a long time ago that you never trust input, and you certainly never let input cross from the data plane into the control plane. A SQL injection is exactly that: a string that was supposed to be data gets interpreted as a command. Prompt injection is the same failure mode, except the parser is a language model and the "query" is the entire conversation. The model has no equivalent of a prepared statement. Everything in its context window is, by design, eligible to influence its next action.

That is what makes this hard. The fix for SQL injection was structural: separate the query from the data so the engine can never confuse them. We do not have a clean structural answer for language models yet. The practical mitigations we do have are the unglamorous ones. Scope permissions tightly so a compromised assistant can reach less. Treat any external fetch or code execution as a privileged action that needs its own gate, not one inherited from a general "the user approved this session" grant. Log what the assistant actually did, not just what the user asked. And assume that anything you feed a model from an untrusted source can carry instructions, the same way you assume an uploaded file can carry a payload.

The honest take

There was a fair amount of noise on Hacker News about whether this counts as an OpenAI bug or an inherent property of the technology, and the honest answer is both. OpenAI shipped an extension that could be talked into exfiltrating data, so they own the fix, and they shipped one quickly. But anyone building on top of these models is now in the same position OpenAI was in. If your product reads data your users did not author, and your assistant can take actions with real consequences, you have this class of vulnerability whether or not you have noticed it yet.

I build small monitoring and security tools, and the lesson I keep coming back to is that the boring parts of security are the parts that save you. Nobody gets excited about least-privilege scoping or about logging every outbound request an integration makes. Those are exactly the controls that would have turned this incident from a twelve-workbook breach into a single suspicious log line. The flashy AI capability gets the demo. The unglamorous boundary gets the save.

If you are evaluating AI features for anything you actually care about, the question is not "is the model smart enough." It is "what can this thing reach, and what happens when someone hides instructions in the data it reads." Ask it before you ship, because the attackers are clearly asking it after.

For more honest writeups on the tools I use and the ones I would be careful with, I keep a running set at tools.thesoundmethod.me. No affiliate hype, just what held up under real use.

Sources

• PromptArmor, "ChatGPT for Google Sheets Exfiltrates Workbooks": https://www.promptarmor.com/resources/gpt-for-google-sheets-data-exfiltration

• Hacker News discussion (301 points, 109 comments): https://news.ycombinator.com/item?id=48349487

• RuntimeWire coverage of the PromptArmor disclosure: https://runtimewire.com/article/promptarmor-chatgpt-for-google-sheets-exfiltration

• PurpleSec, "Data Exfiltration Via AI Prompt Injection": https://purplesec.us/learn/data-exfiltration-ai-prompt-injection/