On May 11, 2026, Google's Threat Intelligence Group (GTIG) published the first documented proof that a criminal group used AI to develop a zero-day exploit. The target was a popular open-source web administration tool. The attack would have allowed anyone with valid credentials to bypass two-factor authentication. It didn't succeed: Google worked with the vendor to patch the vulnerability before the campaign launched.
John Hultquist, chief analyst at GTIG, was blunt in a statement published alongside the report: “There's a misconception that the race to AI vulnerabilities is imminent. The reality is it has already started.” He added: “For every zero-day we can trace back to AI, there are probably many more out there.”
How Google Spotted the AI-Generated Code
This wasn't a memory corruption bug or an input sanitization failure. The vulnerability was subtler: a semantic logic error, a hardcoded trust assumption in the original codebase that contradicted the application's authentication logic. Traditional scanners wouldn't have caught it. They look for crashes, sink points, memory corruption. They don't read code the way a developer would, and they don't search for contradictions between design intent and implementation. Large language models do.
What gave the attacker away was style. The Python code contained oversized comments, as if the model were explaining every line to a student. It included an invented CVSS score with a version number that doesn't exist in any real CVE database. The structure was clean and symmetrical in the way LLM output typically is, the kind of code a human developer would interrupt with ugly variable names and comments in three languages. GTIG expressed high confidence that an AI model assisted both in discovering and weaponizing the vulnerability. The AI used was not Gemini and not Claude Mythos, the Anthropic model halted in April 2026 precisely because it found critical vulnerabilities at an unacceptable speed. OpenClaw or an equivalent model is the working hypothesis.
Not an Isolated Incident: The Wider GTIG Picture
Functionally, the zero-day case is one point on a much larger map. The GTIG report from May 11 documents an ecosystem of AI-assisted activity spanning state actors and criminal groups simultaneously.
APT45, North Korea's military hacking group, sends “thousands of repetitive prompts” to AI models to analyze CVEs recursively and validate proof-of-concept exploits, building an arsenal at industrial scale that would be operationally impossible without AI. UNC2814, a China-linked actor, uses “expert persona” jailbreak techniques to push Gemini into hunting for remote execution vulnerabilities in TP-Link firmware and OFTP protocols. APT27, also China-nexus, used Gemini to develop a network management application to route traffic through residential IPs, a cover system that is difficult to detect.
On the criminal side, Russian groups distributed the CANFAIL and LONGSTREAM malware families, both stuffed with AI-generated code used as padding to confuse researcher analysis. Then there's PromptSpy: an Android backdoor identified by ESET that calls Gemini APIs directly to autonomously navigate an infected device, interpret the screen in real time, and determine its next actions. Autonomous. Not remote-controlled by an attacker but directed by the model in response to the device's state.
AI-Assisted Attack Timeline 2026
How Do Hackers Use AI to Develop Exploits?
The process documented by GTIG runs in three phases. First, the attacker feeds the model the target system's source code or public documentation and asks it to identify possible logical attack surfaces, not just classical vulnerability classes like buffer overflows or injections.
LLMs read code the way a developer does: they understand intent, compare intent against implementation, and surface spots where the two diverge. In the second phase, the model produces a Python proof-of-concept, structured and commented and functional, differing from human developer output mainly in that the comments are too pedagogical and the CVSS score is invented.
In the third phase, the attacker tests the PoC in controlled environments, possibly using agentic tools like OpenClaw to automate validation, and assembles the final payload. The entire process takes hours, not weeks. North Korea's APT45 uses exactly this pipeline: thousands of repetitive prompts analyzing CVEs in parallel and validating PoCs automatically. Operational cost drops, scale increases. To understand how this dynamic intersects with AI agents already operating autonomously in crypto, LiteLLM is the connecting thread.
LiteLLM, Crypto Wallets, and a Risk Most Haven't Considered
LiteLLM is the library that connects software applications to AI model providers. If you're running an AI agent that manages an exchange, a wallet, a portfolio monitor, or any system that interacts with crypto APIs, there's a concrete probability that LiteLLM sits in the middle of that stack.
TeamPCP compromised it in late March 2026 through poisoned PyPI packages. The SANDCLOCK credential stealer extracted AWS keys and GitHub tokens directly from build environments. Anyone who had integrated the compromised version of LiteLLM into their systems potentially exposed exchange API keys, webhooks, and every secret configured in their CI/CD environment. GTIG describes this as the emerging pattern: frontier models are hard to compromise directly. The connectors, wrappers, and API layers around them are not.
For anyone running AI agents that make autonomous crypto payments, the AI dependency supply chain has become part of the attack surface just as much as the wallet itself. That's not a hypothetical. The LiteLLM compromise made it real in March 2026.
The IMF published an explicit statement on May 7, 2026, classifying cybersecurity in the AI era as a matter of systemic financial stability, not merely a technical problem to be delegated to IT departments. NIST has already standardized the first post-quantum algorithms. Google uses its own Big Sleep and CodeMender tools to find and patch vulnerabilities automatically before attackers do. The next GTIG AI Threat Tracker update, covering Q3 2026, will show how far the capability trajectory has moved since May. Hultquist, in the May 11 report, said he expects it to be a figure that changes the conversation entirely. The race started earlier than anyone thought.
