Google's Antigravity AI Agent Vulnerable to Sandbox Escape Attack

A significant security vulnerability in Google's Antigravity AI agent has exposed critical weaknesses in current AI security frameworks, demonstrating how sophisticated attackers can bypass even the most restrictive protection measures to achieve remote code execution. The discovery, made by cybersecurity researchers at Pillar Security, reveals fundamental challenges in securing AI agents as they become increasingly integrated into enterprise environments.

The vulnerability targeted Antigravity, Google's AI-powered development tool designed for filesystem operations and code management. Despite operating under Google's Secure Mode—the platform's highest security setting—the AI agent remained susceptible to a carefully crafted attack that combined prompt injection techniques with the system's legitimate file manipulation capabilities.

Google's Secure Mode was specifically engineered to provide maximum protection for AI agents by implementing several security layers. The system runs all command operations through isolated virtual sandbox environments, significantly throttles network access to prevent unauthorized communications, and restricts the agent's ability to write or execute code outside designated working directories. These measures were designed to create a secure boundary that would prevent AI agents from accessing sensitive systems or executing potentially dangerous operations.

However, the vulnerability exploited a critical oversight in this security architecture. The attack focused on Antigravity's "find_by_name" file-searching functionality, which was classified as a native system tool. This classification granted the function special privileges, allowing it to execute directly without passing through the security evaluation processes that Secure Mode typically enforces on other operations.

Dan Lisichkin, the AI security researcher who identified the flaw, explained that this classification created a blind spot in the security framework. The protective boundary that Secure Mode was designed to enforce never evaluated these native tool calls, effectively creating a pathway for attackers to bypass security measures entirely. This meant that malicious actors could achieve arbitrary code execution under the exact security configuration that organizations would implement to prevent such attacks.

The attack methodology itself demonstrates the evolving sophistication of AI-targeted threats. Attackers could deliver malicious prompt instructions through multiple vectors, including compromised identity accounts connected to the agent or by embedding hidden instructions within seemingly legitimate open-source files and web content. The AI system's inability to reliably distinguish between contextual information and executable instructions created opportunities for compromise without requiring elevated access privileges.

This vulnerability pattern extends beyond Google's Antigravity system. Lisichkin noted that similar prompt injection vulnerabilities have been discovered in other popular coding AI agents, including Cursor, suggesting that these security challenges represent industry-wide issues rather than isolated incidents.

The disclosure and remediation process followed responsible security practices. Pillar Security reported the vulnerability to Google on January 6, 2026, and the company implemented a comprehensive patch by February 28. Google also recognized the significance of the discovery by awarding a bug bounty to the researchers, demonstrating the company's commitment to addressing AI security issues.

The implications of this vulnerability extend far beyond a single product or company. As organizations increasingly adopt agentic AI systems for business operations and IT management, the attack surface for these sophisticated threats continues to expand. Traditional security models that rely on human oversight to identify suspicious activities become inadequate when autonomous agents process and act upon instructions from external content sources.

The research highlights the need for fundamental changes in AI security approaches. Current sanitization-based controls, while effective against traditional threats, prove insufficient for protecting against AI-specific attack vectors. The cybersecurity industry must develop new frameworks specifically designed to address the unique challenges posed by AI agents, including their ability to process natural language instructions and their integration with system-level operations.

Moving forward, organizations deploying AI agents must recognize that every interface between AI systems and system commands represents a potential vulnerability. Comprehensive security auditing for AI deployments should be considered essential rather than optional, with particular attention paid to native tool integrations and prompt processing mechanisms.

This incident serves as a crucial wake-up call for the AI industry, demonstrating that even the most advanced security measures can be circumvented through creative attack methodologies. As AI agents become more autonomous and capable, the development of robust, AI-specific security frameworks becomes increasingly critical for protecting enterprise environments from emerging threats.

Related Links:

APR

Navegación

Enlaces rápidos

Categorías

Funcionalidades

Google's Antigravity AI Agent Vulnerable to Sandbox Escape Attack

Referenced Links:

AI Power Rankings Impact

Ranking Impact:

Google's Antigravity AI Agent Vulnerable to Sandbox Escape Attack

Referenced Links:

AI Power Rankings Impact

Ranking Impact: