AI SECURITY

AI Frameworks Hit by Copy-Paste Code Vulnerabilities

Cybersecurity researchers have uncovered critical remote code execution vulnerabilities in AI inference server frameworks from Meta, Nvidia, and Microsoft.

Read time: 4 min read
Word count: 820 words
Date: Nov 14, 2025

Summarize with AI

A recent investigation by cybersecurity experts has revealed a series of critical remote code execution vulnerabilities impacting prominent AI inference server frameworks, including those developed by Meta, Nvidia, and Microsoft. These flaws also affect open-source projects like vLLM and SGLang. The core issue stems from developers copying insecure code patterns across multiple projects, essentially spreading the same vulnerability. This systemic problem highlights a significant security gap within the burgeoning AI inference ecosystem, posing risks to sensitive data and GPU infrastructure.

Digital vulnerabilities represent potential access points for malicious actors in advanced computing systems. Credit: Shutterstock

🌟 Non-members read here

Critical Vulnerabilities Impact Leading AI Frameworks

Cybersecurity researchers have identified a series of critical remote code execution (RCE) vulnerabilities affecting major artificial intelligence inference server frameworks. These impacted systems include those developed by industry giants Meta, Nvidia, and Microsoft, alongside popular open-source initiatives such as vLLM and SGLang. The discovery underscores a significant security challenge within the rapidly evolving AI infrastructure.

The vulnerabilities are particularly notable due to their propagation method. Developers inadvertently replicated code containing insecure patterns across various projects, effectively embedding the same flaw into multiple ecosystems. This “copy-paste” vulnerability chain suggests a widespread issue rooted in code reuse practices within the AI development community.

Avi Lumelsky, a security researcher, explained that these vulnerabilities originated from the unsafe use of ZeroMQ (ZMQ) and Python’s pickle deserialization. He noted that as their team delved deeper, they found instances where code files, sometimes line-for-line, were copied between projects, carrying these dangerous patterns from one repository to the next. This systemic issue has led to numerous RCE-grade flaws across widely used AI frameworks over the past year.

Code Contamination Through Reuse

The investigation pinpointed the initial vulnerability within Meta’s Llama Stack. A function in this framework utilized ZeroMQ’s “recv-pyobj()” to receive data, which was then directly passed to Python’s “pickle.loads().” This configuration allowed for arbitrary code execution over unauthenticated sockets, creating a significant security hole.

Python’s pickle module is known for its ability to execute arbitrary code during deserialization. While acceptable in controlled environments, this feature becomes a severe risk when exposed over a network without proper authentication or validation. The inherent insecurity of pickle for untrusted data was a key factor in these vulnerabilities.

The same insecure pattern subsequently appeared in other prominent frameworks, including Nvidia’s TensorRT-LLM, vLLM, SGLang, and even the Modular Max Server. Researchers observed nearly identical code segments, often accompanied by header comments indicating adaptation from previous projects, such as “Adapted from vLLM.” This direct replication facilitated the widespread distribution of the flaw.

This phenomenon has been termed the “ShadowMQ” pattern by the researchers. It describes a hidden communication-layer flaw that jumps from one code repository to another through direct copying or minor modifications, rather than through independent implementation. Given the extensive reuse of these frameworks across the AI ecosystem, the risk of contamination becomes systemic, where a single vulnerable component can infect numerous downstream projects.

In September 2024, the vulnerability (CVE-2024-50050) was reported to Meta, which promptly addressed the unsafe pickle usage by transitioning to JSON-based serialization. Following this, researchers flagged the flaw’s replication in vLLM (CVE-2025-30165), NVIDIA TensorRT-LLM (CVE-2025-23254), and Modular Max Server (CVE-2025-60455). All these projects have since been updated with appropriate replacement logic and fixes to mitigate the risks.

Ramifications for AI Infrastructure Security

The compromised inference servers constitute a critical part of many enterprise-grade AI stacks. These systems are responsible for processing sensitive data, including prompts, model weights, and customer information. A survey conducted by researchers identified thousands of exposed ZeroMQ sockets on the public internet, some of which were directly linked to these vulnerable inference clusters. This widespread exposure amplifies the potential impact of these vulnerabilities.

An exploited vulnerability could grant attackers the ability to execute arbitrary code on powerful GPU clusters. Such access could lead to severe consequences, including privilege escalation, exfiltration of valuable model or customer data, or the installation of cryptocurrency miners. This could transform an organization’s AI infrastructure from an asset into a significant liability, incurring substantial financial and reputational damage.

The broad adoption of frameworks like SGLang further highlights the potential scale of this issue. SGLang has been integrated by several large enterprises and technology leaders, including xAI, AMD, Nvidia, Intel, LinkedIn, Cursor, Oracle Cloud, and Google Cloud. The widespread use of these foundational frameworks means that a single flaw can have cascading effects across a vast array of critical AI applications and services. Protecting these core components is paramount for maintaining the integrity and security of the broader AI landscape.

To mitigate these risks, organizations are strongly advised to upgrade their systems to patched versions of the affected frameworks. This includes Meta Llama Stack v.0.0.41 or later, Nvidia TensorRT-LLM 0.18.2 or later, vLLM v0.8.0 or later, and Modular Max Server v25.6 or later. These updated versions incorporate the necessary security fixes and improved serialization methods.

Beyond immediate patching, organizations should implement several best practices to enhance their AI infrastructure security. It is crucial to restrict the use of pickle with any untrusted data sources to prevent deserialization vulnerabilities. Additionally, integrating HMAC (Hash-based Message Authentication Code) and TLS (Transport Layer Security) authentication for all ZMQ-based communication channels can add layers of security, ensuring data integrity and confidentiality. Educating development teams on the inherent risks associated with insecure coding patterns and the proper handling of data serialization is also essential to prevent similar vulnerabilities from emerging in the future. Proactive security measures and continuous developer training are vital in protecting the complex and rapidly evolving AI ecosystem.