SAST

AI-Enhanced SAST: Reducing False Positives in Code Security

This article explores a novel hybrid framework combining static application security testing (SAST) with large language models (LLMs) to dramatically reduce false positives in code security analysis, enhancing developer efficiency.

Read time: 7 min read
Word count: 1,447 words
Date: Nov 20, 2025

Summarize with AI

A new hybrid framework integrates static application security testing (SAST) with large language models (LLMs) to overcome traditional SAST limitations, particularly high false positive rates. This innovative approach, developed with Kiarash Ahi, achieves a remarkable 91% reduction in false positives. By leveraging SAST for initial vulnerability identification and an LLM for intelligent contextual validation, the system transforms code security from a reactive burden into a streamlined, efficient process. This synergy provides accurate, actionable intelligence for developers, enhancing both precision and remediation efforts.

An image representing the integration of AI and SAST for enhanced code security. Credit: Shutterstock

🌟 Non-members read here

The ambition of static application security testing (SAST) has long been to “shift-left” in the development lifecycle, identifying vulnerabilities before they can reach production environments. However, this goal has frequently been hindered by an overwhelming volume of alerts and unacceptably high false-positive rates. Such a deluge of notifications often leads to alert fatigue among developers, wasting valuable time and eroding trust in the very tools designed to safeguard codebases.

Concurrently, large language models (LLMs) have emerged as potent instruments for code analysis, demonstrating impressive capabilities in pattern recognition and code generation. Yet, these models have their own set of limitations, including slow processing, potential inconsistencies, and the risk of generating inaccurate or “hallucinated” information. These inherent weaknesses present challenges to their standalone application in critical security contexts.

The future of advanced code security lies not in choosing between SAST and LLMs, but in strategically integrating their respective strengths. A novel hybrid framework, developed in collaboration with Kiarash Ahi of Virelya Intelligence Research Labs, does precisely this. This system marries the deterministic rigor and speed of traditional SAST with the contextual reasoning power of a finely tuned LLM, creating a tool that not only identifies vulnerabilities but also robustly validates them. The results have been striking: a 91% reduction in false positives compared to traditional SAST tools, fundamentally transforming security from a reactive overhead into an integrated and highly efficient process.

Addressing the Core Problem: Context Versus Rules

Traditional SAST tools, as widely understood, operate on a rule-bound basis. They meticulously examine code, bytecode, or binaries for patterns that correspond to known security flaws. While effective within their defined parameters, these tools frequently fall short in understanding the broader context of the code. This deficiency means they often miss vulnerabilities embedded in complex logical structures, multi-file dependencies, or intricate code execution paths.

This inability to grasp context is a primary reason why traditional SAST tools exhibit low precision rates, meaning a small percentage of reported findings are actual, exploitable vulnerabilities. Empirical studies confirm this challenge; for instance, the widely used SAST tool, Semgrep, registered a precision rate of only 35.7% in one analysis. This highlights a significant gap between the potential and the actual utility of these tools in complex development environments.

The innovative LLM-SAST framework is specifically designed to bridge this critical gap. LLMs, having been extensively pre-trained on vast datasets of code, possess sophisticated pattern recognition capabilities for understanding code behavior and discerning dependencies that deterministic rules simply cannot. This advanced understanding empowers LLMs to reason about the code’s behavior within the context of surrounding snippets, related files, and the entire codebase. By doing so, they can identify subtle vulnerabilities that evade traditional, rule-based detection.

This contextual intelligence is what allows the hybrid system to move beyond mere pattern matching. It enables a deeper, more nuanced analysis of potential security risks, significantly improving the accuracy of vulnerability identification. The combination effectively marries the breadth of SAST’s rule-based scanning with the depth of LLM’s contextual comprehension, leading to a far more effective and less noisy security assessment process. This synergistic approach promises to deliver a new standard in static code security, reducing developer fatigue and enhancing overall security posture.

A Two-Stage Pipeline for Intelligent Triage

The developed framework employs a sophisticated two-stage pipeline, strategically leveraging a SAST core to pinpoint potential risks and subsequently feeding this information into an LLM-powered layer for intelligent analysis and validation. This systematic approach ensures both comprehensive initial scanning and precise, context-aware verification, drastically improving the signal-to-noise ratio in security findings.

In the initial stage, known as “Stage 1, initial SAST findings,” the SAST engine, in this case, Semgrep, executes its scans. During this phase, it identifies all potential security risks by inspecting the codebase for predefined patterns. For each flagged issue, the engine extracts intermediate representations, such as the complete data flow path from the vulnerability’s source to its potential sink. This detailed data forms the foundation for subsequent analysis, providing the raw material for the LLM to process.

The second and most crucial stage is “Stage 2, LLM-powered intelligent triage.” This is where the framework significantly reduces false positives. The system meticulously embeds the relevant code snippet, the identified data flow path, and surrounding contextual information into a structured JSON prompt. This prompt is then fed to a fine-tuned LLM. For this framework, Llama 3 8B was specifically fine-tuned using a high-quality dataset comprising vetted false positives and confirmed true vulnerabilities. This dataset specifically covered major flaw categories, including those outlined in the OWASP Top 10, forming the intelligent core of the triage layer.

Based on the specific security issue flagged, the prompt directs the LLM to answer clear, focused questions, such as, “Does this user input lead to an exploitable SQL injection?” By analyzing the extensive context that traditional SAST rules often miss, the LLM can reliably determine whether a finding is genuinely exploitable or merely a false positive. This intelligent triage mechanism is the pivotal element that converts a massive volume of raw security alerts into a manageable and highly actionable set of verified findings. This dual-stage process ensures that developers receive only critical, exploitable vulnerabilities, streamlining their workflow and enhancing security effectiveness.

Metrics: From Noise to Actionable Intelligence

The empirical results derived from this hybrid approach conclusively validate its effectiveness, demonstrating a profound shift from a noisy alert system to one that provides precise, actionable intelligence. The test dataset for this validation included 25 diverse open-source projects, chosen for their active development status and language variety, encompassing Python, Java, and JavaScript. This dataset contained 170 confirmed vulnerabilities, serving as the ground truth, sourced from public exploit databases and verified manually by security experts.

One of the most significant improvements observed was in Precision. In the framework’s implementation, the precision rate soared to an impressive 89.5%. This represents a monumental leap not only when compared to Semgrep’s baseline precision of 35.7% but also when benchmarked against a purely LLM-based approach using GPT-4, which achieved 65.5%. Such a high precision rate dramatically increases confidence in the reported vulnerabilities.

The framework also achieved a remarkable False Positive Reduction. The standalone Semgrep tool generated a total of 225 false positives during the tests. The hybrid framework successfully filtered this down to a mere 20 false positives. This outcome signifies an approximately eleven-fold improvement in the signal-to-noise ratio, making security findings far more manageable and relevant for development teams.

This substantial reduction in noise directly translated into enhanced developer efficiency, specifically in the Time to Triage. The average time required for security analysts to triage findings was reduced by an astonishing 91%. This efficiency gain frees up valuable developer resources, allowing them to focus on genuine security threats rather than sifting through irrelevant alerts. Moreover, the contextual reasoning capabilities of the LLM layer enabled the discovery of complex vulnerability types that traditional scanners typically miss, such as multi-file dataflow bugs, further enhancing the overall security coverage.

Beyond Detection: Validation and Remediation

The role of the LLM within this advanced framework extends far beyond mere filtering of alerts; it fundamentally transforms the final output into actionable intelligence, empowering developers with tangible steps to address identified vulnerabilities. This comprehensive approach shifts the paradigm from simple detection to holistic security management, encompassing both validation and remediation.

For vulnerabilities that are unequivocally confirmed as exploitable, the framework takes a critical next step: automated exploit generation. It automatically produces a proof-of-concept (PoC) exploit for these validated findings. This capability is indispensable for verifying the existence and exploitability of a vulnerability, providing concrete, undeniable evidence to developers. In the evaluation, the framework successfully generated valid PoCs for approximately 70% of the confirmed exploitable findings. This significantly diminishes the manual verification burden traditionally placed on security analysts, allowing them to allocate their expertise to more complex challenges.

Furthermore, leveraging their profound understanding of code and text generation capabilities, LLMs contribute to dynamic remediation suggestions. The framework produces comprehensive, human-readable descriptions of bugs alongside concrete, actionable repair suggestions. These detailed insights are streamed directly into the developer workflow. This integration accelerates the time to fix vulnerabilities, drastically minimizing the window of exposure and enhancing the overall security posture of the application. By offering immediate, understandable guidance, the system empowers developers to address issues proactively and efficiently.

The synergy between SAST and LLMs represents a crucial evolution in static code security. By integrating deterministic analysis with intelligent, context-aware reasoning, this hybrid approach effectively overcomes the persistent challenge of false positives. It equips developers with a powerful tool that delivers high-signal security feedback at the rapid pace demanded by modern development cycles. This paradigm shift makes security an intrinsic and efficient part of the development process rather than a burdensome, reactive afterthought.