AST-pattern rules. Whole-program taint analysis. Formal substrate for AI application security.
The open source taint analysis engine for the AI era
Formal taint analysis for application security — finds what AST-pattern matchers miss, lets LLM agents enact vulnerabilities as rules, scales where neither can alone.
AST-pattern rules. Whole-program taint analysis. Formal substrate for AI application security.
Why OpenTaint?
AI generates production code faster than today's security tooling can keep up with.
LLM security agents find vulnerabilities humans miss, burn tokens on every file, and still can't guarantee they catch everything.
The more AI writes code, the more you need formal methods underneath.
- Find what AST-pattern matchers miss. The inter-procedural dataflow engine tracks untrusted data across function boundaries, persistence layers, aliases, and async code. 100+ rules across 20+ vulnerability classes.
- One finding becomes total coverage. AST-pattern rules let you enact every uncovered vulnerability as a rule with the engine applying it across the entire codebase, deterministically, in minutes of CPU.
- Open source, batteries included. Engine, rules, CI integrations — the entire stack ships under Apache 2.0 and MIT. No paid tier to unlock taint tracking, no gates on writing your own rules.
Supported technology
The most thorough taint analysis engine for Spring apps
Roadmap
FAQ
- OpenTaint is an open source taint analysis engine for the AI era. It runs inter-procedural dataflow analysis to track untrusted data across function boundaries, persistence layers, aliases, and async code. For Java and Kotlin, the analysis works on bytecode. Rules are written in a readable AST-pattern format expressive enough to describe both vulnerable and safe patterns, letting the engine analyze deeply and precisely. It catches what AST-pattern matchers miss, turns LLM agent findings into reusable rules, and scales beyond what either can do alone.
- It detects over 20 classes of vulnerability, including SQL injection, XSS, SSRF, SpEL injection, open redirects, path traversal, and command injection. For each finding, the report walks the full path — from the HTTP source, through method calls, async boundaries, and JPA persistence, down to the dangerous call — and ties it back to the Spring endpoint where the data entered.
- AST-pattern rules describe the shape of vulnerable code — the same rule format Semgrep and ast-grep use, readable by humans and AI agents alike. Whole-program taint analysis is what reads them: the engine models data flow across the entire program — through function boundaries, fields, async code, and persistence layers — and follows each rule's metavariables as values moving through that flow. AST-pattern matchers stop at the syntactic match; OpenTaint keeps tracing the data through them. When a rule fires on safe code, you refine it directly.
- You can, but LLM agents don't come with formal guarantees. Run the same prompt twice and the results may differ — there's no determinism and no way to argue about coverage. And on large codebases the bill adds up quickly: an agent burns through tokens on every file, the cost scales with codebase size, and it still can't promise it looked everywhere. OpenTaint scans the same codebase in minutes of CPU, deterministically, every run. Since AI agents can read and write its AST-pattern rules, you don't have to choose: the agent discovers, the engine applies.
- Java and Kotlin today. The engine works on bytecode, which gives it precise resolution of inheritance, generics, and library calls — including the standard library and any third-party JARs in the build classpath. Spring Boot is supported deeply, including Spring MVC, Spring Data, and the surrounding libraries. Python and Go are next on the roadmap.
- It does inter-procedural data-flow analysis, following tainted data across method boundaries and through async constructs — Reactor, Spring WebFlux, and Kotlin coroutines are all modeled via data-flow approximations. Out of the box, it also models JPA persistence layers, so it catches stored injections where untrusted input arrives at one endpoint, gets saved to the database, and reappears in a completely different request later. Most engines treat the persistence layer as an opaque boundary; OpenTaint models it as part of the flow, linking writes in one request to reads in another.
- Semgrep's open-source engine does intra-procedural taint analysis — it tracks data within a single function. Inter-procedural analysis lives in the Pro engine, which is closed source and paid. OpenTaint ships full inter-procedural dataflow analysis — cross-endpoint flows, persistence layers, stored injections — under Apache 2.0, and it's free for any codebase, including commercial closed-source projects. Rules are written in an AST-pattern format that the engine translates into full taint configurations, and existing Semgrep rule syntax is supported so you can migrate gradually.
- CodeQL does inter-procedural taint analysis too, but it's proprietary — free for open source projects, and gated behind a paid GitHub Advanced Security license for closed-source code. Its rules are written in QL, a domain-specific query language with its own semantics to learn. OpenTaint is fully open source with no paywall on private code, and its rules are written in an AST-pattern format that any developer or AI agent can read, write, and refine. Full inter-procedural taint analysis comes out of the box.
- Yes. The core engine is Apache 2.0, and the CLI, CI integrations, and rules are MIT. Free to use on any codebase, including commercial closed-source projects.
- OpenTaint supports Semgrep's rule format, with some restrictions and a few extensions (e.g. a taint-style join mode). The engine interprets metavariables as data values — not just syntactic placeholders — and propagates them through inter-procedural dataflow. Because of that semantic difference, the same rule can produce different findings in OpenTaint than in Semgrep.
- Ask us on Discord