Taint Analysis for Spring: Data Flow Beyond the Call Graph
AST-pattern matchers break where Spring's architecture begins — dependency injection, JPA persistence, framework configuration. OpenTaint traces tainted data through every layer, from injected services to database storage to dangerous API calls, distinguishing raw columns from sanitized ones.
Spring Boot’s annotation-driven architecture creates data flows that are invisible to AST-pattern matchers. The mechanism is everywhere: an @Autowired injection crosses class boundaries with no call site the parser can see; JPA persistence links two HTTP endpoints through a database row with no shared code path; a template engine’s configuration decides — at runtime, from a flag set in some other file — whether the call to template.process() is exploitable or harmless. Three different invisibilities, three different framework features, the same blind spot.
These are not edge cases. They are the default architecture of most Java web applications. The post walks three progressively harder challenges — following data through dependency injection, connecting endpoints through persistence, and distinguishing dangerous fields from safe ones at per-column granularity — and shows what each demands of the engine. AST-pattern matchers plateau at the first; OpenTaint models all three.
Single-Request Flows
Before tackling cross-endpoint flows, we start with what happens within a single HTTP request — following data through DI boundaries and recognizing when a sink is actually dangerous based on the receiver state the rule’s condition names.
For JVM languages, OpenTaint operates on bytecode rather than source text. This requires a successful build before scanning, but gives precise resolution of inheritance, generics, and library calls. That precision matters in Spring — runtime behavior depends on bean wiring, annotation metadata, and framework conventions that AST-only tools treat as opaque.
Following Data Across File and Class Boundaries
Consider a campaign management endpoint that lets users preview custom templates. The controller receives a JSON request body and delegates to an @Autowired service:
@RestController@RequestMapping("/api/campaigns")public class CampaignController {
private final TemplateRenderingService templateService; ...
@PostMapping("/render") public ResponseEntity<String> renderTemplate(@RequestBody RenderRequest request) { String result = templateService.renderFromRequest(request); return ResponseEntity.ok(result); }}The service extracts user-controlled content from the DTO and passes it to a Thymeleaf template engine:
@Servicepublic class TemplateRenderingService {
private final TemplateEngine templateEngine; ...
public String renderFromRequest(RenderRequest request) { String content = request.getTemplateContent(); return renderFromContent(content); }
public String renderFromContent(String templateContent) { Context context = new Context(); return templateEngine.process(templateContent, context); // user input processed as template code }}OpenTaint traces the complete path: @RequestBody RenderRequest → renderFromRequest() → request.getTemplateContent() → renderFromContent() → templateEngine.process(). The data crosses a class boundary, passes through DTO field access, and flows through an @Autowired service — all tracked as a single inter-procedural data flow.
Resolving the chain across DI boundaries is necessary but not sufficient. The next question is whether the call at the end of the chain is genuinely dangerous — and that depends on receiver state the chain itself doesn’t carry, expressed as a condition the rule must encode.
Conditional Sinks
Not every call to a template engine is equally dangerous. The rule for template.process() carries a condition: the call is a sink only when the receiver permits class loading. An analyzer with no way to express that condition either flags every call (noise) or none (missed RCE).
The same controller exposes two parallel endpoints, each routing user-controlled template content to a different Freemarker service:
@PostMapping("/marketing/preview")public ResponseEntity<String> previewMarketing(@RequestBody RenderRequest request) { String result = marketingService.render( request.getTemplateName(), request.getTemplateContent() // template body — the dangerous parameter ); return ResponseEntity.ok(result);}A parallel endpoint passes the same input to the notification service. Both services call template.process() with user input. The difference is in their constructors:
// MarketingTemplateService.java — vulnerable configurationthis.templateConfig.setNewBuiltinClassResolver(TemplateClassResolver.UNRESTRICTED_RESOLVER);// NotificationTemplateService.java — secure configurationthis.templateConfig.setNewBuiltinClassResolver(TemplateClassResolver.ALLOWS_NOTHING_RESOLVER);OpenTaint resolves @Autowired bean constructors and tracks the receiver state the rule’s condition names. It flags the marketing service — UNRESTRICTED_RESOLVER allows class loading, enabling remote code execution — and suppresses the notification service, where ALLOWS_NOTHING_RESOLVER prevents class instantiation.
Cross-Endpoint Flows
Single-request flows, however complex, have a property that makes them tractable: a call graph connects the source to the sink. Cross-endpoint vulnerabilities don’t have this property. An attacker submits a payload through one endpoint; a different endpoint reads it and renders it. No code path connects the two — the database or service state is the only link.
Detecting these stored vulnerabilities requires modeling data flow across persistence boundaries, not just within them.
Through the Database
Imagine a per-thread message board — a small collaboration feature where users post short notes that other users read on the thread page. A POST endpoint creates each note and stores it in the database; the thread page renders the stored notes as HTML so links and formatting come through. Two endpoints, no shared code path. The controller and service below implement this.
Stored XSS attack flow: a malicious payload is persisted via POST and later served to a victim via GET.
@PostMappingpublic ResponseEntity<Long> createMessage(@RequestBody CreateMessageRequest request) { Message message = messageService.createMessage( request.getTitle(), request.getContent(), request.getAuthor() ); return ResponseEntity.ok(message.getId());}The service creates a JPA entity and persists it:
public Message createMessage(String title, String content, String author) { this.lastContent = content; Message message = new Message(title, content, author); return messageRepository.save(message);}A separate GET endpoint retrieves that content and returns it as HTML:
@GetMapping("/{id}/content")public ResponseEntity<String> getMessageContent(@PathVariable Long id) { ... String content = messageService.getMessageContent(message); return ResponseEntity.ok() .contentType(MediaType.TEXT_HTML) .body(content);}The two endpoints share no direct method call. OpenTaint traces the full flow across both by modeling JPA repository operations as database read/write boundaries. When an entity is persisted via repository.save(), the taint state of each field is recorded against that entity type. When a different endpoint retrieves via repository.findById(), it looks up the stored state and propagates it per-column to the retrieved entity’s fields. No actual database connection is needed — this is a static approximation of persistence-layer data flow.
Through Service State
Databases are not the only state that survives between requests. Spring’s @Service beans are singletons by default — any field written during one request is readable during the next. Notice the this.lastContent = content line in createMessage — the same method that persists to the database also stores raw content in a service field:
@Servicepublic class MessageService {
private String lastContent; ...
public Message createMessage(String title, String content, String author) { this.lastContent = content; ... }
public String getLastContent() { return lastContent; }}A separate endpoint returns that field as HTML:
@GetMapping("/last-content")public ResponseEntity<String> getLastContent() { String content = messageService.getLastContent(); ... return ResponseEntity.ok() .contentType(MediaType.TEXT_HTML) .body(content);}OpenTaint traces the data from createMessage’s content parameter through the lastContent field assignment and back out via getLastContent() — a cross-endpoint stored XSS that doesn’t touch the database at all.
Column-Level Precision
Detecting cross-endpoint flows is only half the problem. The other half is precision: knowing which fields are actually dangerous. Treating every column of a persisted entity as equally tainted produces false positives that drown out real findings.
The Message entity stores three user-controlled fields, but they aren’t all equal:
Column-level tracking: sanitized fields (author) are distinguished from raw fields (title, content).
public Message(String title, String content, String author) { this.title = title; this.content = content; this.author = HtmlUtils.htmlEscape(author); // sanitized before storage}The author field is HTML-escaped before it reaches the database. The title and content fields are stored raw. OpenTaint tracks each column independently:
GET /api/messages/{id}/content→ returns rawcontent→ XSS detectedGET /api/messages/{id}/title→ returns rawtitle→ XSS detectedGET /api/messages/{id}/author→ returns escapedauthor→ no finding
Without column-level tracking, the choice is between flagging all three endpoints (false positives on author) or suppressing the entire entity (missing real vulnerabilities on content and title). Per-column sensitivity avoids the trade-off.
The same logic applies to sanitizers at read time. The GET /api/messages/{id}/content/safe endpoint passes content through HtmlUtils.htmlEscape() before returning it — OpenTaint sees the sanitizer and suppresses the finding for that path as well.
Conclusion
The call graph is the wrong primitive for framework-driven Java. Annotations replace explicit calls; persistence connects endpoints with no shared code; configuration decides whether a sink is a sink. An analyzer built on the call graph plus pattern matching cannot see these flows — not because they are rare, but because the abstraction is wrong. OpenTaint commits to a richer abstraction: bean wiring, persistence boundaries, conditional sinks, per-column taint. The cost is whole-program analysis that needs a build. The payoff is the findings the call graph alone cannot reach.
Clone the purpose-built Spring Boot demo and reproduce every finding in this post.
For a side-by-side comparison of how Semgrep, CodeQL, and OpenTaint handle progressively harder XSS cases — from direct returns to builder patterns with virtual dispatch — see Semgrep vs. CodeQL vs. OpenTaint: XSS Detection Depth Compared.