Taint Analysis for Spring: Security Beyond Syntax
AST-pattern matchers break where Spring's architecture begins — interprocedural flow across class boundaries, conditionally dangerous APIs configured at bean wiring time, JPA persistence. OpenTaint traces tainted data through every layer, from injected services to database storage to dangerous API calls, distinguishing raw columns from sanitized ones.
Spring Boot’s annotation-driven architecture creates data flows that are invisible to AST-pattern matchers. The mechanism is everywhere: dependency injection wires @Autowired beans together at startup with no call site the parser can see; a template engine’s configuration decides — at runtime, from a flag set in another class — whether the call to template.process() is exploitable or harmless; JPA persistence links two HTTP endpoints through a database row with no shared code path. Three different invisibilities, three different framework features, the same blind spot.
These are not edge cases. They are the default architecture of most Java web applications. The post walks three progressively harder challenges — following data across function and class boundaries, recognizing when an @Autowired constructor makes an otherwise-benign call dangerous, and connecting endpoints through persistence with per-column precision — and shows what each demands of the engine. AST-pattern matchers plateau at the first; OpenTaint models all three.
Single-Request Flows
Before tackling cross-endpoint flows, we start with what happens within a single HTTP request — following data across function and class boundaries and recognizing when an @Autowired constructor decides whether the call at the end of the chain is dangerous.
For JVM languages, OpenTaint operates on bytecode rather than source text. This requires a successful build before scanning, but gives precise resolution of inheritance, generics, and library calls. That precision matters in Spring — runtime behavior depends on bean wiring, annotation metadata, and framework conventions that AST-pattern matchers treat as opaque.
Following Data Across Function and Class Boundaries
Consider a campaign management endpoint that lets users preview custom templates. The controller receives a JSON request body and delegates to an @Autowired service:
@RestController@RequestMapping("/api/campaigns")public class CampaignController {
private final TemplateRenderingService templateService; ...
@PostMapping("/render") public ResponseEntity<String> renderTemplate(@RequestBody RenderRequest request) { String result = templateService.renderFromRequest(request); return ResponseEntity.ok(result); }}The service extracts user-controlled content from the DTO and passes it to a Thymeleaf template engine:
@Servicepublic class TemplateRenderingService {
private final TemplateEngine templateEngine; ...
public String renderFromRequest(RenderRequest request) { String content = request.getTemplateContent(); return renderFromContent(content); }
public String renderFromContent(String templateContent) { Context context = new Context(); return templateEngine.process(templateContent, context); // user input processed as template code }}OpenTaint traces the complete path: @RequestBody RenderRequest → renderFromRequest() → request.getTemplateContent() → renderFromContent() → templateEngine.process(). The data crosses a class boundary, passes through DTO field access, and flows through an @Autowired service — all tracked as a single inter-procedural data flow.
Tracing the chain across function and class boundaries is necessary but not sufficient. With Thymeleaf, once the trace reaches templateEngine.process() on a user-controlled body, the call is exploitable on its own — the API and the taint source are enough to confirm the finding. Other engines aren’t so obliging. Freemarker’s template.process(), for instance, is exploitable only when the engine was wired up with a permissive class resolver — and that choice is made inside the engine’s @Autowired constructor.
When Autowired Constructors Matter
Let’s look at two endpoints in the same controller, both passing user-controlled template content into a Freemarker template.process(). The call sites are indistinguishable — same method, same argument shape, same surrounding code. Yet one is a remote-code-execution vulnerability and the other is harmless.
The reason is that template.process() is a conditionally dangerous method: it is exploitable only when the receiver permits class loading. The permission flag is set at bean wiring time, in the bean’s constructor — possibly in a class the call site never names. An analyzer that cannot resolve which bean is wired in and walk its constructor either flags every call (noise) or none (missed RCE).
The marketing endpoint:
@PostMapping("/marketing/preview")public ResponseEntity<String> previewMarketing(@RequestBody RenderRequest request) { String result = marketingService.render( request.getTemplateName(), request.getTemplateContent() // template body — the dangerous parameter ); return ResponseEntity.ok(result);}A parallel endpoint passes the same input to the notification service. Both services call template.process() with user input. The difference is in their constructors:
// MarketingTemplateService.java — vulnerable configurationthis.templateConfig.setNewBuiltinClassResolver(TemplateClassResolver.UNRESTRICTED_RESOLVER);// NotificationTemplateService.java — secure configurationthis.templateConfig.setNewBuiltinClassResolver(TemplateClassResolver.ALLOWS_NOTHING_RESOLVER);OpenTaint resolves @Autowired bean constructors and tracks the receiver state the rule’s condition names. It flags the marketing service — UNRESTRICTED_RESOLVER allows class loading, enabling remote code execution — and suppresses the notification service, where ALLOWS_NOTHING_RESOLVER prevents class instantiation.
Cross-Endpoint Flows
Single-request flows, however complex, have a property that makes them tractable: a code path connects the user input to the dangerous call. Cross-endpoint vulnerabilities don’t have this property. An attacker submits a payload through one endpoint; a different endpoint reads it and renders it. No code path connects the two — the database or service state is the only link.
Detecting these stored vulnerabilities requires modeling data flow across persistence boundaries, not just within them.
Through the Database
Imagine a per-thread message board — a small collaboration feature where users post short notes that other users read on the thread page. A POST endpoint creates each note and stores it in the database; the thread page renders the stored notes as HTML so links and formatting come through. Two endpoints, no shared code path. The controller and service below implement this.
Stored XSS attack flow: a malicious payload is persisted via POST and later served to a victim via GET.
@PostMappingpublic ResponseEntity<Long> createMessage(@RequestBody CreateMessageRequest request) { Message message = messageService.createMessage( request.getTitle(), request.getContent(), request.getAuthor() ); return ResponseEntity.ok(message.getId());}The service creates a JPA entity and persists it:
public Message createMessage(String title, String content, String author) { this.lastContent = content; Message message = new Message(title, content, author); return messageRepository.save(message);}A separate GET endpoint retrieves that content and returns it as HTML:
@GetMapping("/{id}/content")public ResponseEntity<String> getMessageContent(@PathVariable Long id) { ... String content = messageService.getMessageContent(message); return ResponseEntity.ok() .contentType(MediaType.TEXT_HTML) .body(content);}The two endpoints share no direct method call. OpenTaint traces the full flow across both by modeling JPA repository operations as database read/write boundaries. When an entity is persisted via repository.save(), the taint state of each field is recorded against that entity type. When a different endpoint retrieves via repository.findById(), it looks up the stored state and propagates it per-column to the retrieved entity’s fields. No actual database connection is needed — this is a static approximation of persistence-layer data flow.
Through Service State
Databases are not the only state that survives between requests. Spring’s @Service beans are singletons by default — any field written during one request is readable during the next. Notice the this.lastContent = content line in createMessage — the same method that persists to the database also stores raw content in a service field:
@Servicepublic class MessageService {
private String lastContent; ...
public Message createMessage(String title, String content, String author) { this.lastContent = content; ... }
public String getLastContent() { return lastContent; }}A separate endpoint returns that field as HTML:
@GetMapping("/last-content")public ResponseEntity<String> getLastContent() { String content = messageService.getLastContent(); ... return ResponseEntity.ok() .contentType(MediaType.TEXT_HTML) .body(content);}OpenTaint traces the data from createMessage’s content parameter through the lastContent field assignment and back out via getLastContent() — a cross-endpoint stored XSS that doesn’t touch the database at all.
Column-Level Precision
Detecting cross-endpoint flows is only half the problem. The other half is precision: knowing which fields are actually dangerous. Treating every column of a persisted entity as equally tainted produces false positives that drown out real findings.
The Message entity stores three user-controlled fields, but they aren’t all equal:
Column-level tracking: sanitized fields (author) are distinguished from raw fields (title, content).
public Message(String title, String content, String author) { this.title = title; this.content = content; this.author = HtmlUtils.htmlEscape(author); // sanitized before storage}The author field is HTML-escaped before it reaches the database. The title and content fields are stored raw. OpenTaint tracks each column independently:
GET /api/messages/{id}/content→ returns rawcontent→ XSS detectedGET /api/messages/{id}/title→ returns rawtitle→ XSS detectedGET /api/messages/{id}/author→ returns escapedauthor→ no finding
Without column-level tracking, the choice is between flagging all three endpoints (false positives on author) or suppressing the entire entity (missing real vulnerabilities on content and title). Per-column sensitivity avoids the trade-off.
The same logic applies to sanitizers at read time. The GET /api/messages/{id}/content/safe endpoint passes content through HtmlUtils.htmlEscape() before returning it — OpenTaint sees the sanitizer and suppresses the finding for that path as well.
Conclusion
In framework-driven Java, the data flow that matters spans the whole program — long call chains across class boundaries, @Autowired constructor configuration that decides whether a call is dangerous, JPA persistence joining endpoints with no shared code. Spring assembles these connections at startup; reading the source one file at a time can’t follow them. No amount of pattern depth fixes that — the abstraction itself is wrong. OpenTaint commits to a richer abstraction: bean wiring, persistence boundaries, conditionally dangerous APIs, per-column taint. The cost is a successful build before scanning, and whole-program analysis instead of file-by-file. The payoff is the findings that syntactic analysis alone cannot reach.
Clone the purpose-built Spring Boot demo and reproduce every finding in this post.
For a side-by-side comparison of how Semgrep, CodeQL, and OpenTaint handle progressively harder XSS cases — from direct returns to builder patterns with virtual dispatch — see Semgrep vs. CodeQL vs. OpenTaint: XSS Detection Depth Compared.