Taint Analysis

Multi-Phase Taint Analysis Engine

VEXLIT doesn't rely on regex pattern matching alone. It traces actual data flow from user input (sources) through transformations to dangerous operations (sinks) using a 5-phase analysis pipeline.

What is Taint Analysis?

Taint analysis tracks how untrusted data (user input, environment variables, file reads) flows through your code. If tainted data reaches a dangerous operation without proper sanitization, VEXLIT reports it as exploitable.

Sources

Where untrusted data enters your application.

req.params, req.query, req.body, process.env, System.getenv(), request.GET, os.Args, Console.ReadLine()

Sinks

Dangerous operations where tainted data causes harm.

db.query(), exec(), eval(), res.send(), innerHTML, Runtime.exec(), os.system(), template.Execute()

Sanitizers

Operations that neutralize tainted data.

parameterized queries, escapeHtml(), parseInt(), DOMPurify.sanitize(), validator.escape()

5-Phase Analysis Pipeline

Each file goes through five analysis phases, from literal tracking to interprocedural flow.

1

Phase 1: Constant Propagation

Tracks literal values across assignments. Variables assigned hardcoded strings or numbers are marked safe, eliminating false positives from non-user-controlled data. Multi-pass tracking handles reassignment chains.

2

Phase 2: Taint Map Building

5-pass construction: (1) identify sources, (2) track field access, (3) propagation through assignments, (4) apply sanitizers, (5) interprocedural flow. Builds a complete map of which variables are tainted at each point.

3

Phase 3: Reachability Analysis

Evaluates constant conditions to detect dead branches. If a condition always evaluates to true/false, the unreachable branch is excluded from analysis. Supports arithmetic evaluation, ternary operators, and switch/case.

4

Phase 4: Points-to Analysis

Tracks data through collections and method returns. HashMap key-sensitive tracking knows which keys hold safe vs. tainted values. List element tracking follows data through add/get operations. Method return analysis determines if a function always returns safe values.

5

Phase 5: AST Data Flow

Path-sensitive merge, field-sensitive taint propagation, interprocedural return mapping, StringBuilder chain analysis, and ternary branch evaluation. Integrates AST scope analysis for branch safety and type cast detection.

How It Works: SQL Injection Detection

Step-by-step trace of how VEXLIT detects a SQL injection vulnerability.

api/users.js
app.get('/user', (req, res) => {
  const id = req.params.id;        // Source: tainted
  const name = 'admin';             // Literal: safe
  const query = `SELECT * FROM users
    WHERE id = '${id}'`;            // Propagation: tainted
  db.query(query);                  // Sink: SQL query
});
1

req.params.id is identified as a taint source (user input)

2

'admin' is tracked as a constant literal (safe, not tainted)

3

Template literal concatenation propagates taint from id to query

4

db.query(query) is a SQL sink receiving tainted data without parameterization

5

No sanitizer (like parameterized query) found between source and sink

Result: CRITICAL - CWE-89 SQL Injection (exploitable)

Benchmark Results

Verified against two industry-standard benchmark suites.

OWASP Benchmark v1.2

2,740 test cases

98.2% TPR

TPR

2.8% FPR

FPR

95.4

Youden

Juliet Test Suite v1.3

6,864 test cases

98.2% TPR

TPR

1.0% FPR

FPR

97.2

Youden

In addition to OWASP Benchmark and Juliet Test Suite, VEXLIT was validated on real-world open source projects including WebGoat (Spring) and NodeGoat (Node.js).

False Positive Prevention

Multiple layers work together to eliminate noise.

Constant propagation eliminates variables with hardcoded values

Dead branch detection skips unreachable code paths

Collection tracking knows which map keys hold safe values

Lexical masking filters matches inside strings and comments

Method return analysis tracks functions that always return safe values

Numeric parsing (parseInt, Atoi, intval) blocks taint propagation