Synthetic identity and UI fraud defense

PUA-era fraud showed every state UI agency what it looks like to lose a billion dollars in a quarter. The defenses that work are not single-model. They are architectural — and they are largely unbuilt.

In 2020 and 2021, every state unemployment insurance agency in the United States learned, very quickly and very expensively, what a sophisticated fraud ring with access to consumer credit data and a few weeks of organizing time can do to a benefits-issuance system. The DOL OIG's published estimates put pandemic-era UI fraud somewhere between $130 billion and $400 billion. The post-mortems uniformly identify synthetic identity — fabricated identities assembled from real and synthetic data fragments — as the dominant attack pattern.

The pandemic is over. The infrastructure that enables synthetic-identity fraud is not. Every quarter that a state UI system runs without architectural defenses, it is exposed to the same attack class, against a smaller but persistently funded set of adversaries.

What synthetic identity actually is

A synthetic identity is not a stolen identity. A stolen identity is the credentials of a real person. A synthetic identity is a constructed persona — typically a real SSN belonging to a child or a deceased person, paired with a fabricated name and a fabricated date of birth — that has been gradually built up over months or years through low-stakes credit interactions until it has enough of a footprint to pass automated verification.

The distinction matters operationally. Stolen-identity defenses focus on detecting impersonation. Synthetic-identity defenses focus on detecting the absence of a coherent identity behind a coherent paper trail. Most state UI systems are configured to detect impersonation. Almost none are configured to detect absence-of-person.

Why single-model defenses fail

Most agencies' first instinct, after the pandemic experience, was to license a fraud-detection model. Some of these models work well on the specific pattern they were trained on. None of them work well on the next pattern.

There are two structural reasons. First, fraud is adversarial. The model that catches today's ring trains the next ring to vary its inputs in ways the model does not see. Second, the signal that distinguishes a synthetic identity from a real one is not concentrated in any one feature. It is distributed across many weakly correlated signals — credit-bureau footprint shape, device fingerprint coherence, claim-narrative linguistic features, banking-routing patterns, payee-address shape — that no single model is positioned to consume.

The agencies that did best during and after PUA were not the ones with the most accurate model. They were the ones with multiple layers of defense, each catching a different segment of the attack distribution, and a graph layer joining the signals across claims.

The architecture that works

We have written about this pattern at length in Stop running models. Start running architectures. Applied specifically to UI synthetic-identity defense, the architecture has five layers.

Layer one — credit-bureau coherence. The signal is whether the SSN's credit history and tradeline shape match a person of the claimed age. A 28-year-old with a credit footprint that began two years ago and has no high-school-era utility account is not, statistically, a 28-year-old. Bureaus provide this signal directly.

Layer two — device and behavioral fingerprint. Submission-channel signals: device fingerprint, residential-vs-datacenter IP, claim-form behavior (typing cadence, copy-paste patterns), session shape. These are mature signals; they catch the unsophisticated ring at almost zero cost.

Layer three — payee-side graph. Bank account, routing number, prepaid-card BIN, mailing address, phone number. Graph signals over the payee side surface rings that submit thousands of claims into a small number of payment vehicles. This is the layer that catches the rings; without it, ring detection is essentially impossible.

Layer four — employer coherence. Did the named employer file a Form UI-style record with the agency in the relevant quarter? Does the employer exist? Fictitious-employer fraud is its own pattern; the layer here is whether the employer reference resolves against ground truth.

Layer five — claim-narrative linguistic features. The free-text portions of the claim — separation reason, employer disputes, claimant explanations — have a generated-text quality that mature LLM detection picks up reliably. This layer is new in the last 18 months and substantially raises the floor.

No single layer detects synthetic identity well. The combination is the defense.

What stops most agencies from building this

Three reasons, in our experience.

The graph layer requires data integration that is not procurement-shaped. Layer three depends on joining claim records across many claimants. Most state UI systems are configured to process claims one at a time. Standing up the join is technically straightforward but procurement-painful.

The data-sharing agreements are politically difficult. Cross-agency synthetic-identity defense ideally pulls signals from the state Medicaid agency, the state tax agency, and the state DMV. The data-sharing agreements that make this possible take 12 to 18 months to negotiate. They are worth it, but they are not a procurement-cycle deliverable.

Adverse-action notices are an underrated bottleneck. When detection is working, the hold-queue fills with legitimate claimants who got bad letters. If the notice does not explain what to do, the legitimate claimant gives up and the agency loses the recovery. Synthetic-identity defense without notice-quality work is a Pyrrhic deployment.

Where Vardr does this work

Multi-layer fraud architecture is one of our four published service lines. The Reference Architecture provides the structural answer for graph signals, sequence-model features, and provenance. We pair this with the Modernization Readiness Assessment to identify which of the five layers an agency can stand up first — typically layers one and two, because they are vendor-buyable; layers three through five are where Vardr concentrates the engagement.

The work is methodology, principal time, and integration. The model is the smallest of the five purchases. The agencies that understand that order tend to be the agencies that recover their PUA-era losses.

Synthetic identity is the unemployment-insurance problem of the next decade

What synthetic identity actually is

Why single-model defenses fail

The architecture that works

What stops most agencies from building this

Where Vardr does this work