Stop running models. Start running architectures.
If your fraud-detection program is described by the name of a model, it is fragile by construction. The work is at the seams, not the layers.
By Frank Speiser and Payton Jonson · April 8, 2026
There is a presentation slide that recurs across state benefits agencies. It is titled "Our AI Approach." It usually contains the name of one or two models, a bar chart of accuracy or AUC, and a clip-art image of a brain or a network.
Programs described by the names of their models tend to fail in a specific way. They produce a quarterly report of detections. They do not produce a sustained reduction in improper payments. The reason is not the model. The reason is that the model is the only thing the program operates.
What a model-as-program looks like
The classic failure mode is a state procurement that contracts a vendor for a "fraud model" and treats the deliverable as the model artifact. Six months in, the model exists in some staging environment and produces a daily file of flagged claims. The investigations team opens 8% of the file, recovers some portion, and the rest expires.
This is what is mistaken for AI in government. It is a model run as if it were a report.
What architecture actually means
Programs that move the improper-payment needle have, in addition to a model, all of:
An intake-time decision point. The signal must arrive before payment, not after. A model that produces a Tuesday-morning batch report is detecting fraud that already happened.
A graph layer. Most modern fraud is collusive. A single-claim model cannot detect a ring. Programs that work treat the graph as a first-class object — provider-to-claim, claim-to-bank-account, address-to-identity — and run signals over the graph in addition to over individual claims.
A triage queue with expected-value ranking. Investigator capacity is finite. A signal that is correct but unranked is operationally indistinguishable from no signal. Queues must be ranked by expected recoverable value, not by raw model score.
Adversarial review. Fraud rings are adaptive. A model that performed well on Q1 data degrades by Q4 because the population is moving. Programs that work treat their own model as an adversary and red-team it monthly.
Due-process integration. Every adverse action requires a citation-grounded notice. Without it, the legal exposure of the program exceeds the recovered value of the fraud detected.
The seams are the work
None of the above is, by itself, difficult. The work — and the difference between a program that recovers $40M a year and one that recovers $400M a year — is in the seams. The seam between intake and the model. Between the model and the graph. Between the graph and the investigator queue. Between the queue and the adverse-action notice.
Vendors love to sell layers. Programs that buy layers and then assemble the seams in-house tend to fail at the seams. Programs that buy seams — operationalized end-to-end flows with the layers commodified beneath — tend to succeed.
This is what we mean when we say Vardr sells an operating model, not a model. We are happy to commodify the model layer. We will not commodify the seams.
A diagnostic
If your program's last quarterly review described AI accomplishments primarily in terms of model performance metrics, the program is at risk. Ask the team for the four numbers that actually matter:
- Time from claim submission to first signal.
- Percentage of flagged claims that an investigator actually opens.
- Expected-recoverable-value-weighted precision and recall, not raw precision and recall.
- Quarterly improper-payment leakage trend.
If the team cannot produce these, the gap is in the architecture, not the model.
If this resonates with a program you're working on, we'd be glad to talk.