Skip to main content
Vardr Partners
Insights
AI GovernanceFederal Civilian·3 min read

M-25-21 as engineering work: what the latest OMB memo actually asks you to build

The recent OMB AI memo refines, narrows, and operationalizes its predecessors. Most of the obligations land on engineers, not policy staff. Here is what they require in code.

By Frank Speiser · May 6, 2026

OMB M-24-10 was the first memo most agencies read carefully. Its successor — the M-25-21 family of refinements and clarifications issued through 2025 and into 2026 — is the first one that quietly demands engineering artifacts agencies cannot fake with a SharePoint inventory.

The shift in tone is small. The shift in obligation is not.

What changed since M-24-10

M-24-10 established that agencies must maintain an inventory of AI use cases, run pre-deployment evaluations for rights-impacting and safety-impacting uses, and implement a set of minimum practices for each. Many agencies satisfied the letter of the memo with a spreadsheet, a one-time impact assessment, and a half-page of risk-management language inserted into the system security plan.

M-25-21 closes those loopholes in three specific ways.

Re-evaluation cadence. The minimum practices now reference re-evaluation triggers tied to model updates, drift signals, and population shifts — not annual review. If your vendor pushes a new model version, you owe a re-evaluation. If your demographic mix shifts more than a threshold, you owe one. Without an evaluation harness you can run, this obligation is unmeetable.

Procurement obligations. Procurement language now explicitly requires the vendor to provide artifacts the agency needs to run impact assessments without the vendor's assistance: test datasets representative of the deployment population, fairness criteria explicitly named, and runnable benchmarks tied to the use case. The Vardr Procurement Language Library was assembled with this exact shift in mind.

Adverse-action transparency. Notices generated by AI-influenced decisions must reference, at minimum, the policy basis, the path to challenge, and a plain-language description of what the system considered. The previous generation of templated notice copy will not survive due-process review against this bar.

What you actually have to build

Three artifacts, in order. Each is small. Each is non-optional.

An AI inventory that is a data product, not a document. A versioned, schema-backed record of every AI use case in the agency, with explicit fields for system identifier, deployment status, last assessment date, minimum-practice coverage, vendor, model version, and pointer to the runnable evaluation. The Vardr Reference Architecture treats this as the control plane for everything downstream; M-25-21 makes that posture mandatory in practice.

One evaluation harness, one passing assessment. Pick the highest-stakes rights-impacting system. Build a runnable evaluation suite with demographic-group breakouts, distribution-shift tests, fairness criteria named explicitly, and a CI-style harness the agency runs — not the vendor. Eight weeks of focused engineering work. The harness becomes the template for every subsequent assessment.

Per-decision provenance. Append-only event-log capture for every decision touchpoint, with content-addressed pointers to features, models, prompts, and retrieved documents. The bar is: any decision is replayable with the same inputs, on demand. We have written about this requirement at length in The audit-trail problem — M-25-21 turns the practice from a best-practice into a compliance artifact.

What does not work

The reflex response to M-25-21 will be the same reflex that produced M-24-10 compliance theater: a longer document, a more elaborate risk-management plan, an updated training curriculum. None of these produce the evaluation harness, the inventory data product, or the provenance schema.

The agencies that get this right in 2026 will be the ones that treat the memos as the architecture team's problem first and the policy team's problem second. We expect the OIG community to start asking the harness question by mid-2026; the agencies that can produce one will be a small group.

Where Vardr fits

We help agencies treat the memos as an engineering specification. The Reference Architecture provides the structural answer to provenance and replay. The Modernization Readiness Assessment lets a program office identify, in three weeks, which of the three artifacts they can stand up and which require remediation first. The procurement library is already aligned to the M-25-21 obligations and pre-baked into solicitation templates.

The work is small. The window in which it remains optional is closing.

If this resonates with a program you're working on, we'd be glad to talk.