Preprint·2026·9 min read·Updated Jun 2026·DOI pending

Governance-First AI for
Failure-Mode Control

MAVS-GC: regulated consensus over always-on specialists for safer behaviour when evidence becomes uncertain, contradictory, corrupted, or unstable.

Saif Malik·MAVS Research Program·InfernusReal·ssaifmalikk@gmail.com

Paper Code

TLDR

Separate specialist prediction from output governance. Every specialist evaluates every input; diagnostics raise red flags; severity is aggregated; contextual weights and bounded mitigation shape a governed acceptance threshold; and the final decision passes through an auditable consensus trace with a hard veto. The claim is not universal accuracy — it is failure-mode control: under corruption and specialist failure, governed consensus suppresses unsafe acceptance by up to ~200× versus aggregation baselines while preserving accuracy and stability.

Problem

Modern AI systems are typically optimised to maximise accuracy under clean conditions. But real deployments are not clean: evidence becomes uncertain, contradictory, corrupted, or unstable, and individual specialists can fail silently. In these regimes the dangerous outcome is not a wrong prediction — it is an unsafe acceptance¹1An unsafe acceptance is admitting an input that should have been rejected. It is the failure mode safety-critical systems care about most — and the one accuracy alone does not measure.1An unsafe acceptance is admitting an input that should have been rejected. It is the failure mode safety-critical systems care about most — and the one accuracy alone does not measure.: confidently admitting an input that should have been rejected.

Static ensembles and routing-based Mixture-of-Experts inherit this weakness because acceptance is a fixed threshold applied after model scoring. In a controlled false-positive trap, mean aggregation accepted 100% of unsafe cases and static weighted aggregation accepted 85%. The decision rule itself, not the detectors, was the failure point.

Method

MAVS-GC elevates governance into a first-class computational object. A system is the tuple $M = (X, \Phi, F, G, A, W, P, \Theta, \Pi)$ : a shared feature map $\Phi$ , a set of always-on specialists $F$ ²2All-speak evaluation: every specialist scores every input. There is no router that can silently exclude a relevant specialist — a key difference from Mixture-of-Experts.2All-speak evaluation: every specialist scores every input. There is no router that can silently exclude a relevant specialist — a key difference from Mixture-of-Experts., a diagnostic system $G$ , a severity aggregator $A$ , an influence rebalancer $W$ , bounded mitigation $P$ , a threshold map $\Theta$ , and a decision rule $\Pi$ .

Figure 1. The MAVS-GC pipeline. Input x is mapped to features φ and scored by all specialists into a governed consensus R. A separate governance block turns diagnostics into severity a, mitigation m, and a threshold θ; the final decision Π accepts only when consensus clears θ and severity stays below the hard veto.

Specialists emit calibrated scores $s_i \in [0,1]$ , converted to supports $r_i = 2s_i - 1$ . Diagnostics produce a severity $a = A(z)$ and mitigation $m$ , which move a governed threshold.

Governed threshold

\theta = \Theta(a,m) = \theta_0 + \lambda a - \delta m

Consensus

R(x) = \sum_i w_i\, r_i

Decision

\Pi(R,\theta,a) = \mathbb{1}[\,a < \tau_{\text{hard}}\,]\cdot \mathbb{1}[\,R \geq \theta\,]

Acceptance therefore requires two things at once: severity must stay below a hard veto, and governed consensus must clear the governed threshold. Every run emits an auditable trace $(r, w, z, a, m, \theta, \tau_{\text{hard}}, R, \Pi)$ , so a decision can always be reconstructed and explained — try it below.

Auditable trace

Live computation of (r, w, a, m, θ, τ_hard, R, Π)

specialist	rᵢ	wᵢ	wᵢ·rᵢ
f1	0.82	0.34	0.279
f2	0.74	0.33	0.244
f3	0.68	0.33	0.224

severity a

0.05

mitigation m

0.20

θ = θ₀+λa−δm

0.065

τ_hard

0.80

R = 0.747 · R ≥ θ (0.065)

Π = 1 · ACCEPT

Specialists agree, diagnostics are quiet. Consensus clears a low governed threshold — the input is accepted.

Figure 2. An auditable trace, computed live. Switch scenarios to see how elevated severity raises the threshold (rejecting borderline inputs) and how the hard veto overrides an otherwise-positive consensus.

Key intuition

Normal behaviour is easy to govern from ordinary evidence; abnormal, adverse behaviour is not. By making governance explicit and monotone in severity³3Monotone safety: increasing diagnostic severity can only raise the acceptance threshold, never lower it. Safety is a structural property of the decision rule, not a learned habit.3Monotone safety: increasing diagnostic severity can only raise the acceptance threshold, never lower it. Safety is a structural property of the decision rule, not a learned habit., higher diagnostic severity can never make acceptance easier — it can only make it harder. Mitigation is bounded and lives inside the decision rule, so it can nudge a borderline case but can never override the hard veto.

The consequence is a clean separation of concerns: intelligence generation (the specialists) and intelligence governance (A, W, P, Θ, Π) become independent. Acceptance behaviour can be retuned — more cautious, more permissive, differently audited — without retraining a single specialist.

Results

The program spans a formal Foundation Arc, a synthetic validation chapter, and three real-benchmark studies on Breast Cancer Wisconsin, Adult Income, Credit Card Fraud, and Bank Marketing. The strongest signal is robustness: under stress, MAVS-GC fails more safely.

Failure behaviour under corruption

Chapter 10B · accuracy vs. unsafe acceptance

Accuracy (higher is better) Unsafe acceptance (lower is better)

Pure MAVS-GCours

89.95%

1.35%

Mean / Veto

74.31%

27.29%

Single model

59.46%

45.42%

Under specialist-failure corruption, Pure MAVS-GC keeps accuracy high while unsafe acceptance stays near zero — roughly 20× lower than ensemble baselines and 34× lower than a single model.

Figure 3. Accuracy versus unsafe acceptance under corruption (Chapter 10B). Pure MAVS-GC holds high accuracy while keeping unsafe acceptance near zero; baselines degrade sharply. Toggle the regime to compare specialist failure and high corruption.

Specialist failure

1.35%

Unsafe acceptance for Pure MAVS-GC at 89.95% accuracy — ~20× lower than ensembles, ~34× lower than a single model.

High corruption (≥ 0.6)

0.45%

Unsafe acceptance at 85.30% accuracy — ~149× lower than ensemble-like baselines, ~202× lower than a single model.

Hard-veto compliance

100%

In synthetic validation, governance reduced unsafe acceptance from 100% / 85% baselines with zero hard-veto violations.

Clean accuracy (10A)

79 / 288

Competitive but not dominant: positive metric deltas in 79 of 288 comparisons. Governance shifts the error profile, not the ceiling.

Stability under corruptionChapter 10C

Metric	Pure MAVS-GC	Baseline
Prediction stability	0.971615	0.952713
Decision stability	0.975770	0.958762
Consensus stability	0.979332	0.963946
Trace stability	0.967976	0.959693

Table 1. Behavioural stability under corruption (Chapter 10C). MAVS-GC preserves prediction, decision, consensus, and trace stability more strongly than the aggregation baseline as corruption increases.

Interpretation: MAVS-GC is a failure-management, robustness, and safety-oriented governance architecture — not a pure accuracy-maximiser. Under stress it rejects more cautiously, suppresses unsafe acceptance, and preserves behavioural consistency.

Chapter 9 · Synthetic Chapter 10A · Accuracy Chapter 10B · Robustness Chapter 10C · Stability

Limitations & future work

The current evaluation is rigorous but bounded. It covers four tabular datasets, a fixed suite of corruption families, a controlled split and audit structure, reproducibility manifests, and verified artifact trails. It does not yet establish production-scale behaviour, LLM-agent behaviour, universal robustness superiority, or cross-domain generalisation beyond the tested benchmark suite.

The most valuable next step is external-scale validation: larger datasets, additional modalities, LLM and agent specialist settings, adversarial expansions, ablation matrices, and independent replication. The open questions are whether the observed failure-management and stability-preservation effects survive at larger scale and in more realistic, safety-critical multi-model systems.

Support sought: research feedback, compute credits, review of experimental design, guidance on scalable evaluation, and collaboration on governance-first evaluation for LLM agents.

Citation

If you find this work useful, please cite it as:

@misc{malik2026mavsgc,
  title        = {MAVS-GC: Governance-First AI for Failure-Mode Control},
  author       = {Malik, Saif},
  year         = {2026},
  howpublished = {Preprint, MAVS Research Program},
  url          = {https://github.com/MAVS-RESEARCH}
}

Governance-First AI forFailure-Mode Control