Drift Detection and Model Degradation Audit for AI Safety: The "Prime Gravity" Protocol for Deterministic and Tamper-Proof Evaluation

kanna qed
2025年12月22日
読了時間: 6分

AI safety evaluations, regardless of their narrative depth, are susceptible to infinite "post-hoc rationalization" unless they are codified as strictly reproducible and verifiable procedures. Current governance frameworks and descriptive documentation fail to ensure that identical inputs yield identical conclusions, thereby leaving a structural loophole for "post-hoc optimization"—the practice of adjusting evaluation criteria after observing the results.

This paper proposes Prime Gravity, a mathematical framework designed to anchor evaluation protocols into a deterministic state. By mapping evaluation results onto a finite arithmetic space (integer encoding combined with parallel recording across coprime moduli) and uniquely reconstructing them via the Chinese Remainder Theorem (CRT), the protocol yields a singular, consistent integer conclusion. This mechanism empowers third-party auditors to regenerate exact evaluation results from an immutable ledger, rendering the evaluation process fundamentally unalterable after execution.

Prime Gravity: A Mathematical Protocol for Rescuing Drift Detection from Arbitrary Operations in AI Safety

The focus of this paper is not the internal architecture of the model, but the "AI Safety Evaluation Protocol" itself. In this context, safety is defined by the following three pillars:

Reproducibility: The property ensuring that (i) the same input identification results in (ii) the exact re-execution of the evaluation logic, and (iii) the identical audit conclusion (OK/NG) is regenerated by any third party. This requires the fixed definition of data identifiers, evaluation windows, threshold policies, rounding conventions, and execution environments within a verifiable ledger.
Tamper-Proofing (Non-Post-Hoc): The structural inability to swap conclusions by retrospectively altering windows, thresholds, or rounding conventions. Any such attempt is mathematically guaranteed to be detected during the ledger's consistency verification (Verify).
Finite Scope: The pre-determination of the evidence range (windows, metrics, and moduli sets). This prevents the "infinite deferral" of conclusions where evaluators perpetually cite new, arbitrary reasons to delay a final judgment.

"Prime Gravity" is defined as the conclusion-anchoring mechanism that fulfills these requirements through a minimal arithmetic protocol.

1. Failure Patterns in AI Safety: The "Infinite Deferral" of Accountability

A recurring crisis in AI safety is the prevalence of "descriptive yet non-verifiable" evaluations. While frameworks such as Model Cards or Datasheets [4][5] offer structured documentation, they lack the technical constraints to enforce consistent output for identical input. Consequently, when anomalies occur, explanations are often retreated into the "black box" of model internals or training datasets, severing the link between Input and Conclusion [3].

The absence of fixed procedures allows for "post-hoc optimization," where thresholds are shifted or windows are narrowed to fit a desired narrative. Transparency without reproducibility is merely a "performative explanation," which fundamentally undermines the credibility of safety audits.

Threat Model: Post-Hoc Optimization and Parameter Substitution

The assumed adversaries include evaluators or developers who attempt to: (i) shift evaluation windows post-observation, (ii) manipulate thresholds, (iii) alter rounding or pre-processing rules, or (iv) selectively present favorable data segments (cherry-picking). These actions are common even under strict documentation requirements. The threat lies in the structural branching of verification; without a fixed protocol, the "truth" remains fluid.

Prime Gravity prevents these substitutions by "burning" the evaluation parameters (window, threshold, rounding, and data ID) into the ledger. Any deviation between the recorded conclusion and the recalculated result is mathematically inevitable. Note: This protocol specifically targets the "Fixation of Judgment Protocols" and "Third-party Verifiability," rather than the inherent ethics or internal logic of the model.

2. The Necessity of "Gravity": Anchoring Judgment, Evidence, and Verification

"Gravity" is the force required to bind the judgment (conclusion) to its evidence and verification logic, preventing explanations from diverging.

Output: The final audit conclusion.
Reason: The breakdown of evidentiary factors.
Verification: The mechanical procedure that validates the conclusion.

Without this gravitational anchoring, evaluation procedures vary across organizations and instances. This lack of rigor leads to the "infinite deferral" of verification. By establishing a "Gravity Point" through a prime-based protocol, the entire evaluation process is collapsed into a single, reproducible state.

3. Primes as "Gravity Points": The Core Mathematical Novelty

Sets of primes (coprime moduli) are ideal for this protocol because they serve as indecomposable arithmetic units [9]. Their combinations are independent of external conventions, and their selection process can be transparently recorded on an audit trail.

Finitization via the Zeta-Structure: Restricting Infinite Contributions

The use of primes is not merely symbolic; it is a method to force the infinite deferral of evidence into a "stoppable, finite form." Utilizing the ζ-structure (via explicit formula systems), discrete arithmetic factors and continuous spectral behaviors are decoupled. This allows the evaluation to be expressed as a "Main Term + Error," where the scope of both is fixed a priori.

The protocol enforces finitization by: (i) defining the evaluation window as a finite interval, (ii) adopting only contributions within that interval as valid evidence, and (iii) codifying out-of-interval data as "errors" in the ledger. This architecture ensures that any attempt to extend the evidence range indefinitely is flagged as a protocol violation.

4. Triple Impact of Prime Gravity on AI Safety

Fixed Explanatory Scope: Evaluation boundaries (data windows, thresholds, and rounding) are rigid, leaving no room for narrative ambiguity.
Universal Reproducibility: Third-party auditors can regenerate identical conclusions from the same raw input.
Structural Immunity to Post-Hoc Bias: Since parameters are embedded in the cryptographic certificate, the procedure cannot be altered after the results are known.

5. Comparative Analysis: Addressing the Unmet Needs of Existing Frameworks

While existing AI governance tools have made significant strides, they consistently fail to arithmetically bind the "safety judgment protocol itself."

Documentation Standards (Model Cards / Datasheets): Effective for sharing metadata but lack mechanical enforcement of result consistency [4][5].
Governance Frameworks (NIST AI RMF / ISO 42001): Provide high-level management requirements but do not offer a concrete format for "conclusion-anchoring" [1][2].
Regulatory Mandates (EU AI Act): Demand "automatically generated logs" for high-risk AI, yet these logs are not inherently linked to mathematically reproducible procedures [3].
Research Culture (NeurIPS Checklist): Promotes disclosure but does not provide a tamper-proof implementation standard [6].
Computational Integrity (Proof-of-Learning): Proves the correctness of computation but does not address the fixation of evaluation parameters [7].

The Breakthrough: Prime Gravity shifts the paradigm from "transparency as documentation" to "transparency as a verifiable arithmetic protocol." The artifact is not a document, but a verifiable ledger.

6. Implementation Layer: From Design Principles to Compliance Artifacts

Prime Gravity translates the "traceability" requirements of NIST/ISO/EU into a concrete set of ledger fields.

Minimal Mandatory Fields (The Audit Schema)

Input-ID: Cryptographic identification of the dataset (hash, retrieval logic, timestamp).
Window-Spec: Precise evaluation window (start/end points, sampling rules).
Metric-Spec: Defined metrics (formulas and aggregation logic).
Threshold-Policy: Fixed judgment rules and non-update declarations.
Rounding-Convention: Explicit rules for outward rounding and precision.
Moduli-Set: The specific sequence of coprime moduli used for encoding.
Residues: The local evaluation codes recorded for each modulus.
CRT-Result: The uniquely restored consistent integer.
Decision: The final audit conclusion (OK/NG).
Verifier-Version: Identification of the verification engine and environment.

Data Architecture: Certificate vs. Ledger

Certificate (Static): Defines the immutable protocol parameters (dataset_hash, threshold_policy, moduli_set).
Ledger (Dynamic): Records the traces of a specific execution (input_hash, residues, crt_result, decision).

Verification Procedure

The verification engine consumes both the Certificate and the Ledger to perform:

Input Integrity Check: Validating data alignment.
Convention Validation: Ensuring compliance with fixed thresholds and rounding.
Residue Recalculation: Re-executing the evaluation to match recorded residues.
CRT Reconstruction: Confirming the uniqueness and consistency of the final decision.

7. Case Study: Anchoring Drift Detection and Model Degradation

Objective: To ensure that a detected regime shift (drift) in time-series data cannot be retroactively "hidden" or "selected."

Common Failure: An evaluator observes a poor detection rate and shifts the window or loosens the threshold to report a favorable "Stable" status.

The Prime Gravity Solution: The evaluation window (e.g., a 28-day calibration window followed by a 7-day test window) and the specific threshold (e.g., an MAE deviation integer code) are "burned" into the Certificate. Daily local evaluations are recorded as residues across the coprime moduli set. The final decision is locked via CRT. Any attempt to manipulate the result after the fact is impossible, as the third-party auditor can re-generate the entire judgment chain from the raw data identification. This transforms safety from a "narrative claim" into a "re-generatable audit conclusion."

Footnotes (references)

[1] NIST, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)", NIST.AI.100-1, 2023.

[2] ISO/IEC, "ISO/IEC 42001:2023 Artificial intelligence — Management system", 2023.

[3] European Union, "Regulation (EU) 2024/1689 (Artificial Intelligence Act)", 2024.

[4] Mitchell, M. et al., "Model Cards for Model Reporting", FAccT 2019, DOI: 10.1145/3287560.3287596.

[5] Gebru, T. et al., "Datasheets for Datasets", Communications of the ACM, 2021, DOI: 10.1145/3458723.

[6] NeurIPS, "NeurIPS Paper Checklist Guidelines", (public guide).

[7] Jia, H. et al., "Proof-of-Learning: Definitions and Practice", IEEE Symposium on Security and Privacy, 2021, DOI: 10.1109/SP40001.2021.00106.

[8] Chen, B.-J. et al., "ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs," EuroSys, 2024.