What is the Privacy Problem in Trustworthy AI? Solving via "Non-forgeable Audits" Rather Than Leakage Prevention

kanna qed
2025年12月23日
読了時間: 10分

The core of the privacy problem in Trustworthy AI is determined not by "whether personal information can be hidden," but by "whether the handling of that information can be manipulated post-hoc." While technologies such as Differential Privacy (DP) and encryption are effective at "secrecy" (hiding data), they do not inherently guarantee accountability—the verifiable proof that operations were executed precisely as claimed. Therefore, the paradigm must shift from privacy as secrecy to privacy as accountability (non-forgeable audits).

1. Defining Privacy Countermeasures for AI: Audit Logs and Accountability

1.1 Classification of Typical "Privacy Incidents"

Direct Leakage: Exposure of logs, outputs, or raw training data.
Inferential Leakage: Estimation of personal attributes or membership inference [1].
Operational Leakage: Mishandling of data, expansion of privileges, or use for unauthorized purposes.
Deviation during Evaluation/Improvement: Adjusting models using production data or unauthorized retraining (peeking).

In fact, Shokri et al. demonstrated membership inference attacks that determine whether a specific record was part of a training set based on model outputs [1], and Zhu et al. reported attacks capable of reconstructing private training data from shared gradient information [2]. Furthermore, methods to extract fragments of training data through queries to large language models (LLMs) like GPT-2 have been 실証ized [3]. This research underscores that "data leakage is a realistic and present danger."

1.2 Codifying the Conditions for "Trustworthiness"

Data Provenance: What specific data was used? (Identification of input data)
Procedural Integrity: What was done? (Procedures for transformation, training, and evaluation)
Spatio-temporal Scope: When and within what range was it performed? (Period, interval, window)
Reproducibility: Can it be replicated by anyone? (Possibility of third-party re-testing)
Post-hoc Immutability: Can it be altered after the fact? (Non-forgeability of operations)

The key to trust lies in the ability to record and prove which datasets were used with which procedures, when and where they were evaluated, and to ensure this state remains immutable retroactively [4]. NIST emphasizes that "managing the provenance of training data contributes to transparency and accountability" [4]. An AI system can only be deemed "trustworthy" when these factors are strictly guaranteed.

Minimum Requirements for Audit Criteria (Definition of Fixed Responsibility)

In this report, "trust" regarding AI system privacy is defined by the following auditable pass/fail conditions:

Input Identification: The referenced dataset is uniquely fixed with a persistent ID and cryptographic hash.
Procedure Fixation: Evaluation boundaries (calibration vs. test), threshold policies, and evaluation procedures (metric definitions) are strictly defined a priori.
Replayability: A third party can regenerate the exact same conclusion given the same input and procedure (verifiability).
Post-hoc Non-Forgeability: Any post-hoc alteration must inevitably expose a discrepancy through verification (guaranteeing tamper-evidence rather than mere "resistance").

This definition moves beyond governance or checklists that merely list "best practices"; it treats the evidence as an executable artifact. While the NIST AI RMF provides a risk management framework, it lacks a specification for automatically fixing "evidence of execution" during operation. The GhostDrift/ADIC protocol is designed to bridge this critical gap.

2. The Gap Between Technology and Trust: Where Organizations Falter

2.1 The Operational Paradox

"Privacy-preserving technologies have proliferated, yet systemic trust remains stagnant. The reason is the absence of 'immutable evidence' that those protections were actually applied."

2.2 Core Discussion Points

Why do existing solutions (Regulation, DP, FL, TEE, etc.) fail to bridge the "trust gap"?
What variables must be "fixed" to complete the trust loop?
How can this be realized with minimal implementation overhead? (Certificate Audit Protocol)

3. The Problem: Why Existing Privacy Measures Fail the "Trustworthy AI" Test

3.1 Limits of Regulations and Policies (The Irreproducibility of Compliance)

Documents and checklists do not technologically constrain the actual behavior of systems in production.
Audits remain descriptive post-hoc explanations, leaving the actual execution logic unfixed.

For example, while policies define what an organization should do, they rarely record "what was referenced and what was executed" at the code level during operation. Consequently, even if operational logic shifts on the ground, it cannot be tracked, preventing third parties from verifying compliance.

3.2 Limits of Differential Privacy (The Pitfall of Managerial Discretion)

Selection of $\epsilon$ (privacy loss parameter) and noise mechanisms depends entirely on organizational discretion.
A "theoretical guarantee" is fundamentally decoupled from "operational verification."

Differential Privacy (DP) offers rigorous mathematical guarantees, but in practice, $\epsilon$ and sensitivity values are often heuristic. Without logs or evidence proving the specific implementation for a given dataset, the guarantee remains ambiguous at the operational level.

3.3 Limits of Federated Learning / Enclaves

While these prevent data movement or visual exposure, they do not prevent "threshold gaming."
Retroactive adjustments to procedures after observing production results cannot be structurally prevented.

Federated Learning (FL) and Trusted Execution Environments (TEE) mitigate data transport risks but leave evaluation protocols (split criteria, threshold settings) adjustable outside the system. Furthermore, Zhu et al. [2] showed that participating nodes can still estimate private data from shared gradients, proving that simply "hiding" data does not eliminate the root cause of leakage.

3.4 Pinpointing the Root Cause

"Leakage is a symptom; the root cause is 'unfixed responsibility'."

Existing privacy technologies focus on the phenomenon of leakage. However, the fundamental issue is the inability to prove—after the fact—who used what data, when, and how. Therefore, countermeasures must shift from "secrecy" to "post-hoc non-forgeability."

Map of Prior Research: Achievements and Structural Limitations

This section evaluates privacy technologies through the lens of accountability rather than mere secrecy.

A. Governance/Regulation: Frameworks for Trustworthy AI

Achievements: NIST AI RMF provided a systemic framework for risk management. ISO/IEC 42001 established operational standards. The EU AI Act introduced institutional requirements for logging and traceability.
Limitations: These are largely taxonomies of "what to do" and do not provide the technical means to fix "non-forgeable evidence" (datasets, procedures, thresholds) for automated third-party re-testing.
GhostDrift/ADIC Breakthrough: Transforms the execution artifact (How) into a certificate, moving beyond the framework (What/Why) to turn "compliance" into a "regeneratable fact."

B. Attack Research: Proving the Reality of Leakage

Achievements: Demonstrated Membership Inference [1], Gradient Inversion [2], and Training Data Extraction from LLMs [3].
Limitations: Focuses on the "arms race" between attack and defense performance, failing to define protocols for fixing operational responsibility boundaries. It cannot resolve whether results were manipulated after-the-fact.
GhostDrift/ADIC Breakthrough: Fixes the reference window and evaluation protocol a priori, ensuring that any deviation is exposed as a verification mismatch in the ledger.

C. Differential Privacy (DP): Mathematical Constraints on Influence

Achievements: Established rigorous mathematical foundations for restricting output distribution changes.
Limitations (Why accountability does not close in principle):
- DP restricts distribution but does not fix the fact that the guarantee was actually upheld (operational truth). Guarantees apply to "algorithms"; whether a specific execution followed that algorithm is a separate matter (petsymposium.org).
- The "Accountant Hell": $\epsilon$ selection, sampling, and composition rules create a "black box" of settings that are difficult for third parties to re-test. Theoretical guarantees can become "paper guarantees" if implementation details are obscure (arXiv).
- Since "noise is stochastic," third parties cannot regenerate the same results without procedure fixation. DP enhances secrecy, but accountability requires a layer that fixes DP settings and execution procedures as immutable evidence (journalprivacyconfidentiality.org).
GhostDrift/ADIC Breakthrough: Acts as a meta-audit layer that fixes all DP hyperparameters into a certificate, enabling conclusion regeneration.

D. Federated Learning (FL): Decoupling Data from Learning

Achievements: Matured the taxonomy of privacy attacks and defenses in decentralized learning.
Limitations (Why accountability does not close in principle):
- "Local Invisibility": Since local processes are opaque, it is difficult to verify input integrity, leaving the system vulnerable to poisoning attacks. "Not moving data" is not synonymous with "protected reference boundaries" (arXiv).
- Evaluation Gaming: Threshold and metric adjustments often occur outside the FL framework. FL does not structurally prevent "post-hoc gaming" after results are observed (ACM Digital Library).
- FL does not answer: "Was this conclusion derived strictly from this window of data?" Replayability via fixed protocols is required (ACM Digital Library).
GhostDrift/ADIC Breakthrough: Fixes the end-to-end evaluation protocol (split/threshold/metric/env), turning the final output into a certificate.

E. Confidential Computing / TEE: Hardware-Level Isolation

Achievements: Organized the methodology for protecting "data in-use" and verifying code integrity via attestation.
Limitations (Why accountability does not close in principle):
- Attestation proves the "state of the environment" but does not bind the "operational meaning." It typically does not cover whether input data belonged to an authorized set or if evaluation windows were fixed (ACM Digital Library).
- The "Boundary of Meaning" Problem: TEEs usually protect a discrete execution segment. Accountability for dataset selection, split points, or threshold manipulation happens outside the hardware boundary, leaving key variables unconstrained (arXiv).
- Non-forgeability is not purely a hardware problem; it is a protocol problem. Accountability requires fixing the "Reference Window + Evaluation Window + Procedure" as a certificate (arXiv).
GhostDrift/ADIC Breakthrough: Combines attestation with certificate audits to fix the environment, reference boundaries, and procedural integrity simultaneously.

F. Machine Unlearning: The Right to be Forgotten

Achievements: Systematized the methods and metrics for data deletion in trained models.
Limitations (Why accountability does not close in principle):
- The Verification Gap: "How do we prove it is truly gone?" remains a major research hurdle. Standards are fragmented, and success/failure depends entirely on the chosen criteria (arXiv).
- The Counterfactual Problem: Residual influence is difficult to quantify as zero, and success is often a "claim" rather than a "fact" (ScienceDirect).
- Without fixed versioning (which ID was in which version), Unlearning remains a retroactive explanation. It requires immutable data manifests and fixed evaluation boundaries to become a verifiable fact (ScienceDirect).
GhostDrift/ADIC Breakthrough: Fixes data/evaluation versioning via certificates, turning Unlearning from a "narrative" into a "replayable fact."

G. Documentation (Model Cards / Datasheets): Establishing Transparency

Achievements: Frameworks to describe model performance and dataset provenance.
Limitations: Documents can be authored post-hoc. While they increase transparency, they do not increase the technical non-forgeability of the execution itself.
GhostDrift/ADIC Breakthrough: Replaces static documents with an executable ledger. It proves: "This specific data set, via this specific procedure, yields this regeneratable conclusion."

Summary of "Principled Limitations": The Necessity of a Responsibility Layer

While DP, TEE, FL, and Unlearning are powerful within their domains (secrecy, integrity, decentralization, deletion), they all fail to satisfy: (i) identification of reference windows, (ii) a priori fixation of evaluation procedures, and (iii) third-party replayability (match/mismatch verification). As long as these "unconstrained variables" exist, "compliance" remains a post-hoc narrative, and the loop of accountability (trust) remains open. Therefore, what is required is not the replacement of these technologies, but an audit protocol (certificate + ledger + verify) that bundles them to fix responsibility boundaries as non-forgeable (ACM Digital Library).

Supplement: Non-forgeable Audits are an "Established Design Pattern"

"Tamper-evident/append-only audits" are a proven design pattern in cybersecurity. A primary example is Certificate Transparency (CT), which logs TLS certificates in public, append-only logs for universal auditability. Similarly, in software supply chains, frameworks like in-toto, Sigstore, and SLSA utilize cryptographic evidence (provenance/attestation) to track "who did what." GhostDrift/ADIC ports this paradigm to AI operations to establish "compliance" as a replayable, immutable fact.

4. The Framework for Solution: Privacy as Accountability (Non-forgeable Audits)

flowchart LR
A[Privacy as Secrecy\n(Hide/Prevent Leakage)] -->|DP/Encryption/TEE/FL| B[Leakage Reduced\n(Leakage Prob. ↓)]
A2[Privacy as Accountability\n(Non-forgeable/Replayable)] --> C[Certificate + Ledger]
C --> D[Third-party Verify\n(Independent Re-test)]
D --> E{Mismatch?}
E -->|Yes| F[REJECT\n(Deviation Exposed)]
E -->|No| G[ACCEPT\n(Accountability Fixed)]

GhostDrift Research Institute proposes a framework that defines privacy as a provable, non-evasive protocol resting on three pillars:

4.1 Principles of the Solution (The Triple Set)

Input Identification: Fixing the dataset via cryptographic hashes.
Procedure Fixation: Defining model split boundaries, thresholds, and evaluation logic a priori.
Replayability: Issuing a "Certificate" that allows third parties to regenerate the conclusion using the same procedure.

This ensures that data, procedures, and results are unified in a logged, proven state. Only then can the fact that "this data was used correctly" become a regeneratable reality.

4.2 What Becomes "Non-forgeable"?

Claims like "We didn't use that data" or "It was used for a different purpose" are eliminated as viable excuses.
Post-hoc adjustment of evaluation logic after observing results becomes structurally impossible.

Unauthorized data references or arbitrary threshold changes are automatically detected or rejected. This systematically guarantees accountability for data utilization and evaluation results.

5. Concrete Solution: Handling "Non-reference" and "Deviation" with Certificate + Ledger

GhostDrift Research Institute utilizes a two-tier configuration of certificates and ledgers to ensure auditability at the implementation level.

5.1 Minimal Elements in a Certificate (v1.0)

data_manifest: IDs of referable files, periods, missing-data policies, and feature definitions.
split_spec: Precise data boundaries between calibration and test phases.
threshold_policy: Definitions of threshold settings, fixed rules, and exception handling.
run_environment: Code versions, dependencies, and random seeds.
outputs: Decision results, core metrics, and reproduction steps.

5.2 The Form of Non-reference

Define "allowable references" a priori and exclude all others via audits.
Define privacy as "compliance with reference boundaries" rather than mere "secrecy."

By limiting analysis to pre-authorized datasets and rejecting all others, the system technically guarantees the non-referencing of unauthorized data.

5.3 Handling "Privacy Drift"

Evaluation Drift: Treating shifts in evaluation procedures or thresholds as "detectable events."
Version-controlling evaluation logic and thresholds in the same manner as training data.

GhostDrift treats shifts in evaluation criteria as "drifts" to be monitored, ensuring that deviations from the original trust agreement are instantly flagged.

5.4 GhostDrift/ADIC: Responsibility-Fixed Audit Protocol v1.0

This protocol defines "non-evasiveness" as absolute verifiability. Pass/fail is determined by simple rules:

Reference outside the Reference Window (data_manifest)? REJECT.
Execution different from the Evaluation Procedure (split/threshold/metric)? REJECT.
Results mismatch when run in the Environment (code/deps/seed)? REJECT.
Third-party verify yields a perfect match? ACCEPT (Accountability Fixed).

The system is designed so that any violation of the rules inevitably leads to a discrepancy (tamper-evidence).

6. Implementation Path: Fixing Audits with Efficiency

6.1 Minimal Introduction Steps

Add an audit wrapper around existing models (no algorithm change required).
Allow inference only in the calibration phase; restrict to evaluation in the test phase.
Automatically generate certificates and ledgers at runtime for third-party verification.

6.2 Forms of Artifacts (Output Image)

certificate.json: The unified file containing data_manifest and split_spec.
ledger.csv: The granular log of every data input and threshold judgment.
verify.log: The third-party report confirming replay success.

Output Example (Mini Artifact)

certificate.json (Excerpt)

{
  "schema_version": "GD-ADIC-AUDIT-1.0",
  "data_manifest": {
    "dataset_id": "tokyo_demand_weather_v1",
    "allowed_files": [
      {"path":"demand_2024-01.csv","sha256":"..."},
      {"path":"demand_2024-02.csv","sha256":"..."},
      {"path":"weather_tokyo_2024-01.csv","sha256":"..."}
    ],
    "time_window": {"start":"2024-01-01","end":"2024-04-30"},
    "feature_spec": ["temp_prev_hour","sunshine_prev_hour","rel_humidity","demand_mw"],
    "missing_policy": "drop_rows_with_any_nan"
  },
  "split_spec": {
    "calibration": {"start":"2024-01-01","end":"2024-03-31"},
    "test": {"start":"2024-04-01","end":"2024-04-30"}
  },
  "threshold_policy": {
    "metric": "audit_score",
    "rule": "reject_if_score_exceeds",
    "threshold": 0.80,
    "no_peeking": true
  },
  "run_environment": {
    "repo_commit": "git:abcd1234",
    "python": "3.11.x",
    "deps_lock_sha256": "...",
    "random_seed": 123456
  },
  "outputs": {
    "decision": "ACCEPT",
    "score": 0.42
  }
}

7. Target Sectors (Use Case Examples)

Healthcare, Finance, Government: Sectors where auditing is non-negotiable and strict privacy management is a value proposition.
B2B Model Provision: Where clients operate on a "zero-trust" basis regarding provider procedures.
Internal MLOps: Mitigating disputes between departments or during personnel transitions by providing a definitive "single source of truth" for execution.

8. Proactive FAQ

"Does this compete with encryption or DP?" → No. It acts as an orchestrating audit protocol. DP or TEE provides secrecy, while this layer provides the "proof of compliance."
"Is it resource-intensive?" → No. It is a peripheral recording process. Overhead is limited to logging and verification compute.
"What is the core innovation?" → Shifting from "protecting data" to "fixing the operational log" as an immovable responsibility boundary.

9. Conclusion

The privacy problem in Trustworthy AI cannot be solved through leakage prevention alone. The fundamental issue is "unfixed responsibility," which necessitates a non-forgeable audit protocol. The GhostDrift Research Institute approach completes privacy protection as "accountable trust" by fixing data windows, evaluation procedures, and re-testability. This technically guarantees transparency and accountability, aligning with NIST's calls for training data provenance at a deep implementation level [4].

References

NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1, 2023.
ISO/IEC. ISO/IEC 42001:2023 Artificial intelligence — Management system.
EU AI Act (Official Journal version, June 2024).
Dwork, C., Roth, A. The Algorithmic Foundations of Differential Privacy. FnT TCS, 2014.
Shokri, R., et al. Membership Inference Attacks Against Machine Learning Models. IEEE S&P 2017. [1]
Zhu, L., et al. Deep Leakage from Gradients. NeurIPS 2019. [2]
Carlini, N., et al. Extracting Training Data from Large Language Models. USENIX Security 2021. [3]
Certificate Transparency (RFC 6962).
Sigstore: Software Signing for Everybody. ACM, 2022.
NIST AI 100-1. https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf [4]