An audit log is only as good as the guarantees you can make about it under adversarial review. "We log every query" is a starting point; "we can prove these records were written in order and have not been modified or deleted since" is the standard a compliance auditor or opposing counsel will actually apply. The techniques below — append-only storage, hash-chained entries, independent signing, and external notarization — turn logs from a convenient debug artifact into evidence.
For an AI document pipeline, each event should capture:
Each log entry includes the hash of the previous entry, so tampering with any record invalidates every subsequent hash. A verifier can walk the chain and detect any insertion, deletion, or modification.
import json, hashlib, hmac, time
from dataclasses import dataclass, asdict
@dataclass
class AuditEntry:
seq: int
ts: float
actor: str
action: str
subject: dict
prev_hash: str
mac: str = ""
def _canonical(e: AuditEntry) -> bytes:
d = asdict(e); d.pop("mac")
return json.dumps(d, sort_keys=True, separators=(",", ":")).encode()
def append(prev: AuditEntry | None, actor: str, action: str,
subject: dict, sign_key: bytes) -> AuditEntry:
entry = AuditEntry(
seq=(prev.seq + 1) if prev else 1,
ts=time.time(),
actor=actor,
action=action,
subject=subject,
prev_hash=hashlib.sha256(_canonical(prev)).hexdigest() if prev else "GENESIS",
)
entry.mac = hmac.new(sign_key, _canonical(entry), hashlib.sha256).hexdigest()
return entry
def verify_chain(entries: list[AuditEntry], sign_key: bytes) -> bool:
prev = None
for e in entries:
expected_prev = (hashlib.sha256(_canonical(prev)).hexdigest()
if prev else "GENESIS")
if e.prev_hash != expected_prev:
return False
mac = hmac.new(sign_key, _canonical(e), hashlib.sha256).hexdigest()
if not hmac.compare_digest(mac, e.mac):
return False
prev = e
return True
For production, replace HMAC with an asymmetric signature (Ed25519, ECDSA P-256) whose private key lives in a KMS HSM. The HMAC version is illustrative; asymmetric signing lets an auditor verify without holding the signing key.
To defend against silent truncation, periodically publish the chain's current head hash to an external, append-only medium:
Cadence is a trade-off: anchor every minute and the head can only slide a minute before detection, but cost rises; anchor daily and the detection window is a day.