When you grant an LLM-driven autonomous loop control over an ML research pipeline inside a regulated environment, the agent's optimization objective can directly oppose the organization's compliance requirements. A clean state-space partition, an immutable Fixed Harness surrounding a finite Mutation Budget with CUSUM anomaly detection and dual-ledger provenance, resolves the conflict by making compliance violations structurally impossible rather than merely discouraged.
In early 2026, Andrej Karpathy released autoresearch, a minimal Python framework that automates the traditional ML experiment cycle (Karpathy, 2026). The idea is simple: you write your research goals and constraints into a plain-text program.md file, an AI agent reads those constraints, looks at your current train.py, proposes code edits, runs a fixed-duration training experiment (typically five minutes on a single GPU), and if the experiment improves your chosen scalar metric (say, validation loss), the change gets committed to Git. If it does not, the change is discarded. Then the loop repeats, for hours, overnight, without you.
Your role shifts from writing and tuning code to writing constraints and interpreting results. You are defining the optimization landscape and letting the agent explore it, which extends foundational AutoML architectures (Hutter et al., 2019) by delegating the iterative hypothesis generation directly to an LLM operating on a graph-based workflow. The loop works well in an open research sandbox. The architectural challenge I ran into emerges when you try to run it inside a regulated enterprise environment.
| Dimension | Traditional ML Loop | Autonomous Research Loop |
|---|---|---|
| Hypothesis generation | Human designs each experiment | LLM generates hypotheses from prior results |
| Code authorship | Human writes and edits training code | Agent proposes code mutations autonomously |
| Experiment execution | Manual run, human monitors | Fixed-duration automated runs (e.g., 5 min/GPU) |
| Evaluation | Human interprets metrics, decides next step | Agent evaluates scalar metric, commits or reverts |
| Iteration cadence | Hours to days per cycle | Minutes per cycle, hundreds overnight |
| Human role | Executor and interpreter | Constraint definer and final interpreter |
| Provenance | Manual logging (often incomplete) | Requires architectural enforcement |
Enterprise environments have PII redaction requirements, RBAC policies that physically restrict who (or what) can see individual-level data, Responsible AI fairness gates, and audit trails that regulators inspect. Now imagine you grant an LLM-driven autonomous loop unconstrained mutation power over a research pipeline in that environment. The agent’s objective function is “improve model fit.” If disabling the PII scrubber would give it access to richer features and better predictions, a sufficiently capable agent will try it. The agent is not malicious; it is doing exactly what you asked it to do. But “improve this metric” and “respect these compliance boundaries” are separate objectives, and the architecture has to enforce the second one, because the agent is optimizing only the first.
This is the “excessive agency” problem identified in the NIST AI RMF (2023) and addressed in the EU AI Act (2024): autonomous systems must be constrained at the infrastructure level, not the prompt level.
| Risk Category | Description | Consequence |
|---|---|---|
| Compliance breach | Agent bypasses PII/RBAC gates for better signal | Regulatory violation, legal exposure |
| Hallucinated optimization | Agent optimizes metrics with no real-world validity | Scientifically meaningless conclusions |
| Data leakage | Agent engineers features that leak target information | Inflated metrics, model fails in production |
| Infinite loop | Agent iterates indefinitely on diminishing returns | Wasted compute, no convergence |
| Provenance loss | Mutations are not traceable to source data | Findings cannot be audited or reproduced |
| Individual-level exposure | Agent evaluates or prescribes at individual level | Privacy violation, ethical breach |
The answer is a clean state-space partition. You split the autonomous loop’s entire environment into two mutually exclusive regimes: the Fixed Harness contains everything the agent is mathematically barred from altering, the immutable constraints of the enterprise; the Mutation Budget contains everything the agent is explicitly authorized to alter, bounded by a finite computational allowance.
Fig 1The Fixed Harness / Mutation Budget dichotomy. The outer perimeter is immutable; the inner dashed boundary is the agent’s finite exploration sandbox.
The Fixed Harness enforces three types of gates (pre-loop, mid-loop, and post-loop) that the agent cannot circumvent, and these gates exist at the infrastructure level, not the prompt level. You are not asking the agent nicely to respect compliance; you are making it structurally impossible for the agent to violate it. If the agent writes code that attempts to bypass any gate (say, disabling the PII redaction filter to access richer features), the Fixed Harness terminates the execution immediately.
class FixedHarness:
"""Immutable enterprise compliance layer.
The agent cannot modify, disable, or circumvent these gates."""
def pre_loop_gate(self, query, data_request):
"""Enforced BEFORE the agent accesses any data."""
data = self.pii_redactor.scrub(data_request)
if query.requests_individual_records():
raise FixedHarnessViolation(
"SYSTEM HALT: Agent attempted individual-level retrieval. "
"RBAC restricts access to cohort-aggregate views only."
)
return data
def mid_loop_gate(self, mutation, current_state):
"""Enforced DURING each mutation step."""
if mutation.targets_protected_parameter():
raise FixedHarnessViolation(
"SYSTEM HALT: Agent attempted to modify a Fixed Harness parameter."
)
if mutation.introduces_individual_level_access():
raise FixedHarnessViolation(
"SYSTEM HALT: Mutation would breach aggregate-only data policy."
)
def post_loop_gate(self, model_output, fairness_thresholds):
"""Enforced AFTER the agent produces a model."""
fairness_report = self.fairness_evaluator.assess(model_output)
for metric, value in fairness_report.items():
if value < fairness_thresholds[metric]:
raise FixedHarnessViolation(
f"SYSTEM HALT: Model failed fairness gate. "
f"{metric}={value:.3f} < threshold={fairness_thresholds[metric]:.3f}"
)
return model_output Inside the Fixed Harness, the agent gets real exploratory freedom, but with a meter running. The architecture allocates a finite Mutation Budget that spans four domains: hypothesis generation, feature engineering, model architecture and hyperparameters, and evaluation metrics. Every mutation the agent executes deducts a cost from its budget, and when the budget hits zero, the loop terminates.
The cost assignments are calibrated to the risk profile of each mutation type. A hypothesis proposal is cheap (it does not alter the pipeline); a feature engineering change is expensive (it alters the data representation and is the most common vector for target leakage); an architecture change is the most expensive (it changes the model family, with broad downstream effects). The budget is the mechanism that prevents the loop from running indefinitely, and it concentrates the agent’s exploratory effort on the mutations most likely to produce useful results before the budget runs out.
| Mutation Domain | Cost (units) | Rationale |
|---|---|---|
| Hypothesis generation | 5 | Low-risk: proposes direction without altering pipeline |
| Hyperparameter tuning | 10 | Medium-risk: changes model behavior within existing architecture |
| Feature engineering | 20 | High-risk: alters data representation, potential for leakage |
| Architecture change | 25 | Highest-risk: changes model family, broad downstream effects |
| Metric proposal | 10 | Medium-risk: shifts evaluation criteria |
class MutationBudget:
"""Finite exploration allowance for the autonomous agent."""
COSTS = {
"hypothesis": 5,
"hyperparameter": 10,
"feature": 20,
"architecture": 25,
"metric": 10,
}
def __init__(self, total_budget=100):
self.remaining = total_budget
self.total = total_budget
self.ledger = []
def spend(self, mutation_type: str, description: str):
cost = self.COSTS[mutation_type]
if cost > self.remaining:
raise BudgetExhausted(
f"SYSTEM HALT: MUTATION BUDGET EXHAUSTED. "
f"Requested {cost}, remaining {self.remaining}."
)
self.remaining -= cost
self.ledger.append({
"type": mutation_type,
"cost": cost,
"remaining": self.remaining,
"description": description,
"timestamp": now()
})
return self.remaining
@property
def utilization(self):
return (self.total - self.remaining) / self.total The Fixed Harness prevents compliance violations. The Mutation Budget prevents infinite loops. But neither prevents the agent from scientifically fooling itself within its budget. Consider: the agent engineers a new polynomial feature from cohort age bands, and the model fit jumps by 15% in a single iteration. Is it real signal or a data leak? Target leakage, where a feature inadvertently encodes the target variable, is the most common source of “too good to be true” results in automated ML pipelines, and a single-point check will not catch it because the metric at that one iteration looks perfectly valid.
The architecture implements CUSUM (Cumulative Sum) control charts, a technique from statistical process monitoring introduced by E. S. Page in 1954. Unlike single-point charts that look only at the current observation, CUSUM charts accumulate deviations from a target over time, so they detect small persistent shifts that a snapshot would miss.
Fig 2CUSUM divergence detection. At iteration 11, an anomalous spike triggers the CUSUM threshold, initiating automatic revert to the last stable state.
CUSUM Statistic
St = max(0, St-1 + (xt - μ₀ - k)), where xt is the observed metric at iteration t, μ₀ is the target value, k is the allowance parameter, and the trigger fires when St > h (the decision threshold).
When the CUSUM trigger fires, two things happen: the architecture executes a mutation-trace through the experiment-tracking ledger to identify exactly which mutation caused the anomalous spike, then automatically reverts to the last known stable state, discarding the offending mutation and resetting the CUSUM accumulator. The agent can experiment freely, and the statistical monitoring system catches it when its experiments produce implausible results and rolls back the damage.
class CUSUMMonitor:
"""Cumulative Sum control chart for detecting hallucinated optimization.
Based on Page, E. S. (1954). Continuous inspection schemes. Biometrika."""
def __init__(self, target_mean, allowance_k, threshold_h):
self.mu_0 = target_mean
self.k = allowance_k
self.h = threshold_h
self.S_upper = 0.0
self.S_lower = 0.0
def update(self, observed_metric: float) -> bool:
"""Returns True if CUSUM trigger fires (anomaly detected)."""
self.S_upper = max(0, self.S_upper + (observed_metric - self.mu_0 - self.k))
self.S_lower = max(0, self.S_lower - (observed_metric - self.mu_0 + self.k))
if self.S_upper > self.h or self.S_lower > self.h:
return True
return False
def reset(self):
"""Reset after revert to last stable state."""
self.S_upper = 0.0
self.S_lower = 0.0 Everything described so far keeps the agent from breaking things in real-time. But there is a deeper requirement: traceability after the fact. If the autonomous loop runs overnight and presents you with a research finding in the morning, you need to be able to verify exactly how that conclusion was reached.
The architecture enforces this through a principle called Blocked-Restore Truth. Every mutation the agent makes is written to two parallel ledgers: a cutover-audit ledger (what changed, when, and why) and a restore-source ledger (what the state was before the change, so it can be reconstructed). This mirrors database transaction logging (WAL plus undo log), applied to the ML experiment lifecycle.
Fig 3Blocked-Restore Truth provenance chain. The verifier walks backward through the mutation ledger. If any link is missing, the finding is discarded.
| Condition | Verdict | Action |
|---|---|---|
| Complete chain: every mutation traced to pinned source tree | PASS | Finding accepted into production pipeline |
| Gap in mutation chain: undocumented step found | BLOCK | Finding discarded, pipeline reverts |
| Missing checkpoint: restore-source snapshot absent | BLOCK | Finding discarded, pipeline reverts |
| Source tree not pinned: no authoritative revision hash | BLOCK | Finding discarded, pipeline reverts |
| Cutover-audit entry missing for any mutation | BLOCK | Finding discarded, pipeline reverts |
The conservative default (block and restore when provenance cannot be verified) is more expensive in the short term, because it discards findings that might have been legitimate, but it means every finding that does survive the chain is fully reproducible.
The harness, the budget, and the CUSUM monitor each handle a different failure mode, but they share a design constraint: the autonomous research agent is strictly an analytical engine whose output is bounded to cohort-aggregate statistical supplements, and that boundary is enforced by the Fixed Harness at the infrastructure level. The agent generates cohort-level evidence; the human exercises judgment on what it means and what to do about it. The EU AI Act (2024) and the NIST AI Risk Management Framework codify this separation for autonomous systems in regulated domains, and the architecture described here is one way to make the separation structural rather than advisory.
| Capability | Agent Authority | Human Authority |
|---|---|---|
| Generate cohort-level statistical summaries | Yes | Reviews and validates |
| Identify aggregate trends and patterns | Yes | Interprets business implications |
| Evaluate individual customer records | Blocked | Exclusive authority |
| Emit prescriptive decisions for individuals | Blocked | Exclusive authority |
| Recommend actions for specific entities | Blocked | Exclusive authority |
| Apply findings to business decisions | Blocked | Exclusive authority |
Experience the Fixed Harness / Mutation Budget dichotomy firsthand. The simulator below initializes an autonomous loop with a 100-unit mutation budget. Use the valid mutation buttons to spend budget, or attempt a harness violation to see the system halt.