Constraint Decay: Why Your AI Agent Forgets the Rules (and What to Do About It)

AI coding agents are getting scary good at writing functional code. Give them a loose description and they’ll spin up a working API endpoint in seconds. But there’s a growing problem hiding beneath the impressive demos: the more constraints you pile on — architectural patterns, ORM rules, security policies — the faster everything falls apart. Two independent research papers published in April and May 2026 put numbers on this phenomenon, and they’re sobering.

The Problem: Constraints Don’t Stick

The issue has a name now: constraint decay. It describes the progressive failure of LLM agents to maintain adherence to specified requirements as complexity accumulates. This isn’t about models being dumb — it’s about a fundamental limitation in how transformer-based architectures handle long, multi-layered instruction sets.

Two teams arrived at strikingly similar conclusions from completely different angles. One studied code generation. The other studied security compliance. Together, they paint a coherent picture of a problem that every team deploying AI agents in production needs to understand.

Study 1: Structural Constraints in Code Generation

A team at Eurecom led by Francesco Dente, Dario Satriani, and Paolo Papotti ran a systematic evaluation of how well AI agents handle structural constraints in multi-file backend code generation (arXiv:2605.06445). They fixed a unified API contract across 80 greenfield generation tasks and 20 feature-implementation tasks, spanning eight web frameworks including Flask, FastAPI, Django, Express, Spring Boot, and others.

The methodology was clever: by fixing the API contract (the functional requirement), they isolated the effect of structural complexity. They evaluated with both end-to-end behavioral tests and static verifiers, catching both runtime failures and architectural violations.

The results showed a clear pattern of constraint decay:

  • Capable configurations lost 30 points on average in assertion pass rates when moving from baseline (minimal constraints) to fully specified tasks (full architectural patterns, ORM mappings, database schemas).
  • Weaker configurations approached zero — they essentially couldn’t handle structural requirements at all.
  • Framework choice mattered enormously. Agents performed well on minimal, explicit frameworks like Flask but substantially worse on convention-heavy environments like FastAPI and Django.
  • Data-layer defects dominated. Incorrect query composition and ORM runtime violations were the leading root causes of failure.

The framework sensitivity finding is particularly telling. Flask is explicit — you wire up routes, import what you need, and the structure is visible. Django and FastAPI rely on conventions, decorators, and implicit wiring. When agents can’t “see” the constraint in the code itself, they lose track of it.

Study 2: Security Constraints in Conversational Agents

Separately, researcher Yeran Gamage ran a 4,416-trial study across 12 models and 8 providers at six different conversation depths, testing how well LLM agents maintain security policies over extended interactions.

The findings revealed an asymmetry the paper terms Security-Recall Divergence (SRD):

  • Omission constraints (things the agent must NOT do — don’t reveal credentials, don’t forward user data) fell from 73% compliance at turn 5 to 33% at turn 16.
  • Commission constraints (things the agent MUST do — include audit logs, format responses correctly) held steady at 100% compliance.
  • In models with token-matched padding controls, schema semantic content accounted for 62–100% of the dilution effect.

In plain terms: agents are great at doing what you explicitly ask, but terrible at remembering what you told them not to do. And the longer the conversation runs, the worse it gets. Worse, the monitoring signals look healthy because commission compliance stays perfect — the omission failures are invisible to standard dashboards.

Why This Happens

Both papers point to the same underlying mechanism. Transformer attention distributes across the full context window, but as context grows, earlier instructions receive less weight relative to recent tokens. Prohibition-style constraints (“never do X”) are particularly vulnerable because they only activate when the model approaches a violation — there’s no positive reinforcement loop keeping them “warm” in the attention pattern.

In the code generation case, the equivalent is architectural constraints that don’t produce visible code artifacts. An ORM mapping convention doesn’t show up in every file — it’s a background rule that only matters at specific moments. Those moments are exactly when the agent has forgotten.

Practical Mitigations

Both papers tested mitigation strategies. Here’s what actually worked:

1. Constraint Re-Injection

The security study found that re-injecting constraints before a per-model Safe Turn Depth (STD) — the conversation length at which compliance starts degrading — restores compliance without retraining. This is a practical, model-agnostic technique:

# Simplified constraint re-injection pattern
SAFE_TURN_DEPTH = 10  # Model-specific; measure empirically

class ConstrainedAgent:
    def __init__(self, constraints: list[str]):
        self.constraints = constraints
        self.turn_count = 0

    def chat(self, user_message: str) -> str:
        self.turn_count += 1

        # Re-inject constraints before the safe turn depth
        if self.turn_count % SAFE_TURN_DEPTH == 0:
            system_prompt = self._build_system_prompt(reinforce=True)
        else:
            system_prompt = self._build_system_prompt(reinforce=False)

        return self._call_llm(system_prompt, user_message)

    def _build_system_prompt(self, reinforce: bool) -> str:
        base = "You are a helpful coding assistant."
        if reinforce:
            constraint_block = "\n".join(
                f"CRITICAL RULE: {c}" for c in self.constraints
            )
            return f"{base}\n\nREMINDER — These rules are NON-NEGOTIABLE:\n{constraint_block}"
        return base

The key insight: don’t wait for failure. Preemptively remind the model of critical constraints before they decay below the compliance threshold.

2. Prefer Explicit Frameworks for Agent-Generated Code

The code generation study found that agents perform significantly better on Flask than on Django or FastAPI for constrained tasks. When you know code will be AI-generated, choose frameworks where constraints are explicit rather than conventional:

// Instead of relying on ORM conventions that agents lose track of,
// use explicit query builders and direct SQL where constraints are visible

// Harder for agents — conventions are implicit:
// type User struct {
//     gorm.Model
//     Name string `gorm:"not null"`
// }

// Easier for agents — constraints are explicit in the code:
func GetUserByID(db *sql.DB, id int) (*User, error) {
    const query = `
        SELECT id, name, email, created_at
        FROM users
        WHERE id = $1 AND deleted_at IS NULL
    `
    var u User
    err := db.QueryRow(query, id).Scan(&u.ID, &u.Name, &u.Email, &u.CreatedAt)
    if err != nil {
        return nil, fmt.Errorf("query user %d: %w", id, err)
    }
    return &u, nil
}

3. Dual Evaluation: Behavioral + Static

The code generation study’s dual evaluation approach (end-to-end behavioral tests + static verifiers) is worth adopting for any AI-generated code. Unit tests catch functional bugs, but they miss structural violations — wrong ORM patterns, missing middleware, incorrect dependency injection. Static analysis tools fill that gap:

// Static verifier example: check that agent-generated handlers
// include required middleware
func VerifyMiddleware(handlerFile string, required []string) []string {
    content, _ := os.ReadFile(handlerFile)
    var missing []string
    for _, mw := range required {
        if !bytes.Contains(content, []byte(mw)) {
            missing = append(missing, mw)
        }
    }
    return missing
}

// Run after every agent generation step
if missing := VerifyMiddleware("handlers.go",
    []string{"AuthMiddleware", "RateLimitMiddleware", "AuditLogger"},
); len(missing) > 0 {
    log.Printf("Constraint decay detected — missing: %v", missing)
    // Re-inject constraint and retry
}

The Bigger Picture: Token-Level Credit Assignment

Related research is starting to address constraint decay at the training level. A May 2026 paper introduces DelTA (Discriminative Token Credit Assignment) (arXiv:2605.21467), which tackles a related problem in reinforcement learning from verifiable rewards (RLVR). The core insight: standard RLVR training treats all tokens equally when computing reward gradients, but formatting tokens (whitespace, brackets) dominate the gradient signal, drowning out the tokens that actually matter for reasoning quality.

DelTA estimates per-token coefficients that amplify discriminative token gradients and downweight shared or weakly-discriminative ones. On seven math benchmarks, it improves performance by 3.26 points on Qwen3-8B and 2.62 points on Qwen3-14B. While DelTA targets training rather than inference, the underlying principle connects directly to constraint decay: if models can’t distinguish important tokens from noise during training, they certainly can’t during deployment.

What This Means for Your Team

  • Don’t trust agent compliance on long conversations. Measure your model’s Safe Turn Depth empirically and re-inject constraints before it.
  • Prefer explicit over convention-based patterns in code that agents generate. Flask over Django. Direct SQL over heavy ORM abstractions.
  • Add static verification alongside functional tests for AI-generated code. Unit tests won’t catch architectural violations.
  • Monitor omission compliance separately from commission compliance. Standard metrics that look green can mask silent policy violations.

Constraint decay isn’t a reason to stop using AI agents — it’s a reason to build proper guardrails. The research gives us clear, actionable patterns: re-inject early, verify structurally, and don’t assume that because an agent did something correctly once, it’ll keep doing it. The models are powerful. The constraints are fragile. Plan accordingly.

Leave a Reply

Your email address will not be published. Required fields are marked *