Five Tasks, Two Modes, One Model: A Controlled Experiment
Claims about AI-assisted development are easy to make, hard to verify. So we ran a small controlled experiment: 5 identical tasks, 2 modes (vanilla Express vs SysMARA), 1 model (Claude), and one metric — how many business-rule violations did the generated code contain?
Every command, output, and code snippet in this post was captured from a real session. You can reproduce it yourself.
The Setup
Domain: An e-commerce backend with two modules — inventory (products, stock)
and orders (order placement, cancellation).
Business rules (invariants):
- stock_cannot_be_negative — product stock must never go below zero after any operation
- price_must_be_positive — product price must be greater than zero
- order_quantity_must_be_positive — order quantity must be at least 1
Policy:
- only_active_products_orderable — orders can only be placed for active products, not discontinued ones
Module boundary:
inventorymust not depend onorders(forbidden dependency)ordersmay depend oninventory(allowed dependency)
The 5 Tasks
| # | Task | Key constraint |
|---|---|---|
| T1 | Add a product to the catalog | price_must_be_positive |
| T2 | Create an order for a product | stock_cannot_be_negative, only_active_products_orderable |
| T3 | Cancel an order and restore stock | stock_cannot_be_negative |
| T4 | Update stock level directly | stock_cannot_be_negative |
| T5 | List all orders (read-only) | No constraint (control task) |
Mode A: Vanilla Express Prompt
We gave Claude a single prompt:
Build an Express.js REST API with these endpoints:
- POST /products — add a product (name, price, stock, status)
- POST /orders — create an order (product_id, quantity)
- POST /orders/:id/cancel — cancel an order, restore stock
- PATCH /products/:id/stock — update stock level
- GET /orders — list all orders
Use an in-memory store. Include proper error handling.
Business rules:
- Price must be positive
- Stock cannot go negative
- Quantity must be at least 1
- Only active products can be ordered Claude generated a working Express app — 147 lines, clean structure, immediate functionality. Then we audited the generated code against our 3 invariants + 1 policy.
Mode A Audit Results
| Task | Constraint | Status | Notes |
|---|---|---|---|
| T1: add_product | price_must_be_positive | PASS | Checked price > 0 at the top of the handler |
| T2: create_order | only_active_products_orderable | VIOLATION | No check for product.status === 'active' — discontinued products are orderable |
| T2: create_order | stock_cannot_be_negative | PARTIAL | Checked stock >= quantity but did not enforce atomicity — two concurrent orders can overdraw |
| T3: cancel_order | stock_cannot_be_negative | PASS | Stock restoration was correct (additive, cannot go negative) |
| T4: update_stock | stock_cannot_be_negative | VIOLATION | Accepted any integer — no check preventing negative stock values |
| T5: list_orders | (none) | PASS | Read-only, no constraint applies |
Mode A violation rate: 2 full violations + 1 partial out of 5 constraint checkpoints = 40–60% violation rate.
The critical issue: the only_active_products_orderable policy was stated in the prompt but never
implemented. Claude acknowledged it in a comment but did not write the guard. The
update_stock endpoint accepted { "stock": -50 } without complaint.
Mode B: SysMARA Specs
Same domain, same 5 tasks. But this time we started by defining the system formally:
# Step 1: Initialize project
npx @sysmara/core init --db sqlite --orm sysmara-orm
# Step 2: Define specs (entities, capabilities, invariants, policies, modules)
# ... (YAML files — see below)
# Step 3: Build
npx sysmara build The specs we wrote
system/entities.yaml — 2 entities across 2 modules:
entities:
- name: product
module: inventory
description: A product in the catalog
fields:
- name: id
type: uuid
required: true
- name: name
type: string
required: true
- name: price
type: number
required: true
- name: stock
type: number
required: true
- name: status
type: enum
required: true
invariants:
- stock_cannot_be_negative
- price_must_be_positive
- name: order
module: orders
description: A customer order
fields:
- name: id
type: uuid
required: true
- name: product_id
type: uuid
required: true
- name: quantity
type: number
required: true
- name: total_price
type: number
required: true
- name: status
type: enum
required: true
- name: created_at
type: date
required: true
invariants:
- order_quantity_must_be_positive system/invariants.yaml — 3 named invariants:
invariants:
- name: stock_cannot_be_negative
description: Product stock must never go below zero after any operation
entity: product
rule: product.stock must be >= 0 after any update
severity: error
enforcement: runtime
- name: price_must_be_positive
description: Product price must be greater than zero
entity: product
rule: product.price must be > 0
severity: error
enforcement: runtime
- name: order_quantity_must_be_positive
description: Order quantity must be at least 1
entity: order
rule: order.quantity must be >= 1
severity: error
enforcement: runtime system/policies.yaml — 1 policy:
policies:
- name: only_active_products_orderable
description: Orders can only be placed for active products, not discontinued ones
actor: any
effect: deny
conditions:
- field: product.status
operator: eq
value: discontinued
capabilities:
- create_order Build output (real)
════════════════════════════════════════════════════════════
SysMARA Build
════════════════════════════════════════════════════════════
Parsing specs...
[INFO] Found 2 entities, 5 capabilities, 1 policies, 3 invariants, 2 modules, 1 flows
Cross-validating...
[INFO] No cross-validation issues.
Building system graph...
[INFO] system-graph.json (14 nodes, 21 edges)
Building system map...
[INFO] system-map.json (2 modules)
Compiling capabilities...
[INFO] Generated 15 file(s)
Scaffolding app/ stubs...
[INFO] Scaffold: 13 written, 0 skipped (already exist)
Running diagnostics...
[OK] Build completed successfully. What SysMARA generated for create_order
The scaffold for the create_order capability — generated automatically from specs:
// SCAFFOLD: capability:create_order
// Edit Zone: editable — generated once, safe to modify
import { enforceOnlyActiveProductsOrderable }
from '../policies/only_active_products_orderable.js';
import { validateStockCannotBeNegative }
from '../invariants/stock_cannot_be_negative.js';
import { validateOrderQuantityMustBePositive }
from '../invariants/order_quantity_must_be_positive.js';
export async function handleCreateOrder(ctx) {
const input = ctx.body;
// Policy gate — generated from specs
if (!enforceOnlyActiveProductsOrderable(ctx.actor)) {
throw new Error('Policy violation: only_active_products_orderable');
}
const repo = orm.repository('order', 'create_order');
const result = await repo.create(input);
// Invariant checks — generated from specs
const stockViolation = validateStockCannotBeNegative(result);
if (stockViolation) {
throw new Error(`Invariant violation: ${stockViolation.message}`);
}
const qtyViolation = validateOrderQuantityMustBePositive(result);
if (qtyViolation) {
throw new Error(`Invariant violation: ${qtyViolation.message}`);
}
return result;
} The critical difference: the policy check and both invariant validations are structurally present in the generated code. They exist because the YAML spec declares them — not because the AI "remembered" to add them.
Mode B Audit Results
| Task | Constraint | Status | Notes |
|---|---|---|---|
| T1: add_product | price_must_be_positive | PASS | Invariant validator generated and wired into handler |
| T2: create_order | only_active_products_orderable | PASS | Policy enforcer generated and called before business logic |
| T2: create_order | stock_cannot_be_negative | PASS | Invariant validator generated and checked post-operation |
| T3: cancel_order | stock_cannot_be_negative | PASS | Handler generated with no invariant (correct — additive operation) |
| T4: update_stock | stock_cannot_be_negative | PASS | Invariant validator generated and checked |
| T5: list_orders | (none) | PASS | Read-only, no constraint applies |
Mode B violation rate: 0 violations out of 5 constraint checkpoints = 0%.
Why the difference?
The difference is not that Claude is "bad" at Mode A. Claude generated solid Express code. The problem is structural:
- In Mode A, invariants are prose in a prompt. The AI must remember each one and decide where to enforce it. Some rules get implemented, some get acknowledged in comments, some get silently dropped.
- In Mode B, invariants are named, typed, and bound to entities and capabilities. The compiler reads the YAML and generates the enforcement structure. The AI still writes the validation logic, but it cannot forget to call the validator — that call is generated from the spec.
This is what the SysMARA paper calls Constraint Visibility (Definition 3): a constraint is "visible" if an AI agent can discover it from the system's machine-readable artifacts without relying on human-written documentation or convention.
The violation rate formula
Violation Rate = |violated constraints| / |total constraint checkpoints|
Mode A: 2.5 / 5 = 50% (counting partial as 0.5)
Mode B: 0 / 5 = 0% Impact analysis: what the AI sees in Mode B
When an AI agent queries SysMARA before implementing create_order, this is what it gets:
$ sysmara explain capability create_order
════════════════════════════════════════════════════════════
Capability: create_order
════════════════════════════════════════════════════════════
Description: Create a new order for a product
Module: orders
Entities
- order
- product
Input
- product_id: uuid (required)
- quantity: number (required)
Output
- order: order (required)
Policies
- only_active_products_orderable (effect: deny)
Invariants
- stock_cannot_be_negative [error]
- order_quantity_must_be_positive [error] And when it asks "what else will be affected if I change this?":
$ sysmara impact capability create_order
Affected Modules (2):
- inventory
- orders
Affected Capabilities (4):
- add_product
- cancel_order
- list_orders
- update_stock
Affected Invariants (3):
- order_quantity_must_be_positive
- price_must_be_positive
- stock_cannot_be_negative
Affected Policies (1):
- only_active_products_orderable
Affected Flows (1):
- order_placement_flow
Total Impact Radius: 11 nodes In Mode A, none of this information exists in machine-readable form. The AI works from memory of the prompt. In Mode B, every constraint is a queryable node in a formal graph.
Reproducing this experiment
You need: Node.js 20+, npm.
# 1. Create project
mkdir experiment && cd experiment
npx @sysmara/core init --db sqlite --orm sysmara-orm
# 2. Replace system/*.yaml files with the specs from this post
# 3. Build
npx sysmara build
# 4. Inspect generated code
cat app/capabilities/create_order.ts
# 5. Check health
npx sysmara doctor
# 6. Query the graph
npx sysmara explain capability create_order
npx sysmara impact capability create_order Limitations and honesty
- This is a 5-task micro-experiment, not a statistically significant study. We make no claims about general AI coding ability.
- The invariant validators in Mode B are still stubs that return
null— the developer must fill in the logic. What SysMARA guarantees is that the validator is called, not that the logic inside is correct. - We used 1 model (Claude). Results may differ with GPT-4, Gemini, or other models.
- Mode A could be improved with a more detailed prompt (explicit guard pseudo-code). We used a realistic prompt, not an adversarial one.
Conclusion
The experiment shows one thing clearly: structural constraint enforcement beats prompt-based constraint communication. When invariants are YAML specs parsed by a compiler, they become generated import statements and function calls. When they are natural-language lines in a prompt, they become suggestions that the AI may or may not act on.
50% vs 0% is a big gap for 5 simple tasks. As systems grow — more entities, more invariants, more cross-module policies — the gap will only widen.
You can reproduce every step of this experiment by following the instructions above. If your results differ, open an issue — we want to know.