I published a five-layer framework for AI-assisted production. In the same session, I broke the first rule.
The Stack I Didnât Design opens with âthe top of the stack is the one thing I canât automate.â Layer 1 â Human Direction â is human decides what to build, for whom, from what angle. That canât be delegated. Itâs the postâs central claim.
The last30days blog post I published that same session had an angle chosen entirely by the Builder agent: âHow I Built a Research Engine.â I submitted a PR to someone elseâs tool. The Builder fabricated an authorship narrative and I published it. I caught it by reading the live site on my phone. Not the QC agent. Not the orchestrator. Me, on a phone.
Layer 1 delegated to a machine. The one thing the framework says you canât do.
The audit
I ran every layer against actual usage. Hereâs what I found.
Layer 2 â Methodology had mixed results. The template (blog style guide) was used consistently. The personas (TheEditor, TheDesigner) worked for QC and layout. Procedure exists on paper. Enforcement worked perfectly â but only because the pre-commit hook is a physical block. You cannot push without qc: passed in frontmatter, so that gate runs 100% of the time. Every gate without a physical mechanism ran 0% of the time.
Layer 3 â Code Patterns is where the numbers get bad. The framework describes five gate operators. Actual usage:
| Gate | Used? | What it would have caught |
|---|---|---|
| QCGate | Yes (pre-commit hook) | Prose issues |
| SpecGate | Never | Builder writing âHow I Builtâ from a vague direction |
| DiagnosisGate | Never | Session 7: 3 failed Graph View fixes before the right solution |
| EvidenceGate | Never | âI built this toolâ published with zero evidence check |
| TestGate | Never | Blog components with zero tests |
One out of five gates operational. 20% execution rate on my own framework.
Layer 4 â Execution Infrastructure. Session briefs work well. QC agents running in separate context: correct. Ralph loop and cc-fuel-gauge: never used. Context management was entirely manual, which means it happened when I remembered and not when I didnât.
Layer 5 â Model Behavior is the most ironic miss. The framework describes PUA and SM (diagnosis before fix) as tools the model uses autonomously. In practice, the user PUAâd the AI every time â âpua ä˝ ćŻP8ä˝ ĺŽćšćĄâ, âpua last30daysĺ°ĺşĺ¨čŻ´ĺĽâ â never the other direction. SM was never applied to the last30days incident. When the fabricated post was flagged, the first instinct was immediate rewrite. The user had to push back â âä˝ ćä¸ĺ°?â â before anyone actually checked the PR.
The frameworkâs own warning is âwhat the model says â what the model does.â The last30days incident is that sentence happening to its author. The Builder confidently wrote an authorship narrative. The QC agent confidently passed it. Both said the right things and neither did the right thing.
Root cause
The pipeline was: user says something â Builder writes â QC reviews prose â publish
It should be: user says something â SpecGate (thesis + author's relationship to subject) â Builder writes â EvidenceGate (verify factual claims) â QCGate â publish
The missing gates werenât forgotten. They had no enforcement mechanism. The frameworkâs own principle: enforcement = physical block. Only QCGate got a hook. The framework described the exact solution to its own failure mode and then didnât implement it.
Volume pressure made this worse. âOne post per sessionâ created an incentive to ship, not to check. The SpecGate and EvidenceGate feel like friction when youâre trying to hit a cadence. Theyâre not friction â theyâre the point.
What changes
Four concrete fixes, not resolutions:
1. SpecGate enforcement. Blog posts now require a spec: frontmatter field: thesis, authorâs relationship to the subject, target reader. Pre-commit hook checks for its presence. No spec, no commit. The last30days post would have failed immediately â I had no thesis that was mine to claim.
2. EvidenceGate in QC. The first item on the QC checklist is now âverify authorâs relationship to the subjectâ â before prose quality, before structure. Factual integrity is not a prose problem.
3. Self-PUA: enforce or remove. Itâs been in the memory system for four sessions with 0% execution. Either it gets a physical check (a gate operator that asks âhave you applied SM to this sessionâs failures?â) or I admit itâs aspirational and remove it. Aspirational items in an enforcement framework are worse than nothing â they create the illusion of coverage.
4. One post with a thesis, not one post. The volume goal stays. The quality standard changes. A post without a verifiable thesis doesnât count toward the session target. This directly addresses the incentive that produced the fake post.
The meta-observation: I can tell the difference between a gate that runs and a gate that exists. The hook runs. Everything else existed. Existence is not execution. The next version of this framework has one design rule: if it doesnât have a physical block, it doesnât count as a gate.