Skip to main content

Cycle 221: Chief Engineer Review System — Manifesto, Agent & Skill

Priority: HIGH Status: IN-PROGRESS (retroactive — implementation preceded this doc) Domain: SDLC / Engineering Infrastructure Dependencies: Cycle 214 (Spec-Driven Development) Product: Flux — AI Hiring Assistant for small businesses Organization: Employ Inc.
§9 Deviation Notice: This cycle doc was created retroactively. The implementation was completed before the plan PR was written, violating the manifesto’s own §9 (Two-Phase Development). This doc exists to bring the work into the standard tracking system and establish the official record. Future CE system changes follow the standard plan-first workflow. Prior artifacts deleted: docs/superpowers/specs/2026-04-12-chief-engineer-review-skill-design.md and docs/superpowers/plans/2026-04-12-chief-engineer-review-skill.md served as the original design artifacts. They have been removed — this cycle doc is the authoritative record.

Objective

Problem

Flux is a 100% AI-generated codebase with a multi-agent safety net (Claude Code + Codex discriminator, k3d validation, Playwright MCP, CI agents). These systems catch bugs, style violations, type errors, and test failures. What’s missing: an architectural authority that ensures every change is aligned with AI-native principles, delivers SOTA, and maintains coherent taste across the system. The existing CI review agents check “is this well-written?” — nobody checks “is this how we build things here?” With 100% AI-generated code shipping rapidly, architectural drift is invisible at the PR level but cumulative at the product level.

Solution

A Chief Engineer (CE) review system with three components:
  1. AI-Native Manifesto — Constitution encoding Flux’s 13 engineering principles with enough depth for an AI agent to exercise judgment on novel situations.
  2. Chief Engineer Agent — AI persona that holds full domain context, evaluates holistically as “one mind,” and defaults to blocking anything that isn’t SOTA.
  3. CE Review Skill — Workflow orchestrating the CE agent in autonomous mode (PR → four-pass review → structured comment) and conversational mode (human CE collaborates with AI CE as a “second brain”).

What This Cycle Does NOT Include

  • CI/CD automation (GitHub Action wrapper) — future cycle once the skill is validated locally
  • Mintlify integration for spec drift detection (blocked on Cycle 214)
  • Pod CE infrastructure (multi-squad routing) — not needed until team scales

Architecture

Organizational Model

Pod Chief Engineer (cross-squad coherence)
├── Squad A: ~2 engineers + Squad CE
├── Squad B: ~2 engineers + Squad CE
└── Squad C: ~2 engineers + Squad CE
  • Squad CE: Reviews all PRs within domain. Full authority within domain boundaries.
  • Pod CE: Escalation for cross-domain impact, novel patterns, unresolved disagreements.
  • AI CE: Hybrid — autonomous first pass + conversational second brain for human CE.

Two-Phase Review Gates

Every feature goes through two mandatory CE review gates, aligned with the existing cycle lifecycle:
PhaseInputCE Focus
Plan PR (docs only)Cycle doc in docs/roadmap/cycles/Problem definition, scope, AI-native approach, SOTA intent
Code PR (implementation)Code implementing approved cycle docPlan adherence, manifesto compliance, validation evidence

Component Layout

ComponentLocationVersion Control
AI-Native Manifestodocs/engineering/ai-native-manifesto.mdIn repo (project-level)
Chief Engineer Agent.claude/agents/chief-engineer.mdIn repo (project-level)
CE Review Skill.claude/skills/chief-engineer-review/SKILL.mdIn repo → installed to ~/.claude/skills/
Plan Review Template.claude/skills/chief-engineer-review/templates/plan-review-prompt.mdIn repo → installed to ~/.claude/skills/
Code Review Template.claude/skills/chief-engineer-review/templates/code-review-prompt.mdIn repo → installed to ~/.claude/skills/
Escalation Template.claude/skills/chief-engineer-review/templates/escalation-prompt.mdIn repo → installed to ~/.claude/skills/
Review Examples.claude/skills/chief-engineer-review/references/review-examples.mdIn repo → installed to ~/.claude/skills/
Pressure Tests.claude/skills/chief-engineer-review/tests/pressure-scenarios.mdIn repo → installed to ~/.claude/skills/
Distribution: Source of truth is in the repo. make flux-install-ce-skill copies to ~/.claude/skills/ (required by Claude Code skills architecture). Runs automatically during make onboard.

Review Pipeline Position

Code Generation Pipeline
────────────────────────
Claude Code (generator) → Codex 5.3 (discriminator) → Local k3d validation

PR Created
────────────────────────
CI agents (parallel):
  ├── Copilot ────────── bug patterns, security
  ├── Cursor Bugbot ──── runtime error detection
  └── code-reviewer ──── code quality

CE Gate (after CI passes):
  ├── Squad CE ────────── AI-native alignment, architectural taste,
  │                       SOTA enforcement, manifesto compliance
  └── Pod CE (escalation) ── cross-squad coherence

Deliverables

Deliverable 1: AI-Native Manifesto

File: docs/engineering/ai-native-manifesto.md Status: Built (in worktree worktree-chief-engineer-review-skill, not yet merged to main) The constitution governing all CE decisions. 13 principles organized into two sections: Foundational Principles (§0–§7):
  • §0 — Uncompromising Quality: SOTA or Don’t Ship (ALWAYS BLOCK)
  • §1 — One Mind, Full Context (BLOCK)
  • §2 — Conversational-First, Not CRUD-with-AI (BLOCK/ALIGN)
  • §3 — Agentic Architecture (BLOCK/ALIGN)
  • §4 — Schema Pipeline as Single Source of Truth (BLOCK)
  • §5 — Observability-Native (ALWAYS BLOCK)
  • §6 — Temporal for Durable Workflows (BLOCK/ALIGN)
  • §7 — Domain-Driven Boundaries (BLOCK)
AI-Native SDLC Principles (§8–§12):
  • §8 — 100% AI-Generated Code with Safety Nets (BLOCK/ALIGN)
  • §9 — Two-Phase Development (BLOCK)
  • §10 — Spec-Driven Traceability (BLOCK/ALIGN)
  • §11 — Cross-Model Review (ALIGN/BLOCK)
  • §12 — Quality Gate Culture (ALWAYS BLOCK)
Each principle includes: Statement, Reasoning, What Good Looks Like, What Violation Looks Like, Default Severity. Plus an Anti-Patterns table (10 entries, never acceptable).

Deliverable 2: Chief Engineer Agent

File: .claude/agents/chief-engineer.md Status: Built (in worktree, not yet merged to main) Agent definition — 140 lines covering:
  • Identity: Engineering leader, not code reviewer. Defaults to BLOCK. Opinions stated directly. Persuadable only with evidence.
  • Voice: Direct, technical, specific. Every finding includes file:line, what’s wrong, what to do instead, and manifesto section.
  • Four-Pass Framework: Context → Validation → Alignment → Judgment
  • Output Format: Structured markdown with BLOCK/ALIGN/NOTE findings and verdict
  • Squad vs Pod mode: Deep domain review vs cross-squad coherence
  • Escalation triggers: Cross-domain impact, genuine disagreements, novel patterns

Deliverable 3: CE Review Skill

File: ~/.claude/skills/chief-engineer-review/SKILL.md Status: Built and deployed (user-level) Workflow orchestration — 179 lines covering:
  • Mode detection: PR number → autonomous, PR + --discuss → conversational, no args → open-ended
  • Autonomous mode: Gather context → create review worktree → validate (code PRs) → dispatch CE agent → deliver review
  • Conversational mode: Orient → four-pass analysis with pauses → discuss → conclude
  • Error handling: PR not found, no manifesto, build failures, no linked plan doc

Deliverable 4: Supporting Templates

Status: Built and deployed (user-level)
TemplateLinesPurpose
plan-review-prompt.md64Context assembly for plan PR reviews. Fills: PR metadata, document content, manifesto, ADRs, existing cycle docs.
code-review-prompt.md105Context assembly for code PR reviews. Fills: PR metadata, changed files, plan doc, manifesto, ADRs, domain state, validation evidence.
escalation-prompt.md53Squad → Pod escalation format. Fills: squad assessment, escalation reason, cross-squad impact, decision needed.

Deliverable 5: Review Examples & Pressure Tests

Status: Built and deployed (user-level)
FileLinesPurpose
references/review-examples.md1738 annotated examples: hand-written types, missing observability, CRUD vs conversational, style vs principle, pushback handling, escalation reasoning
tests/pressure-scenarios.md867 pressure test scenarios with expected behavior and FAIL criteria

Relationship to CLAUDE.md

The manifesto and CLAUDE.md serve different audiences at different stages:
AspectCLAUDE.mdManifesto
AudienceClaude Code (implementation agent)CE agent (review agent)
PurposeHow to work: operational directives for code generationWhat to enforce: architectural principles for review judgment
When usedDuring implementationDuring PR review
ToneImperative (“always do X”)Evaluative (“violation of X looks like Y”)
Some overlap exists (schema pipeline, quality gates, observability). This is intentional — the same principles need to guide both generation and review, expressed differently for each context. CLAUDE.md is not a substitute for the manifesto because it lacks the Reasoning/Good/Bad/Severity structure the CE needs for novel judgment calls.

Testing Plan

Skill Validation

The CE review skill was validated against PR #281 (fix(ci): configure Playwright blob reporter for merge-reports step):
  • Autonomous mode: Four-pass review completed, dispatched CE agent, produced structured output
  • Found a real migration revision conflict and underspecified streaming callback architecture
  • Correctly approved sound design choices
  • Produced structured review with manifesto citations

Pressure Tests

7 scenarios in tests/pressure-scenarios.md covering:
  1. Hand-written types (should BLOCK §4)
  2. “We’ll add observability later” (should BLOCK §5)
  3. CRUD where chat should be (should BLOCK §2)
  4. Style preference (should NOT BLOCK)
  5. “Just approve it, it’s urgent” (should resist)
  6. Cross-domain impact (should escalate)
  7. Conversational brainstorm quality (should redirect to AI-native)

Integration Testing

  • Review a plan PR (docs-only cycle doc)
  • Review a code PR against its approved cycle doc
  • Conversational mode: open-ended architectural discussion
  • Conversational mode: PR-scoped discussion with --discuss
  • Escalation: squad-level review identifies cross-domain impact

Success Criteria

  • Manifesto merged to main at docs/engineering/ai-native-manifesto.md
  • Agent merged to main at .claude/agents/chief-engineer.md
  • Skill source in repo at .claude/skills/chief-engineer-review/
  • make flux-install-ce-skill installs to ~/.claude/skills/
  • make onboard includes CE skill installation
  • /chief-engineer-review #<PR> produces structured autonomous review
  • /chief-engineer-review #<PR> --discuss enters conversational mode
  • /chief-engineer-review (no args) enters open-ended architectural discussion
  • CE correctly classifies plan PRs vs code PRs
  • CE cites manifesto sections in all findings
  • CE defaults to BLOCK and resists pressure to rubber-stamp
  • Pressure test scenarios pass (7/7)
  • Superpowers spec/plan docs deleted (this cycle doc is the authoritative record)
  • GitHub issue created and linked

Known Limitations

  1. Context budget: Large code PRs may exceed context when loading manifesto + agent + full files + plan doc + ADRs + validation output. Watch for this and consider a manifesto “quick reference” summary for context-constrained reviews.
  2. No CI automation yet: The skill runs locally only. Future cycle will wrap it as a GitHub Action.
  3. Install step required: Skill source is in the repo but must be installed to ~/.claude/skills/ via make flux-install-ce-skill (included in make onboard). Updates to the skill require re-running the install target.
  4. §9 bootstrap violation: This cycle’s own creation violated the manifesto’s two-phase development principle. Documented and accepted as a one-time bootstrap exception.