Cycle 221: Chief Engineer Review System — Manifesto, Agent & Skill

Priority: HIGH Status: IN-PROGRESS (retroactive — implementation preceded this doc) Domain: SDLC / Engineering Infrastructure Dependencies: Cycle 214 (Spec-Driven Development) Product: Flux — AI Hiring Assistant for small businesses Organization: Employ Inc.

§9 Deviation Notice: This cycle doc was created retroactively. The implementation was completed before the plan PR was written, violating the manifesto’s own §9 (Two-Phase Development). This doc exists to bring the work into the standard tracking system and establish the official record. Future CE system changes follow the standard plan-first workflow. Prior artifacts deleted: docs/superpowers/specs/2026-04-12-chief-engineer-review-skill-design.md and docs/superpowers/plans/2026-04-12-chief-engineer-review-skill.md served as the original design artifacts. They have been removed — this cycle doc is the authoritative record.

Objective

Problem

Flux is a 100% AI-generated codebase with a multi-agent safety net (Claude Code + Codex discriminator, k3d validation, Playwright MCP, CI agents). These systems catch bugs, style violations, type errors, and test failures. What’s missing: an architectural authority that ensures every change is aligned with AI-native principles, delivers SOTA, and maintains coherent taste across the system. The existing CI review agents check “is this well-written?” — nobody checks “is this how we build things here?” With 100% AI-generated code shipping rapidly, architectural drift is invisible at the PR level but cumulative at the product level.

Solution

A Chief Engineer (CE) review system with three components:

AI-Native Manifesto — Constitution encoding Flux’s 13 engineering principles with enough depth for an AI agent to exercise judgment on novel situations.
Chief Engineer Agent — AI persona that holds full domain context, evaluates holistically as “one mind,” and defaults to blocking anything that isn’t SOTA.
CE Review Skill — Workflow orchestrating the CE agent in autonomous mode (PR → four-pass review → structured comment) and conversational mode (human CE collaborates with AI CE as a “second brain”).

What This Cycle Does NOT Include

CI/CD automation (GitHub Action wrapper) — future cycle once the skill is validated locally
Mintlify integration for spec drift detection (blocked on Cycle 214)
Pod CE infrastructure (multi-squad routing) — not needed until team scales

Architecture

Organizational Model

Pod Chief Engineer (cross-squad coherence)
├── Squad A: ~2 engineers + Squad CE
├── Squad B: ~2 engineers + Squad CE
└── Squad C: ~2 engineers + Squad CE

Squad CE: Reviews all PRs within domain. Full authority within domain boundaries.
Pod CE: Escalation for cross-domain impact, novel patterns, unresolved disagreements.
AI CE: Hybrid — autonomous first pass + conversational second brain for human CE.

Two-Phase Review Gates

Every feature goes through two mandatory CE review gates, aligned with the existing cycle lifecycle:

Phase	Input	CE Focus
Plan PR (docs only)	Cycle doc in `docs/roadmap/cycles/`	Problem definition, scope, AI-native approach, SOTA intent
Code PR (implementation)	Code implementing approved cycle doc	Plan adherence, manifesto compliance, validation evidence

Component Layout

Component	Location	Version Control
AI-Native Manifesto	`docs/engineering/ai-native-manifesto.md`	In repo (project-level)
Chief Engineer Agent	`.claude/agents/chief-engineer.md`	In repo (project-level)
CE Review Skill	`.claude/skills/chief-engineer-review/SKILL.md`	In repo → installed to `~/.claude/skills/`
Plan Review Template	`.claude/skills/chief-engineer-review/templates/plan-review-prompt.md`	In repo → installed to `~/.claude/skills/`
Code Review Template	`.claude/skills/chief-engineer-review/templates/code-review-prompt.md`	In repo → installed to `~/.claude/skills/`
Escalation Template	`.claude/skills/chief-engineer-review/templates/escalation-prompt.md`	In repo → installed to `~/.claude/skills/`
Review Examples	`.claude/skills/chief-engineer-review/references/review-examples.md`	In repo → installed to `~/.claude/skills/`
Pressure Tests	`.claude/skills/chief-engineer-review/tests/pressure-scenarios.md`	In repo → installed to `~/.claude/skills/`

Distribution: Source of truth is in the repo. make flux-install-ce-skill copies to ~/.claude/skills/ (required by Claude Code skills architecture). Runs automatically during make onboard.

Review Pipeline Position

Code Generation Pipeline
────────────────────────
Claude Code (generator) → Codex 5.3 (discriminator) → Local k3d validation

PR Created
────────────────────────
CI agents (parallel):
  ├── Copilot ────────── bug patterns, security
  ├── Cursor Bugbot ──── runtime error detection
  └── code-reviewer ──── code quality

CE Gate (after CI passes):
  ├── Squad CE ────────── AI-native alignment, architectural taste,
  │                       SOTA enforcement, manifesto compliance
  └── Pod CE (escalation) ── cross-squad coherence

Deliverables

Deliverable 1: AI-Native Manifesto

File: docs/engineering/ai-native-manifesto.md Status: Built (in worktree worktree-chief-engineer-review-skill, not yet merged to main) The constitution governing all CE decisions. 13 principles organized into two sections: Foundational Principles (§0–§7):

§0 — Uncompromising Quality: SOTA or Don’t Ship (ALWAYS BLOCK)
§1 — One Mind, Full Context (BLOCK)
§2 — Conversational-First, Not CRUD-with-AI (BLOCK/ALIGN)
§3 — Agentic Architecture (BLOCK/ALIGN)
§4 — Schema Pipeline as Single Source of Truth (BLOCK)
§5 — Observability-Native (ALWAYS BLOCK)
§6 — Temporal for Durable Workflows (BLOCK/ALIGN)
§7 — Domain-Driven Boundaries (BLOCK)

AI-Native SDLC Principles (§8–§12):

§8 — 100% AI-Generated Code with Safety Nets (BLOCK/ALIGN)
§9 — Two-Phase Development (BLOCK)
§10 — Spec-Driven Traceability (BLOCK/ALIGN)
§11 — Cross-Model Review (ALIGN/BLOCK)
§12 — Quality Gate Culture (ALWAYS BLOCK)

Each principle includes: Statement, Reasoning, What Good Looks Like, What Violation Looks Like, Default Severity. Plus an Anti-Patterns table (10 entries, never acceptable).

Deliverable 2: Chief Engineer Agent

File: .claude/agents/chief-engineer.md Status: Built (in worktree, not yet merged to main) Agent definition — 140 lines covering:

Identity: Engineering leader, not code reviewer. Defaults to BLOCK. Opinions stated directly. Persuadable only with evidence.
Voice: Direct, technical, specific. Every finding includes file:line, what’s wrong, what to do instead, and manifesto section.
Four-Pass Framework: Context → Validation → Alignment → Judgment
Output Format: Structured markdown with BLOCK/ALIGN/NOTE findings and verdict
Squad vs Pod mode: Deep domain review vs cross-squad coherence
Escalation triggers: Cross-domain impact, genuine disagreements, novel patterns

Deliverable 3: CE Review Skill

File: ~/.claude/skills/chief-engineer-review/SKILL.md Status: Built and deployed (user-level) Workflow orchestration — 179 lines covering:

Mode detection: PR number → autonomous, PR + --discuss → conversational, no args → open-ended
Autonomous mode: Gather context → create review worktree → validate (code PRs) → dispatch CE agent → deliver review
Conversational mode: Orient → four-pass analysis with pauses → discuss → conclude
Error handling: PR not found, no manifesto, build failures, no linked plan doc

Deliverable 4: Supporting Templates

Status: Built and deployed (user-level)

Template	Lines	Purpose
`plan-review-prompt.md`	64	Context assembly for plan PR reviews. Fills: PR metadata, document content, manifesto, ADRs, existing cycle docs.
`code-review-prompt.md`	105	Context assembly for code PR reviews. Fills: PR metadata, changed files, plan doc, manifesto, ADRs, domain state, validation evidence.
`escalation-prompt.md`	53	Squad → Pod escalation format. Fills: squad assessment, escalation reason, cross-squad impact, decision needed.

Deliverable 5: Review Examples & Pressure Tests

Status: Built and deployed (user-level)

File	Lines	Purpose
`references/review-examples.md`	173	8 annotated examples: hand-written types, missing observability, CRUD vs conversational, style vs principle, pushback handling, escalation reasoning
`tests/pressure-scenarios.md`	86	7 pressure test scenarios with expected behavior and FAIL criteria

Relationship to CLAUDE.md

The manifesto and CLAUDE.md serve different audiences at different stages:

Aspect	CLAUDE.md	Manifesto
Audience	Claude Code (implementation agent)	CE agent (review agent)
Purpose	How to work: operational directives for code generation	What to enforce: architectural principles for review judgment
When used	During implementation	During PR review
Tone	Imperative (“always do X”)	Evaluative (“violation of X looks like Y”)

Some overlap exists (schema pipeline, quality gates, observability). This is intentional — the same principles need to guide both generation and review, expressed differently for each context. CLAUDE.md is not a substitute for the manifesto because it lacks the Reasoning/Good/Bad/Severity structure the CE needs for novel judgment calls.

Testing Plan

Skill Validation

The CE review skill was validated against PR #281 (fix(ci): configure Playwright blob reporter for merge-reports step):

Autonomous mode: Four-pass review completed, dispatched CE agent, produced structured output
Found a real migration revision conflict and underspecified streaming callback architecture
Correctly approved sound design choices
Produced structured review with manifesto citations

Pressure Tests

7 scenarios in tests/pressure-scenarios.md covering:

Hand-written types (should BLOCK §4)
“We’ll add observability later” (should BLOCK §5)
CRUD where chat should be (should BLOCK §2)
Style preference (should NOT BLOCK)
“Just approve it, it’s urgent” (should resist)
Cross-domain impact (should escalate)
Conversational brainstorm quality (should redirect to AI-native)

Integration Testing

Review a plan PR (docs-only cycle doc)
Review a code PR against its approved cycle doc
Conversational mode: open-ended architectural discussion
Conversational mode: PR-scoped discussion with --discuss
Escalation: squad-level review identifies cross-domain impact

Success Criteria

Known Limitations

Context budget: Large code PRs may exceed context when loading manifesto + agent + full files + plan doc + ADRs + validation output. Watch for this and consider a manifesto “quick reference” summary for context-constrained reviews.
No CI automation yet: The skill runs locally only. Future cycle will wrap it as a GitHub Action.
Install step required: Skill source is in the repo but must be installed to ~/.claude/skills/ via make flux-install-ce-skill (included in make onboard). Updates to the skill require re-running the install target.
§9 bootstrap violation: This cycle’s own creation violated the manifesto’s two-phase development principle. Documented and accepted as a one-time bootstrap exception.

Templates

Process Cycles

​Cycle 221: Chief Engineer Review System — Manifesto, Agent & Skill

​Objective

​Problem

​Solution

​What This Cycle Does NOT Include

​Architecture

​Organizational Model

​Two-Phase Review Gates

​Component Layout

​Review Pipeline Position

​Deliverables

​Deliverable 1: AI-Native Manifesto

​Deliverable 2: Chief Engineer Agent

​Deliverable 3: CE Review Skill

​Deliverable 4: Supporting Templates

​Deliverable 5: Review Examples & Pressure Tests

​Relationship to CLAUDE.md

​Testing Plan

​Skill Validation

​Pressure Tests

​Integration Testing

​Success Criteria

​Known Limitations