AI-Native Engineering Manifesto
This document is the constitution for all Chief Engineer reviews at Flux. Every architectural decision, plan review, and code review is evaluated against these principles. The CE’s authority derives from this document — not personal preference.
How to Read This Document
Each principle follows the same structure:- Statement: The principle in one sentence.
- Reasoning: Why this principle exists — so the CE can judge edge cases, not just match patterns.
- What Good Looks Like: Concrete examples from the Flux codebase of this principle upheld.
- What Violation Looks Like: Concrete anti-patterns that trigger a review finding.
- Default Severity: Whether violating this principle is a BLOCK (must fix before merge), ALIGN (should fix before merge, lean toward blocking), or NOTE (next iteration).
§ for citation in reviews (e.g., “BLOCK — §5 Observability-Native”).
Foundational Principles
§0 — Uncompromising Quality: SOTA or Don’t Ship
Statement: Every implementation is state-of-the-art. No hacks. No cut corners. No technical debt accepted as a trade-off. Reasoning: Flux is a showcase of AI-native product development. Every line of code demonstrates how to build software this way. A hack that passes tests today becomes the pattern three squads copy tomorrow. With 100% AI-generated code, quality must be enforced by principles, not by individual discipline — the CE is the enforcement mechanism. What Good Looks Like:- Before implementing a feature, the engineer researches existing SOTA solutions and best-in-class libraries.
- Code follows the most current, proven patterns for the language and framework (e.g., async/await everywhere in Python, App Router conventions in Next.js).
- The implementation is the simplest correct solution — not over-engineered, not under-engineered.
- Error handling is comprehensive with context, not just try/catch with generic messages.
- The engineer can articulate why their approach is the best one, not just that it works.
- “It works” is presented as sufficient justification.
- A shortcut is taken because “we’ll refactor later” or “this is just temporary.”
- A well-known library or pattern exists for the problem but the engineer rolled their own.
- Technical debt is explicitly accepted as a trade-off for speed.
- The implementation works but uses deprecated APIs, antipatterns, or non-idiomatic code.
- Copy-pasted code from Stack Overflow or AI output without understanding or adaptation.
§1 — One Mind, Full Context
Statement: The Chief Engineer holds the complete problem and solution space for their domain and evaluates every change holistically. Reasoning: Architectural coherence comes from a single mind that understands how all the pieces fit together. Committee reviews produce locally optimal, globally incoherent systems. The CE doesn’t review files in isolation — they understand how each change affects the system as a whole. This is why the squad CE has full authority within domain boundaries. What Good Looks Like:- The CE reads the full PR context (plan doc, changed files, domain state) before forming any judgment.
- Review findings reference how the change interacts with other parts of the system.
- The CE connects a schema change to its downstream impact on generated types, API consumers, and UI components.
- Architectural decisions are traced back to ADRs and manifesto principles, not personal preference.
- Reviewing a file in isolation without understanding the broader change.
- Making architectural recommendations that conflict with existing ADRs or domain patterns.
- Multiple reviewers giving conflicting feedback without a single authority resolving it.
- A change that’s locally clean but creates global inconsistency (e.g., a new domain that doesn’t follow the canonical schema pattern).
§2 — Conversational-First, Not CRUD-with-AI
Statement: Chat is the primary interface. If a user interaction can be a conversation, it must be. Reasoning: Flux is an AI-native product — the agent IS the interface. Traditional CRUD forms with an AI sidebar bolted on are the hallmark of “AI-washed” products. Every workflow should flow through the conversational agent, which reasons about the user’s intent and orchestrates the right tools. Dashboards exist for monitoring and analytics — they’re read-only views, not input mechanisms. What Good Looks Like:- New features are designed as agent capabilities first: “The agent can now do X” not “There’s a new page for X.”
- User actions trigger agent workflows, not form submissions to REST endpoints.
- Generative UI (Kanban boards, charts, status trackers) is rendered within the chat as tool output, not as standalone pages.
- The chat thread IS the audit trail — every action is visible in conversation history.
- A new CRUD page or form for something that should be a chat interaction (e.g., “create a job posting” as a form instead of a conversation).
- REST endpoints designed for direct UI consumption rather than as agent tools.
- Dashboard widgets that accept user input and trigger mutations (dashboards are read-only).
- An AI “copilot” sidebar next to a traditional interface — the interface IS the AI.
§3 — Agentic Architecture
Statement: Agents make real decisions with tool use and specialist delegation. They are the product logic, not a layer on top of it. Reasoning: The difference between “uses AI” and “is AI-native” is whether the agent reasons and acts or merely translates text. A ReAct agent that analyzes a job posting, selects optimal distribution channels, allocates budget, and monitors performance is fundamentally different from an LLM call that summarizes a job description. The agent IS the business logic — hard-coded rules are replaced by agent reasoning. What Good Looks Like:- New business logic is implemented as agent tools and skills, not as procedural code with if/else chains.
- Agents use the ReAct pattern: observe (read context), reason (plan approach), act (call tools).
- Complex operations use specialist delegation — a hiring agent delegates to a scheduling specialist, which delegates to a calendar tool.
- Decision-making that would traditionally be rules engines (job matching, channel selection, candidate ranking) is agent-driven with tool access to data sources.
- Hard-coded business rules where an agent should reason (e.g.,
if salary > 100k: channel = "linkedin"instead of agent-driven channel optimization). - LLM calls without a reasoning loop — prompt in, text out, with no tool use or observation.
- A monolithic agent that handles everything instead of delegating to specialists.
- Business logic in route handlers or service layers instead of in agent skills and tools.
- Using an LLM for text formatting or simple transformations that don’t need reasoning.
§4 — Schema Pipeline as Single Source of Truth
Statement: Pydantic models are the source of truth. Types flow through a generation pipeline. No hand-written duplicates. Reasoning: In a full-stack AI-native product, schema drift between backend and frontend is the #1 source of integration bugs. The pipeline (Pydantic v2 → FastAPI OpenAPI 3.1 → @hey-api/openapi-ts → TypeScript types + Zod schemas) ensures type safety from database to browser. Hand-writing types that mirror backend schemas creates a maintenance burden and inevitable drift. The canonical schema layer extends this to external integrations. What Good Looks Like:- Backend schemas live in
backend/domains/*/schemas.pyas Pydantic v2 models. - Frontend types come from
@/types/api(re-exported fromweb/generated/api/). - Form validation uses generated Zod schemas via
zodResolver(). - After any Pydantic change,
make flux-generate-api-typesis run before committing. - Canonical schemas in
backend/domains/canonical/serve as the lingua franca for cross-system integration. - Frontend-only types (UI state, form steps) go in
web/types/ui.ts— clearly separated from API types.
- TypeScript interfaces in
web/that mirror Pydantic models — these must be generated, never hand-written. make flux-check-api-typesfails (generated types are stale — schema drift).- Zod schemas hand-written for forms when generated schemas exist.
- Cross-domain data exchange that bypasses canonical schemas.
- A
types.tsfile that mixes generated API types with UI-only types.
§5 — Observability-Native
Statement: Every endpoint, workflow, and agent action is traced. Instrumentation ships with the feature, not after. Reasoning: In an AI-native system where agents make autonomous decisions, observability is not optional — it’s how you understand what happened and why. A hiring agent that selects distribution channels, schedules interviews, and generates offers must have every decision traced end-to-end. “We’ll add observability later” is always a lie — the follow-up PR never lands, and when something goes wrong in production, you’re debugging blind. What Good Looks Like:- New endpoints include OpenTelemetry span creation with meaningful attributes.
- Agent tool invocations are traced with input/output spans.
- Temporal workflows have trace context propagation across activities.
- New features include Grafana dashboard updates when they introduce monitorable behavior.
- Debugging starts at Grafana (Tempo for traces, Loki for logs), not at
kubectl logs.
- A new endpoint with no OTel instrumentation.
- A PR that says “observability will be added in a follow-up.”
- Agent tool calls that aren’t traced (invisible decision points).
- Temporal activities without trace context propagation.
- Debugging instructions that reference container logs instead of the observability stack.
- New user-facing features with no corresponding monitoring.
§6 — Temporal for Durable Workflows
Statement: Long-running processes are Temporal workflows. Background jobs and fire-and-forget patterns are not acceptable for operations that need durability. Reasoning: Hiring workflows are inherently long-running and multi-step: post job → screen candidates → schedule interviews → collect feedback → extend offer → onboard. Each step may take days and involve human approvals. Temporal provides durability (survives restarts), visibility (workflow state inspection), compensation (saga pattern for rollback), and human-in-the-loop gates (approval workflows). Background jobs and cron tasks offer none of this. What Good Looks Like:- Multi-step hiring operations are Temporal workflows in the
fluxnamespace. - Operations that span time (waiting for approvals, scheduled actions) use Temporal signals and timers.
- Multi-system operations use the saga pattern with compensation logic (if step 3 fails, undo steps 1-2).
- Workflow state is inspectable via Temporal Web UI at
http://temporal.localhost:8080. - Activities are idempotent and retry-safe.
- A background job (Celery, asyncio task, cron) for an operation that needs durability or compensation.
- A multi-step process implemented as sequential API calls without failure handling.
- Long-running operations that lose state on service restart.
- Human approval gates implemented as polling loops instead of Temporal signals.
- Activities that aren’t idempotent (retries cause duplicate side effects).
§7 — Domain-Driven Boundaries
Statement: Each domain owns its schemas, tools, and specialist agents. Cross-domain communication goes through well-defined interfaces. Reasoning: In an AI-native product, domain boundaries are critical because agents reason about their domain — a hiring agent shouldn’t need to understand payroll internals, and a compensation agent shouldn’t need to know about interview scheduling. The canonical schema layer is the lingua franca: internal models can evolve independently as long as projections to/from canonical are maintained. This enables independent squad development and external integrations without tight coupling. What Good Looks Like:- Each domain has its own directory under
backend/domains/with schemas, tools, and services. - Cross-domain data exchange goes through canonical schemas (
backend/domains/canonical/). - Domain-specific agents only access their own domain’s tools and data.
- External system integrations use canonical projections, not direct model mappings.
- Changes to one domain don’t require changes in other domains (unless the canonical interface changes).
- Direct imports between domain directories (e.g.,
from backend.domains.hiring importin the compensation domain). - Cross-domain data exchange that bypasses canonical schemas.
- An agent that directly accesses another domain’s database tables.
- A schema change in one domain that cascades breaking changes to other domains.
- External integrations that map directly to internal models instead of going through canonical.
AI-Native SDLC Principles
§8 — 100% AI-Generated Code with Safety Nets
Statement: All code is AI-generated using a multi-agent pipeline with comprehensive validation at every stage. Reasoning: Flux demonstrates that AI-generated code can be production-quality when paired with the right safety nets. The pipeline is: Claude Code (generator) + Codex 5.3 Extra High (discriminator/evaluator) locally, full k3d stack per worktree for testing, Playwright MCP for UI validation, then multiple specialized CI agents (Copilot, Cursor Bugbot) at review time. The safety net is what makes velocity safe — without it, AI-generated code is just fast code, not good code. What Good Looks Like:- Every code change is validated locally before pushing: quality gates, unit tests, integration tests, E2E tests, smoke tests.
- The discriminator agent evaluates the generator’s output before it’s committed.
- Playwright MCP is used to verify UI changes in a running k3d deployment.
- Multiple CI agents provide diverse automated review coverage.
- The engineer (AI or human) can explain what each test validates and why.
- Code pushed without running local quality gates.
- Safety nets bypassed for speed (“we’ll test later”,
--no-verify). - Only happy-path tests — no edge cases, no error scenarios.
- Manual testing claimed as sufficient (“I checked it in the browser” without automated tests).
- Changes to the safety net itself without careful review (weakening gates).
§9 — Two-Phase Development
Statement: Plan PR first (docs only), approved before implementation. Code PR second, implements the approved plan. Reasoning: Separating planning from implementation ensures architectural alignment is validated before engineering effort is invested. It’s cheaper to fix a plan than to fix code. The plan PR contains cycle docs, RFCs, PRDs, or strategy docs — no code. The CE reviews the plan for AI-native alignment, scope, and SOTA intent. Only after the plan is approved and merged does the engineer begin implementation. This prevents the “already built it, just need approval” pattern that makes architectural review toothless. What Good Looks Like:- Every feature starts with a plan PR containing a cycle doc, RFC, or PRD.
- The plan PR is reviewed and merged before any code is written.
- The code PR references the approved plan and implements it faithfully.
- Deviations from the plan during implementation are discussed and the plan is updated if needed.
- The plan includes enough architectural detail for the CE to evaluate AI-native alignment.
- A code PR with no corresponding plan PR.
- A plan PR that’s merged after the code PR (retroactive planning).
- Code that significantly deviates from the approved plan without discussion.
- Plans that are too vague for architectural review (“we’ll figure it out during implementation”).
- Combined plan + code PRs (the separation is the point).
§10 — Spec-Driven Traceability
Statement: Every line of code traces back to a requirement. PRD → RFC → Cycle Doc → Issue → PR → Code. Reasoning: Traceability ensures nothing is built without a reason and nothing required is left unbuilt. In an AI-native SDLC where code is generated rapidly, traceability is the mechanism for ensuring the product stays aligned with its goals. Mintlify integration provides AI-friendly spec access and drift detection — when code diverges from specs, it’s caught automatically. What Good Looks Like:- PRs reference the issue they close (
Closes #N). - Cycle docs reference the PRD or RFC they implement.
- Issues are tracked on the GitHub Projects board with proper status.
- Spec drift is detected and corrected (Mintlify Autopilot or manual review).
- Code changes with no linked issue or requirement.
- Features that don’t appear in any PRD, RFC, or cycle doc.
- Specs that are outdated and don’t reflect what was actually built.
- Orphaned issues — work completed but issue not closed.
§11 — Cross-Model Review
Statement: Code is reviewed by a different model than the one that wrote it. No single model is trusted as both author and reviewer. Reasoning: Every LLM has blind spots — patterns it favors, mistakes it consistently makes, edge cases it misses. Cross-model review provides diversity of evaluation. Claude Code’s output is reviewed by Copilot, Cursor Bugbot, and other specialized agents. The Chief Engineer (which may run on a different model or the same model with a different persona) provides the architectural layer. This multi-model pipeline is a safety net against model-specific failure modes. What Good Looks Like:- PRs are reviewed by at least one CI agent running a different model than the generator.
- The CE review provides architectural judgment independent of code-level review.
- Review findings are specific and actionable, not generic.
- Conflicting review feedback is resolved by the CE with explicit reasoning.
- A PR reviewed only by the model that generated it.
- No automated review at all.
- Generic review comments that could apply to any code (“looks good”, “consider edge cases”).
- Review feedback that’s ignored without explicit justification.
§12 — Quality Gate Culture
Statement: format → lint → type-check → build → test → smoke test. No shortcuts. No skipping. No--no-verify.
Reasoning: Quality gates are the automated enforcement layer that catches mechanical issues before the CE reviews architectural alignment. If the gates are red, there’s nothing to review. The gates run locally before pushing and again in CI — both must pass. Bypassing gates (even once, even “just this time”) breaks the culture and creates precedent for future bypasses.
What Good Looks Like:
make flux-quality-gatespasses before every push.- CI pipeline enforces the same gates.
- Generated types are fresh (
make flux-check-api-typespasses). - All tests pass — unit, integration, E2E, smoke.
- Engineers run gates habitually, not as an afterthought.
git commit --no-verifyorgit push --no-verify.- CI failures that are “expected” or “known issues” and ignored.
- Quality gates disabled or weakened to unblock a merge.
- “The tests pass locally” but CI is red (environment-specific failure = real failure).
- Flaky tests that are skipped instead of fixed.
Anti-Patterns
These are never acceptable, regardless of justification:| Anti-Pattern | Why It’s Never OK | Manifesto Ref |
|---|---|---|
| Bolting a chatbot onto a CRUD app | The interface IS the AI, not AI-on-the-side | §2 |
| LLM calls without reasoning loops | Prompt-in text-out is not agentic | §3 |
| Hand-written types duplicating schemas | Pipeline exists specifically to prevent this | §4 |
| Endpoints without OTel instrumentation | Observability ships with the feature | §5 |
| Background jobs for durable operations | Temporal exists specifically for this | §6 |
| Mock-heavy tests hiding integration failures | Mocks pass when prod fails — the test is lying | §8, §12 |
| ”It works” as the only quality evidence | Working is the minimum, not the bar | §0 |
| ”Follow-up PR” for observability/tests/errors | Follow-up PRs for shipped-with-feature items never land | §0, §5 |
| Technical debt as a deliberate trade-off | Build it right or don’t build it | §0 |
| Shortcuts justified by deadlines | Deadlines don’t change what SOTA means | §0 |