Cycle 451: Autonomous Release Notes — Per-PR “What’s Changed” Blog Posts in Mintlify

Priority: HIGH (pre-pilot DX investment, AI-native publishing showcase) Status: DONE Domain: infra Wave: 10 (Process & Spec-Driven Dev) Milestone: Pre-pilot DX Owner: @pj Dependencies: Cycle 214 (Mintlify platform foundation, docs.json, root AGENTS.md) Issue: #451 Plan PR: #452 Product: Flux — AI-native hiring platform Organization: Employ Inc. (employ-inc GitHub org)

Overview

Per-PR release communication today is limited to PR descriptions (uneven quality), commit messages (no narrative), and — once cycle 400 lands — a Slack digest of merged PRs (team-facing, daily, multi-PR). What’s missing is a single, polished, navigable changelog surface that evaluators, testers, employer customers, and external readers can use to see what shipped and click through to try it. Rapid SOTA teams (Linear, Resend, Knock, PostHog, Vercel) all publish blog-style changelogs, and almost all of them are human-written — which is why most rapid teams’ changelogs are stale, terse, or missing. Flux has an asymmetric advantage here: cycle docs already carry the “why” (motivation, scope, design), diffs are fully accessible, Playwright is already in the stack, and Mintlify (cycle 214) is the publishing surface. Feed those four inputs into a three-pass Claude pipeline — Facts → Narrative → Verifier, with a hard-coded slop-voice guard — and we can produce SOTA narrative changelog entries on every merge to main with no human in the loop, without sounding like AI slop. This cycle ships the autonomous generator end to end: GitHub Action trigger, context assembly, Playwright screenshots, three-pass synthesis, Mintlify MDX output, and a graceful human-handoff failure mode. The post is the canonical artifact for “what shipped” — not the PR description, not the commit message, not the Slack digest. This is also a deliberate showcase of Mintlify’s full AI-native surface: MDX, <Update> component, section-level AGENTS.md, Autopilot as a secondary reviewer, auto-generated MCP server (so AI agents can answer “what shipped this week?”), llms.txt for downstream agent grounding, and Mintlify’s contextual buttons for one-click handoff to Claude/Cursor.

Current State

What Works

Cycle 214 has merged the spec-driven dev model and selected Mintlify as the docs platform. The docs.json config, root AGENTS.md, and Mintlify Autopilot are all part of the platform foundation.
Cycle 400 (open implementation PR #418) ships a /changelog Claude Code skill that generates a Slack-formatted digest of merged PRs — different surface, different cadence, different audience.
Cycle 380 (UI design skill) establishes Playwright screenshot conventions and docs/design/cycle-{N}/ artifact patterns we can mirror.
CI infrastructure: gh api access to PR metadata, check runs, and diff is already wired through GitHub Actions.

What’s Missing

No per-PR changelog surface — readers (evaluators, testers, customers) have no canonical place to see what shipped, formatted for human consumption.
No Mintlify changelog page — docs/changelog/ does not exist.
No autonomous publishing pipeline — Mintlify has Autopilot for spec drift, but no out-of-the-box “PR → narrative blog post” generator.
No voice-quality enforcement — nothing prevents AI-generated marketing-speak slop from being published.
No diff → screenshot pipeline — Playwright is wired for cycle 380’s design workflow but not for changelog automation.

Scope

In Scope (Phase 1 — this cycle)

1. GitHub Action `autonomous-changelog.yml` (~0.5 day)

Triggers (parity with cycle 400’s daily-changelog.yml):

on: pull_request: types: [closed] filtered to merged == true against main — primary path
on: workflow_dispatch: with pr_number input — manual regeneration / debugging / replaying a PR after prompt iteration

Permissions: contents: write (commit MDX), pull-requests: write (open follow-up PR on failure), issues: write (post status comment). Concurrency: scoped per-PR to prevent duplicate generation on retry.

2. Context assembly module (~1 day)

scripts/autonomous_changelog/context_assembly.py. Pure function assemble_context(pr_number) -> ChangelogContext that gathers, in parallel:

PR metadata — title, body, author, labels, conventional commit prefix from title
Diff — gh api repos/.../pulls/{n}/files for file list; git diff base...merge_sha -U3 for hunks
Commits — full commit messages and bodies via gh pr view --json commits
Cycle doc — resolved from branch name regex cycle(\d+(\.\d+)?)/... → docs/roadmap/cycles/cycle{N}-*.md. Falls back to PR body link extraction if branch doesn’t match.
CI results — gh api repos/.../check-runs for the merge SHA. Captures pass/fail summary per check, not full logs.
Linked issues — parse Closes #N / Fixes #N from PR body; fetch issue titles for context.
Preview/staging URL — read from existing GH deployment status API (cycle 209.2 preview env) or fall back to staging.

Output: a single ChangelogContext Pydantic model (defined in schemas.py). Heavy diffs are truncated to the first 10 000 changed lines + a one-line summary per truncated file, to bound token cost.

3. Playwright screenshot pass (~1 day)

scripts/autonomous_changelog/screenshot_runner.py. Diff-driven route discovery:

Identify changed user-facing routes by scanning web/app/**/page.tsx paths in the diff and mapping to URL paths.
Identify changed components by file path; for each, find a containing route via static analysis (best-effort).
Spin up Playwright (Chromium, stable) against the already-deployed preview or main URL (no local app spin-up). Capture each route at desktop (1440×900) and mobile (390×844).
Save to docs/changelog/images/<slug>/<route-slug>-{desktop,mobile}.png.
Capture metadata: route, viewport, response status, capture timestamp.

Failure tolerance: a single bad route never aborts the pass — log and continue. If no user-facing routes are detected (pure backend PR), skip screenshots entirely.

4. Three-pass Claude synthesis (~2 days)

scripts/autonomous_changelog/synthesis/. Implemented per the claude-api skill (Anthropic SDK, prompt caching, structured outputs). Detailed in Three-Pass Synthesis below.

5. Voice & anti-slop guardrails (~1 day)

scripts/autonomous_changelog/synthesis/voice_guard.py. Detailed in Voice & Anti-Slop Guardrails. Includes:

Hard-coded regex blocklist (~30 phrases)
Soft heuristics (sentence length, adjective density, opener patterns)
Few-shot voice samples in docs/changelog/_examples/
Voice guide in docs/changelog/_voice-guide.md (consumed by prompts)

6. Mintlify integration (~1 day)

scripts/autonomous_changelog/mintlify_writer.py — emits per-PR MDX files using the <Update> component
docs/changelog/index.mdx — landing page that aggregates entries (newest first, grouped by month)
docs/changelog/AGENTS.md — section-level AI customization (immutability rules, MCP grounding instructions)
docs/docs.json — adds a Changelog tab with auto-grouped pages via glob changelog/*
Mintlify Autopilot is invoked as a secondary review on the generated MDX (catches markdown/component syntax errors before publish)

7. Failure mode + human handoff (~0.5 day)

scripts/autonomous_changelog/failure_handoff.py. When Pass 3 rejects, slop-guard fires, or any pass errors:

Open a follow-up PR titled chore(changelog): handoff for #{N} — {failure reason} containing the draft MDX and a structured comment with flagged issues.
Post a non-blocking PR comment on the original merged PR linking to the handoff PR.
Never revert the original merge; never block CI on changelog generation.

8. Golden set + voice samples (~1 day)

Hand-curate five reference posts in docs/changelog/_examples/ covering: a frontend feature, a backend feature, a bug fix, a refactor with no user-facing change, and a complex multi-component cycle. These are the few-shot exemplars for Pass 2.
Hand-write docs/changelog/_voice-guide.md (one page) — voice rules, what to avoid, what good looks like. Prompts cite this guide.
Hand-curate the slop blocklist seed list (~30 phrases) from public Linear / Resend / Knock / PostHog / Vercel changelogs (positive examples) versus AI-generated marketing copy (negative examples).

9. Verification, observability, documentation (~0.5 day)

Unit tests for voice guard (≥ 50 phrase test cases)
Unit tests for context assembly (3 fixture PRs)
Integration test: end-to-end on a known-good past PR (e.g., cycle 365 plan PR), output reviewed manually
Token cost emitted to GitHub Action summary per run
Failure mode tested by injecting a deliberate slop phrase into Pass 2 output
Operator/contributor guide: docs/guides/autonomous-changelog.md

Total: ~8.5 engineer-days (≈ 1.5 weeks)

Out of Scope (Phase 2+)

Slack notification on publish — adjacent to cycle 400; deferred to keep cycles separate.
Weekly AI-synthesized “Shipped” roll-up post — a separate generator that consumes the per-PR posts.
Eval harness auto-trigger from generated “Try it” steps — ties to cycle 209.7 (post-merge validation + evals).
Internal-only “evaluator notes” section — role-gated content via Mintlify auth tiers.
Customer email digest — monthly newsletter sourced from changelog.
Multi-PR release-level summaries — group merged PRs in a release window into a single post.
Author-edit loop — letting authors comment /changelog edit on a PR to trigger regeneration with hints. Phase 2 if friction emerges.

Architecture

Pipeline Flow

PR merged to main
    │
    ▼
┌──────────────────────────────────────────────────────────────┐
│ .github/workflows/autonomous-changelog.yml                    │
│   trigger: pull_request closed && merged == true              │
│   concurrency: per-PR (cancel-in-progress: false)             │
└──────────────────────────────────────────────────────────────┘
    │
    ▼
┌──────────────────────────────────────────────────────────────┐
│ scripts/autonomous_changelog/pipeline.py (orchestrator)       │
└──────────────────────────────────────────────────────────────┘
    │
    ├─▶ context_assembly.py       (PR + diff + cycle doc + CI)
    │
    ├─▶ screenshot_runner.py      (Playwright on changed routes)
    │
    ▼
┌──────────────────────────────────────────────────────────────┐
│ synthesis/                                                    │
│   pass 1: facts.py        (Opus 4.7, T=0, structured JSON)    │
│   pass 2: narrative.py    (Opus 4.7, T=0.4, MDX output)       │
│   pass 3: verifier.py     (Sonnet 4.6, T=0, verdict JSON)     │
│   voice_guard.py          (regex blocklist + heuristics)      │
└──────────────────────────────────────────────────────────────┘
    │
    ├─[verdict: publish]─▶ mintlify_writer.py
    │                          │
    │                          ▼
    │                      docs/changelog/YYYY-MM-DD-<slug>.mdx
    │                          │
    │                          ▼
    │                      git commit + push to main
    │                          │
    │                          ▼
    │                      Mintlify auto-deploys
    │                          │
    │                          ▼
    │                      PR comment: "Published → <Mintlify URL>"
    │
    └─[verdict: human_review]─▶ failure_handoff.py
                                   │
                                   ▼
                               open follow-up PR with draft MDX
                                   │
                                   ▼
                               PR comment: "Handoff PR opened → #{M}"

Repository Layout

.github/workflows/
└── autonomous-changelog.yml          NEW — trigger workflow

scripts/autonomous_changelog/         NEW — Python module
├── __init__.py
├── pipeline.py                       Orchestrator (CLI entrypoint)
├── context_assembly.py               PR/diff/cycle/CI gather (parallel async)
├── screenshot_runner.py              Playwright route discovery + capture
├── mintlify_writer.py                MDX file generation, frontmatter, <Update>
├── failure_handoff.py                Open follow-up PR on rejection
├── schemas.py                        Pydantic models (ChangelogContext, FactList, Verdict)
├── synthesis/
│   ├── __init__.py
│   ├── facts.py                      Pass 1: extract verified facts from diff
│   ├── narrative.py                  Pass 2: write MDX in Linear voice
│   ├── verifier.py                   Pass 3: cross-check + slop check
│   ├── voice_guard.py                Regex blocklist + heuristics
│   └── prompts/
│       ├── facts.md                  System prompt for Pass 1
│       ├── narrative.md              System prompt for Pass 2 (cites voice guide)
│       └── verifier.md               System prompt for Pass 3

tests/autonomous_changelog/           NEW — root-level test suite (matches cycle 400 convention)
├── test_voice_guard.py
├── test_context_assembly.py
├── test_synthesis_e2e.py             Integration test on fixture PR
└── fixtures/
    ├── pr-frontend-feature.json
    ├── pr-backend-only.json
    └── pr-large-refactor.json

docs/changelog/                       NEW — Mintlify-published surface
├── index.mdx                         Landing page (newest first, by month)
├── AGENTS.md                         Section-level AI customization
├── _voice-guide.md                   Voice rules (read by Pass 2 prompt)
├── _examples/                        Few-shot voice samples (5 hand-curated)
│   ├── frontend-feature.mdx
│   ├── backend-feature.mdx
│   ├── bug-fix.mdx
│   ├── refactor.mdx
│   └── multi-component-cycle.mdx
├── images/                           Per-post screenshots (one dir per slug)
└── YYYY-MM-DD-<slug>.mdx             Per-PR posts (generated)

docs/docs.json                        EDIT — add Changelog tab
docs/guides/autonomous-changelog.md   NEW — operator/contributor guide

Three-Pass Synthesis

The core IP of this cycle. Every detail matters because the difference between a great post and AI slop lives in the prompts, model choice, and verifier rigor.

Pass 1 — Facts

Setting	Value
Model	`claude-opus-4-7` (deepest reasoning for code-diff understanding)
Temperature	`0`
Tools	None
Output	Structured JSON, validated against `FactList` Pydantic schema
Caching	System prompt + voice guide cached (5-min TTL)

System prompt directs Claude to extract a flat list of factual claims from the diff. Each fact carries:

class Fact(BaseModel):
    claim: str                          # one-sentence factual statement
    evidence: list[FileLineRef]         # ≥1 file:line references
    user_facing: bool                   # affects users vs. internal-only
    surface: Literal["backend", "frontend", "infra", "docs", "config", "test"]
    confidence: float                   # 0.0–1.0

Examples of good facts:

claim: "JobGet channel adapter posts jobs to JobGet's /jobs API", evidence: [{file: "backend/domains/hiring/distribution/channels/jobget.py", line: 42}], user_facing: false, surface: "backend", confidence: 0.95
claim: "Candidate portal sidebar collapses to icon-only at <768px viewport", evidence: [{file: "web/components/candidate/Sidebar.tsx", line: 87}], user_facing: true, surface: "frontend", confidence: 0.9

Failure modes:

Empty fact list → abort, post a PR comment “diff too sparse to summarize” (e.g., dependency bumps with no behavior change).
Output fails Pydantic validation → retry once with stricter schema reminder; second failure → abort with handoff.

Pass 2 — Narrative

Setting	Value
Model	`claude-opus-4-7` (voice + structure)
Temperature	`0.4` (some creativity within guardrails)
Tools	None
Inputs	`FactList` (Pass 1 output) + cycle doc text + voice guide + few-shot samples + screenshot URLs + linked issues
Output	Raw MDX (no frontmatter — writer adds frontmatter)
Caching	System prompt + voice guide + few-shot samples cached

System prompt:

Cites docs/changelog/_voice-guide.md verbatim
Includes the five _examples/*.mdx posts as in-context few-shot demonstrations
Names the slop blocklist explicitly (“never use these phrases: …”)
Instructs Claude to lead with the change (not an announcement), use specifics over abstractions, prefer active voice
Tells Claude to use Mintlify components (<Frame>, <CardGroup>, <Card>, <CodeGroup>) where appropriate
Requires a “Try it” section if preview_url is present

Output structure (target template, not enforced rigidly):

{Hero image — first screenshot, or cycle-doc diagram if backend-only}

{Opening paragraph — ≤3 sentences. Lead with the change. Specific.}

{Body — what changed, surfaced through screenshots / specifics / numbers.
 Inline screenshots via <Frame>. Avoid sub-headings unless the post is long.}

## Try it
{Link to preview URL or staging environment, with one-line "what to look at".}

## Under the hood
{Terse bullet list with file:line links to GitHub. For curious readers.}

Failure modes:

Invalid MDX (component misuse, unmatched tag) → retry once with error feedback; second failure → handoff.
Pass 2 ignores few-shot voice → caught by Pass 3 verifier or slop guard.

Pass 3 — Verifier

Setting	Value
Model	`claude-sonnet-4-6` (cheaper, faster, sufficient for cross-check)
Temperature	`0`
Tools	None
Inputs	`FactList` (Pass 1) + narrative MDX (Pass 2) + slop blocklist
Output	Structured JSON: `Verdict`

class Verdict(BaseModel):
    verified_claims: list[ClaimMapping]      # narrative claim → fact ID
    unsupported_claims: list[str]            # claims with no fact backing
    slop_phrases_detected: list[str]         # blocklist hits
    voice_concerns: list[str]                # heuristic violations
    verdict: Literal["publish", "human_review"]
    reasoning: str                            # brief justification

Decision rule:

verdict = "publish" iff: len(unsupported_claims) == 0 AND len(slop_phrases_detected) == 0 AND len(voice_concerns) <= 2.
Otherwise verdict = "human_review" and the failure handoff PR opens with the verdict JSON included for context.

Slop guard runs as a deterministic regex pass after Pass 3 (defense in depth — Pass 3’s slop detection is LLM-judged, slop guard is regex-judged).

Cost & Caching Strategy

Per the claude-api skill, the implementation must use prompt caching:

System prompt + voice guide + few-shot samples are cached across all three passes (same Anthropic API key, 5-min TTL). Pass 2’s call hits the cache established by Pass 1; Pass 3 also hits it.
Cycle doc is cached when present (used by Pass 2; also referenced by Pass 1’s reasoning).
Diff is the only large per-PR input that cannot be cached — it changes every PR.

Estimated per-PR cost (with caching):

Pass 1: ~15k input (mostly diff) + ~2k output, Opus 4.7 → ~$0.05
Pass 2: ~5k input (mostly cached) + ~3k output, Opus 4.7 → ~$0.04
Pass 3: ~5k input + ~1k output, Sonnet 4.6 → ~$0.01
Total per post: ~$0.10

At 100 PRs/week, this costs ~

10/week. At 1000 PRs/week (extreme), ~

100/week. Cheap relative to the human time saved. Token budget enforcement:

Hard cap: 50k input tokens per pass. Diffs over the cap are truncated by file (whole files preserved, tail dropped) with a “(truncated)” marker.
If the cap forces truncation of more than 30 % of the diff, the post adds an “Under the hood” disclaimer and links to the full diff on GitHub.

Voice & Anti-Slop Guardrails

This is the single most important section of this cycle. The whole pipeline fails to deliver value if the output reads like AI slop. Three layers of defense:

Layer 1 — Prompt-level (Pass 2 system prompt)

Voice rules embedded in the prompt:

Lead with the change, not the announcement. Bad: “We’re excited to announce a new way to schedule interviews.” Good: “Interview scheduling now suggests time slots based on the candidate’s stated availability.”
One specific over three abstractions. Bad: “powerful, intuitive, seamless experience.” Good: “creates a 30-minute slot in the next 48 hours that fits both calendars.”
Show, don’t tell — screenshots beat adjectives. If you’d reach for an adjective (“clean”, “polished”, “intuitive”), reach for a screenshot instead.
Active voice, present tense. Bad: “A new feature has been added that allows users to…” Good: “The candidate portal now shows pending interview requests at the top.”
Names and numbers > generalities. Bad: “much faster”. Good: “p95 search latency dropped from 1.4 s to 240 ms.”
Say what’s NEW, not what’s “now possible”. Bad: “It’s now possible to filter candidates by skill.” Good: “Candidate list has a Skill filter.”
Don’t editorialize. No “we think this is going to be transformative.” Just say what shipped.

Layer 2 — Few-shot exemplars (Pass 2 in-context)

Five hand-curated reference posts in docs/changelog/_examples/:

Example	Purpose
`frontend-feature.mdx`	A new user-facing feature with screenshots
`backend-feature.mdx`	A backend capability with no UI, but downstream impact
`bug-fix.mdx`	A reported bug, now fixed — terse, specific
`refactor.mdx`	An internal refactor with no behavior change — minimal post
`multi-component-cycle.mdx`	A cycle that touched 5+ surfaces — structured, with sections

Each example is reviewed by a human and considered the gold standard for that PR archetype. Pass 2’s prompt selects the closest archetype based on Pass 1’s surface distribution.

Layer 3 — Deterministic slop guard

scripts/autonomous_changelog/synthesis/voice_guard.py runs after Pass 3, regex-only, no LLM. Seed blocklist (sample — full list in code, ~30 entries):

SLOP_PATTERNS: list[re.Pattern] = [
    re.compile(r"\bwe(?:'re| are) (?:excited|thrilled|delighted|pleased) to\b", re.I),
    re.compile(r"\bseamless(?:ly)?\b", re.I),
    re.compile(r"\bsupercharg(?:e|ed|ing)\b", re.I),
    re.compile(r"\bworld[- ]class\b", re.I),
    re.compile(r"\bleverag(?:e|es|ing|ed)\b", re.I),  # except proper noun "Leverage"
    re.compile(r"\bcutting[- ]edge\b", re.I),
    re.compile(r"\bgame[- ]chang(?:er|ing)\b", re.I),
    re.compile(r"\brobust\b", re.I),
    re.compile(r"\bunder the hood\b", re.I),  # except as section title — handled by structural exclusion
    re.compile(r"\bblazing(?:ly)? fast\b", re.I),
    re.compile(r"\bnext[- ]generation\b", re.I),
    re.compile(r"\brevolutioniz(?:e|es|ing|ed)\b", re.I),
    re.compile(r"\bempower(?:s|ing|ed)?\b", re.I),
    # … ~17 more
]

Soft heuristics (warnings, not failures, surfaced in Verdict.voice_concerns):

Sentence average length > 28 words
Adjective density > 18% of tokens (per nltk POS tag)
Opening sentence starts with “We ” (lead with the change, not the team)
More than 2 marketing adjectives in any single sentence
Use of em-dash chains (3+ in one paragraph — a known Claude tic)

Layer 4 — Sampling audit (post-publish)

Weekly: a human (rotating, owner = cycle owner this iteration) reads the last 5 published posts and rates each on:

Specificity (1–5)
Voice match to references (1–5)
Would I publish this if I’d written it? (yes/no)

Results logged to docs/changelog/_audit-log.md. When patterns emerge (e.g., posts about backend changes are too dry), the voice guide and few-shot examples are updated.

Mintlify Primitives Used (Showcase)

This cycle exercises the full Mintlify AI-native surface. This table is part of the cycle on purpose: the goal is not just “publish a changelog” — it is to demonstrate Mintlify’s AI-native publishing model end to end.

Primitive	Usage in this cycle
MDX files	Native authoring surface — generator emits MDX directly, no transformation layer
`<Update>` component	Wraps each entry with a date label, description, and content slot — Mintlify’s first-class changelog primitive
`<Frame>`, `<CardGroup>`, `<Card>`	Hero images, “Under the hood” file links, “Try it” callouts
`<CodeGroup>`, `<Tabs>`	Multi-language code samples (rare in changelog, supported when needed)
Frontmatter	`title`, `description`, `date`, `tags`, `pr`, `cycle`, `preview_url`, `authors` — drives navigation, search, AI indexing
`docs.json` navigation	Adds a top-level `Changelog` tab; pages auto-grouped by month via glob pattern `changelog/2026-04-*`
Root `AGENTS.md`	Already configured by cycle 214; we extend with a Changelog section
Section `AGENTS.md`	`docs/changelog/AGENTS.md` declares: entries are immutable; Autopilot must not edit them; MCP queries should treat changelog as canonical “what shipped” source
Mintlify Autopilot	Runs as a secondary review on each generated MDX — catches markdown/component syntax errors before publish; if Autopilot rejects, generator falls through to human handoff
Auto-generated MCP server	Evaluators ask Claude/Cursor “what shipped this week?” via `specs.flux.employinc.io/mcp`; changelog entries are first-class MCP resources
`llms.txt` / `llms-full.txt`	Auto-includes changelog entries; downstream agents (support bot, sales bot) can ground answers in shipped features without a separate KB
`contextual` buttons	Each entry surfaces “Copy”, “Open in Claude”, “Open in Cursor”, “MCP” buttons (configured in `docs.json`)
AI traffic analytics	Mintlify dashboard reports which agents read which entries and where they 404 — feedback loop for entry quality
Tags + filtering	Domain tags (`hiring`, `distribution`, `frontend`, `agents`, etc.) drive Mintlify’s tag-filter UI; readers can scope to their area of interest
Search	Mintlify’s built-in search indexes entries; tagged for relevance boost on cycle-related queries
Bi-directional sync	Generator commits MDX to `main`; Mintlify auto-deploys within seconds; PMs/engineers can hand-edit a published entry via Mintlify’s web editor and the change syncs back to the repo

Failure Modes & Recovery

Stage	Failure	Behavior
Workflow trigger	Concurrent PR merges	Per-PR concurrency group; each PR processed independently
Workflow trigger	Re-fire on already-published PR (label change, manual `workflow_dispatch`, re-merge after revert)	Detect existing entry by PR number in frontmatter; overwrite only if both Pass 3 and slop guard pass on the new run; otherwise open a handoff PR with a diff-of-diffs explaining what changed
Context assembly	Cycle doc not found by branch regex	Continue without cycle doc; log warning; Pass 2 falls back to PR body for “why”
Context assembly	Diff is empty (revert, no-op merge)	Skip post entirely; post non-blocking PR comment “no changelog entry — no diff”
Context assembly	Diff > 5 000 lines / > 50 files	Generate post but flag as “large change — review recommended”; truncate diff input
Playwright	No user-facing routes detected	Generate post without screenshots (backend-only style)
Playwright	Browser crash / route 500	Capture error-state screenshot; note in narrative; continue
Pass 1 — Facts	Empty fact list	Abort; PR comment “diff too sparse to summarize”
Pass 1 — Facts	JSON validation fails	Retry once with stricter schema reminder; second failure → handoff
Pass 2 — Narrative	Invalid MDX (parse fails)	Retry once with error feedback; second failure → handoff
Pass 2 — Narrative	Slop voice (caught by Pass 3)	Handoff PR opened with flagged phrases
Pass 3 — Verifier	Unsupported claims detected	Handoff PR opened with claim list and fact list for human review
Pass 3 — Verifier	Pass 3 itself errors	Default to handoff (fail closed)
Slop guard	Regex hit	Handoff PR opened with matched phrases highlighted
Mintlify writer	MDX file write fails	Retry; if persistent, handoff PR with content as artifact
Git commit/push	Push conflict	Pull latest, retry; on second failure, open handoff PR
Mintlify deploy	Mintlify Autopilot rejects MDX	Open handoff PR with Autopilot feedback included

Invariant: a changelog generation failure never blocks the original PR’s merge. The merge has already happened. Worst case is a follow-up PR for human polish.

Quality Bar — Definition of “Not AI Slop”

A passing post must satisfy all of the following:

Specificity — every benefit claim has a concrete artifact (screenshot, code link, number, named feature)
Brevity — opening paragraph ≤ 3 sentences; full post ≤ 400 words for a typical PR (multi-component cycles get more)
Voice — zero hits on the slop blocklist; ≤ 2 soft heuristic violations
Grounding — every factual claim in the narrative maps to a fact from Pass 1 (verified by Pass 3)
Visual — at least 1 screenshot if the diff touches user-facing routes
Navigability — “Try it” link present and resolves (curl HEAD check at publish time)
Cycle context — if a cycle doc exists, the “why” is reflected (verified by Pass 3 — the narrative must contain at least one phrase semantically aligned with the cycle doc’s overview)

These are tested in tests/test_synthesis_e2e.py against fixture PRs and enforced by Pass 3 + slop guard at runtime.

Implementation Plan

Step 1 — Scaffold + GH Action skeleton (~0.5 day)

Create scripts/autonomous_changelog/ package with __init__.py, pipeline.py stub
Create .github/workflows/autonomous-changelog.yml with trigger + Python setup, calling pipeline.py --pr-number <N>
Wire dry-run mode (no commit, prints MDX to logs) for testing
Permissions: contents: write, pull-requests: write, issues: write
Concurrency group: changelog-pr-${{ github.event.pull_request.number }}

Step 2 — Context assembly module (~1 day)

Pydantic schemas in schemas.py (ChangelogContext, Fact, FactList, Verdict, FileLineRef)
context_assembly.py with parallel async fetches via asyncio.gather
Branch-name → cycle doc resolution
Diff truncation logic (whole-file preservation, tail-drop)
Unit tests with 3 fixture PRs (frontend feature, backend-only, large refactor)

Step 3 — Playwright screenshot runner (~1 day)

screenshot_runner.py with route discovery from diff paths
Playwright Chromium (pinned version), desktop + mobile viewports
Screenshot output to docs/changelog/images/<slug>/
Failure tolerance: per-route try/except, never aborts the pass
Skip when no user-facing routes touched

Step 4 — Three-pass synthesis (~2 days)

Anthropic SDK with prompt caching (per claude-api skill)
synthesis/facts.py (Pass 1) — Opus 4.7, structured output with Pydantic
synthesis/narrative.py (Pass 2) — Opus 4.7, MDX output, few-shot from _examples/
synthesis/verifier.py (Pass 3) — Sonnet 4.6, structured Verdict output
Prompt files in synthesis/prompts/ — reviewable, version-controlled
Token cost emitted per pass to GH Action summary
Integration test: end-to-end on cycle 365 plan PR fixture; manual review of output

Step 5 — Voice guard + slop blocklist (~1 day)

voice_guard.py with seed regex blocklist (~30 patterns)
Soft heuristics (sentence length, adjective density, opener pattern, em-dash chain)
Unit tests with ≥ 50 phrase test cases (positive + negative)
Integration: voice guard runs after Pass 3, results merged into Verdict

Step 6 — Mintlify integration (~1 day)

mintlify_writer.py — emits MDX with frontmatter + <Update> wrapper
docs/changelog/index.mdx — landing page with monthly grouping
docs/changelog/AGENTS.md — section-level AI customization (immutability, MCP grounding)
Edit docs/docs.json — add Changelog tab with auto-glob pages
_voice-guide.md — voice rules (consumed by Pass 2 prompt)
Verify Mintlify renders generated entries correctly (manual check on a deployed preview)

Step 7 — Failure mode + human handoff (~0.5 day)

failure_handoff.py — open follow-up PR with draft MDX + structured comment
PR comment integration on the original merged PR
Test by injecting deliberate slop into Pass 2 output

Step 8 — Golden set + voice samples (~1 day)

Hand-curate 5 reference posts in _examples/
Hand-write _voice-guide.md
Curate slop blocklist seed (~30 patterns from real changelog corpora)

Step 9 — Verification + observability + documentation (~0.5 day)

Operator/contributor guide: docs/guides/autonomous-changelog.md
Token cost monitoring (GH Action summary + Mintlify analytics)
Sampling audit log: docs/changelog/_audit-log.md template
Final E2E test: run pipeline on 3 historical PRs, manually review outputs

Total: ~8.5 engineer-days (≈ 1.5 weeks)

Verification Plan

Risks and Mitigations

Risk	Impact	Mitigation
Generated narratives still feel AI-written despite the three-layer guard	Defeats the whole purpose	Few-shot from real human-curated samples; verifier slop check; deterministic regex guard; weekly sampling audit; iterate prompts when patterns emerge from audit
Pass 3 verifier false positives block legitimate posts	Toil — every PR needs human polish	Calibrate verdict threshold against 20 hand-labeled fixtures before launch; track override rate as quality signal; allow author label `changelog:approve-handoff` to publish a handoff draft as-is
Pass 3 verifier false negatives let slop through	Quality leak	Sampling audit (weekly, last 5 posts); deterministic regex guard as defense in depth; voice guide updated quarterly based on audit findings
Cost per PR exceeds estimate (large diffs, many PRs)	Token spend	Hard 50k input cap per pass with truncation; Pass 3 uses cheaper Sonnet; cost emitted to GH Action summary; alert if weekly spend > $50
Cycle doc not found for a branch (legacy or non-cycle work)	Loss of “why” context	Fallback to PR body and linked issues; over time cycle 381 enforces cycle docs per cycle; document the fallback in operator guide
Diff is too large or too unfocused to summarize meaningfully	Generic post	Skip post (`diff > 5 000 lines or > 50 files`) and open a non-blocking PR comment “large change — manual changelog recommended”; provide a starter template
Race condition: two PRs merge in same minute	Filename collision	Filename uses merge-commit SHA suffix on collision; fall back to `YYYY-MM-DD-<slug>-<sha7>.mdx`
Mintlify Autopilot accidentally edits historical posts	Loss of immutable record	`docs/changelog/AGENTS.md` declares entries immutable; entries’ frontmatter contains `immutable: true`; Autopilot configuration set to ignore the directory by default
Playwright dependency makes CI slow or flaky	Workflow latency / failure	Pin Playwright Docker image; per-route try/except so single bad route never aborts; investigate Mintlify preview screenshot service as a Phase 2 optimization once cycle 214’s Mintlify implementation lands
Author objects to autogenerated post about their PR	Process friction	`changelog:skip` label suppresses generation; `changelog:edit` label triggers handoff (draft only); Mintlify web editor lets author hand-edit a published entry, which syncs back
Voice guide drift — what feels SOTA today feels stale in 6 months	Long-term staleness	Quarterly review of `_voice-guide.md` and `_examples/` against current SOTA changelogs (Linear, Resend, etc.); voice guide is version-controlled; refresh is a one-day chore
Generator publishes sensitive details (e.g., security fix details before disclosure)	Disclosure risk	`security` label on a PR routes to handoff PR (no auto-publish); Pass 1 prompt instructed to never describe vulnerability mechanics
First N posts will need iteration after launch	Early-life messiness	Reserve a follow-up cycle (after first 20 posts ship) for prompt + voice-guide iteration informed by audit

Phase 2 Roadmap (Future Cycles)

Phase 2 Item	Surface	Likely Cycle Number
Slack publish notification	Cross-post Mintlify URL + hero image to a Slack channel on publish	TBD — coordinate with cycle 400’s surface
Weekly “Shipped” roll-up	A separate generator that consumes the week’s per-PR posts and writes a narrative weekly summary for external readers	TBD
Eval harness auto-trigger	Generated “Try it” steps fed into eval scenario nominator	Ties to cycle 209.7 (post-merge validation + evals)
Internal-only “evaluator notes” section	Role-gated content via Mintlify auth tiers; technical detail for testers	TBD
Customer email digest	Monthly newsletter sourced from changelog tags; uses email service	TBD
Multi-PR release-level summaries	Group merged PRs in a release window into a single post (vs per-PR)	TBD
Author-edit loop	`/changelog edit <hint>` PR comment triggers regeneration with author hint	TBD if friction emerges
Translation (es/pt-BR for pilot regions)	Mintlify supports i18n; auto-translate via Claude as a fourth pass	TBD post-pilot
Visual diff comparison	Use the screenshot pass to capture before/after of the same route across the merge	TBD if value is demonstrated

Relationship to Other Cycles

Cycle 214 — Spec-driven dev with Mintlify (REQUIRED dependency). Provides the Mintlify platform, docs.json navigation pattern, root AGENTS.md, and Autopilot configuration that this cycle extends. This cycle cannot ship until 214’s Mintlify implementation PR has merged.
Cycle 400 — /changelog Slack digest skill (ADJACENT, complementary). Slack-facing team digest of merged PRs. Different surface (Slack vs Mintlify), different cadence (daily cron vs per-PR merge), different audience (team vs external + evaluators). Phase 2 will tie the two together (publish notification cross-posts to Slack).
Cycle 401 — /standup skill (ADJACENT, similar shape). Both cycle 400 and 401 are interactive Claude Code skills. Cycle 451 is fully autonomous (GH Action only) but shares conventions for gh api PR fetching and conventional-commit grouping.
Cycle 209.7 — Post-merge validation + evals (PHASE 2 INTEGRATION TARGET). The “Try it” sections this cycle generates can feed back into eval scenario nomination — when the changelog says “candidate portal sidebar collapses at <768px”, that becomes a candidate eval fixture.
Cycle 221 — Chief Engineer review (COMPLEMENTARY). CE reviews quality of code; this cycle reports quality of shipped product. Both feed the AI-native quality loop.
Cycle 380 — UI design skill (REFERENCED). Screenshot conventions and docs/design/cycle-{N}/ artifact patterns inform this cycle’s docs/changelog/images/<slug>/ structure.
Cycle 365 — Pilot evals harness (REFERENCED). LLM-as-judge model selection pattern (distinct judge model from agent model) informs Pass 3 verifier model choice (Sonnet for Pass 3 vs Opus for Pass 1/2).
Cycle 381 — Issue-number cycle IDs (CONVENTION). This cycle follows the new convention; cycle number 451 = issue 451.

AI-Native Manifesto Alignment

§ Principle	How This Cycle Embodies It
§0 Uncompromising Quality	Three-pass synthesis with verifier; deterministic slop blocklist; sampling audit; no shortcuts on output quality. The whole cycle exists because terse PR descriptions are not SOTA enough.
§1 One Mind, Full Context	Generator reads full diff + cycle doc + CI results + linked issues + screenshots — holistic context per post, not file-by-file.
§3 Agentic Architecture	Three Claude passes are agents with distinct roles (Facts extractor, Narrative writer, Verifier judge), not LLM wrappers. Each pass observes (reads inputs), reasons (within prompt rules), acts (produces structured output).
§5 Observability-Native	Token cost per post tracked and emitted to GH Action summary; verifier pass-rate tracked; sampling audit results logged to `_audit-log.md`; Mintlify AI traffic analytics tracks consumption.
§8 100% AI-Generated Code with Safety Nets	Generator is itself AI-generated (this cycle and its implementation). Safety nets: deterministic slop guard, verifier pass, sampling audit, human handoff fallback.
§10 Spec-Driven Traceability	Cycle doc → implementation PR → autonomous changelog entry → Mintlify-published — full traceability from spec to public surface. The changelog entry frontmatter cites cycle and PR.
§11 Cross-Model Review	Pass 1 (Opus) extracts; Pass 2 (Opus, different temperature) writes; Pass 3 (Sonnet) reviews — cross-model verification within a single workflow.

Notes

This cycle is meta in a productive way: when the implementation PR for this cycle merges, the resulting changelog entry will be the first autonomous post, generated by the system describing the system. That’s the canonical validation — if Cycle 451’s own changelog post is good, the system works.
The post is canonical; the PR description is not. Authors can put rough notes in PR bodies (or skip them) and trust the generator to polish the public-facing artifact. This should reduce PR-description toil over time.
Treat the first 20 posts as a prototype run. Reserve a follow-up cycle for prompt and voice-guide iteration informed by sampling audit results — the slop blocklist will need expansion as new patterns emerge.
AGENTS.md policies open the door for downstream agents (support bot, sales bot) to consume changelog entries via Mintlify’s MCP server to answer “does Flux do X?” — turns the changelog into a queryable product knowledge base. Phase 2.
The dependency on cycle 214’s Mintlify implementation PR is hard. If that PR is delayed, this cycle’s code phase waits. The plan PR (this doc) does not depend on it.
Cost is not a meaningful constraint. The pipeline costs ~$0.10 per post, which is dwarfed by the human time saved. The only real budget concern is keeping verifier false-positive rates low so engineers don’t spend 10 minutes per handoff PR.

Templates

Process Cycles

​Cycle 451: Autonomous Release Notes — Per-PR “What’s Changed” Blog Posts in Mintlify

​Overview

​Current State

​What Works

​What’s Missing

​Scope

​In Scope (Phase 1 — this cycle)

​1. GitHub Action autonomous-changelog.yml (~0.5 day)

​2. Context assembly module (~1 day)

​3. Playwright screenshot pass (~1 day)

​4. Three-pass Claude synthesis (~2 days)

​5. Voice & anti-slop guardrails (~1 day)

​6. Mintlify integration (~1 day)

​7. Failure mode + human handoff (~0.5 day)

​8. Golden set + voice samples (~1 day)

​9. Verification, observability, documentation (~0.5 day)

​Out of Scope (Phase 2+)

​Architecture

​Pipeline Flow

​Repository Layout

​Three-Pass Synthesis

​Pass 1 — Facts

​Pass 2 — Narrative

​Pass 3 — Verifier

​Cost & Caching Strategy

​Voice & Anti-Slop Guardrails

​Layer 1 — Prompt-level (Pass 2 system prompt)

​Layer 2 — Few-shot exemplars (Pass 2 in-context)

​Layer 3 — Deterministic slop guard

​Layer 4 — Sampling audit (post-publish)

​Mintlify Primitives Used (Showcase)

​Failure Modes & Recovery

​Quality Bar — Definition of “Not AI Slop”

​Implementation Plan

​Step 1 — Scaffold + GH Action skeleton (~0.5 day)

​Step 2 — Context assembly module (~1 day)

​Step 3 — Playwright screenshot runner (~1 day)

​Step 4 — Three-pass synthesis (~2 days)

​Step 5 — Voice guard + slop blocklist (~1 day)

​Step 6 — Mintlify integration (~1 day)

​Step 7 — Failure mode + human handoff (~0.5 day)

​Step 8 — Golden set + voice samples (~1 day)

​Step 9 — Verification + observability + documentation (~0.5 day)

​Verification Plan

​Risks and Mitigations

​Phase 2 Roadmap (Future Cycles)

​Relationship to Other Cycles

​AI-Native Manifesto Alignment

​Notes