Canonical Schema & Integration-First Strategy

Owner: @pj (CTO, Employ Inc.) Created: 2026-03-29 Status: Active Implements: Cycle 210 (Canonical Schema Platform)

Strategic Context

Employ Inc. operates multiple ATS products (JazzHR, Lever, Jobvite, Recruit Marketing) serving different market segments. The company is building Flux as the next-generation AI-native hiring platform. Rather than forcing customers through painful data migrations, the platform uses a strangler pattern: a canonical data layer sits between legacy systems and modern experiences, allowing customers to upgrade without migrating. The Canonical Schema Platform is the data contract backbone that makes this architecture work.

  LEGACY STACK                 CANONICAL SPINE                  MODERN EXPERIENCES
  ────────────                 ───────────────                  ──────────────────

  JazzHR Data & BE ──┐    ┌─ Canonical API Event Platform ─┐    ┌── Modern ATS UI
  Lever Data & BE  ──┤    │  + Tenant Router                │    ├── Companions v2
  Jobvite Data & BE ─┼───→│                                 │───→├── HRIS integrations
  RM Data & BE ──────┘    │  Canonical BFAI Platform        │    ├── Analytics
                          │  (SSE, MCP)                     │    └── Other ATS
                          │                                 │
                          │  3rd Party Data &               │
                          │  Integration Platform           │
                          │                                 │
                          │  Canonical ATS Data & BE        │
                          └────────────┬────────────────────┘
                                       │
                               Context Engine (Cerebe)

Core Principle: Customers Upgrade, They Don’t Migrate

When a JazzHR customer “upgrades” to the Modern ATS UI, their data stays in JazzHR’s backend. The Canonical API Event Platform + Tenant Router serves the same API surface by projecting JazzHR data through the canonical schema layer. Over time, data can be incrementally moved to the Canonical ATS Data & BE, but the customer never experiences a migration — features just appear.

Core Principle: Integration as a Primary Capability

Every external system — HRIS, payroll, job boards, background checks, assessments — connects through canonical schemas. This means:

An integration built once works for all ATS products (JazzHR, Lever, Jobvite, RM, Flux)
AI agents can build, test, and deploy new integrations by reasoning about schema contracts via MCP
The integration surface is standards-aligned (HR Open, Schema.org, O*NET) so it speaks the industry’s language

The Canonical Schema Layer

What It Is

A set of HR-standard-aligned Pydantic v2 models that represent the canonical shape of hiring data. These are NOT internal domain models — they are the external contract that all systems project to and from. Internal Flux models can evolve independently (rename fields, restructure, add domain-specific concepts) as long as the projection adapters keep the canonical shape stable.

Canonical Entities

Entity	Aligned With	Purpose
`CanonicalJob`	HR Open 4.5 PositionOpening + Schema.org JobPosting	Job postings across all systems
`CanonicalCandidate`	HR Open Candidate + Merge ATS Candidate	Applicant profiles
`CanonicalApplication`	Merge ATS Application	Links candidate to job pipeline
`CanonicalInterview`	HR Open Interview/Assessment	Scheduled interviews + feedback
`CanonicalOffer`	HR Open Offer	Compensation offers
`CanonicalEmployee`	HR Open Employment + Finch Employee	Post-hire HRIS data
`CanonicalOrganization`	Schema.org Organization	Employer entity
`CanonicalSkill`	O*NET + ESCO	Skills/competences taxonomy
`CanonicalCompensation`	HR Open PositionCompensation + Finch Income	Pay structure

Open Standards Alignment

Standard	Org	Format	Usage
HR Open 4.5	HR Open Standards Consortium	JSON Schema	Reference model for entity structure and field naming. Flux canonical schemas align where practical.
Schema.org JobPosting	Schema.org (W3C)	JSON-LD	All Flux jobs emit JSON-LD for Google for Jobs via `CanonicalJob.to_schema_org()`. Near-universal adoption.
*ONET / SOC**	US Dept of Labor	REST API + CSV	US occupation codes and skills taxonomy. `CanonicalSkill` and `CanonicalJob.onet_soc_code`.
ESCO	European Commission	RDF/JSON-LD	EU skills/competences taxonomy. Cross-walked with O*NET for international support. `CanonicalJob.esco_uri`.
Merge Common Models	Merge.dev	OpenAPI 3.0	Practical industry consensus on ATS data model shape. Used as validation reference.
Apideck Unified API	Apideck	OpenAPI (MIT)	Open-source `ats.yml`, `hris.yml` specs. Used as reference for integration contract shapes.
Finch Unified API	Finch	OpenAPI	Unified payroll/HRIS model covering 220+ systems. Reference for employee/compensation projection.

Why Not Just Use HR Open Directly?

HR Open 4.5 is comprehensive but verbose — designed for enterprise EDI, not API-first platforms. Our canonical models take the field semantics and naming from HR Open but use a flat, JSON-friendly structure that maps cleanly to REST APIs, Pydantic validation, and frontend consumption. Think of it as “HR Open for the API era.”

Projection Architecture

Three Projection Directions

                        EXTERNAL STANDARDS & SYSTEMS
                        (HR Open, Schema.org, Merge, Finch, BambooHR, ADP...)
                                    ↕
                           External Projections
                      canonical/projections/external/
                                    ↕
                          CANONICAL SCHEMA LAYER
                           canonical/schemas/
                                    ↕
                          Internal Projections
                        canonical/projections/flux_*
                                    ↕
                         INTERNAL DOMAIN MODELS
                       domains/hiring/schemas.py
                                    ↕
                          Legacy Projections
                       canonical/projections/legacy/
                                    ↕
                            LEGACY SYSTEMS
                     (JazzHR, Lever, Jobvite, RM)

Projection Adapter Contract

Every projection adapter implements bidirectional mapping:

class SomeProjection:
    @staticmethod
    def to_canonical(source_data) -> CanonicalEntity:
        """Convert source system data to canonical form."""
        ...

    @staticmethod
    def from_canonical(canonical: CanonicalEntity) -> dict:
        """Convert canonical form back to source system shape."""
        ...

    @staticmethod
    def coverage_report() -> dict[str, float]:
        """What % of canonical fields can this source populate?"""
        ...

Round-trip integrity is tested: source → canonical → source must preserve all mappable data.

Crosswalk Tables

Enum values differ across systems. Crosswalk tables provide deterministic mapping:

# Employment type varies by system
FLUX_TO_SCHEMA_ORG = {"full_time": "FULL_TIME", "contract": "CONTRACTOR", ...}
HR_OPEN_TO_FLUX = {"FullTime": "full_time", "Contract": "contract", ...}
MERGE_TO_FLUX = {"FULL_TIME": "full_time", "CONTRACTOR": "contract", ...}

Crosswalks exist for: employment type, candidate stage, job status, interview type, offer status, location type.

Integration Contract Registry

The registry tracks all known integration targets — what systems Flux can connect to, what entities are mapped, and the quality of each projection:

@dataclass(frozen=True, slots=True)
class IntegrationContract:
    system_name: str               # e.g., "bamboohr"
    display_name: str              # e.g., "BambooHR"
    spec_path: str                 # Path to cached OpenAPI spec
    supported_entities: tuple[str, ...]  # e.g., ("employee", "job")
    projection_module: str | None  # e.g., "canonical.projections.external.bamboohr"
    status: str                    # "available" | "planned" | "community"
    auth_type: str                 # "oauth2" | "api_key" | "bearer"

External API Spec Cache

Vendor OpenAPI specs are cached locally so AI agents can reason about them without live API calls:

System	Source	License	Entities Covered
Kombo	`api.kombo.dev/openapi.json`	Proprietary (public)	ATS + HRIS
Apideck ATS	`github.com/apideck-libraries/openapi-specs`	MIT	Jobs, candidates, applications
Apideck HRIS	Same repo	MIT	Employees, companies, departments
Merge ATS	`docs.merge.dev`	Reference	Candidates, applications, jobs, interviews, offers
Finch	`developer.tryfinch.com`	Reference	Employees, companies, payments, benefits

Specs are refreshed via make flux-update-external-specs.

AI-Driven Integration Building via MCP

The BFAI Platform (Flux’s AI agent layer) uses MCP tools to discover, build, validate, and deploy integrations:

MCP Integration Tools

Tool	Purpose
`list_canonical_schemas`	Returns JSON Schema for all canonical entities — the AI’s starting point for understanding what data is available
`get_external_api_spec`	Returns the cached OpenAPI spec for an external system (BambooHR, Gusto, etc.)
`validate_projection`	Given a source schema, target schema, and field mapping, validates type compatibility and required field coverage
`generate_projection_adapter`	Generates a Python projection adapter class + test suite from a validated field mapping

Integration Building Flow

Employer: "Connect our BambooHR to sync employee data"
    │
    ▼
Agent calls list_canonical_schemas()
    → Learns CanonicalEmployee fields
    │
    ▼
Agent calls get_external_api_spec("bamboohr")
    → Reads BambooHR API shape
    │
    ▼
Agent generates field mapping: BambooHR → CanonicalEmployee
    │
    ▼
Agent calls validate_projection(mapping, test_data)
    → Validates types match, required fields covered
    │
    ▼
Agent calls generate_projection_adapter(...)
    → Gets Python adapter code + test suite
    │
    ▼
Agent deploys via Temporal sync workflow
    → Runs on schedule, syncs data through canonical layer

This means adding a new HRIS integration does NOT require a developer to write custom code. The AI reads the vendor’s API spec, maps fields to canonical schemas, validates the mapping, generates the adapter, and deploys it — all through MCP tools.

Internal Type Safety Pipeline

Before schemas can project outward, they must be enforced inward. The internal pipeline ensures compile-time AND runtime type safety from database to UI:

Pydantic v2 ──→ FastAPI OpenAPI 3.1 ──→ @hey-api/openapi-ts ──→ TypeScript + Zod + SDK
(backend)       (auto-generated)        (codegen)                (frontend)

Key Properties

Single source of truth: Pydantic schemas in backend/domains/*/schemas.py
Generated, never hand-written: web/generated/api/ contains TypeScript types, Zod schemas, SDK client
Runtime validation: Every API response is Zod-parsed (throws in dev, logs in prod)
Form validation: Uses the same generated Zod schemas via zodResolver()
CI enforcement: oasdiff detects breaking changes; codegen freshness check blocks stale types
Tool output types: LangChain tool outputs are Pydantic-modeled, flow through the same pipeline to frontend generative UI

Developer Workflow

Change a Pydantic schema in the backend
Run make flux-generate-api-types
Frontend types, Zod schemas, and SDK update automatically
If you forget, CI fails with “Generated API types are stale”

Execution Strategy

Sequencing

┌──────────────────────┐     ┌───────────────────┐     ┌──────────────────────┐
│  Cycle 210 Part 1    │     │  Cycle 205.1       │     │  Cycle 210 Part 2    │
│  Internal Pipeline   │ ──→ │  API Wiring        │ ──→ │  Canonical Schema    │
│  (Days 1-4)          │     │  (uses generated   │     │  & Integration       │
│                      │     │   types from 210)  │     │  Projection          │
│  - OpenAPI export    │     │                    │     │  (Days 5-8)          │
│  - hey-api codegen   │     │  - Wire CRUD pages │     │                      │
│  - Zod + forms       │     │  - Seed test data  │     │  - Canonical models  │
│  - CI drift detect   │     │  - Remove mocks    │     │  - Projections       │
│                      │     │                    │     │  - MCP tools         │
└──────────────────────┘     └───────────────────┘     │  - Legacy stubs      │
                                                        └──────────────────────┘

210 Part 1 must land before 205.1 begins. Otherwise 205.1 cements hand-maintained types into ~15+ more files and 210 becomes a migration instead of a foundation. 210 Part 2 can run after or in parallel with 205.1 — it extends schemas outward but doesn’t block frontend work.

Incremental Expansion

The canonical layer starts with hiring lifecycle entities (Job, Candidate, Interview, Offer) and expands:

Phase	Entities	Driven By
Cycle 210	Job, Candidate, Application, Interview, Offer	Core hiring flow
Cycle 206	Candidate portal types (candidate-facing projections)	Candidate portal
HRIS cycle	Employee, Organization, Compensation	Paychex/HRIS integrations
Distribution cycle	Job distribution, posting analytics	Job board integrations
Compliance cycle	Audit trail, consent records	EU AI Act, EEOC

Each cycle adds entities to the canonical layer. The pipeline, projections, and MCP tools are built once and reused.

Decision Record

Why Pydantic as Source of Truth (Not Zod, Not JSON Schema)

Backend is Python. Pydantic is the natural validation layer.
FastAPI auto-generates OpenAPI from Pydantic. No manual spec authoring.
@hey-api/openapi-ts generates Zod from OpenAPI. Pipeline is fully automated.
Alternative (Zod-first) would require maintaining schemas in two languages or running Node.js in the backend build.

Why @hey-api/openapi-ts Over openapi-typescript

hey-api generates types + Zod + SDK in one pass. openapi-typescript generates types only.
hey-api’s Zod plugin produces runtime validators. openapi-typescript is compile-time only.
hey-api’s SDK plugin replaces hand-maintained query hooks.
~977k npm weekly downloads, used by Vercel/PayPal. Production-proven.

Why Zod v4 Over Valibot/ArkType

Ecosystem dominance: react-hook-form, shadcn/ui, hey-api, TanStack all have first-class Zod support.
v4 closed the performance gap (6-14x faster than v3, 57% smaller).
@zod/mini at 1.9 KB for bundle-sensitive paths.
Standard Schema compliant — exit path to Valibot/ArkType if ever needed.

Why Canonical Models Separate From Domain Models

Internal models serve business logic (validation rules, ORM mapping, domain events).
Canonical models serve integration contracts (field naming, standards alignment, cross-system compatibility).
They evolve at different rates. A Flux refactor shouldn’t break every integration.
Projection adapters absorb the difference. Round-trip tests verify integrity.

Why Not a Unified API Platform (Merge/Finch) for All Integrations

Unified API platforms (Merge, Finch, Kombo) are excellent for rapid integration coverage.
But they own the data pipeline — adding latency, cost, and a dependency on their uptime.
Flux’s canonical layer enables BOTH: use Merge/Finch as an integration method (their data projects through canonical) AND build direct integrations for high-value systems.
The canonical layer is the abstraction. Unified APIs are one implementation strategy behind it.

References

Cycle 210: docs/roadmap/cycles/cycle210-e2e-schema-enforcement.md — implementation plan
HR Open Standards 4.5: https://www.hropenstandards.org/standards — JSON Schema downloads
Schema.org JobPosting: https://schema.org/JobPosting — Google for Jobs structured data
O*NET: https://www.onetcenter.org/ — US occupation/skills taxonomy
ESCO: https://esco.ec.europa.eu/ — EU skills/competences taxonomy
Merge ATS Docs: https://docs.merge.dev/ats/ — ATS common model reference
Finch API Docs: https://developer.tryfinch.com/ — Unified payroll/HRIS API
Apideck OpenAPI Specs: https://github.com/apideck-libraries/openapi-specs — MIT-licensed ATS/HRIS specs
Kombo OpenAPI: https://api.kombo.dev/openapi.json — Downloadable ATS/HRIS spec
hey-api/openapi-ts: https://heyapi.dev/ — Codegen tool
Zod v4: https://zod.dev/v4 — Runtime validation
oasdiff: https://www.oasdiff.com/ — OpenAPI breaking change detection

Getting Started

Roadmap

Engineering

Canonical Schema & Integration-First Strategy

Canonical Schema & Integration-First Strategy

Strategic Context

Core Principle: Customers Upgrade, They Don’t Migrate

Core Principle: Integration as a Primary Capability

The Canonical Schema Layer

What It Is

Canonical Entities

Open Standards Alignment

Why Not Just Use HR Open Directly?

Projection Architecture

Three Projection Directions

Projection Adapter Contract

Crosswalk Tables

Integration Contract Registry

External API Spec Cache

AI-Driven Integration Building via MCP

MCP Integration Tools

Integration Building Flow

Internal Type Safety Pipeline

Key Properties

Developer Workflow

Execution Strategy

Sequencing

Incremental Expansion

Decision Record

Why Pydantic as Source of Truth (Not Zod, Not JSON Schema)

Why @hey-api/openapi-ts Over openapi-typescript

Why Zod v4 Over Valibot/ArkType

Why Canonical Models Separate From Domain Models

Why Not a Unified API Platform (Merge/Finch) for All Integrations

References

Getting Started

Roadmap

Engineering

​Canonical Schema & Integration-First Strategy

​Strategic Context

​Core Principle: Customers Upgrade, They Don’t Migrate

​Core Principle: Integration as a Primary Capability

​The Canonical Schema Layer

​What It Is

​Canonical Entities

​Open Standards Alignment

​Why Not Just Use HR Open Directly?

​Projection Architecture

​Three Projection Directions

​Projection Adapter Contract

​Crosswalk Tables

​Integration Contract Registry

​External API Spec Cache

​AI-Driven Integration Building via MCP

​MCP Integration Tools

​Integration Building Flow

​Internal Type Safety Pipeline

​Key Properties

​Developer Workflow

​Execution Strategy

​Sequencing

​Incremental Expansion

​Decision Record

​Why Pydantic as Source of Truth (Not Zod, Not JSON Schema)

​Why @hey-api/openapi-ts Over openapi-typescript

​Why Zod v4 Over Valibot/ArkType

​Why Canonical Models Separate From Domain Models

​Why Not a Unified API Platform (Merge/Finch) for All Integrations

​References

Canonical Schema & Integration-First Strategy

Strategic Context

Core Principle: Customers Upgrade, They Don’t Migrate

Core Principle: Integration as a Primary Capability

The Canonical Schema Layer

What It Is

Canonical Entities

Open Standards Alignment

Why Not Just Use HR Open Directly?

Projection Architecture

Three Projection Directions

Projection Adapter Contract

Crosswalk Tables

Integration Contract Registry

External API Spec Cache

AI-Driven Integration Building via MCP

MCP Integration Tools

Integration Building Flow

Internal Type Safety Pipeline

Key Properties

Developer Workflow

Execution Strategy

Sequencing

Incremental Expansion

Decision Record

Why Pydantic as Source of Truth (Not Zod, Not JSON Schema)

Why @hey-api/openapi-ts Over openapi-typescript

Why Zod v4 Over Valibot/ArkType

Why Canonical Models Separate From Domain Models

Why Not a Unified API Platform (Merge/Finch) for All Integrations

References