Skip to main content

Canonical Schema & Integration-First Strategy

Owner: @pj (CTO, Employ Inc.) Created: 2026-03-29 Status: Active Implements: Cycle 210 (Canonical Schema Platform)

Strategic Context

Employ Inc. operates multiple ATS products (JazzHR, Lever, Jobvite, Recruit Marketing) serving different market segments. The company is building Flux as the next-generation AI-native hiring platform. Rather than forcing customers through painful data migrations, the platform uses a strangler pattern: a canonical data layer sits between legacy systems and modern experiences, allowing customers to upgrade without migrating. The Canonical Schema Platform is the data contract backbone that makes this architecture work.
  LEGACY STACK                 CANONICAL SPINE                  MODERN EXPERIENCES
  ────────────                 ───────────────                  ──────────────────

  JazzHR Data & BE ──┐    ┌─ Canonical API Event Platform ─┐    ┌── Modern ATS UI
  Lever Data & BE  ──┤    │  + Tenant Router                │    ├── Companions v2
  Jobvite Data & BE ─┼───→│                                 │───→├── HRIS integrations
  RM Data & BE ──────┘    │  Canonical BFAI Platform        │    ├── Analytics
                          │  (SSE, MCP)                     │    └── Other ATS
                          │                                 │
                          │  3rd Party Data &               │
                          │  Integration Platform           │
                          │                                 │
                          │  Canonical ATS Data & BE        │
                          └────────────┬────────────────────┘

                               Context Engine (Cerebe)

Core Principle: Customers Upgrade, They Don’t Migrate

When a JazzHR customer “upgrades” to the Modern ATS UI, their data stays in JazzHR’s backend. The Canonical API Event Platform + Tenant Router serves the same API surface by projecting JazzHR data through the canonical schema layer. Over time, data can be incrementally moved to the Canonical ATS Data & BE, but the customer never experiences a migration — features just appear.

Core Principle: Integration as a Primary Capability

Every external system — HRIS, payroll, job boards, background checks, assessments — connects through canonical schemas. This means:
  1. An integration built once works for all ATS products (JazzHR, Lever, Jobvite, RM, Flux)
  2. AI agents can build, test, and deploy new integrations by reasoning about schema contracts via MCP
  3. The integration surface is standards-aligned (HR Open, Schema.org, O*NET) so it speaks the industry’s language

The Canonical Schema Layer

What It Is

A set of HR-standard-aligned Pydantic v2 models that represent the canonical shape of hiring data. These are NOT internal domain models — they are the external contract that all systems project to and from. Internal Flux models can evolve independently (rename fields, restructure, add domain-specific concepts) as long as the projection adapters keep the canonical shape stable.

Canonical Entities

EntityAligned WithPurpose
CanonicalJobHR Open 4.5 PositionOpening + Schema.org JobPostingJob postings across all systems
CanonicalCandidateHR Open Candidate + Merge ATS CandidateApplicant profiles
CanonicalApplicationMerge ATS ApplicationLinks candidate to job pipeline
CanonicalInterviewHR Open Interview/AssessmentScheduled interviews + feedback
CanonicalOfferHR Open OfferCompensation offers
CanonicalEmployeeHR Open Employment + Finch EmployeePost-hire HRIS data
CanonicalOrganizationSchema.org OrganizationEmployer entity
CanonicalSkillO*NET + ESCOSkills/competences taxonomy
CanonicalCompensationHR Open PositionCompensation + Finch IncomePay structure

Open Standards Alignment

StandardOrgFormatUsage
HR Open 4.5HR Open Standards ConsortiumJSON SchemaReference model for entity structure and field naming. Flux canonical schemas align where practical.
Schema.org JobPostingSchema.org (W3C)JSON-LDAll Flux jobs emit JSON-LD for Google for Jobs via CanonicalJob.to_schema_org(). Near-universal adoption.
O*NET / SOCUS Dept of LaborREST API + CSVUS occupation codes and skills taxonomy. CanonicalSkill and CanonicalJob.onet_soc_code.
ESCOEuropean CommissionRDF/JSON-LDEU skills/competences taxonomy. Cross-walked with O*NET for international support. CanonicalJob.esco_uri.
Merge Common ModelsMerge.devOpenAPI 3.0Practical industry consensus on ATS data model shape. Used as validation reference.
Apideck Unified APIApideckOpenAPI (MIT)Open-source ats.yml, hris.yml specs. Used as reference for integration contract shapes.
Finch Unified APIFinchOpenAPIUnified payroll/HRIS model covering 220+ systems. Reference for employee/compensation projection.

Why Not Just Use HR Open Directly?

HR Open 4.5 is comprehensive but verbose — designed for enterprise EDI, not API-first platforms. Our canonical models take the field semantics and naming from HR Open but use a flat, JSON-friendly structure that maps cleanly to REST APIs, Pydantic validation, and frontend consumption. Think of it as “HR Open for the API era.”

Projection Architecture

Three Projection Directions

                        EXTERNAL STANDARDS & SYSTEMS
                        (HR Open, Schema.org, Merge, Finch, BambooHR, ADP...)

                           External Projections
                      canonical/projections/external/

                          CANONICAL SCHEMA LAYER
                           canonical/schemas/

                          Internal Projections
                        canonical/projections/flux_*

                         INTERNAL DOMAIN MODELS
                       domains/hiring/schemas.py

                          Legacy Projections
                       canonical/projections/legacy/

                            LEGACY SYSTEMS
                     (JazzHR, Lever, Jobvite, RM)

Projection Adapter Contract

Every projection adapter implements bidirectional mapping:
class SomeProjection:
    @staticmethod
    def to_canonical(source_data) -> CanonicalEntity:
        """Convert source system data to canonical form."""
        ...

    @staticmethod
    def from_canonical(canonical: CanonicalEntity) -> dict:
        """Convert canonical form back to source system shape."""
        ...

    @staticmethod
    def coverage_report() -> dict[str, float]:
        """What % of canonical fields can this source populate?"""
        ...
Round-trip integrity is tested: source → canonical → source must preserve all mappable data.

Crosswalk Tables

Enum values differ across systems. Crosswalk tables provide deterministic mapping:
# Employment type varies by system
FLUX_TO_SCHEMA_ORG = {"full_time": "FULL_TIME", "contract": "CONTRACTOR", ...}
HR_OPEN_TO_FLUX = {"FullTime": "full_time", "Contract": "contract", ...}
MERGE_TO_FLUX = {"FULL_TIME": "full_time", "CONTRACTOR": "contract", ...}
Crosswalks exist for: employment type, candidate stage, job status, interview type, offer status, location type.

Integration Contract Registry

The registry tracks all known integration targets — what systems Flux can connect to, what entities are mapped, and the quality of each projection:
@dataclass(frozen=True, slots=True)
class IntegrationContract:
    system_name: str               # e.g., "bamboohr"
    display_name: str              # e.g., "BambooHR"
    spec_path: str                 # Path to cached OpenAPI spec
    supported_entities: tuple[str, ...]  # e.g., ("employee", "job")
    projection_module: str | None  # e.g., "canonical.projections.external.bamboohr"
    status: str                    # "available" | "planned" | "community"
    auth_type: str                 # "oauth2" | "api_key" | "bearer"

External API Spec Cache

Vendor OpenAPI specs are cached locally so AI agents can reason about them without live API calls:
SystemSourceLicenseEntities Covered
Komboapi.kombo.dev/openapi.jsonProprietary (public)ATS + HRIS
Apideck ATSgithub.com/apideck-libraries/openapi-specsMITJobs, candidates, applications
Apideck HRISSame repoMITEmployees, companies, departments
Merge ATSdocs.merge.devReferenceCandidates, applications, jobs, interviews, offers
Finchdeveloper.tryfinch.comReferenceEmployees, companies, payments, benefits
Specs are refreshed via make flux-update-external-specs.

AI-Driven Integration Building via MCP

The BFAI Platform (Flux’s AI agent layer) uses MCP tools to discover, build, validate, and deploy integrations:

MCP Integration Tools

ToolPurpose
list_canonical_schemasReturns JSON Schema for all canonical entities — the AI’s starting point for understanding what data is available
get_external_api_specReturns the cached OpenAPI spec for an external system (BambooHR, Gusto, etc.)
validate_projectionGiven a source schema, target schema, and field mapping, validates type compatibility and required field coverage
generate_projection_adapterGenerates a Python projection adapter class + test suite from a validated field mapping

Integration Building Flow

Employer: "Connect our BambooHR to sync employee data"


Agent calls list_canonical_schemas()
    → Learns CanonicalEmployee fields


Agent calls get_external_api_spec("bamboohr")
    → Reads BambooHR API shape


Agent generates field mapping: BambooHR → CanonicalEmployee


Agent calls validate_projection(mapping, test_data)
    → Validates types match, required fields covered


Agent calls generate_projection_adapter(...)
    → Gets Python adapter code + test suite


Agent deploys via Temporal sync workflow
    → Runs on schedule, syncs data through canonical layer
This means adding a new HRIS integration does NOT require a developer to write custom code. The AI reads the vendor’s API spec, maps fields to canonical schemas, validates the mapping, generates the adapter, and deploys it — all through MCP tools.

Internal Type Safety Pipeline

Before schemas can project outward, they must be enforced inward. The internal pipeline ensures compile-time AND runtime type safety from database to UI:
Pydantic v2 ──→ FastAPI OpenAPI 3.1 ──→ @hey-api/openapi-ts ──→ TypeScript + Zod + SDK
(backend)       (auto-generated)        (codegen)                (frontend)

Key Properties

  • Single source of truth: Pydantic schemas in backend/domains/*/schemas.py
  • Generated, never hand-written: web/generated/api/ contains TypeScript types, Zod schemas, SDK client
  • Runtime validation: Every API response is Zod-parsed (throws in dev, logs in prod)
  • Form validation: Uses the same generated Zod schemas via zodResolver()
  • CI enforcement: oasdiff detects breaking changes; codegen freshness check blocks stale types
  • Tool output types: LangChain tool outputs are Pydantic-modeled, flow through the same pipeline to frontend generative UI

Developer Workflow

  1. Change a Pydantic schema in the backend
  2. Run make flux-generate-api-types
  3. Frontend types, Zod schemas, and SDK update automatically
  4. If you forget, CI fails with “Generated API types are stale”

Execution Strategy

Sequencing

┌──────────────────────┐     ┌───────────────────┐     ┌──────────────────────┐
│  Cycle 210 Part 1    │     │  Cycle 205.1       │     │  Cycle 210 Part 2    │
│  Internal Pipeline   │ ──→ │  API Wiring        │ ──→ │  Canonical Schema    │
│  (Days 1-4)          │     │  (uses generated   │     │  & Integration       │
│                      │     │   types from 210)  │     │  Projection          │
│  - OpenAPI export    │     │                    │     │  (Days 5-8)          │
│  - hey-api codegen   │     │  - Wire CRUD pages │     │                      │
│  - Zod + forms       │     │  - Seed test data  │     │  - Canonical models  │
│  - CI drift detect   │     │  - Remove mocks    │     │  - Projections       │
│                      │     │                    │     │  - MCP tools         │
└──────────────────────┘     └───────────────────┘     │  - Legacy stubs      │
                                                        └──────────────────────┘
210 Part 1 must land before 205.1 begins. Otherwise 205.1 cements hand-maintained types into ~15+ more files and 210 becomes a migration instead of a foundation. 210 Part 2 can run after or in parallel with 205.1 — it extends schemas outward but doesn’t block frontend work.

Incremental Expansion

The canonical layer starts with hiring lifecycle entities (Job, Candidate, Interview, Offer) and expands:
PhaseEntitiesDriven By
Cycle 210Job, Candidate, Application, Interview, OfferCore hiring flow
Cycle 206Candidate portal types (candidate-facing projections)Candidate portal
HRIS cycleEmployee, Organization, CompensationPaychex/HRIS integrations
Distribution cycleJob distribution, posting analyticsJob board integrations
Compliance cycleAudit trail, consent recordsEU AI Act, EEOC
Each cycle adds entities to the canonical layer. The pipeline, projections, and MCP tools are built once and reused.

Decision Record

Why Pydantic as Source of Truth (Not Zod, Not JSON Schema)

  • Backend is Python. Pydantic is the natural validation layer.
  • FastAPI auto-generates OpenAPI from Pydantic. No manual spec authoring.
  • @hey-api/openapi-ts generates Zod from OpenAPI. Pipeline is fully automated.
  • Alternative (Zod-first) would require maintaining schemas in two languages or running Node.js in the backend build.

Why @hey-api/openapi-ts Over openapi-typescript

  • hey-api generates types + Zod + SDK in one pass. openapi-typescript generates types only.
  • hey-api’s Zod plugin produces runtime validators. openapi-typescript is compile-time only.
  • hey-api’s SDK plugin replaces hand-maintained query hooks.
  • ~977k npm weekly downloads, used by Vercel/PayPal. Production-proven.

Why Zod v4 Over Valibot/ArkType

  • Ecosystem dominance: react-hook-form, shadcn/ui, hey-api, TanStack all have first-class Zod support.
  • v4 closed the performance gap (6-14x faster than v3, 57% smaller).
  • @zod/mini at 1.9 KB for bundle-sensitive paths.
  • Standard Schema compliant — exit path to Valibot/ArkType if ever needed.

Why Canonical Models Separate From Domain Models

  • Internal models serve business logic (validation rules, ORM mapping, domain events).
  • Canonical models serve integration contracts (field naming, standards alignment, cross-system compatibility).
  • They evolve at different rates. A Flux refactor shouldn’t break every integration.
  • Projection adapters absorb the difference. Round-trip tests verify integrity.

Why Not a Unified API Platform (Merge/Finch) for All Integrations

  • Unified API platforms (Merge, Finch, Kombo) are excellent for rapid integration coverage.
  • But they own the data pipeline — adding latency, cost, and a dependency on their uptime.
  • Flux’s canonical layer enables BOTH: use Merge/Finch as an integration method (their data projects through canonical) AND build direct integrations for high-value systems.
  • The canonical layer is the abstraction. Unified APIs are one implementation strategy behind it.

References