Skip to main content

RFC-001: Canonical Schema Layer — HR-Standard-Aligned Lingua Franca

Summary

Adopt a canonical schema layer at backend/domains/canonical/ aligned with HR Open 4.5, Schema.org JobPosting, O*NET/SOC, ESCO, Merge / Finch / Apideck common models. Internal Flux models (backend/domains/hiring/schemas.py) project to and from canonical; legacy systems and external integrations are coded against canonical, never against internal Flux schemas. The canonical layer is the external contract of the platform.

Motivation

Flux integrates with many external systems: legacy ATS (JazzHR, Lever, Jobvite), HRIS / payroll (via Merge / Finch), job boards (Indeed, JobGet, VONQ), and customer-specific HRIS instances (Paychex). Coding integrations against internal Flux models couples external partners to our internal evolution and makes it impossible to support an industry-standard contract. The PRD-driving constraint is that AI agents must be able to discover and validate integration contracts autonomously — meaning the contract has to be machine-readable, standards-aligned, and stable. Internal Flux models churn (new fields, renamed enums) at a rate incompatible with that requirement.

Detailed Design

Architecture

External / Legacy Systems
      ↑↓
┌───────────────────────────────────────────────┐
│  Canonical Schema Layer                       │
│  backend/domains/canonical/                   │
│                                               │
│   schemas/         9 canonical entities       │
│   crosswalks/      enum mappings ↔ standards  │
│   projections/                                │
│     flux_*         internal Flux ↔ canonical  │
│     external/      Schema.org / Merge / Finch │
│     legacy/        per-vendor projections     │
│   api/             REST endpoints             │
│   tools/           LangChain MCP tools        │
│   external_specs/  cached vendor OpenAPI      │
└───────────────────────────────────────────────┘
      ↑↓
Internal Flux Domain Models
backend/domains/hiring/schemas.py
The 9 canonical entities are: Job, Candidate, Application, Interview, Offer, Employee, Organization, Skill, Compensation. Every canonical model carries provenance: source_system + source_id.

Data Model

Canonical models are Pydantic v2. They include:
  • Standards alignment fields — fields named to match HR Open 4.5 / Schema.org wherever possible (e.g., Job.employmentType aligned with Schema.org’s JobPosting.employmentType)
  • Crosswalk-backed enums — values map to standard codes via canonical/crosswalks/ tables (O*NET/SOC, ESCO, ISO country/currency)
  • Provenance — every record records where it came from
  • Optional Flux-extension namespace_flux: dict for fields specific to Flux that don’t fit a standard

API Changes

New REST endpoints under /api/canonical/:
  • GET /canonical/schemas/{entity} — JSONSchema for the canonical entity
  • GET /canonical/integrations — registry of integration contracts
  • POST /canonical/projections/validate — validate an external payload against a canonical entity

Security Considerations

Canonical projection adapters must enforce tenant isolation — projecting a record for tenant A must never accidentally surface tenant B’s fields. The LegacyProjection protocol takes a tenant context as an explicit parameter.

Performance Considerations

Projection overhead is bounded: O(n) over the field count of the entity. For high-volume paths (job posting fanout), projections are computed once per record per channel and cached for the duration of the workflow.

Alternatives Considered

AlternativeProsConsWhy Rejected
Code integrations directly against internal Flux modelsSimple, no projection overheadCouples external partners to internal evolution; cannot adopt industry standards; AI agents have no stable contract to ground againstViolates the AI-discoverability constraint
Adopt one external standard wholesale (e.g., HR Open as our internal model)Single schema, no projectionHR Open has fields Flux doesn’t need, missing fields Flux does need; standards evolve slower than the productInternal velocity dies; we’d be perpetually waiting for standards bodies
Per-vendor adapters with no canonical layerEach adapter is smallN² adapter explosion as integrations grow; no shared semantics; no AI-discoverable contractDoesn’t scale past ~10 integrations
Generate canonical schemas from external specs (auto)Less manual schema authoringExternal specs vary in quality; generated schemas would be unstable; no opportunity to enforce Flux conventionsLoses schema design control

Migration Strategy

The canonical layer is additive. No breaking changes to internal Flux models. Migration:
  1. Land canonical schemas + projection protocols (this RFC)
  2. New integrations always go through canonical
  3. Existing JazzHR / Lever / Jobvite integrations gradually migrated to LegacyProjection
  4. Internal Flux model evolution continues independently — projection adapters absorb the change

Validation Plan

  • All 9 canonical entities defined in canonical/schemas/
  • At least one external projection per direction (Schema.org JobPosting, Merge candidate)
  • At least one legacy projection (JazzHR job)
  • Internal Flux Job ↔ canonical Job projection round-trips (no data loss)
  • AI tool query_canonical_schema returns valid JSONSchema for each entity
  • Integration contract registry is queryable via MCP

Risks

RiskLikelihoodImpactMitigation
Canonical schemas drift from internal Flux modelsMediumMediumRound-trip test gate in CI; projection adapter is the sync point
External standards evolve and break compatibilityLowMediumPin to a major version; bump cycles tracked as RFCs
Projection performance degrades on high-volume pathsLowMediumBenchmark per-entity; cache per-workflow; fall back to direct internal model in non-external paths

Open Questions

#QuestionOwnerTarget DateResolution
1Should canonical models be exposed via GraphQL in addition to REST?@pj2026-06-01Defer until a customer asks

Decision

Accepted by @pj on 2026-04-18 as the foundation for all external integrations and legacy system projections going forward. New integration cycles cite this RFC in source_rfcs. No new integration code may bypass the canonical layer without a superseding RFC.
This RFC established the canonical schema layer pattern. All integration cycles inherit this contract.