What is the Canonical Data Model?¶

The Canonical Data Model (CDM) is DIBOP's universal data schema -- a single, shared definition of how business data is structured. It is the foundation that makes DIBOP's integration capabilities possible.

The Problem Without a CDM¶

Imagine you have five systems: an OEM portal, a DMS, a CRM, a finance platform, and a telematics provider. Each system uses its own names and formats for the same concepts:

Concept	OEM Portal	DMS	CRM	Finance	Telematics
Vehicle ID	`fahrzeug_id`	`stock_vin`	`asset_number`	`collateral_id`	`device_vehicle_id`
Customer Name	`kunde_name`	`cust_full_name`	`contact_name`	`borrower_name`	`owner`
Mileage	`kilometerstand`	`odometer_reading`	`mileage`	N/A	`total_distance_km`

Without a common model, connecting these systems requires building a separate translation for every pair: OEM-to-DMS, OEM-to-CRM, DMS-to-CRM, DMS-to-Finance, and so on. For N systems, that is N x (N-1) translations -- a number that grows rapidly.

The CDM Solution¶

The CDM introduces a single, authoritative definition for each business concept:

CDM Field	Definition
`vin`	The 17-character Vehicle Identification Number
`customer_name`	The customer's full name
`mileage_km`	Current odometer reading in kilometres

Each connector maps its native fields to and from the CDM -- once. Any pair of systems can then exchange data by going through the CDM:

OEM Portal → CDM → DMS
OEM Portal → CDM → CRM
DMS → CDM → Finance
Telematics → CDM → DMS

For N systems, you need only 2N translations (one inbound, one outbound per system) instead of N x (N-1). Adding a new system requires only one new set of mappings, and it instantly works with all existing systems.

CDM Structure¶

The CDM is organised into domains, each representing a business entity:

Canonical Data Model
├── Vehicle
│   ├── vin (string, required)
│   ├── make (string)
│   ├── model (string)
│   ├── year (integer)
│   ├── mileage_km (number)
│   └── ... (30+ fields)
├── Customer
│   ├── customer_id (string, required)
│   ├── customer_name (string, PII)
│   ├── email (string, PII)
│   ├── phone (string, PII)
│   └── ... (20+ fields)
├── Order
│   ├── order_id (string, required)
│   ├── order_date (datetime)
│   ├── total_amount (number)
│   └── ...
└── ... (22 domains total)

Each domain has:

Required fields: Must be present for the record to be valid
Optional fields: Available for mapping but not required
PII classification: Each field is marked as None, PII, or Sensitive PII
Data types: Strict typing (string, number, integer, boolean, datetime)

Why "Canonical"?¶

The word "canonical" means "according to a standard rule or authority." In data modelling, the canonical model is the one true representation. When two systems disagree about how to name a field, the CDM provides the definitive answer.

This means:

There is exactly one definition of what a "vehicle" looks like
All connectors translate to and from this one definition
All orchestrations work with this one set of field names
No ambiguity about what a field means or what format it uses

How the CDM Is Used¶

In Connectors¶

Each connector includes mappings that translate its native fields to CDM fields and back:

Mercedes-Benz OneAPI          CDM
────────────────────          ───
fin                    →      vin
marke                  →      make
modell                 →      model
erstzulassung          →      first_registration_date
kilometerstand         →      mileage_km

These mappings are defined when the connector is built (using the Connector SDK) and can be enhanced with AI-assisted mapping.

In Orchestrations¶

When you build an orchestration, you work with CDM fields:

{
  "step": "enrich_vehicle",
  "input": {
    "vin": "${fetch_vehicle.vin}",
    "make": "${fetch_vehicle.make}",
    "model": "${fetch_vehicle.model}"
  }
}

The CDM field names are consistent regardless of which source or target system is involved.

In Monitoring¶

The CDM enables cross-system analytics. Because all data is normalised to the same fields, you can:

Count vehicles across all systems using a single query
Track customer records across DMS and CRM using a shared customer_id
Identify data discrepancies between systems (e.g., different mileage values for the same VIN)

PII Classification¶

Every field in the CDM is classified for personally identifiable information:

Classification	Meaning	Examples	How DIBOP Handles It
None	Not personally identifiable	Vehicle make, model, year	Shown in full in logs and exports
PII	Personally identifiable	Name, email, phone, address	Partially masked in logs; encrypted at rest
Sensitive PII	Highly sensitive	Government ID, SSN, financial accounts	Fully redacted in logs; encrypted at rest

PII classification is enforced automatically:

Execution logs and API call logs mask PII fields
Exports include PII warnings
Retention policies can be configured separately for PII data

See the Canonical Explorer to view PII classifications for every field.

Extensibility¶

The CDM is not static. It grows as DIBOP adds support for new business domains:

DIBOP ships with 22 standard domains
Platform Admins can request new domains for industry-specific needs
Custom fields can be added to existing domains without breaking existing mappings

Standard First

Always check the existing CDM domains and fields before requesting custom additions. The more data that maps to standard fields, the more interoperable your integrations become.

Next Steps¶

Available Domains -- browse all 22 domains with field lists
Field Mapping -- learn how to map connector fields to the CDM
Canonical Explorer -- explore the CDM interactively in the DIBOP UI
CDM in Orchestrations -- how orchestrations use CDM fields