Skip to content

What is the Canonical Data Model?

The Canonical Data Model (CDM) is DIBOP's universal data schema -- a single, shared definition of how business data is structured. It is the foundation that makes DIBOP's integration capabilities possible.


The Problem Without a CDM

Imagine you have five systems: an OEM portal, a DMS, a CRM, a finance platform, and a telematics provider. Each system uses its own names and formats for the same concepts:

Concept OEM Portal DMS CRM Finance Telematics
Vehicle ID fahrzeug_id stock_vin asset_number collateral_id device_vehicle_id
Customer Name kunde_name cust_full_name contact_name borrower_name owner
Mileage kilometerstand odometer_reading mileage N/A total_distance_km

Without a common model, connecting these systems requires building a separate translation for every pair: OEM-to-DMS, OEM-to-CRM, DMS-to-CRM, DMS-to-Finance, and so on. For N systems, that is N x (N-1) translations -- a number that grows rapidly.


The CDM Solution

The CDM introduces a single, authoritative definition for each business concept:

CDM Field Definition
vin The 17-character Vehicle Identification Number
customer_name The customer's full name
mileage_km Current odometer reading in kilometres

Each connector maps its native fields to and from the CDM -- once. Any pair of systems can then exchange data by going through the CDM:

OEM Portal → CDM → DMS
OEM Portal → CDM → CRM
DMS → CDM → Finance
Telematics → CDM → DMS

For N systems, you need only 2N translations (one inbound, one outbound per system) instead of N x (N-1). Adding a new system requires only one new set of mappings, and it instantly works with all existing systems.


CDM Structure

The CDM is organised into domains, each representing a business entity:

Canonical Data Model
├── Vehicle
│   ├── vin (string, required)
│   ├── make (string)
│   ├── model (string)
│   ├── year (integer)
│   ├── mileage_km (number)
│   └── ... (30+ fields)
├── Customer
│   ├── customer_id (string, required)
│   ├── customer_name (string, PII)
│   ├── email (string, PII)
│   ├── phone (string, PII)
│   └── ... (20+ fields)
├── Order
│   ├── order_id (string, required)
│   ├── order_date (datetime)
│   ├── total_amount (number)
│   └── ...
└── ... (22 domains total)

Each domain has:

  • Required fields: Must be present for the record to be valid
  • Optional fields: Available for mapping but not required
  • PII classification: Each field is marked as None, PII, or Sensitive PII
  • Data types: Strict typing (string, number, integer, boolean, datetime)

Why "Canonical"?

The word "canonical" means "according to a standard rule or authority." In data modelling, the canonical model is the one true representation. When two systems disagree about how to name a field, the CDM provides the definitive answer.

This means:

  • There is exactly one definition of what a "vehicle" looks like
  • All connectors translate to and from this one definition
  • All orchestrations work with this one set of field names
  • No ambiguity about what a field means or what format it uses

How the CDM Is Used

In Connectors

Each connector includes mappings that translate its native fields to CDM fields and back:

Mercedes-Benz OneAPI          CDM
────────────────────          ───
fin                    →      vin
marke                  →      make
modell                 →      model
erstzulassung          →      first_registration_date
kilometerstand         →      mileage_km

These mappings are defined when the connector is built (using the Connector SDK) and can be enhanced with AI-assisted mapping.

In Orchestrations

When you build an orchestration, you work with CDM fields:

{
  "step": "enrich_vehicle",
  "input": {
    "vin": "${fetch_vehicle.vin}",
    "make": "${fetch_vehicle.make}",
    "model": "${fetch_vehicle.model}"
  }
}

The CDM field names are consistent regardless of which source or target system is involved.

In Monitoring

The CDM enables cross-system analytics. Because all data is normalised to the same fields, you can:

  • Count vehicles across all systems using a single query
  • Track customer records across DMS and CRM using a shared customer_id
  • Identify data discrepancies between systems (e.g., different mileage values for the same VIN)

PII Classification

Every field in the CDM is classified for personally identifiable information:

Classification Meaning Examples How DIBOP Handles It
None Not personally identifiable Vehicle make, model, year Shown in full in logs and exports
PII Personally identifiable Name, email, phone, address Partially masked in logs; encrypted at rest
Sensitive PII Highly sensitive Government ID, SSN, financial accounts Fully redacted in logs; encrypted at rest

PII classification is enforced automatically:

  • Execution logs and API call logs mask PII fields
  • Exports include PII warnings
  • Retention policies can be configured separately for PII data

See the Canonical Explorer to view PII classifications for every field.


Extensibility

The CDM is not static. It grows as DIBOP adds support for new business domains:

  • DIBOP ships with 22 standard domains
  • Platform Admins can request new domains for industry-specific needs
  • Custom fields can be added to existing domains without breaking existing mappings

Standard First

Always check the existing CDM domains and fields before requesting custom additions. The more data that maps to standard fields, the more interoperable your integrations become.


Next Steps