Data Platforms
OnParts OEM Lookup Ingestion & Azure SQL Incremental Load Pipeline
OEM reference-data ingestion pipeline that validates lookup files, normalizes OEM numbers, parses fitment years, prevents duplicate SQL records, and incrementally loads clean reference data into Azure SQL.
Reference data command view
OnParts OEM Lookup Ingestion
OEM lookup intake, validation, year parsing, normalization, existing-key comparison, and append-only Azure SQL loading
Receive
Local / Blob
Validate
Columns / Schema
Normalize
OEM / Years
Compare
Existing keys
Load
Azure SQL
Validation checks
Source → SQLReference load preview
Append-only| Field | Check | Output | Status |
|---|---|---|---|
| OEM Number | Normalize | Canonical key | Ready |
| Year Range | Parse | YearStart / YearEnd | Mapped |
| PingedCount | Clean | Optional metric | Validated |
| Existing Key | Compare | Append decision | De-duped |
SQL-ready outputs
Business problem
OEM lookup data needed repeatable ingestion while preventing duplicates. Source files could contain inconsistent OEM formatting, year ranges that needed parsing, optional fields that required cleanup, and records that may already exist in SQL.
The workflow needed a controlled reference-data loader that could validate the incoming file, normalize key fields, compare against existing records, and only append clean new rows into the database.
System built
Built local and blob ingestion, required-column validation, OEM-number cleanup, YearStart and YearEnd parsing, optional PingedCount normalization, schema validation, SQLAlchemy loading, and existing-key comparison.
The system turns OEM lookup files into a safer incremental SQL load process that supports downstream reference lookups, matching workflows, reporting, and future application logic.
Reference-data controls
Signals reviewed
The pipeline evaluates source readiness, schema quality, OEM key structure, fitment-year parsing, duplicate risk, and append-only load status before publishing lookup records into SQL.
Incremental loading workflow
How it works
Receive
Collect OEM lookup files from local or blob-based intake paths and prepare them for controlled loading.
The pipeline starts by treating lookup files as managed reference-data inputs instead of one-off spreadsheet uploads.
Validate
Check required columns, source structure, schema readiness, and field-level quality before transformation.
Validation protects the reference table from incomplete files, missing identifiers, and malformed records.
Normalize
Standardize OEM numbers, parse year ranges, normalize optional PingedCount values, and clean lookup fields.
Normalization turns inconsistent reference rows into predictable records that can be compared and loaded safely.
Compare
Compare incoming records against existing SQL keys to prevent duplicate reference-data inserts.
The comparison layer gives the loader memory so repeated runs can add only new records instead of reloading the same OEM references.
Load
Append clean, new OEM lookup records into Azure SQL for downstream reporting and lookup workflows.
The final layer publishes reference data into a stable SQL structure that can support analytics, matching, and operational lookup use cases.
System layers
What the loader coordinates
Source intake
Accepts local or blob-based lookup files and brings them into a controlled reference-data ingestion workflow.
Validation layer
Checks required columns, schema readiness, year fields, optional counters, and field-level quality before loading.
Normalization layer
Standardizes OEM numbers, parses YearStart and YearEnd values, and cleans fields used by downstream lookup logic.
Incremental load layer
Compares incoming keys against existing SQL records and inserts only new, clean reference-data rows.
Impact signals
What the pipeline improved
Incremental Azure SQL loading for OEM reference data
OEM number de-duplication and existing-key comparison
YearStart and YearEnd parsing for cleaner fitment context
Required-column and schema validation before publication
Reference-data readiness for downstream lookup and matching
Operational value
Reference data made safer for downstream use
Cleaner reference data
Turns messy lookup files into structured OEM records that are easier to search, match, and use downstream.
Safer repeated loads
Existing-key comparison and append-only logic reduce the risk of duplicate inserts across recurring runs.
Better fitment context
YearStart and YearEnd parsing gives the lookup table more useful structure for reporting and matching.
Operational SQL readiness
Publishes clean records into Azure SQL so the data can support analytics, APIs, lookup tools, and future pipelines.
Why this project matters
Reference-data pipelines matter because lookup quality affects every downstream match, report, and application workflow.
This project shows how OEM lookup data can move from file-based reference material into a controlled SQL asset. Validation, normalization, fitment-year parsing, duplicate prevention, and append-only loading make the reference table more reliable over time.
The value is not only loading a file. The value is protecting the quality of the reference layer that other systems depend on.
Confidentiality note
Visuals and descriptions are sanitized conceptual representations. They do not expose private company data, OEM source files, customer records, internal pricing, Azure SQL credentials, operational screenshots, proprietary lookup tables, or source pipeline logic.