Data Platforms

Customer Parts Movement History Loader & Azure SQL De-Dupe Pipeline

CSV-to-Azure SQL ingestion pipeline that normalizes customer movement history, validates period and quantity fields, prevents duplicate records, and loads append-ready data into a reporting foundation.

PythonPandasAzure SQLSQLAlchemyCSV LoadingData QualityDe-Dupe LogicAppend-Only Inserts

Movement-history loader view

Customer History Loader

CSV source files, header mapping, validation, de-dupe, and Azure SQL loading

de-dupe ready

Read

CSV

Map

Headers

Validate

Month / qty

De-dupe

Natural key

Load

Azure SQL

Data quality controls

Header aliases

Required fields

Month / year

Quantity coercion

Natural-key check

Load preview

append-only

Customer	Part	Period	Qty	Status
Customer A	100-4412	2026-05	14	New
Customer B	225-0198	2026-05	8	Exists
Customer C	300-7710	2026-06	21	New
Customer D	410-2250	2026-06	3	Review

Loader flow

CSVNormalizeValidateDe-dupeAzure SQL

CSV → SQL

Movement-history loading

Canonical Keys

Duplicate prevention

Append-Only

Controlled inserts

Business problem

Customer movement history required reliable loading, duplicate prevention, and cleaner structure before it could support reporting or downstream analytics. Repeated files, inconsistent headers, and period fields created risk for bad or duplicated history.

The process needed a loader that could normalize incoming CSV data, validate records, prevent duplicate inserts, and preserve a clean Azure SQL history table.

System built

Built a Python and Pandas loader with header mapping, month/year validation, quantity coercion, part-number cleanup, natural-key de-dupe logic, Azure SQL table creation, SQLAlchemy loading, and append-only inserts.

The system turns customer movement CSV files into a cleaner, de-duplicated database layer ready for reporting, movement analytics, and historical analysis.

Load signals

Signals reviewed

The loader evaluates file readiness, source structure, validation rules, and existing SQL records before inserting new movement-history data.

CSV file readiness

Header aliases

Required column coverage

Month / year validation

Quantity coercion

Part number normalization

Customer movement history

Natural-key identity

Existing SQL records

Duplicate candidate checks

Append-only insert readiness

Ingestion flow

How it works

Read

Load customer movement-history CSV files and inspect the incoming structure before transformation.

The loader starts by turning source files into a controlled dataframe so the rest of the pipeline can apply validation and mapping rules.

Map

Apply header mapping and alias handling so inconsistent source names align to the expected schema.

This makes the loader more resilient when source files use slightly different column names or export formats.

Validate

Validate month, year, quantity, required fields, and source readiness before records move into SQL.

The validation layer protects the target table from malformed dates, bad quantities, missing fields, and unusable rows.

De-dupe

Build canonical natural keys and compare incoming records against existing SQL history to prevent duplicate inserts.

The de-dupe logic gives the loader memory and makes repeated runs safer by avoiding duplicate movement history.

Load

Create or validate the Azure SQL table and insert only new, clean, append-ready records.

The final layer delivers a clean movement-history table that can support analytics, reporting, and downstream business logic.

Data engineering layers

What the loader coordinates

Header mapping

Normalizes inconsistent CSV column names into a stable schema that the loader can trust.

Validation layer

Checks month, year, quantity, required fields, and conversion readiness before SQL loading.

Canonical de-dupe

Uses natural-key logic to identify records that already exist and prevent duplicate movement history.

Azure SQL loading

Creates or validates the target table and performs append-only inserts through SQLAlchemy.

Impact signals

What the loader improved

Canonical de-duplication for safer repeated loads

SQLAlchemy loading into Azure SQL

Header alias handling for source-file flexibility

Month / year and quantity validation before insert

Scheduled-loader readiness for repeatable movement-history ingestion

Operational value

Movement history turned into a reliable SQL foundation

Cleaner history loading

Moves customer movement history from manual file handling into a repeatable ingestion pipeline.

Duplicate prevention

Natural-key logic helps prevent repeated source files from inflating movement history.

More reliable SQL tables

Validation and type coercion reduce the chance of malformed records landing in the database.

Analytics-ready foundation

Creates a cleaner history layer that can support customer behavior, movement, sales, and inventory analysis.

Why this project matters

Customer movement history converted into trusted database history.

This project shows how raw historical files can become a controlled data platform layer. Header mapping, validation, type cleanup, canonical de-dupe, and append-only SQL loading create a stronger foundation for analytics.

The value is not just loading rows. The value is protecting historical movement data from duplication, malformed records, and inconsistent source-file structure.

Confidentiality note

Visuals and descriptions are sanitized conceptual representations. They do not expose private company data, customer records, credentials, raw exports, internal pricing, operational screenshots, or proprietary source files.