CRUD Kit¶
The CRUD Kit is the data-engineering layer of django-mindoff. It runs model-aware bulk create, read, and update operations over Polars frames while enforcing Django model constraints.
It is designed for high-throughput tabular workflows where API-level validation and storage-level consistency both matter.
For usage-level CRUD implementation patterns, see Developer Guide - Data Operations (CRUD).
Architecture & Data Flow¶
At a high level, write operations pass through one shared validation pipeline before they reach database execution.
Core Runtime Components¶
| Component | Responsibility | Examples |
|---|---|---|
MindoffCRUDHandler |
Public API surface and lifecycle orchestration (create, read, update). |
mo_crud_kit.create(...), mo_crud_kit.read(...) |
| Validators | Normalize and validate structure, field values, and FK integrity. | ColumnValidator, RowValidator, ForeignKeyValidator |
_ModelFrmsValidInvalidSplitter |
Splits valid/invalid rows and propagates invalid dependency chains across related models. | Parent-child FK invalidation propagation |
CRUDProcessor |
Executes DB writes through SQLAlchemy/Polars with dialect-specific upsert behavior. | Append inserts, staging merge upserts |
Validation Pipeline¶
create(...) and update(...) share the same ordered write pipeline when is_validate=True.
1. Column Contract Validation¶
ColumnValidator enforces model-frame alignment before any row-level work:
- Requires UUID primary keys.
- Requires explicit
db_columnon primary key and foreign keys. - Normalizes field names to model
db_columnnames. - Handles missing/extra columns using configured behavior (CRUD defaults: add missing, remove extra).
- Fails fast on schema-contract errors.
2. Row Sanitization & Field Rules¶
RowValidator sanitizes and checks row values against Django field metadata:
- Text/slug/email/url normalization and type coercion.
- Numeric/date/time/duration/UUID/IP/JSON transformation.
- Default value application.
- Constraint checks (
required,max_length,max_digits, min/max validators, dtype matching). - Unsupported fields are explicitly rejected (for example
ManyToManyField,BinaryField).
3. Foreign-Key Integrity¶
ForeignKeyValidator verifies FK references:
- Uses in-memory related frames when related models are present in the same operation.
- Falls back to database existence checks when related frames are absent.
- Rejects unresolved FK references as validation failures.
4. Relationship-Aware Invalid Propagation¶
_ModelFrmsValidInvalidSplitter does more than row splitting:
- Collects invalid IDs per model.
- Propagates invalidity upward and downward across FK edges.
- Produces
valid_model_frmsandinvalid_model_frmsthat remain relationally consistent.
5. Partial vs Fail-Fast Behavior¶
After validation:
- Returns
failwhen no valid rows remain. - Returns
failwhen invalid rows exist andis_partial=False. - Returns
partial_okwhen invalid rows exist andis_partial=True. - Returns
okwhen all rows are valid.
When is_validate=False, CRUD writes proceed directly and emit runtime warnings about unsafe persistence.
Read Path Architecture¶
read(...) is intentionally strict on input and flexible on output.
Input Contract¶
- Queryset must be
.values()-based. batch_sizemust be>= 0.- When
batch_size=0, defaults are auto-selected to1000for streaming mode and100for pagination mode.
Execution Modes¶
- Streaming mode (
page_number=None): iterates queryset in chunks and concatenates into one Polars frame. - Pagination mode (
page_number=<n>): returns only one page plus paginator metadata.
Each mode supports eager (pl.DataFrame) and lazy (pl.LazyFrame) output.
Read Response Metadata¶
read(...) returns (frame, stats) where stats includes:
modebatch_sizetotal_counttotal_pagescurrent_pagehas_nexthas_previous
Write Engine Architecture¶
CRUDProcessor handles database writes and update strategies.
Engine & Dialect Resolution¶
- Builds a SQLAlchemy engine from Django
DATABASES. - Supports
sqlite,postgresql, andmysql. - Verifies table existence before writes.
Create Strategy¶
- Uses append semantics (
if_table_exists="append"). - Supports
DataFrameandLazyFrameinputs.
Update Strategies (Upsert)¶
update(...) supports two upsert paths:
- Staging merge mode (
is_temp_table=True, default): writes to temp table, then merges into target table. - Direct upsert mode (
is_temp_table=False): performs dialect-specific conflict update directly.
Before validation, update flow also auto-fills missing model columns by fetching current DB values using primary keys.
Operational Assumptions¶
The kit is opinionated. These are part of normal operation, not optional conventions:
- Models should use UUID primary keys.
- PK and FK columns should define explicit
db_column. read()expects.values()querysets.delete()is not currently exposed inmo_crud_kit.
Troubleshooting the Kit¶
Most issues fall into validation contract mismatches:
- Column errors at create/update start: check UUID PK and explicit
db_columnrequirements. - Rows unexpectedly invalid: inspect validator error column (
POLARS_VALIDATOR_ERROR_COLor__error__info). - FK validation failures: ensure related IDs exist in input frames or database.
- Read rejects queryset: confirm
.values()is called beforeread(). - Update conflict behavior differs by DB: verify chosen mode (
is_temp_table) and target dialect support.