Polars Kit¶
The Polars Kit is the frame-processing utility layer of django-mindoff. It provides shared DataFrame/LazyFrame helpers used by CRUD, validation, and payload-flattening workflows.
Its purpose is to keep frame operations predictable across eager/lazy execution modes while preserving model-aware data semantics.
For usage-focused examples, see Developer Guide - Polars Utilities.
Stability Warning
mo_polars_kit is not part of the core offering and is currently experimental and subject to change without any prior notice. Refer to this page to keep informed on development with Polars utilities.
Architecture & Intent¶
Polars Kit has two runtime surfaces:
mo_polars_kit(MindoffPolarsKit) for frame utilities._polars_kit/json_to_frame.pyfor JSON-to-frame flattening and model-frame construction.
Together they support ingestion (json_to_frame), normalization (sync_model_frms_type), transformation (frm_fill_*), and operational checks (is_frm_empty, get_frm_height).
Core Runtime Components¶
| Component | Responsibility | Examples |
|---|---|---|
MindoffPolarsKit |
Public utility surface for emptiness checks, frame normalization, null transforms, and row counts. | is_frm_empty, frm_fill_null |
| Batch transform engine | Shared map/sink-map execution path for column mutation. | _apply_batch_transform |
| Payload flattener | Converts nested JSON payloads into table-like Polars frames. | PayloadFlattener.flatten() |
| Model-frame builder | Maps flattened path outputs to Django model classes. | build_model_frms |
Frame Utility Architecture (mo_polars_kit)¶
Emptiness & Height Primitives¶
is_frm_empty: handles bothDataFrameandLazyFrame, including schema-only lazy frames.is_model_frms_empty/is_model_frms_not_empty: map-wide emptiness predicates.get_frm_height: row count for eager and lazy frames using streaming collect for lazy.
These methods are used as control-flow gates in CRUD and validation pipelines.
Model-Frame Normalization¶
collect_model_frms: materializes all lazy values in a model-frame mapping.sync_model_frms_type: if any frame is lazy, converts all eager frames to lazy.
This enforces a single execution mode per operation and avoids mixed eager/lazy surprises.
Null-State Splitting¶
has_nulls_in_frm_col: null detection per column.split_model_frms_on_null: splits frames into valid/invalid partitions using an error marker column (default__error__info).
If the split column is absent, it is added as null and rows route to the valid partition.
Column Mutation Pipeline¶
frm_fill_null and frm_fill_notnull provide three strategies:
lit: literal replacement (or callable evaluated once for literal mode).map: batch map transformation in memory.sink_map: lazy sink-to-parquet and re-scan path for large lazy pipelines.
Shared rules:
map/sink_maprequire callablefill_value.frm_fill_notnullsupportsrow_paramonly withmap/sink_map.- dtype can be explicitly controlled; otherwise inferred from schema.
JSON Flattening Architecture (json_to_frame)¶
PayloadFlattener transforms nested payloads into table-style frames keyed by path.
Flatten Lifecycle¶
- Initialize root frame from payload union-of-keys.
- Ensure/normalize root PK column using
id_map["__root__"]. - Process nested paths in depth order.
- Extract subtables via explode/unnest semantics.
- Ensure each path-specific primary key column exists and is normalized.
- Drop nested object/list columns from parent tables after extraction.
Structural Rules¶
- Root PK mapping is mandatory.
- Nested values must be
dictorlist[dict](or null). - Missing nested key yields empty subtable frame.
- Missing path PK mapping raises error.
UUID Normalization¶
Supports uuid_mode:
hex: 32-char compact UUID.standard: hyphenated UUID format.
Existing IDs are normalized; missing IDs are generated via vectorized Polars expressions.
Frame Type Strategy¶
frame_type="dataframe": eager output.frame_type="lazyframe": lazy output.frame_type="auto": lazy when payload size exceedslazy_threshold.
Model Mapping Architecture (build_model_frms)¶
build_model_frms maps flattened path tables back to model classes.
Key constraints:
- Exactly one root model path is required (
"",".", or__root__). - Duplicate normalized paths are rejected.
- Model PK db columns are used as ID columns in generated
id_map.
Outputs preserve eager/lazy type consistency from flattening.
Integration With Other Runtime Layers¶
- CRUD Kit: uses emptiness checks, type sync, height checks, and null transforms.
- Row validation: uses fill helpers for default/value normalization.
- Payload ingestion workflows: use JSON flattening for model-aware frame generation.
- Validation Kit: guards mode/callable constraints inside mutation helpers.
Operational Caveats¶
sink_mapfor lazy frames writes temporary parquet files under temp storage and returnsscan_parquetlazy frames.is_model_frms_empty({})returnsTrueandis_model_frms_not_empty({})returnsFalseby Pythonall/anysemantics.- Split-on-null adds missing marker column, which can affect downstream schema expectations if not anticipated.
- JSON flattening assumes declared
id_mappath coverage; undeclared nested PK paths fail fast.
Troubleshooting the Kit¶
- Callable mode errors in fill helpers: verify
fill_valueis callable formap/sink_map. - Unexpected eager/lazy behavior: check
sync_model_frms_typeandframe_type/lazy_thresholdinputs. - Flattening fails on nested payload: ensure nested values are dict or list-of-dict and
id_mapcontains path PK. - Model frame mapping errors: verify exactly one root model and no duplicate normalized paths.
- Temporary-file concerns with sink mode: prefer
mapfor smaller datasets or manage temp storage lifecycle.