Polars Kit¶
The Polars Kit is the frame-processing utility layer of django-mindoff. It provides shared DataFrame/LazyFrame helpers used by CRUD, validation, and payload-flattening workflows.
Its purpose is to keep frame operations predictable across eager/lazy execution modes while preserving model-aware data semantics.
For usage-focused examples, see Developer Guide - Polars Utilities.
Stability Warning
mo_polars_kit is not feature-complete yet and is currently experimental and subject to change without any prior notice. Refer to this page to keep informed on development with Polars utilities.
Architecture & Intent¶
Polars Kit has two runtime surfaces:
mo_polars_kit(MindoffPolarsKit) for frame utilities._polars_kit/json_to_frame.pyfor JSON-to-frame flattening and model-frame construction.
Together they support ingestion (json_to_frame), normalization (sync_model_frms_type), transformation (frm_fill_*), and operational checks (is_frm_empty, get_frm_height).
Core Runtime Components¶
| Component | Responsibility | Examples |
|---|---|---|
MindoffPolarsKit |
Public utility surface for emptiness checks, frame normalization, null transforms, and row counts. | is_frm_empty, frm_fill_null |
| Batch transform engine | Shared map/sink-map execution path for column mutation. | _apply_batch_transform |
| Payload flattener | Converts nested JSON payloads into table-like Polars frames. | PayloadFlattener.flatten() |
| Model-frame builder | Maps flattened path outputs to Django model classes. | build_model_frms |
Frame Utility Architecture (mo_polars_kit)¶
Emptiness & Height Primitives¶
is_frm_empty: handles bothDataFrameandLazyFrame, including schema-only lazy frames.is_model_frms_empty/is_model_frms_not_empty: map-wide emptiness predicates.get_frm_height: row count for eager and lazy frames using streaming collect for lazy.
These methods are used as control-flow gates in CRUD and validation pipelines.
Model-Frame Normalization¶
collect_model_frms: materializes all lazy values in a model-frame mapping.sync_model_frms_type: if any frame is lazy, converts all eager frames to lazy.
This enforces a single execution mode per operation and avoids mixed eager/lazy surprises.
Null-State Splitting¶
has_nulls_in_frm_col: null detection per column.split_model_frms_on_null: splits frames into valid/invalid partitions using an error marker column (default__error__info).
If the split column is absent, it is added as null and rows route to the valid partition.
Column Mutation Pipeline¶
frm_fill_null and frm_fill_notnull provide three strategies:
lit: literal replacement (or callable evaluated once for literal mode).map: batch map transformation in memory.sink_map: lazy sink-to-parquet and re-scan path for large lazy pipelines.
Shared rules:
map/sink_maprequire callablefill_value.frm_fill_notnullsupportsrow_paramonly withmap/sink_map.- dtype can be explicitly controlled; otherwise inferred from schema.
JSON Flattening Architecture (json_to_frame)¶
PayloadFlattener transforms nested payloads into table-style frames keyed by path.
Flatten Lifecycle¶
- Initialize root frame from payload union-of-keys.
- Ensure/normalize root PK column using
id_map["__root__"]. - Process nested paths in depth order.
- Extract subtables via explode/unnest semantics.
- Ensure each path-specific primary key column exists and is normalized.
- Drop nested object/list columns from parent tables after extraction.
Structural Rules¶
- Root PK mapping is mandatory.
- Nested values must be
dictorlist[dict](or null). - Missing nested key yields empty subtable frame.
- Missing path PK mapping raises error.
UUID Normalization¶
Supports uuid_mode:
hex: 32-char compact UUID.standard: hyphenated UUID format.
Existing IDs are normalized; missing IDs are generated via vectorized Polars expressions.
Frame Type Strategy¶
frame_type="dataframe": eager output.frame_type="lazyframe": lazy output.frame_type="auto": lazy when payload size exceedslazy_threshold.
Model Mapping Architecture (build_model_frms)¶
build_model_frms maps flattened path tables back to model classes.
Key constraints:
- Exactly one root model path is required (
"",".", or__root__). - Duplicate normalized paths are rejected.
- Model PK db columns are used as ID columns in generated
id_map.
Outputs preserve eager/lazy type consistency from flattening.
Integration With Other Runtime Layers¶
- CRUD Kit: uses emptiness checks, type sync, height checks, and null transforms.
- Row validation: uses fill helpers for default/value normalization.
- Payload ingestion workflows: use JSON flattening for model-aware frame generation.
- Validation Kit: guards mode/callable constraints inside mutation helpers.
Operational Caveats¶
sink_mapfor lazy frames writes temporary parquet files under temp storage and returnsscan_parquetlazy frames.is_model_frms_empty({})returnsTrueandis_model_frms_not_empty({})returnsFalseby Pythonall/anysemantics.- Split-on-null adds missing marker column, which can affect downstream schema expectations if not anticipated.
- JSON flattening assumes declared
id_mappath coverage; undeclared nested PK paths fail fast.
Troubleshooting the Kit¶
- Callable mode errors in fill helpers: verify
fill_valueis callable formap/sink_map. - Unexpected eager/lazy behavior: check
sync_model_frms_typeandframe_type/lazy_thresholdinputs. - Flattening fails on nested payload: ensure nested values are dict or list-of-dict and
id_mapcontains path PK. - Model frame mapping errors: verify exactly one root model and no duplicate normalized paths.
- Temporary-file concerns with sink mode: prefer
mapfor smaller datasets or manage temp storage lifecycle.