Skip to content

Polars Utilities

mo_polars_kit provides frame-level utilities for transformation, normalization, and null handling in Polars-based API workflows. If you are preparing frames for bulk writes, review Data Operations (CRUD).

Stability Warning

`mo_polars_kit` is not part of the core offering and is currently experimental and subject to change without any prior notice. Refer to this page to keep informed on development with Polars utilities.

Implementation

Start by importing the polars kit in your API or utility module:

from django_mindoff.components.polars_kit import mo_polars_kit

1. Is Frm Empty

Check whether a Polars frame has zero rows.

Usage:

is_empty = mo_polars_kit.is_frm_empty(frm)

Parameters:

  • frm (pl.DataFrame | pl.LazyFrame): Frame to inspect.

Possible responses:

  • Returns True when frame has no rows.
  • Returns False when at least one row exists.

2. Is Model Frms Empty

Check whether every frame in a model-frame map is empty.

Usage:

all_empty = mo_polars_kit.is_model_frms_empty(model_frms)

Parameters:

  • model_frms (dict[Any, pl.DataFrame|pl.LazyFrame]): Model-frame mapping.

Possible responses:

  • Returns True when all mapped frames are empty.
  • Returns False when any mapped frame contains rows.

3. Is Model Frms Not Empty

Check whether at least one frame in a model-frame map is not empty.

Usage:

has_data = mo_polars_kit.is_model_frms_not_empty(model_frms)

Parameters:

  • model_frms (dict[Any, pl.DataFrame|pl.LazyFrame]): Model-frame mapping.

Possible responses:

  • Returns True when at least one frame has rows.
  • Returns False when all frames are empty.

4. Collect Model Frms

Collect all lazy frames in a model-frame map into eager DataFrames.

Usage:

collected = mo_polars_kit.collect_model_frms(model_frms, streaming=True)

Parameters:

  • df_dict (dict[Any, pl.DataFrame|pl.LazyFrame]): Input model-frame mapping.
  • streaming (bool, default=True): Uses streaming collection for LazyFrame inputs.

Possible responses:

  • Returns dict[Any, pl.DataFrame] with all values materialized as DataFrame.

5. Sync Model Frms Type

Normalize model-frame map to a single Polars execution type.

Usage:

synced = mo_polars_kit.sync_model_frms_type(model_frms)

Parameters:

  • model_frms (dict[Any, pl.DataFrame|pl.LazyFrame]): Input mapping.

Behavior:

  • If any value is LazyFrame, all DataFrame values are converted to LazyFrame.
  • If all values are DataFrame, mapping is returned unchanged.

Possible responses:

  • Returns normalized model-frame mapping.

6. Has Nulls In Frm Col

Check whether a frame column contains null values.

Usage:

has_nulls = mo_polars_kit.has_nulls_in_frm_col(frm, "email")

Parameters:

  • frm (pl.DataFrame | pl.LazyFrame): Frame to inspect.
  • column (str): Target column name.

Possible responses:

  • Returns True when at least one null is present.
  • Returns False when no nulls exist.

7. Split Model Frms On Null

Split model-frame map into valid/invalid partitions by nullability of an error column.

Usage:

valid, invalid = mo_polars_kit.split_model_frms_on_null(
    model_frms,
    column="__error__info",
)

Parameters:

  • model_frms (dict[Any, pl.DataFrame|pl.LazyFrame]): Model-frame mapping.
  • column (str, default="__error__info"): Error/status column used for split.

Possible responses:

  • Returns (valid_model_frms, invalid_model_frms).
  • Valid partition contains rows where column is null.
  • Invalid partition contains rows where column is not null.

8. Frm Fill Null

Fill null values in a column using literal or callable modes.

Usage:

frm = mo_polars_kit.frm_fill_null(
    frm,
    column="status",
    fill_value="draft",
    mode="lit",
)

Parameters:

  • fr (pl.DataFrame | pl.LazyFrame): Target frame.
  • column (str): Column to update.
  • fill_value (Any | Callable): Literal value or callable for generated values.
  • mode ("lit" | "map" | "sink_map"): Fill strategy.
  • dtype (type | None, default=None): Target dtype for generated values.
  • **custom_params: Additional kwargs passed to callable fill function.

Varieties:

  • lit: direct literal fill.
  • map: in-memory batch mapping.
  • sink_map: lazy sink to parquet then scan back for large lazy pipelines.

Possible responses:

  • Returns transformed DataFrame/LazyFrame with nulls filled.

9. Frm Fill Notnull

Transform non-null values in a column using literal or callable modes.

Usage:

frm = mo_polars_kit.frm_fill_notnull(
    frm,
    column="email",
    fill_value=lambda row: row.lower(),
    mode="map",
    row_param="row",
    dtype=pl.Utf8,
)

Parameters:

  • fr (pl.DataFrame | pl.LazyFrame): Target frame.
  • column (str): Column to update.
  • fill_value (Any | Callable): Literal value or callable transform.
  • mode ("lit" | "map" | "sink_map"): Transformation strategy.
  • row_param (str | None, default=None): Callable kwarg name for current non-null value.
  • dtype (type | None, default=None): Target dtype.
  • **custom_params: Extra kwargs for callable transform.

Varieties:

  • lit: replace all non-null values with one literal.
  • map: apply callable across batches.
  • sink_map: lazy sink-based map flow for large lazy frames.

Possible responses:

  • Returns transformed DataFrame/LazyFrame with non-null values updated.

10. Get Frm Height

Get frame row count for eager or lazy Polars inputs.

Usage:

total_rows = mo_polars_kit.get_frm_height(frm)

Parameters:

  • frm (pl.DataFrame | pl.LazyFrame): Target frame.

Possible responses:

  • Returns row count as int.

Example Usage

import polars as pl
from django_mindoff.components.polars_kit import mo_polars_kit

frm = pl.DataFrame({
    "email": ["A@EXAMPLE.COM", None],
    "status": [None, "active"],
})

normalized = mo_polars_kit.frm_fill_notnull(
    frm,
    column="email",
    fill_value=lambda row: row.lower(),
    mode="map",
    row_param="row",
    dtype=pl.Utf8,
)

filled = mo_polars_kit.frm_fill_null(
    normalized,
    column="status",
    fill_value="draft",
    mode="lit",
)

Troubleshooting

  • frm_fill_null and frm_fill_notnull support lit, map, and sink_map modes.
  • Use sink_map for lazy/streaming pipelines where full in-memory materialization is not desired.