Skip to content

Data Operations (CRUD)

Use mo_crud_kit for model-aware bulk create/read/update workflows with Polars frames. This workflow is designed for data-heavy endpoints where serializer-by-row patterns become a bottleneck. It applies directly to high-volume ingestion and update APIs.

Prerequisites

  • Models are migrated and uses UUID primary/foreign keys.

Implementation

mo_crud_kit provides model-aware operations for bulk create, read, and update. It uses model metadata plus Polars validation to keep writes fast and consistent.

1. Model Frames

mo_crud_kit works on a model-frame mapping:

{
    OrderModel: order_df_or_lazy,
    OrderItemModel: item_df_or_lazy,
}

Each write operation returns:

  • status: ok, partial_ok, or fail
  • valid_model_frms: rows that passed validation
  • invalid_model_frms: rows with error metadata

Sample Model Frame

import polars as pl
from apps.orders.models import OrderModel

order_df = pl.DataFrame(
    {
        "order_id": ["f2fa1a5b7abf4f37a3f7e14725c0b211"],
        "status": ["draft"],
        "total_amount": [120.50],
    }
)

model_frms = {
    OrderModel: order_df,
}

2. Create

Create rows from model-to-frame mappings with optional validation pipeline.

Usage:

from django_mindoff.components.crud_kit import mo_crud_kit

status, valid_model_frms, invalid_model_frms = mo_crud_kit.create(
    {
        OrderModel: order_df,
        OrderItemModel: order_item_df,
    },
    is_partial=False,
    is_validate=True,
    batch_size=1000,
)

Parameters:

  • model_frms (dict[type[models.Model], pl.DataFrame|pl.LazyFrame]): Input model-frame mapping for bulk insert.
  • is_partial (bool, default=False): If True, allows partial save when only a subset of rows are valid.
  • is_validate (bool, default=True): If True, runs column, row, and foreign-key validation before insert.
  • batch_size (int, default=1000): Batch size used by the underlying write process.

Varieties:

  • Validation mode: is_validate=True runs ColumnValidator -> RowValidator -> ForeignKeyValidator.
  • Partial-save mode: is_partial=False fails if any invalid rows exist; is_partial=True saves valid rows and returns invalid rows separately.

Possible responses:

  • Returns ("ok", valid_model_frms, {}) when all rows are valid and inserted.
  • Returns ("partial_ok", valid_model_frms, invalid_model_frms) when partial mode is enabled and some rows are invalid.
  • Returns ("fail", valid_model_frms, invalid_model_frms) when validation fails and no write should proceed.

Notes:

  • Invalid rows contain an error column (POLARS_VALIDATOR_ERROR_COL or __error__info).
  • is_validate=False skips safety checks and may persist unsafe data.

3. Read

Read queryset data into Polars with streaming/pagination variants.

Usage:

from django_mindoff.components.crud_kit import mo_crud_kit

frm, stats = mo_crud_kit.read(
    OrderModel.objects.filter(is_active=True).values(),
    page_number=1,
    is_lazy=False,
    batch_size=100,
)

Parameters:

  • qs (models.QuerySet): Queryset that must use .values() output.
  • page_number (int|None, default=None): When provided, enables pagination mode. When None, uses streaming mode.
  • is_lazy (bool, default=False): If True, returns pl.LazyFrame; otherwise returns pl.DataFrame.
  • batch_size (int, default=0): Chunk/page size. Auto-resolved when 0.

Varieties:

  • Streaming mode (page_number=None): reads full dataset in chunks.
  • Pagination mode (page_number=<n>): reads one page and returns paging metadata.
  • Materialization mode: eager (DataFrame) or lazy (LazyFrame).

Possible responses:

  • Returns (frm, stats) where frm is DataFrame/LazyFrame and stats includes: mode, batch_size, total_count, total_pages, current_page, has_next, has_previous.
  • Returns empty frame with zeroed stats for empty querysets.
  • Raises validation error if queryset is not .values()-based.

4. Update

Upsert rows from model-to-frame mappings with optional staged merge strategy.

Usage:

from django_mindoff.components.crud_kit import mo_crud_kit

status, valid_model_frms, invalid_model_frms = mo_crud_kit.update(
    {
        OrderModel: order_updates_df,
    },
    is_partial=True,
    is_validate=True,
    batch_size=1000,
    is_temp_table=True,
)

Parameters:

  • model_frms (dict[type[models.Model], pl.DataFrame|pl.LazyFrame]): Input model-frame mapping for bulk update/upsert.
  • is_partial (bool, default=False): If True, allows valid rows to proceed even when invalid rows exist.
  • is_validate (bool, default=True): If True, applies model-based column/row/FK validation before update.
  • batch_size (int, default=1000): Batch size used for missing-column fetch and update processing.
  • is_temp_table (bool, default=True): If True, writes to staging table then merges; if False, performs direct dialect-specific upsert.

Varieties:

  • Validation mode: enabled (is_validate=True) or skipped (is_validate=False).
  • Partial mode: fail-fast (is_partial=False) or partial success (is_partial=True).
  • Upsert mode: staging merge (is_temp_table=True) or direct upsert (is_temp_table=False).

Possible responses:

  • Returns ("ok", valid_model_frms, {}) when all rows are valid and updated.
  • Returns ("partial_ok", valid_model_frms, invalid_model_frms) when partial mode is enabled and some rows are invalid.
  • Returns ("fail", valid_model_frms, invalid_model_frms) when validation blocks update.

Notes:

  • Missing DB columns are auto-fetched using primary key before validation.
  • Invalid rows include model-aware error details in error column.

Example Usage

from django_mindoff.components.crud_kit import mo_crud_kit
from apps.orders.models import OrderModel

# CREATE
create_status, create_valid, create_invalid = mo_crud_kit.create(
    {OrderModel: order_df},
    is_validate=True,
    is_partial=False,
)

# READ (queryset must use values())
orders_frm, stats = mo_crud_kit.read(
    OrderModel.objects.filter(is_active=True).values(),
    page_number=1,
    batch_size=100,
)

# UPDATE
update_status, update_valid, update_invalid = mo_crud_kit.update(
    {OrderModel: order_df},
    is_validate=True,
    is_partial=True,
    is_temp_table=True,
)

Core Concepts

1. Polars Serialization

mo_crud_kit uses model metadata plus Polars validators (ColumnValidator, RowValidator, ForeignKeyValidator) to sanitize and validate rows before DB writes. This is the intended replacement for serializer-driven bulk validation in data-heavy pipelines.

What this gives you:

  • Type normalization aligned with Django field definitions.
  • Constraint checks (required/nullability, choices, min/max, length, FK consistency).
  • Structured invalid-row capture in the configured error column (POLARS_VALIDATOR_ERROR_COL, default __error__info).

Validate before write

mo_crud_kit is built for validated tabular data. Keep request-level validation in API code before CRUD execution.

2. Limitations

  • ManyToManyField is not supported in row validation.
  • BinaryField is not supported in row validation.
  • Any Django field not mapped in row validator dtype map is unsupported.
  • Models must use UUID primary keys for CRUD validation flow.
  • Primary key and foreign key fields are expected to define explicit db_column.
  • mo_crud_kit.read() requires queryset .values() input.
  • mo_crud_kit.delete() is not currently exposed.

Troubleshooting

  • read() fails with shape/type errors
    Confirm queryset input uses .values() and field names align with frame columns.
  • Rows are silently excluded from writes
    Inspect invalid_model_frms and the configured error column to trace validation failures.
  • FK validation fails unexpectedly
    Check UUID types and explicit db_column configuration on related fields.