crandas.check_recording

Functionalities to check script recordings.

This module provides functionality to detect possible problems that may cause a recorded script to give problems when used in production.

Note that the checks described on this page are only performed during script recording (and not, e.g., during script use).

Conditional call detection

Conditional call detection checks that crandas calls are not performed conditionally during script recording; see Crandas commands should be outside conditional branches.

For example, in the code example below, the call cd.DataFrame(...) is performed conditional on whether some_condition() holds. If this condition is true during script recording but false when applying the script on real data, this will cause an error when using the script:

import crandas as cd

cd.script.record()

if some_condition():
    cd.DataFrame(...)

The same is true, for example, when calling crandas functions from an if or while statement. Because such a crandas call can be potentially problematic, crandas gives a ConditionalCallDetected warning in such a case.

Note that, if the crandas call is not actually performed during script recording, e.g., if some_condition() is False, then no warning is given.

Suppressing warnings

In many cases, use of crandas from an if, for, etc., can be legitimate (for example, in the case of a for loop that is always called the same number of times). In such cases, the warning can be suppressed by adding a comment that contains the text crandas-dontwarn to the conditional statement, e.g.:

import crandas as cd

cd.script.record()

if some_condition():  # crandas-dontwarn
    cd.DataFrame(...)

Performing joins during script recording

When performing a join during script recording, crandas provides a warning that such a join can cause errors in production if the values of the join columns are not unique. For example, the following will cause a warning.

import crandas as cd
cd.script.record()

cdf = cd.demo_table(10, 10)
cd.merge(cdf, cdf, left_on="col1", right_on="col2")
cd.merge(cdf, cdf, left_on=cdf.groupby("col1"), right_on="col2")
cd.merge(cdf, cdf, left_on="col1", right_on=cdf.groupby("col2"))

To suppress this warning, pass the appropriate validate argument to the merge function: 1:1 for a one-to-one join; 1:m for a one-to-many join; or m:1 for a many-to-one join, e.g.:

(...)
cd.merge(cdf, cdf, on="col1", validate="1:1")
cd.merge(cdf, cdf, left_on=cdf.groupby("col1"), right_on="col2", validate="m:1")
cd.merge(cdf, cdf, left_on="col1", right_on=cdf.groupby("col2"), validate="1:m")

This warning can be disabled; see below.

Configuring the checking functionality

The checking functionality can be configured using the following configuration options (seee crandas.config):

  • check_recording (bool, default: True): enable/disable recording check

  • check_recording_label (str, default: crandas-dontwarn): label to suppress warnings (see above)

    NOTE: be careful to change this setting from outside of the script, because this lowers the reusability of scripts when sharing them with people with other settings

  • check_recording_throw (bool, default: False): throw exception instead of warning

  • check_recording_conditional (bool, default: True): if recording check is enabled, check for conditional calls

  • check_recording_join (bool, default: True): if recording check is enabled, check for joins

For example, the snippet below changes the label to suppress warnings and then uses the updated label:

import crandas as cd
from crandas.config import settings

settings.check_recording_label = "ignore"

cd.script.record()

if True:  # ignore
    cd.demo_table(1,1)
exception crandas.check_recording.ConditionalCallDetected(filename, conditional_node)

Bases: UserWarning

Reprents an error that a conditional crandas call was detected

exception crandas.check_recording.JoinDetected(*, left_on, right_on, **kwargs)

Bases: UserWarning

Warning that the user performs a join without the validate argument in script recording