Skip to content

crandas.script

Functionalities to record scripts.

The user can package a crandas script by first executing crandas.script.record(path=filename). Then, every query that is executed will be recorded in a script. This does not modify the output: the user still needs to be able to execute the queries, e.g. by being connected to a test environment, or by setting session.dry_run = True.

After executing all their queries, the user can run crandas.script.end(), which will package the script into a json-formatted file. The session.analyst_key is included into the file. This file can be sent to the approvers to be signed off.

After receiving the signed query file from the approvers, it can be loaded using crandas.script.load(filename). Now the user, possibly being connected to a different engine server that requires authorization, can execute the same script as they executed when recording the script. Note that exactly the same queries have to be executed as before, in the same order.

Script(mode, name=None, auth_tag=None, session=None, map_dummy_handles=False, stepless=False, linear=True, path=None, analyst_key=None, script_extractor=None, include_outputs=False)

A script represents a sequence of queries, that can be automatically recorded, then approved, and later executed on a different engine instance. Within the context of a script, each query is called a "step", and it is assigned a number 0, 1, 2, ..., called its script_step.

The objective of a script is to internally link the inputs/outputs of operations, so that for example the output of step 0 is the input to step 1.

An active (not-closed) script is either recording or executing. When recording, each crandas command that is executed and sent to the server is locally appended to a list, with a script_step that is one higher than the last script_step. When all of the desired commands have been run, the script can be saved to a JSON-formatted file using Script.end(). This script can subsequently be authorized by approvers.

An authorized script can be loaded using load(). The resulting script is put in executing mode. For each crandas command that is executed, it is expected to exactly match the next command in the script. The matching authorization is then used, and sent along to the server.

By default, scripts are recorded and executed in "linear mode". This means that all steps need to be executed in order and that if a step results in an error during a script execution, the script execution is not allowed to continue. To allow the script to continue after certain errors, a crandas.script.allow_errors block can be used. It is also possible to disable linear mode (this needs to be done both when recording a script and when loading it); in this case, it is in principle allowed to perform the steps of the script in any order and also to continue executing the script in case an error occurred.

Stepless mode

Besides the above approach of having an increasing script_step for each executed crandas command, scripts may additionally contain stepless queries, that do not have any associated script_step. These commands may be recorded once and executed arbitrarily many times, and are useful for commands that need to be executed a variable number of times. Use crandas.script.stepless to temporarily enable stepless mode:

with cd.script.stepless():
    STEPLESS_QUERIES

When recording, any crandas command that is executed is then added to the script without a script step. When executing, any crandas command is then matched to the stepless queries in the script. In linear mode, a script can still continue in case of an error in a stepless query.

Example usage:
# Outside the script
handles = [cd.DataFrame({"a": [i]}).handle for i in range(5)]

# Inside the script
tables = []
with cd.script.stepless():
    for handle in handles:
        table = cd.get_table(handle, map_dummy_handles=False)
        tables.append(table)

all_rows = cd.concat(placeholders.Any(tables))

It is also possible to directly set script.stepless to enable stepless mode:

script.stepless = True
STEPLESS_QUERIES
script.stepless = False

Limitations

Since the step numbers are used to link the outputs and inputs of different steps, the outputs of stepless queries cannot be used as the inputs of other (stepless or stepped) queries. Stepless queries are especially useful together with placeholders, e.g. Any .

run_tag property

For each script execution, we generate a random 32-byte run_tag, to distinguish different executions of the same script to the server.

apply_to_command(cmd)

Add step information according to the current script to the given JSON command

close()

Deprecated

Use Script.end() instead.

current_step()

Returns current step number (starting at zero), or None if in a stepless context

end()

Denotes the end of a script. Either saves the current script (if it is being recorded) or closes it (if it is being executed).

To close a recording without writing to the .recording file, use crandas.script.end.

finish_step()

Finish step by incrementing the current step counter if not stepless

get_inputs()

Get the input tables used in the current script.

This returns a dictionary of Reference -> List[str], where the reference maps to a list of descriptions (currently, source code lines) of the queries.

handle_to_script_step(handle)

Returns (script_step, transaction_index) tuple for handle.

Returns tuple for the last script_step (that is not None) in which the given handle was previously used. The last instance is returned because of backward compatibility; or returns None if there is no such tuple.

print_inputs()

Print the input tables used in the current script.

reference(handle)

Obtain a script_step reference to the handle if it is known, or None

save(target=None, analyst_key=None)

Deprecated

Use Script.end instead. Save the recorded script to a file.

PARAMETER DESCRIPTION
target

The path to save the script to, or a file-like object (such as the result of a call to open). This should be specified here, or as path at record().

TYPE: str | Path | file-like object DEFAULT: None

analyst_key

A script will generally include the analyst_key of the user that created the script. The user may specify a signing or verification key to include, either as a key object, or a path to load the key from disk. This should be specified here, or at record().

Alternatively, if True (the default), the current crandas.base.session analyst_key will be used. If False, no analyst key will be included in the script.

TYPE: VerifyKey | SigningKey | str | Path | False | True(default) DEFAULT: None

skip(steps=1)

Skip the specified number of steps in the recorded script

allow_errors(session=None)

Context manager for allowing a script to continue in case of errors

Normally, if a script is run in linear mode (see Script ), and an error occurs any point during the script, then the script cannot be executed further. By running crandas commands in an allow_errors block, this behaviour is overridden for the commands in this block. In order to do this, the function must be in an allow_errors block both in during recording and execution.

Example usage:

cdf = cd.get_table("people")

# Analysis on people > 65
with cd.script.allow_errors():
    print(cdf.filter(cdf["age"]>65, threshold=10).describe())

# Analysis on people <= 65, allowed regardless of whether or
# not the previous analysis succeeded
with cd.script.allow_errors():
    print(cdf.filter(cdf["age"]<=65, threshold=10).describe())

current_script(session=None)

Returns the currently recording/loaded script.

end(*, session=None)

Denote the end of a script.

See Script.end.

is_executing(session=None)

Check if a script is being executed

PARAMETER DESCRIPTION
session

Session in which to check; if not given, crandas.base.session is used

TYPE: Session(optional) DEFAULT: None

RETURNS DESCRIPTION
bool

True if the session has an active script and this script is being executed

is_recording(session=None)

Check if a script is being recorded

PARAMETER DESCRIPTION
session

Session in which to check; if not given, crandas.base.session is used

TYPE: Session(optional) DEFAULT: None

RETURNS DESCRIPTION
bool

True if the session has an active script and this script is being recorded

load(path, *, session=None)

Load a script from a file for execution.

PARAMETER DESCRIPTION
path

File name of script to load

TYPE: str

record(*, path=None, name=None, map_dummy_handles=True, analyst_key=None, session=None, include_python_script=None, include_full_script=None, include_outputs=None, linear=True)

Start recording a new script.

Use this command before running a sequence of crandas commands. Afterwards, use end to save the script to a file (or reset to reset the script without saving). The script can subsequently be authorized by approvers. When the approved script is received, use load to load it, and then you may execute the exact same sequence of commands, in the same order, as you did when recording the script.

PARAMETER DESCRIPTION
path

The path to save the script to, or a file-like object (such as the result of a call to open).

TYPE: str | Path | file-like object DEFAULT: None

name

A name to attach to the script

TYPE: str(optional) DEFAULT: None

map_dummy_handles

Unless the user sets this to False, this causes all hexadecimal table handles that are used during recording to map to the table name dummy_name(handle). To upload tables with this name, save an object with the dummy_for argument, e.g., crandas.upload_pandas_dataframe(df).save(dummy_for="<handle>").

TYPE: bool DEFAULT: True

analyst_key

A script will generally include the analyst_key of the user that created the script. The user may specify a signing or verification key to include, either as a key object, or a path to load the key from disk.

Alternatively, if True (the default), the current crandas.base.session analyst_key will be used. If False, no analyst key will be included in the script.

TYPE: VerifyKey | SigningKey | str | Path | False | True(default) DEFAULT: None

include_python_script

Automatically include the recorded Python script used for the analysis in the .recording file.

TYPE: bool DEFAULT: True, the default is configurable by `config.settings.include_python_script`, see [`crandas.config`][crandas.config]

include_full_script

When include_full_script is True, include the full python script. When False, only include the analysis part of the recorded script, consisting of script.record() up to script.end().

TYPE: bool DEFAULT: False, the default is configurable by `config.settings.include_full_script`, see [`crandas.config`][crandas.config]

include_outputs

Include the outputs revealed by the recorded script on the design data. Similar outputs are revealed when the approved script is executed on production data. When include_python_script is False, no positional information about the outputs can be provided.

TYPE: bool DEFAULT: True, the default is configurable by `config.settings.include_outputs`, see [`crandas.config`][crandas.config]

linear

Record the script in linear mode. Scripts recorded in this mode need to be executed in the same order as they were recorded, without skipping steps and without raising errors. The latter condition can be relaxed by using crandas.script.allow_errors.

TYPE: bool DEFAULT: True

reset(session=None)

Reset the current script execution or recording (without saving the recording to file)

save(target=None, analyst_key=None, *, session=None)

Deprecated

Use Script.end instead. Save the recorded script to a file.

See Script.save.

stepless()

Context manager to facilitate stepless recording/execution. See crandas.script.Script for details on stepless queries.