Authorization: Production¶

After creating a design script and receiving an approval, we can now execute the same analysis using real data.

The analysis must be the same as the one that was approved. Any deviation will be prevent the script from being run.

Attention

This script must be run in the authorized/production environment

When running the script on production data, you can duplicate the design script and just make some minor changes:

These changes are:

Remove the dummy data creation/uploads as it will use get_table() to access the real data using the handles from upload
Replace cd.script.record() with cd.script.load('{ANALYSIS NAME}.approved') (fill in the name of the .approved file you downloaded from the web portal).
Replace cd.script.save() with cd.script.close().

Changing any other part of the script when executing in production, will produce a NoMatchingAuthorization error. Ensure that you reference your analyst key (such that it lets you execute the script), this key needs to correspond to the user that uploaded the script for approval.

#load libraries
import crandas as cd
import plotly.express as px


# On a jupyter environment provided by Roseman Labs, session variables are set automatically in the background - unless you wish to test the approval flow within jupyter (shown below)
# Set the session base_path and session.endpoint manually when executing this notebook in any other environment

# To test the approval flow in Jupyter, you will need the following lines to set your analyst key (downloaded from the portal). This must be referenced correctly.
cd.base.session.analyst_key = 'PATH_TO_ANALYST_KEY_HERE'

# session.endpoint = 'https://localhost:9820/api/v1'

Load the approval file¶

After downloading an approval from the web portal, it must be loaded to crandas.

# The approval file should be in the same directory as this script
script = cd.script.load('analysis.approved')

Note: the above file can only be downloaded following approval of your uploaded analysis

The handles for each party below should be updated to refer to the handles linked to the production datasets uploaded.

# Replace the handles from the original input
party1_table = cd.get_table('INPUT_TABLE_1_HANDLE_HERE')

# Replace the handles from the original input
party2_table = cd.get_table('INPUT_TABLE_2_HANDLE_HERE')

Join tables¶

After doing this, we can go through the same analysis that we did in the design state.

merged = cd.merge(party1_table, party2_table, how='inner', on=['year', 'month', 'day', 'article_nr', 'batch_nr', 'smokes'])

merged['condition_y'].mean()

merged[(merged["smokes"]==1)]["condition_y"].mean()

def compute_sum(month):
    try:
        result = merged[((merged["condition_y"]==1) & (merged['month']==month)).with_threshold(3)]["condition_y"].sum()
        return result             # computes means only if there are 3 cases for the month
    except:
        return None                # if not, no output is given

dic={"Month": ['Jan', 'May', 'September'], "Sum of Asthma": [
     compute_sum(1),compute_sum(5),compute_sum(9)]}

# Data visualization is done with data that is already iin the clear, so it does not need to be in the script.
# But it can
plot=px.bar(dic, x="Month", y="Sum of Asthma")
plot.show()

The following code diverges from the approved script. Uncommenting it should result in a NoMatchingAuthorization error.

#merged[(merged["smokes"]==0)]["condition_y"].mean()

Finally we reset the script, as we now only need to run local operation.

script = cd.script.reset()

As mentioned in the design step, the visualization does not involve the VDL so it can be done outside of the script

plot=px.bar(dic, x="Month", y="Sum of Asthma")
plot.show()

We have now executed a script on private data only after receiving the appropriate permissions. We have also seen that attempting to do any different analysis will not work.