Authorization: Production¶
After creating a design script and receiving an approval, we can now execute the same analysis using real data.
The analysis must be the same as the one that was approved. Any deviation will be prevent the script from being run.
Attention
This script must be run in the authorized/production environment
When running the script on production data, you can duplicate the design script and just make some minor changes:
These changes are:
Remove the dummy data creation/uploads as it will use
get_table()
to access the real data using the handles from uploadReplace
cd.script.record()
withcd.script.load('{ANALYSIS NAME}.approved')
(fill in the name of the .approved file you downloaded from the web portal).Replace
cd.script.save()
withcd.script.close()
.
Changing any other part of the script when executing in production, will
produce a NoMatchingAuthorization
error. Ensure that you reference your analyst key (such that it lets you execute the script), this key needs to correspond to the user that uploaded the script for approval.
#load libraries
import crandas as cd
import plotly.express as px
# To test the approval flow in Jupyter, you will need the following lines to set your analyst key (downloaded from the portal). This must be referenced correctly.
cd.base.session.analyst_key = 'PATH_TO_ANALYST_KEY_HERE'
Load the approval file¶
After downloading an approval from the web portal, it must be loaded to crandas.
# The approval file should be in the same directory as this script
script = cd.script.load('analysis.approved')
Note: the above file can only be downloaded following approval of your uploaded analysis
The handles for each party below should be updated to refer to the handles linked to the production datasets uploaded.
# Replace the handles from the original input
party1_table = cd.get_table('INPUT_TABLE_1_HANDLE_HERE')
# Replace the handles from the original input
party2_table = cd.get_table('INPUT_TABLE_2_HANDLE_HERE')
Join tables¶
After doing this, we can go through the same analysis that we did in the design state.
merged = cd.merge(party1_table, party2_table, how='inner', on=['year', 'month', 'day', 'article_nr', 'batch_nr', 'smokes'])
merged['condition_y'].mean()
merged[(merged["smokes"]==1)]["condition_y"].mean()
def compute_sum(month):
try:
result = merged[((merged["condition_y"]==1) & (merged['month']==month)).with_threshold(3)]["condition_y"].sum()
return result # computes means only if there are 3 cases for the month
except:
return None # if not, no output is given
dic={"Month": ['Jan', 'May', 'September'], "Sum of Asthma": [
compute_sum(1),compute_sum(5),compute_sum(9)]}
# Data visualization is done with data that is already iin the clear, so it does not need to be in the script.
# But it can
plot=px.bar(dic, x="Month", y="Sum of Asthma")
plot.show()
The following code diverges from the approved script. Uncommenting it
should result in a NoMatchingAuthorization
error.
#merged[(merged["smokes"]==0)]["condition_y"].mean()
Finally we reset the script, as we now only need to run local operation.
script = cd.script.reset()
As mentioned in the design step, the visualization does not involve the engine so it can be done outside of the script
plot=px.bar(dic, x="Month", y="Sum of Asthma")
plot.show()
We have now executed a script on private data only after receiving the appropriate permissions. We have also seen that attempting to do any different analysis will not work.