Authorization: Design ===================== In this script, we merge and analyze the data from 2 pharmaceutical companies. This tutorial shows the process of creating a script that has to be approved by the approvers. At the end of this script, we will have a file that must be signed by the participating parties before the *same* analysis can be executed over “real” data. .. important:: This script must be run in design mode While we perform data merging, compute conditional values and visualize the results using bar plots, the main goal of this tutorial is script recording. We will learn to use the ``record`` function to record the script for approval. This function creates a ``.recording`` file that can be uploaded to the platform. .. code:: python #load libraries import crandas as cd from pathlib import Path import plotly.express as px In order to record a script, we need to create a table with the same structure as the real data (i.e. the data that will be used in production). This *dummy* data will be used in the script recording as a ‘stand-in’ for the actual data. In order to do this for another party’s data, the data owner must have communicated the data structure. Alternatively, the analyst could obtain this information from the platform data source page. The variable ``dummy_for`` is used to link the dummy data from script recording with the real data when using the recorded script. *Note: Both dummy datasets are uploaded to the platform* **prior** *to the recording* .. code:: python # Insert the handles of each table from the platform prod_table1_handle = 'INSERT_TABLE_1_HANDLE_HERE' prod_table2_handle = 'INSERT_TABLE_2_HANDLE_HERE' .. code:: python # Upload dummy data for party 1 party1_data = cd.read_csv('../../data/auth_data/party1_dummy.csv', auto_bounds=True, dummy_for=prod_table1_handle) # Don't forget to change the handle above We also need to create dummy data for the table belonging to the other party. In this case, we assume that the other party provided a dummy data file ``party2_dummy.csv``. When uploading this data, we also have to specify the handle of the *real* table that this table represents using ``dummy_for``. .. code:: python # Upload dummy data for party 2 party2_data = cd.read_csv('../../data/auth_data/party2_dummy.csv', auto_bounds=True, dummy_for=prod_table2_handle) # Don't forget to change the handle above Start the script recording -------------------------- .. code:: python script = cd.script.record() Now that the recording has started we need to access the production tables using their handles (for the recording it will use the dummy dataframe/csv we uploaded above). .. code:: python # Add the handle of Party 1's data party1_table = cd.get_table(prod_table1_handle) Here we use the handle from the data source that Party 2 uploaded to the platform. .. code:: python # Add the handle of Party 2's data party2_table = cd.get_table(prod_table2_handle) Join tables ----------------------------------- Here we do a simple join that we will query afterwards. Note that a join requires the keys they are being joined on to be identical. .. code:: python merged = cd.merge(party1_table, party2_table, how='inner', on=['year', 'month', 'day', 'article_nr', 'batch_nr', 'smokes']) repr(merged) Retrieve statistics ------------------------ .. code:: python merged['condition_y'].mean() .. code:: python merged[(merged["smokes"]==1)]["condition_y"].mean() .. code:: python def compute_sum(month): try: result = merged[((merged["condition_y"]==1) & (merged['month']==month)).with_threshold(3)]["condition_y"].sum() return result # computes means only if there are 3 cases for the month except: return None # if not, no output is given dic={"Month": ['Jan', 'May', 'September'], "Sum of Asthma": [ compute_sum(1),compute_sum(5),compute_sum(9)]} # Data visualization is done with data that is already in the clear, so it does not need to be in the script. # But it can plot=px.bar(dic, x="Month", y="Sum of Asthma") plot.show() Finally we save the script (everything from ``script.record()`` up to here) as a ``.recording`` file for you to download and upload to the platform for approval. .. code:: python script = cd.script.save('analysis.recording') After getting approval for the script, it can be executed with production data. In order to do this, use the **Production** script in authorized mode.