.. _approvals:

The Approval Workflow
######################

When working with sensitive data, you have to be careful that no information is leaked.
Roseman Labs provides a system of script approvals to ensure that only the right computations are performed.
In order to do this, crandas has access to two distinct modes.
One to design the analysis, where no real data is found and therefore is not at risk of being leaked, and an authorized one, where only pre-approved computations are allowed.
This way, only the computations that have been agreed upon to be safe can be executed on the sensitive data.

In summary, there are two different modes for crandas:

- **Design Mode:** Any query is allowed and can be immediately executed by the engine. Allows for an interactive workflow over *dummy data*. Used for script design and recording.
- **Authorized Mode:** All queries require prior approval by a fixed set of *approvers* before they can be executed. Sensitive data should only be accessible through this mode. Used to execute scripts that have been recorded and approved.

When connecting to the engine in design mode, data from the authorized mode is not available, and the other way around.
Please check the `help center <https://support.rosemanlabs.com/design-and-production>`_ for more information on authorized and design modes.

Working in authorized mode
==========================
These are the steps that need to be followed in order to execute any computation in the authorized domain.
We will into detail in every one of them:

1. Uploading sensitive data to the engine in authorized mode.
2. Creating *dummy data* that mirrors the sensitive data and iuploading it to the engine in design mode.
3. Creating and recording an analysis in design mode to create a ``.recording`` *script file*.
4. Submitting an analysis script for approval using the `platform <https://support.rosemanlabs.com/design-and-production>`__.
5. Examining, approving and cryptographically signing an analysis.
6. Downloading an *approved script* from the platform.
7. Loading the approved script into crandas and running the analysis in authorized mode on the sensitive data.


Step 1: Uploading production data
---------------------------------

The standard way to upload data is through `the platform <https://support.rosemanlabs.com/how-to-upload-a-data-source>`__.
Before an analysis can be designed, you must know the structure of the data.
The most straightforward way to know it is if the data has already been uploaded to the platform.
Uploaded tables have a random *handle* through which they are identified.
This handle is necessary to record a script.
Additionally, a *schema* can be downloaded with the structure of the table.

In certain situations, the data has not been uploaded before attempting to design a script.
While the handle will need to be known at the time of the recording, the table schemas can be known before the data is uploaded.
Such is the case of `data requests <https://support.rosemanlabs.com/how-to-create-a-data-request>`__ and `surveys <https://support.rosemanlabs.com/how-to-create-a-survey>`__.

Crandas can also be used to upload tables directly, but this will require an authorized script.
Either the data upload is part of the execution of a script or a specific script can be recorded to approve *any* data upload.
Such script would look like this:

.. code:: python

    cd.script.record()
    table = cd.read_csv("dummy_data.csv")
    # It is important to record the table handle for use
    print(table.handle)
    cd.script.save("upload-dummy-data.recording")


.. _approvals_dummy_for:

Step 2: Exploring with dummy data
---------------------------------

The process of recording a script will require executing all of the functions in it.
In order to do that, there has to be data over which the functions will be executed.
Because these data cannot be the real data that must be kept private, we need to use *dummy data*.
Not any data must do, it must have identical structure (column names and types) to the real data.

.. important::

    The structure of the tables used in the design script must be the same as in the authorized mode (except for the content and number of rows).
    This means that tables must have the same number of columns, with the same types (including nullability), names and order.
    Tables should also be referenced in the same order.

Dummy data can be generated by means of any packages that are available, such as `Faker <https://faker.readthedocs.io/en/master/>`_.
Alternatively, the analyst can create and fill a dummy table directly.
To ensure that the table has the right structure, you must use a table schema.
Such a schema can be extracted from a table (using ``.schema``) or from a data request or survey in the platform.
In-depth information on how to upload and work with dummy data can be found in :ref:`dummy_data`.

Once dummy data is uploaded to a design environment, you can create an analysis interactively.
Design mode places no limitations on the computations that can be done, so you can build the analysis as needed.
After the analysis is finalized, it must be recorded.

Step 3: Recording an analysis
-----------------------------

After having generated an analysis, it must be recorded.
In order to do so, you must isolate the crandas calls in between calls to :func:`.script.record` and :func:`.script.save`.
For information on how to do script recordings, including the limitations and best practices, see :ref:`recording`.

After successfully recording a script, you will have a ``.recording`` script file to then upload to the engine.

To record an analysis, you must call :func:`.script.record` before executing any crandas commands.
Any commands executed afterwards are appended to the recorded script.
After the analysis is complete, you then call :func:`.crandas.script.save`, which saves the recorded analysis as a ``.recording`` script file.
This file can then be submitted for approval.


Step 4: Submitting for approval
-------------------------------

The ``.recording`` script file from the previous step is now ready to be uploaded to the platform for approval.
Follow the steps presented `here <https://support.rosemanlabs.com/how-do-i-create-an-analysis>`__ to do so.

Step 5: Approving a script
---------------------------

The approvers will use to platform to examine the script.
They will approve it if they determine all information produced by the analysis meets the parameters of the collaboration agreement.
To approve, they digitally sign the script using their private key (*approver key*), and authorize the script for execution.

.. important::

    Scripts are always approved to be executed by a *designated analyst*, that holds a secret analyst key.
    Without it, you will not be able to perform the analysis.


Step 6: Downloading the approved script
---------------------------------------

After approval from all the approvers, you can download an ``.approved`` script file from the platform, in the same place where it was uploaded.
This is a digitally signed version of the script file that was submitted in step 4.

Step 7: Executing the analysis
------------------------------

Once the script has been approved and the signed file downloaded, you can perform the analysis.

The script should be modified to connect to the engine in authorized mode by using the appropriate connection file.
Instead of having :func:`.script.record` on top, the analyst inserts :func:`.script.load()` with the path to the ``.approved`` file as input.
Then, at the bottom they replace :meth:`.script.Script.save` by ``cd.script.reset()``.
You should also make sure you load your *analyst key* by inserting the path to your analyst key for the ``analyst_key`` parameter as an input to :func:`.connect`.

.. code:: python

    import crandas as cd
    import crandas.script

    # connect to authorized environment with the analyst key
    # corresponding to the analysis to be executed
    cd.connect('auth-environment', analyst_key='path_to_analyst_key')

    # The following line was changed to load the script
    script = cd.script.load("path_to_approved_file")

    # ...

    # The analysis goes here

    # ...

    # This line was also changed
    script.reset()

Since the analysis is identical to the recorded script, except for the two ``script`` commands, it will match the authorization and execute in authorized mode.

.. admonition:: Using Jupyter
   :class: jupyter

   After the notebook is set up, you can use ``Kernel -> Restart & Run all`` to execute your notebook from top to bottom.
   It is essential that the cells are executed in **exactly the same order** as they were when recording the script.

The details of successfully recording a script can be found in :ref:`the next section <recording>`.