.. _installing: Installation (local python) =========================== This page explains how crandas can be installed locally. Note that crandas can run in either a design or a production environments. More information about these environment and how they relate to each other can be found on the `help center `_. .. warning:: The entire installation section only applies if you are working in a production environment or running crandas locally. In demo environments or Jupyter environments provided by Roseman Labs, crandas is pre-installed -- so skip ahead to :ref:`the next section `. To complete this manual you will need to download and install `Python 3.9 or higher `_. If you are installing on Windows, don't forget to tick the box that says **Add python.exe to PATH** to ensure you can run python from any directory. .. warning:: For step 1, only do 1a or 1b, **not both**. If you are generating your own keys follow 1a, if not 1b. Overview of certificates and files involved ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Before you begin this process, it is useful to understand what each of the files do. - **Server public keys (.pk)**: These keys are used to encrypt input data that is sent to the 3 servers (data gets encrypted once with each key before it gets sent to one of the servers server). The crandas client connects to 1 server and then the input data gets encrypted with the public keys and sent to all servers. - **Certificates:** When the crandas client connects to the Virtual Data Lake server, we use the certificate in order to authenticate the server for the client. This ensures that the client is connecting to the correct server. - **Analyst key:** When an analysis is uploaded to the portal the analysis is signed with their private key (.sk). When executing in production, this key will be needed to ensure that the analyst that uploaded the analysis is the one that actually executes it (verified with the public key). 1a. Generate key pairs using ``2023keygen.py`` script ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Key pairs are needed to securely encrypt any communication with the Virtual Data Lake (such as requesting and approving transactions). In order to generate the key pairs, follow these steps: 1. Save the 2023keygen script as a file named ``2023keygen_analystapprover.py`` in your home directory (can type ``%HOMEPATH%`` into the address bar of the File Explorer window or you can type in ``C:\Users\`` into the address bar and click the one with your username). .. note:: You can also drag and drop the python file into the home directory once you have navigated to it. You can double check the path by pressing ``Alt + D``. 2. Open a command prompt (click ``windows key`` + ``R``, then type ``cmd`` - or you can just search for it) and navigate to your home directory(``%HOMEPATH%`` on windows or ``cd~`` on Linux) where you saved the ``2023keygen_analystapprover.py`` file (if it already says ``C:\Users\{your username}`` then you are already there). 3. Run the following command to install the required nacl library: ``pip install pynacl`` 4. Now, run the script by executing: ``python 2023keygen_analystapprover.py`` 5. The script will generate a folder called ``vdl_certs`` (in the same folder where you saved the keygen file) and the key pairs will be inside. There will be a number of files with the ".sk" and ".pk" extensions. The .sk files are secret keys that should be kept private. Save them in a local folder on your system. The .pk files are public keys. You will be requested to share those keys to set up and operate the VDL. Your admin will request these files to set up the VDL. After completing these steps, you will have successfully generated the key pairs and can proceed to use crandas in your Python scripts. 1b. Download the certificates and storing them in your home folder ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As the certificates are the means of authentication to the production environment, they will be provided out of band. You should have received an e-mail that contains a link to `our secure filesharing system `_. The first step is storing these certificates in a sub-folder in your home directory. **Windows** 1. Download the ZIP from the RL fileshare, and store it in your `Downloads` folder. 2. Extract the ZIP file 3. Open your home folder: a. Press WIN+R, the Windows Run window should open b. Enter :code:`%HOMEDRIVE%%HOMEPATH%` and press enter. c. A new Windows Explorer window should open that shows your Home folder. 4. In this home folder create a new folder called `vdl_certs` 5. Move the contents of the ZIP file you unpacked to the vdl_certs folder you just created. **Linux** 1. Download the ZIP from the RL fileshare, and store it in your `Downloads` folder. 2. Extract the ZIP file 3. Open your home folder in a Terminal window (:code:`cd ~`) 4. In this home folder create a new folder called `vdl_certs` (:code:`mkdir vdl_certs`) 5. Move the contents of the ZIP file you unpacked to the `vdl_certs` folder you just created. 2. Installing crandas ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To be able to use crandas in your Python scripts we are going to install it: ``pip install crandas --index-url=https://pypi.rosemancloud.com``. We refer to the specific index-url because crandas is hosted on the Roseman Labs Pypi rather than [pypi.org](https://pypi.org). **Install a specific version** To install a specific version of crandas we can run: ``pip install crandas== --index-url=https://pypi.rosemancloud.com``. Where you can replace ```` with the version of crandas you wish to install, e.g. ``v1.8.0``. **In a virtual environment on Windows** If we want to install crandas in a virtual environment using `venv` on Windows. 1. Open a Command Prompt window: a. Press WIN+R, the Windows Run window should open b. Enter ``cmd`` and press enter. c. Navigate to the Home directory: ``cd %HOMEPATH%`` (or same as when saving the script above) 2. Create virtual environment by executing: ``python -m venv .crandas`` 3. Activate virtual environment by executing: ``.\.crandas\Scripts\activate.bat`` (you will know it has been activated as it will say ``(.crandas)``) 4. Install crandas: ``pip install crandas --index-url=https://pypi.rosemancloud.com`` **In a virtual environment on Linux** If we want to install crandas in a virtual environment using `venv` on Linux. 1. Open a Terminal window and navigate to the home directory folder: ``cd ~`` 2. Create virtual environment: ``python3 -m venv .crandas`` 3. Activate virtual environment: ``source .crandas/bin/activate``. 4. Install crandas: ``pip install crandas --index-url=https://pypi.rosemancloud.com`` **For use with Jupyter** To install crandas for use with jupyter, use ``pip install crandas[notebook] --index-url=https://pypi.rosemancloud.com``. This also installs dependencies that are needed to let crandas function well from Jupyter, in particular to show the progress bar for long-running operations. **For use with Jupyter in Visual Studio Code** Currently the Jupyter extension for Visual Studio Code does not support some of the additional notebook features in crandas. A workaround is to explicitly downgrade the following dependencies: ``` pip install --force-reinstall -v "ipywidgets == 7.7.2" pip install --force-reinstall -v "jupyterlab_widgets == 1.1.1" ``` For more information refer to https://github.com/microsoft/vscode-jupyter/issues/11014. 3. Setting up the session variables in your script ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We should still be in the virtual environment we created shown by ``(.crandas)``. Now we need to install a development environment such that we can work with crandas more easily, for example: ``pip install notebook`` will install jupyter notebook in the virtual environment (you will know it has been finished as it will say ``successfully installed...``). .. note:: Once you have installed crandas in your virtual environment you can use it with any python editor of your choice. We can start creating our analysis by executing ``jupyter notebook`` (this is an example) and clicking **new** to start a new notebook. Finally we need to tell crandas which VDL endpoint and which certificates to use when running your analysis. An example is included below. .. code:: python #import the crandas package we have installed import crandas as cd #import the session class, to set variables from crandas.base import session #import Pathlib from pathlib import Path #Update to https://**NODE_IP**:**NODE_PORT**/api/v1 #(e.g. https://vdl-1c-cr-node2.rosemancloud.com:32601/api/v1) - #this will be provided by the Roseman Labs admin session.endpoint = '_____' #Set the base path to the folder where we have stored the certificates session.base_path = Path.home()/ 'vdl_certs' #Set the path to analystsign.sk #(check the folder vdl_certs in your home directory to see the id added to the end of the file) session.query_signing_key = Path.home()/ 'vdl_certs/analystsign0.sk' #Set the path to the http cert. used to communicate with VDL node #(this will be provided by the Roseman Labs Admin) session.certificate_path = Path("httpd0.crt") #(for on-premise only) Set the assert hostname to the correct #DNS host name (this will be provided by the Roseman Labs admin) session.assert_hostname = ' ' #Set path to json which contains the signed transactions - #this can be downloaded from the Web Portal (script approval platform) once the cluster is running. session.authorization_file = 'signed-transactions.jsonl' #Set to True if all JSONs that are sent to VDL node need to be printed session.print_json = None After this, we can check which VDL server we will connect to (to double-check we have set it correctly). .. code:: python #Show which VDL server we will connect to print("Virtual Data Lake URL: " + cd.base.session.endpoint) To confirm that the setup has been done correctly, just send any query to the server, for example ``cd.demo_table()``. After this, crandas will be ready for us to use it to make secure computations. .. [1] If you have not received those, please contact Roseman Labs. .. warning:: To activate your virtual environment again: - ``Windows key`` + ``R``, then type ``cmd``. - ``.\.crandas\Scripts\activate.bat``