Installation (local python)¶
This page explains how crandas can be installed locally. Note that crandas can run in either a design or a production environments. More information about these environment and how they relate to each other can be found on the help center.
Warning
The entire installation section only applies if you are working in a production environment or running crandas locally. In demo environments or Jupyter environments provided by Roseman Labs, crandas is pre-installed – so skip ahead to the next section.
To complete this manual you will need to download and install Python 3.9 or higher. If you are installing on Windows, don’t forget to tick the box that says Add python.exe to PATH to ensure you can run python from any directory.
Warning
For step 1, only do 1a or 1b, not both. If you are generating your own keys follow 1a, if not 1b.
Overview of certificates and files involved¶
Before you begin this process, it is useful to understand what each of the files do.
Server public keys (.pk): These keys are used to encrypt input data that is sent to the 3 servers (data gets encrypted once with each key before it gets sent to one of the servers server). The crandas client connects to 1 server and then the input data gets encrypted with the public keys and sent to all servers.
Certificates: When the crandas client connects to the Virtual Data Lake server, we use the certificate in order to authenticate the server for the client. This ensures that the client is connecting to the correct server.
Analyst key: When an analysis is uploaded to the portal the analysis is signed with their private key (.sk). When executing in production, this key will be needed to ensure that the analyst that uploaded the analysis is the one that actually executes it (verified with the public key).
1a. Generate key pairs using 2023keygen.py
script¶
Key pairs are needed to securely encrypt any communication with the Virtual Data Lake (such as requesting and approving transactions).
In order to generate the key pairs, follow these steps:
Save the 2023keygen script as a file named
2023keygen_analystapprover.py
in your home directory (can type%HOMEPATH%
into the address bar of the File Explorer window or you can type inC:\Users\
into the address bar and click the one with your username).
Note
You can also drag and drop the python file into the home directory once you have navigated to it. You can double check the path by pressing Alt + D
.
Open a command prompt (click
windows key
+R
, then typecmd
- or you can just search for it) and navigate to your home directory(%HOMEPATH%
on windows orcd~
on Linux) where you saved the2023keygen_analystapprover.py
file (if it already saysC:\Users\{your username}
then you are already there).Run the following command to install the required nacl library:
pip install pynacl
Now, run the script by executing:
python 2023keygen_analystapprover.py
The script will generate a folder called
vdl_certs
(in the same folder where you saved the keygen file) and the key pairs will be inside.
There will be a number of files with the “.sk” and “.pk” extensions.
The .sk files are secret keys that should be kept private. Save them in a local folder on your system.
The .pk files are public keys. You will be requested to share those keys to set up and operate the VDL. Your admin will request these files to set up the VDL.
After completing these steps, you will have successfully generated the key pairs and can proceed to use crandas in your Python scripts.
1b. Download the certificates and storing them in your home folder¶
As the certificates are the means of authentication to the production environment, they will be provided out of band. You should have received an e-mail that contains a link to our secure filesharing system. The first step is storing these certificates in a sub-folder in your home directory.
Windows
Download the ZIP from the RL fileshare, and store it in your Downloads folder.
Extract the ZIP file
- Open your home folder:
Press WIN+R, the Windows Run window should open
Enter
%HOMEDRIVE%%HOMEPATH%
and press enter.A new Windows Explorer window should open that shows your Home folder.
In this home folder create a new folder called vdl_certs
Move the contents of the ZIP file you unpacked to the vdl_certs folder you just created.
Linux
Download the ZIP from the RL fileshare, and store it in your Downloads folder.
Extract the ZIP file
Open your home folder in a Terminal window (
cd ~
)In this home folder create a new folder called vdl_certs (
mkdir vdl_certs
)Move the contents of the ZIP file you unpacked to the vdl_certs folder you just created.
2. Installing crandas¶
To be able to use crandas in your Python scripts we are going to install it: pip install crandas --index-url=https://pypi.rosemancloud.com
. We refer to the specific index-url because crandas is hosted on the Roseman Labs Pypi rather than [pypi.org](https://pypi.org).
Install a specific version
To install a specific version of crandas we can run: pip install crandas==<version> --index-url=https://pypi.rosemancloud.com
. Where you can replace <version>
with the version of crandas you wish to install, e.g. v1.8.0
.
In a virtual environment on Windows
If we want to install crandas in a virtual environment using venv on Windows.
- Open a Command Prompt window:
Press WIN+R, the Windows Run window should open
Enter
cmd
and press enter.Navigate to the Home directory:
cd %HOMEPATH%
(or same as when saving the script above)
Create virtual environment by executing:
python -m venv .crandas
Activate virtual environment by executing:
.\.crandas\Scripts\activate.bat
(you will know it has been activated as it will say(.crandas)
)Install crandas:
pip install crandas --index-url=https://pypi.rosemancloud.com
In a virtual environment on Linux
If we want to install crandas in a virtual environment using venv on Linux.
Open a Terminal window and navigate to the home directory folder:
cd ~
Create virtual environment:
python3 -m venv .crandas
Activate virtual environment:
source .crandas/bin/activate
.Install crandas:
pip install crandas --index-url=https://pypi.rosemancloud.com
For use with Jupyter
To install crandas for use with jupyter, use pip install crandas[notebook] --index-url=https://pypi.rosemancloud.com
.
This also installs dependencies that are needed to let crandas function well from Jupyter, in particular to show the progress bar for long-running operations.
For use with Jupyter in Visual Studio Code
Currently the Jupyter extension for Visual Studio Code does not support some of the additional notebook features in crandas. A workaround is to explicitly downgrade the following dependencies:
`
pip install --force-reinstall -v "ipywidgets == 7.7.2"
pip install --force-reinstall -v "jupyterlab_widgets == 1.1.1"
`
For more information refer to https://github.com/microsoft/vscode-jupyter/issues/11014.
3. Setting up the session variables in your script¶
We should still be in the virtual environment we created shown by (.crandas)
. Now we need to install a development environment such that we can work with crandas more easily, for example: pip install notebook
will install jupyter notebook in the virtual environment (you will know it has been finished as it will say successfully installed...
).
Note
Once you have installed crandas in your virtual environment you can use it with any python editor of your choice.
We can start creating our analysis by executing jupyter notebook
(this is an example) and clicking new to start a new notebook.
Finally we need to tell crandas which VDL endpoint and which certificates to use when running your analysis. An example is included below.
#import the crandas package we have installed
import crandas as cd
#import the session class, to set variables
from crandas.base import session
#import Pathlib
from pathlib import Path
#Update to https://**NODE_IP**:**NODE_PORT**/api/v1
#(e.g. https://vdl-1c-cr-node2.rosemancloud.com:32601/api/v1) -
#this will be provided by the Roseman Labs admin
session.endpoint = '_____'
#Set the base path to the folder where we have stored the certificates
session.base_path = Path.home()/ 'vdl_certs'
#Set the path to analystsign.sk
#(check the folder vdl_certs in your home directory to see the id added to the end of the file)
session.query_signing_key = Path.home()/ 'vdl_certs/analystsign0.sk'
#Set the path to the http cert. used to communicate with VDL node
#(this will be provided by the Roseman Labs Admin)
session.certificate_path = Path("httpd0.crt")
#(for on-premise only) Set the assert hostname to the correct
#DNS host name (this will be provided by the Roseman Labs admin)
session.assert_hostname = ' '
#Set path to json which contains the signed transactions -
#this can be downloaded from the Web Portal (script approval platform) once the cluster is running.
session.authorization_file = 'signed-transactions.jsonl'
#Set to True if all JSONs that are sent to VDL node need to be printed
session.print_json = None
After this, we can check which VDL server we will connect to (to double-check we have set it correctly).
#Show which VDL server we will connect to
print("Virtual Data Lake URL: " + cd.base.session.endpoint)
To confirm that the setup has been done correctly, just send any query to the server, for example cd.demo_table()
.
After this, crandas will be ready for us to use it to make secure computations.
Warning
To activate your virtual environment again:
- Windows key
+ R
, then type cmd
.
- .\.crandas\Scripts\activate.bat