Installation (local python)¶
This page explains how crandas can be installed locally. Note that crandas can run in either a design or a production environments. More information about these environment and how they relate to each other can be found on the help center.
Warning
The entire installation section only applies if you are working in a production environment or running crandas locally. In demo environments or Jupyter environments provided by Roseman Labs, crandas is pre-installed – so skip ahead to the next section.
To complete this manual you will need to download and install Python 3.9 or higher. If you are installing on Windows, don’t forget to tick the box that says Add python.exe to PATH to ensure you can run python from any directory.
Overview of certificates and files involved¶
Before you begin this process, it is useful to understand what each of the files do.
Server public keys (.pk): We have a public key for each of the 3 servers. These keys are used to encrypt input data towards each server. The crandas client connects to one of the servers and sends the encrypted input data. The connected server, that can only decrypt 1 of the 3 encrypted streams, then forwards the encrypted data to the other two servers.
Certificates: When the crandas client connects to the Virtual Data Lake server, we use the certificate in order to authenticate the server for the client. This ensures that the client is connecting to the correct server.
Analyst key: When an analysis is uploaded to the portal the analysis is signed with their secret key (.sk). When executing in production, this key will be needed to ensure that the analyst that uploaded the analysis is the one that actually executes it (verified with the public key).
Connection file: It contains the URL and certificate for one of the servers, and public keys for the 3 servers to encrypt the uploads.
Note
There are two different ways to connect to the Vitual Data Lake: using the connection file or using the certificate and the server public keys. From version 1.9, the connection file is the recommended mode to connect to the VDL.
1. Downloading the files and storing them in your home folder¶
1a. Connection file (v1.9.0 or higher)¶
Log in into the portal and go to Settings >> Account. Click on ‘Download connection file’, and the file <your-environment>.vdlconn
it will be automatically stored in your Downloads
folder.
Windows¶
- Open your home folder:
Press WIN+R, the Windows Run window should open
Enter
%HOMEDRIVE%%HOMEPATH%
and press enter.A new Windows Explorer window should open that shows your Home folder.
In this home folder create a new folder called
.config
(unless it already exists)Go inside your
.config
folder and create a new folder namedcrandas
(unless it already exists)Move the file
<your-environment>.vdlconn
to thecrandas
folder you just created.
Linux¶
Open your home folder in a Terminal window (
cd ~
)Create a new
crandas
folder inside the.config
folder (mkdir -p ~/.config/crandas
). If the.config
folder doesn’t exist then it will be created.Move the file
<your-environment>.vdlconn
to thecrandas
folder you just created (mv ~/Downloads/<your-environment>.vdlconn ~/.config/crandas
)
1b. Certificate and public keys¶
As the certificates are the means of authentication to the production environment, they will be provided out of band. You should have received an e-mail that contains a link to our secure filesharing system. The first step is storing these certificates in a sub-folder in your home directory.
Windows¶
Download the ZIP or TAR file from the RL fileshare, and store it in your
Downloads
folder.Extract the ZIP or TAR file. You can do this with a program such as 7-Zip.
- Open your home folder:
Press WIN+R, the Windows Run window should open
Enter
%HOMEDRIVE%%HOMEPATH%
and press enter.A new Windows Explorer window should open that shows your Home folder.
In this home folder create a new folder called
vdl_certs
Move the contents of the ZIP/TAR file you unpacked to the vdl_certs folder you just created.
Linux¶
Download the ZIP or TAR from the RL fileshare, and store it in your
Downloads
folder.Extract the ZIP or TAR file (go to the folder where the file is located. Right click on the file and then “Extract here” or “Extract to…”).
Open your home folder in a Terminal window (
cd ~
)In this home folder create a new folder called
vdl_certs
(mkdir vdl_certs
)Move the contents of the ZIP file you unpacked to the
vdl_certs
folder you just created.
2. Installing crandas¶
To be able to use crandas in your Python scripts we are going to install it by using: pip install crandas --index-url=https://pypi.rosemancloud.com
.
crandas is hosted on a private Roseman Labs server rather than pypi.org, so it is necessary to explicitly add the server url.
Install a specific version of crandas¶
To install a specific version of crandas we can run: pip install crandas==<version> --index-url=https://pypi.rosemancloud.com
. You can replace <version>
with the version of crandas you wish to install, e.g. v1.8.0
.
Hint
We recommend the use of virtual environments to install crandas and its dependencies, especially for beginner users.
Install crandas in a virtual environment on Windows¶
If we want to install crandas in a virtual environment using venv
on Windows.
- Open a Command Prompt window:
Press WIN+R, the Windows Run window should open
Enter
cmd
and press enter.Navigate to the Home directory:
cd %HOMEPATH%
(or same as when saving the script above)
Create virtual environment by executing:
python -m venv .crandas
Activate virtual environment by executing:
.\.crandas\Scripts\activate.bat
(you will know it has been activated as it will say(.crandas)
)Install crandas:
pip install crandas --index-url=https://pypi.rosemancloud.com
Note
While installing crandas, we might encounter missing dependencies, for example Visual C++ is needed to build pandas. In that case, install the missing dependencies and reboot before attempting to install crandas again.
Install crandas in a virtual environment on Linux¶
If we want to install crandas in a virtual environment using venv
on Linux.
Open a Terminal window and navigate to the home directory folder:
cd ~
Create virtual environment:
python3 -m venv .crandas
Activate virtual environment:
source .crandas/bin/activate
.Install crandas:
pip install crandas --index-url=https://pypi.rosemancloud.com
Note
- On Debian/Ubuntu systems, you need to install the
python3-venv
package using the following command. apt install python3.10-venv
You might need super user (sudo
) privileges to execute this command.
For use with Jupyter notebooks¶
To install crandas for use with jupyter, use pip install crandas[notebook] --index-url=https://pypi.rosemancloud.com
.
This also installs dependencies that are needed to let crandas function well with Jupyter, in particular to show the progress bar for long-running operations.
Note
When using Jupyter notebooks in Visual Studio Code, make sure to have the latest versions of packages, as earlier versions of VS Code did not correctly display the progress bar. More information about which package versions are needed can be found here.
3. Setting up the session variables in your script¶
We should still be in the virtual environment we created shown by (.crandas)
. Now we need to install a development environment such that we can work with crandas more easily, for example: pip install notebook
will install jupyter notebook in the virtual environment (you will know it has been finished as it will say successfully installed...
).
Note
Once you have installed crandas in your virtual environment you can use it with any python editor of your choice.
We can start creating our analysis by executing jupyter notebook
(this is an example) and clicking new to start a new notebook.
3a. Setting up the connection file¶
import crandas as cd
from crandas.base import session
# connect your session to the VDL
session.connect("<your-environment>")
3b. Setting up the path to the certificate and the server public keys¶
Finally we need to tell crandas which VDL endpoint and which certificates to use when running your analysis. An example is included below.
import crandas as cd
from crandas.base import session
from pathlib import Path
# Update to provided endpoint https://**NODE_IP**:**NODE_PORT**/api/v1
# (e.g. https://vdl-1c-cr-node2.rosemancloud.com:32601/api/v1)
session.endpoint = '_____'
# Set the base path to the folder where we have stored the certificates
session.base_path = Path('./vdl_certs')
# connect your session to the VDL
session.connect()
After this, we can check which VDL server we will connect to (to double-check we have set it correctly).
#Show which VDL server we will connect to
print("Virtual Data Lake URL: " + cd.base.session.endpoint)
To confirm that the setup has been done correctly, just send any query to the server, for example cd.demo_table()
.
After this, crandas will be ready for us to use it to make secure computations.
Note
To activate your virtual environment again:
- For Windows users:
Windows key
+R
, then typecmd
..\.crandas\Scripts\activate.bat
- For Linux users:
Ctrl
+Alt
+T
.source .crandas/bin/activate