.. _salesparty2: Joining data - [Party 2] =================================== In this tutorial, we will demonstrate how the second party in a collaboration would upload their data to the engine analysis to be performed by :ref:`Party 1`. Step 1: Start the Roseman Labs Engine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this step, we import the necessary libraries, including crandas and pandas. .. code:: python import crandas as cd import pandas as pd from pathlib import Path Step 2: Input data ~~~~~~~~~~~~~~~~~~ In this step, we define some parameters to limit the amount of data and set a table name as a reference for the other party (same as for Party 1). We also specify the relevant columns that we will use in our analysis. .. code:: python #Set this to limit the amount of data rows_per_dataset = 1000 # Use this name as a reference for the other party nutrition_table_name = 'nutrition' .. code:: python # Select relevant columns relevant_columns = ['ean_code', 'SPP_DESCRIPTION', 'ENERGY_VALUE_IN_KJ', 'ENERGY_VALUE_IN_KCAL', 'SODIUM_IN_MG', 'SATURATED_FATTY_ACIDS_IN_G', 'TOTAL_PROTEIN_IN_G', 'MONO_AND_DISACCHARIDES_IN_G', 'DIETARY_FIBER_IN_G', 'STANDARD_PORTION_SIZE', 'NUMBER_OF_PORTIONS_PER_PACKAGE'] Read the local file and upload it: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now, we read the local CSV file containing the nutrition data using pandas and limit the number of rows based on the rows_per_dataset we set earlier. We then convert the ``ean_code`` column to a string data type. .. code:: python file_path = '../../data/sales_nutrition_data/nutrition_data.csv' # Read the local csv using pandas nutrition_table_data = pd.read_csv(file_path, nrows=rows_per_dataset) # Upload nutrition data to the engine nutrition_table = cd.upload_pandas_dataframe(nutrition_table_data[relevant_columns]) # show metadata for the table (column titles and field types, i=integer, s=string) print("Table meta-data:\n", repr(nutrition_table)) .. parsed-literal:: Reading data... Uploading data... Table meta-data: Handle: 94EDD8E55C616AEB5C734AE212064DA5DFB4D59018BD8DCC8DC44C3F71BF610F Size: 1000 rows x 11 columns CIndex([Col("ean_code", "s", 14), Col("SPP_DESCRIPTION", "s", 14), Col("ENERGY_VALUE_IN_KJ", "i", 1), Col("ENERGY_VALUE_IN_KCAL", "i", 1), Col("SODIUM_IN_MG", "i", 1), Col("SATURATED_FATTY_ACIDS_IN_G", "i", 1), Col("TOTAL_PROTEIN_IN_G", "i", 1), Col("MONO_AND_DISACCHARIDES_IN_G", "i", 1), Col("DIETARY_FIBER_IN_G", "i", 1), Col("STANDARD_PORTION_SIZE", "i", 1), Col("NUMBER_OF_PORTIONS_PER_PACKAGE", "i", 1)]) .. important:: When executing this script, the table handle will be different. Copy it and paste it in the right place in the :ref:`Party 1` script. Now our data is in the database, ready to be used by another party by simply retrieving it by name.