.. _basic:

Basic table operations
#############################

This section will cover essential basic functionality in crandas, including inspecting, assigning new columns, computing with columns, boolean logic, and manipulating column names.

Inspecting :class:`CDataFrames<.CDataFrame>`
--------------------------------------------

While you cannot normally access the data stored in a :class:`.CDataFrame`, you can use the :meth:`.CDataFrame.describe` method to inspect a :class:`.CDataFrame` to get summary statistics of the numeric data.

.. code:: python

    import crandas as cd

    df = cd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [-1, -1, -1, -1, -1]})

>>> print(df.describe())
                    A        B
    0   type   integer  integer
    1  count         5        5
    2   mean       3.0     -1.0
    3    std  1.581139      0.0
    4    min         1       -1
    5    max         5       -1

See :ref:`numeric` for details

Assigning new columns
-----------------------
Assigning new columns to an existing table unfortunately is not as simple as it would be in pandas.
Using ``df["new_col"] = [5,4,3,2,1]`` will not work in crandas.
To add a column to a :class:`.CDataFrame`, we use the :meth:`.CDataFrame.assign` method.

.. code:: python

    df = cd.DataFrame({"col1":[1,2,3,4,5], "col2": [6,7,8,9,10]})

    # Create a new column called 'col3' which is equal to 'col1' - 1
    df = df.assign(col3=df["col1"] - 1)

    # Create a 2 new columns: "col4" which is equal to 'col1' + 1, and "col5" which is equal to "col2" - 1
    df = df.assign(col4=lambda x: x.col1 + 1, col5=lambda x: x.col2 - 1)

>>> print(df.open())
        col1  col2  col3  col4  col5
    0     1     6     0     2     5
    1     2     7     1     3     6
    2     3     8     2     4     7
    3     4     9     3     5     8
    4     5    10     4     6     9


Computing with columns
----------------------

Given a column, we can check whether its values are equal, greater (or equal) than or less (or equal) than a given value.

.. code:: python

    df = cd.DataFrame({
       "id": [1,2,3,4,5],
        "is_member": [0,1,1,0,1],
        "height": [175,162,151,160,180]
        }
    )

    #We check who has id=2
    id_is2 = df["id"] == 2
>>> print(id_is2.as_table().open())
    0  0
    1  1
    2  0
    3  0
    4  0

.. code:: python

    #We check which rows hold the information of someone who is 160cm or taller
    at_least160 = df["height"] >= 160
>>> print(at_least160.as_table().open())
    0  1
    1  1
    2  0
    3  1
    4  1

Of course, we probably want to do more than just creating this column. We can also filter based on this value (Find out more about filtering in :ref:`selecting`):

.. code:: python

    df = cd.DataFrame({
       "id": [1,2,3,4,5],
        "is_member": [0,1,1,0,1],
        "height": [175,162,151,160,180]
        }
    )

    # Filter the df such that only members are included (using ==)
    member = df[df["is_member"] == 1]

    # Print the mean member height
    print(member['height'].mean())

    # For not_member we can change '==' to '!='
    not_member = df[df['is_member'] != 1]

    # Print the mean not_member height
    print(not_member['height'].mean())

.. note::

    You can also use boolean operations over columns, like ``(df['is_member'] > 3) & (df['height'] < 160)``. We support *and* ``&``, *or* ``|`` and *xor* ``^``.

.. _basic_conditionals:

Conditionals over columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 1.15

    Addition of ``where()``, ``when().then()`` as well as conditionals over tables and support for null values.

Once you are able to check whether a column fulfills a certain property, you might want to create a new column with a value based on that.
Crandas has multiple ways to make this happen.
We can use the :meth:`.CSeries.if_else` method to create simple conditionals.

.. code:: python

    df = cd.DataFrame({
       "id": [1,2,3,4,5],
        "is_member": [0,1,1,0,1],
        "height": [175,162,151,160,180]
        }
    )

    # Create a column that shows the cms above 160 in the person's height or -1 if they are shorter
    df = df.assign(cms_above_160=(df["height"] >= 160).if_else(df["height"] - 160, -1))

We can use :meth:`.CSeries.if_else` to make categorical columns out of numerical ones, by nesting the function:

.. note::
    Crandas provides easier ways to create categorical columns, as seen in :ref:`categorical`.

.. code:: python

    # Creates 3 categories for height: 
    # 0 for below 160, 1 for between 160 and 170 and 2 for above 170
    df = df.assign(
        height_cats=(df["height"] >= 170).if_else(2, (df["height"] >= 160).if_else(1, 0))
    )


We can also use the :meth:`.CSeries.where` method when we want to only keep the value when a condition is met and leave a null value instead.

.. code:: python

    # This command returns a column where only the heights of members are shown
    # Non-member heights are turned to null values
    df["height"].where(df["is_member"]==1)

    # If we add an else value, we can fill that in whenever the condition is not met
    # Here, the height of non-members is set to 175 
    df["height"].where(df["is_member"]==1,175)


If we want to use multiple nested conditions, this syntax will quickly become hard to parse.
Crandas also provides ``when().then()`` syntax similar to polars.
This syntax can be used to more easily represent multiple conditionals and values using the function :func:`when`:

.. code:: python

    df = df.assign(
        is_it_three=cd.when(df["height"] > 175)
        .then("bigger")
        .when(df["height"] < 175)
        .then("smaller")
        .when(df["height"] == 175)
        .then("perfect")
    )


    # The function `cd.when()`` can be used to add multiple conditions
    df = df.assign(
        val=cd.when(
            cd.col("member") == 1,  # cd.col() allows us to refer to the column in the context of the table
            cd.col("height") > 175, # we can add multiple conditions that refer to different columns
        )
        .then("tall member")
        .otherwise(-1)          # .otherwise() gives us the value in the case no condition is met
    )

The :meth:`.CSeries.if_else` function can also be applied to a :class:`.CDataFrame` as a whole.
This is useful in situations where multiple columns are determined under the same condition.
For example, consider a table of food products where several columns need to be set based on the food's type.
Instead of applying :meth:`.CSeries.if_else` on the respective columns, it is also possible make a :class:`.CDataFrame` for each type of food, and select the correct instance using :meth:`.CSeries.if_else`:

.. code:: python

    cdf = cd.DataFrame({"product": ["apple", "crisps", "beer"],
                        "type": ["fruit", "snack", "alcoholic drink"]})
    cdf_fruit = cdf.assign(healthy=1, vat_tariff=0.09)     # values for case 1
    cdf_snack = cdf.assign(healthy=0, vat_tariff=0.09)    # values for case 2
    cdf_drink = cdf.assign(healthy=0, vat_tariff=0.21)    # values for case 3
    # Select cdf_fruit, cdf_snack, or cdf_drink based on cdf["type"]
    (cdf["type"]=="fruit").if_else(cdf_fruit,
        (cdf["type"]=="snack").if_else(cdf_snack, cdf_drink)).open()
    #   product             type  healthy  vat_tariff
    # 0   apple            fruit        1        0.09
    # 1  crisps            snack        0        0.09
    # 2    beer  alcoholic drink        0        0.21

Manipulating column names
-------------------------

Crandas provides methods to rename, add suffixes, or add prefixes to column names.

The :meth:`.CDataFrame.rename` allows you to rename specific columns by providing a dictionary, where the keys are the current column names, and the values are the new column names.

.. code:: python

    df = cd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]})
    df = df.rename(columns={"col1": "A", "col2": "B"})
    >>> print(list(df.columns))
    ['A','B']

If instead you would like to add a prefix or a suffix to all column names in a :class:`.CDataFrame` then you should use either of the following methods:

- :meth:`.CDataFrame.add_suffix`
- :meth:`.CDataFrame.add_prefix`

.. code:: python

    df = cd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]})

    df = df.add_prefix("_1")

    df = df.add_suffix('_2')

    >>> print(list(df.columns))
    ['_1col1_2','_1col2_2']

.. hint::

    Adding prefixes or suffixes to your column names can be useful when :ref:`merging tables<merge>` which have columns with the same names.

Now that we know how to work with columns, in the next section we will find out how to select specific rows and how to filter based on values found in the data.

.. _return-value-modes:
Return value modes
------------------
Crandas has different return value modes. By default, the result of a computation (e.g. an aggregation such as :meth:`.CSeriesColRef.mean()` or :meth:`.CSeriesColRef.sum()`) is directly 
disclosed to the user. However, it is also possible to keep the value secret, and only use it as an intermediate result to compute the final outcome. This can be done using ``mode="regular"``:

.. code:: python

    max1 = table1["col1"].max(mode="regular")
    max2 = table2["col1"].max(mode="regular")

    >>> max1
    <crandas.crandas.ReturnValue at 0x7fa019c94f90>

    # compute the product of the maximum elements in table1 and table2, without opening 
    # either of them
    product_max = max1 * max2
    >>> product_max.open()[0]
    12

    # filter on the elements having the maximum value, without revealing this value
    table1[table1['col1']==table1['col1'].max(mode='regular')]