.. _basic: Basic table operations ############################# This section will cover essential basic functionality in crandas, including inspecting, assigning new columns, computing with columns, boolean logic, and manipulating column names. Inspecting :class:`CDataFrames<.CDataFrame>` ----------------------- While you cannot normally access the data stored in a :class:`.CDataFrame`, you can use the :meth:`.CDataFrame.describe` method to inspect a :class:`.CDataFrame` to get summary statistics of the numeric data. .. code:: python import crandas as cd df = cd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [-1, -1, -1, -1, -1]}) >>> print(df.describe()) A B 0 type integer integer 1 count 5 5 2 mean 3.0 -1.0 3 std 1.581139 0.0 4 min 1 -1 5 max 5 -1 See :ref:`numeric` for details Assigning new columns ----------------------- Assigning new columns to an existing table unfortunately is not as simple as it would be in pandas. Using ``df["new_col"] = [5,4,3,2,1]`` will not work in crandas. To add a column to a :class:`.CDataFrame`, we use the :meth:`.CDataFrame.assign` method. .. code:: python df = cd.DataFrame({"col1":[1,2,3,4,5], "col2": [6,7,8,9,10]}) # Create a new column called 'col3' which is equal to 'col1' - 1 df = df.assign(col3=df["col1"] - 1) # Create a 2 new columns: "col4" which is equal to 'col1' + 1, and "col5" which is equal to "col2" - 1 df = df.assign(col4=lambda x: x.col1 + 1, col5=lambda x: x.col2 - 1) >>> print(df.open()) col1 col2 col3 col4 col5 0 1 6 0 2 5 1 2 7 1 3 6 2 3 8 2 4 7 3 4 9 3 5 8 4 5 10 4 6 9 Computing with columns ---------------------- Given a column, we can check whether its values are equal, greater (or equal) than or less (or equal) than a given value. .. code:: python df = cd.DataFrame({ "id": [1,2,3,4,5], "is_member": [0,1,1,0,1], "height": [175,162,151,160,180] } ) #We check who has id=2 id_is2 = df["id"] == 2 >>> print(id_is2.as_table().open()) 0 0 1 1 2 0 3 0 4 0 .. code:: python #We check which rows hold the information of someone who is 160cm or taller at_least160 = df["height"] >= 160 >>> print(at_least160.as_table().open()) 0 1 1 1 2 0 3 1 4 1 Of course, we probably want to do more than just creating this column. We can also filter based on this value (Find out more about filtering in :ref:`selecting`): .. code:: python df = cd.DataFrame({ "id": [1,2,3,4,5], "is_member": [0,1,1,0,1], "height": [175,162,151,160,180] } ) # Filter the df such that only members are included (using ==) member = df[df["is_member"] == 1] # Print the mean member height print(member['height'].mean()) # For not_member we can change '==' to '!=' not_member = df[df['is_member'] != 1] # Print the mean not_member height print(not_member['height'].mean()) .. note:: You can also use boolean operations over columns, like ``(df['is_member'] > 3) & (df['height'] < 160)``. We support *and* ``&``, *or* ``|`` and *xor* ``^``. Conditionals over columns ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you are able to check whether a column fulfills a certain property, you might want to create a new column with a value based on that. crandas has the :meth:`.CSeries.if_else` method to make this happen. For example, this can allow you to make categorical columns out of numerical ones, by nesting the function. .. code:: python df = cd.DataFrame({ "id": [1,2,3,4,5], "is_member": [0,1,1,0,1], "height": [175,162,151,160,180] } ) # create a column that shows the cms above 160 in the person's height or -1 if they are shorter df = df.assign(cms_above_160=(df["height"] >= 160).if_else(df["height"] - 160, -1)) # creates 3 categories for height: 0 for below 160, 1 for between 160 and 170 and 2 for above 170 df = df.assign(height_cats=(df["height"] >= 170).if_else(2, (df["height"] >= 160).if_else(1,0))) .. note:: Crandas provides easier ways to create categorical columns, as seen in :ref:`categorical`. Manipulating column names ---------------------------- Crandas provides methods to rename, add suffixes, or add prefixes to column names. The :meth:`.CDataFrame.rename` allows you to rename specific columns by providing a dictionary, where the keys are the current column names, and the values are the new column names. .. code:: python df = cd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]}) df = df.rename(columns={"col1": "A", "col2": "B"}) >>> print(list(df.columns)) ['A','B'] If instead you would like to add a prefix or a suffix to all column names in a :class:`.CDataFrame` then you should use either of the following methods: - :meth:`.CDataFrame.add_suffix` - :meth:`.CDataFrame.add_prefix` .. code:: python df = cd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]}) df = df.add_prefix("_1") df = df.add_suffix('_2') >>> print(list(df.columns)) ['_1col1_2','_1col2_2'] .. hint:: Adding prefixes or suffixes to your column names can be useful when :ref:`merging tables` which have columns with the same names. Now that we know how to work with columns, in the next section we will find out how to select specific rows and how to filter based on values found in the data.