.. _data-structures: Data structures in crandas ############################ In crandas, there are number of data structures that you need to know in order to get started. The two fundamental ones that you should be familiar with are :class:`.CSeries` and :class:`CDataFrames<.CDataFrame>`. These are very similar to the ``Series`` and ``DataFrame`` objects that you find in pandas too, so they may be familiar to you already. A :class:`.CDataFrame` represents a table and is made out of :class:`.CSeries` that represent the columns. They work similarly to the pandas ``DataFrame`` and ``Series`` respectively, although they are structurally very different. These classes allow you to interact with the tables that are stored in the Virtual Data Lake, but do not actually contain any data (in fact, they're not even actual tables!). Their values are remotely stored in `secret-shared` form across a number of servers. DataFrames and Series ===================== :class:`.CDataFrame` --------------- A :class:`.CDataFrame` represents a table and allows you to interact with the table stored in the VDL as if it were a python object. For most cases, you can treat it was if it simply were a table. For example, if we have the name of a column, we can use standard bracket notation to access it: .. code:: python import crandas as cd df = cd.DataFrame({'A': [1, 2], 'B': [3, 4]}) colA = df['A'] There are some important differences. As expected, attempting to print a :class:`.CDataFrame` will not print the table, but just some of the metadata that crandas needs to keep track of the table >>> print(df) 7C7621642C15C81A52D6DA9562E3C5F8...[2x2] .. note:: Due to its structure, a :class:`.CDataFrame` does not have an index. :class:`.CSeries` --------------- A :class:`.CSeries` is an object representing a column or a function applied to one or more table columns. Unlike a :class:`.CDataFrame`, it is not as straightforward to manipulate a :class:`.CSeries` directly. crandas processes operations in a lazy way, only communicating with the server whenever an output is needed. The object a :class:`.CSeries` represents can be accessed through :meth:`.CSeries.as_table`. .. code:: python df = cd.DataFrame({'A': [1, 2], 'B': [3, 4]}) series1 = df["A"] series2 = df["A"] + 1 >>> print(series1) >>> print(series2) >>> print(series1.as_table().open()) 0 1 1 2 >>> print(series2.as_table(column_name="A+1").open()) A+1 0 2 1 3 The names :class:`CSeriesFun` and :class:`CSeriesColRef` refer to types of :class:`.CSeries`, the former referring to a function and the latter to a column reference. Columns ^^^^^^^^ When accessing a column of a :class:`.CDataFrame` we receive a :class:`.CSeriesColRef`. This crandas object contains additional information, like the type of the column, as well as the index of that column in that table. These are accessed by :meth:`.CSeriesColRef.ix` and :meth:`.CSeriesColRef.type` respectively. >>> df["B"].ix() # Show the index of column "B". 1 >>> df["B"].type() Col("B", "i", 1) Where the ``i`` in ``Col("B", "i", 1)`` refers to the fact that the column contains integer values. For more information on this, you can look at :ref:`numeric`. In the :ref:`next section`, we will see how to do simple tabular manipulations of our :class:`CDataFrames<.CDataFrame>`.