Data structures in crandas
In crandas, there are number of data structures that you need to know in
order to get started. The two fundamental ones that you should be
familiar with are CSeries and
DataFrames. These are very
similar to the Series and DataFrame objects that you find in pandas
too, so they may be familiar to you already.
A DataFrame represents a table and is
made out of CSeries that represent
the columns. They work similarly to the pandas DataFrame and Series
respectively, although they are structurally very different. These
classes allow you to interact with the tables that are stored in the
engine, but do not actually contain any data (in fact, they're not even
actual tables!). Their values are remotely stored in
secret-sharedform across a number of servers.
DataFrame
A DataFrame represents a table and
allows you to interact with the table stored in the engine as if it were
a python object. For most cases, you can treat it was if it simply were
a table. For example, if we have the name of a column, we can use
standard bracket notation to access it:
There are some important differences. As expected, attempting to print a
DataFrame will not print the table,
but just some of the metadata that crandas needs to keep track of the
table
Privacy
DataFrames do not have an index
to prevent information leakage.
CSeries
A CSeries is an object representing a
column or a function applied to one or more table columns. Crandas
processes operations in a lazy way, only communicating with the server
whenever an output is needed. Some operations like a filter do only work
on a DataFrame and not on a
CSeries For these cases, a
CSeries can be converted to a
DataFrame using
CSeries.as_table().
>>> print(series1)
<crandas.crandas.CSeriesColRef object at 0x7fa2cc1dec80>
>>> print(series2)
<crandas.crandas.CSeriesFun object at 0x7fa2aa4505e0>
>>> print(series1.open())
0 1
1 2
# A column name can be added when using as_table()
>>> print(series2.as_table(column_name="A+1").open())
A+1
0 2
1 3
The names CSeriesFun and
CSeriesColRef refer to types of
CSeries the former referring to a
function and the latter to a column reference.
Columns
When accessing a column of a DataFrame
we receive a CSeriesColRef.
This crandas object contains additional information, like
the type of the column, as well as the index of that column in that
table. These are accessed by .ix()
and .type()
respectively.
Where the i in Col("B", "i", 1) refers to the fact that the column
contains integer values. For more information on this, you can look at
this section.
In the next section, we will see
how to do simple tabular manipulations of our
DataFrames.