Working with dates

Dates can be a very useful datatype to work with. We can use crandas to deal with date data in a way where dates that were uploaded with different formats can still be correctly compared. In this section we will learn how to upload dates to the engine and the date specific functions we have that allow us to extract properties from the date data that we have.

Uploading Dates

When uploading dates as strings we have to be more explicit than when uploading integers or strings. In order for crandas to know that the column that you are trying to upload contains dates, you must specify the ctype.

import crandas as cd
import pandas as pd

# The dates are in ISO 8601 format, but formats accepted by strptime are accepted
dates = ["19910204", "20230830", "20230906", "20160314"]
# We need to add the ctype or it won't work
tab =  cd.DataFrame({"dates" : dates}, ctype={"dates": "date"})

There are certain data formats that are ambiguous. For example, it is not clear whether "01/02/2000" refers to the first of February or the second of January. When using ambiguous formats, you can add the following format specification to the ctype:

# We need to add the ctype or it won't work
tab =  cd.DataFrame({"ambiguous_dates" : ["01/02/2000", "02/04/1991", "03/12/1993"]}, ctype={"ambiguous_dates": "date[format=%d/%m/%Y]"})

The accepted options can be found in here. Currently, only -, /, / and empty space are accepted as separators (not ,).

Python has multiple ways to represent dates, including datetime.date, pandas.timestamp and numpy.datetime64. When uploading an existing DataFrame with either representation, the resulting CDataFrames will have the relevant column as a date.

pydates = pd.to_datetime(["19910204", "10/08/23", "2023-09-06", "March 14, 2016", "19010101", "20991231"], format="mixed")
tab = cd.DataFrame({"dates" : pydates})

Warning

Crandas assumes time-zone agnostic datetimes are uploaded. When uploading datetime objects, make sure that they share this format or there might be off-by-one errors.

Date functions

Given a column of dates, we might be interested on which of them happen in the same year or month. Crandas contains functions that allow the extraction of year, month, day, day of the year and weekday (starting on Monday) from any date.

tab = tab.assign(weekday=tab["dates"].weekday())
tab = tab.assign(day_of_year=tab["dates"].day_of_year())
tab = tab.assign(day=tab["dates"].day())
tab = tab.assign(month=tab["dates"].month())
tab = tab.assign(year=tab["dates"].year())
>>> tab.open()
       dates  weekday  day_of_year  day  month  year
0 1991-02-04        0           35    4      2  1991
1 2023-08-30        2          242   30      8  2023
2 2023-09-06        2          249    6      9  2023
3 2016-03-14        0           74   14      3  2016

When using this operation, the resulting column is an integer column and no longer has a date format, but acts like any other integer column.

Note

We also allow the typical pandas notation tab["dates"].dt.year (also for day_of_week, day_of_year, day, and month)

Calculations over dates

We can also use crandas to find how many days happened between two dates or finding the date after a determined amount of days. To do this, we can simply add and subtract columns as normal.

dates = ["19910204", "30/08/23", "2023-09-06", "March 14, 2016"]
tab =  cd.DataFrame({"dates" : dates, "days_for_next_app" : [12, 7, 20, 60]}, ctype={"dates": "date"})

tab = tab.assign(next_app=tab["dates"]+tab["days_for_next_app"])
tab = tab.assign(day_diff=tab["next_app"]-tab["dates"])
>>> tab.open()
dates  days_for_next_app   next_app  day_diff
0 1991-02-04                 12 1991-02-16        12
1 2023-08-30                  7 2023-09-06         7
2 2023-09-06                 20 2023-09-26        20
3 2016-03-14                 60 2016-05-13        60

Of course, some operations are not allowed, like adding two dates together or subtracting a date from an integer.

We can also add an integer directly or a python date:

>>> (tab["dates"]+30).as_table().open()
0 1991-03-06
1 2023-09-29
2 2023-10-06
3 2016-04-13
>>> (pd.to_datetime("20230906") - tab["dates"]).as_table().open()
0  11902
1      7
2      0
3   2732

Note

In crandas, we can only add and subtract days. It is not possible to add months or years.

Filtering through dates

We can also use crandas to filter over a table, we can choose a specific date range or choose any date after a particular one. Similar to the arithmetic operations, we can treat them as any column type, as long as we provide python date objects.

>>> tab[tab["dates"] >= filter_date]
       dates
0 2023-08-30
1 2023-09-06
2 2016-03-14

Warning

Due to technical limitations, crandas can only handle dates between 1901-01-01 and 2099-12-31.

After learning about dates, we will see what happens when we are working with missing data in the next section.