Working with dates¶
Dates can be a very useful datatype to work with. We can use crandas to deal with date data in a way where dates that were uploaded with different formats can still be correctly compared. In this section we will learn how to upload dates to the engine and the date specific functions we have that allow us to extract properties from the date data that we have.
Uploading Dates¶
When uploading dates as strings we have to be more explicit than when uploading integers or strings. In order for crandas to know that the column that you are trying to upload contains dates, you must specify the ctype.
import crandas as cd
import pandas as pd
# The dates are in ISO 8601 format, but formats accepted by strptime are accepted
dates = ["19910204", "20230830", "20230906", "20160314"]
# We need to add the ctype or it won't work
tab = cd.DataFrame({"dates" : dates}, ctype={"dates": "date"})
There are certain data formats that are ambiguous. For example, it is not clear whether "01/02/2000"
refers to the first of February or the second of January.
When using ambiguous formats, you can add the following format specification to the ctype:
# We need to add the ctype or it won't work
tab = cd.DataFrame({"ambiguous_dates" : ["01/02/2000", "02/04/1991", "03/12/1993"]}, ctype={"ambiguous_dates": "date[format=%d/%m/%Y]"})
The accepted options can be found in here.
Currently, only -
, /
, /
and empty space are accepted as separators (not ,
).
Python has multiple ways to represent dates, including datetime.date
, pandas.timestamp
and numpy.datetime64
.
When uploading an existing DataFrame
with either representation, the resulting CDataFrames
will have the relevant column as a date.
pydates = pd.to_datetime(["19910204", "10/08/23", "2023-09-06", "March 14, 2016", "19010101", "20991231"], format="mixed")
tab = cd.DataFrame({"dates" : pydates})
Warning
Crandas assumes time-zone agnostic datetimes are uploaded. When uploading datetime objects, make sure that they share this format or there might be off-by-one errors.
Date functions¶
Given a column of dates, we might be interested on which of them happen in the same year or month. Crandas contains functions that allow the extraction of year, month, day, day of the year and weekday (starting on Monday) from any date.
tab = tab.assign(weekday=tab["dates"].weekday())
tab = tab.assign(day_of_year=tab["dates"].day_of_year())
tab = tab.assign(day=tab["dates"].day())
tab = tab.assign(month=tab["dates"].month())
tab = tab.assign(year=tab["dates"].year())
>>> tab.open()
dates weekday day_of_year day month year
0 1991-02-04 0 35 4 2 1991
1 2023-08-30 2 242 30 8 2023
2 2023-09-06 2 249 6 9 2023
3 2016-03-14 0 74 14 3 2016
When using this operation, the resulting column is an integer column and no longer has a date format, but acts like any other integer column.
Note
We also allow the typical pandas notation tab["dates"].dt.year
(also for day_of_week
, day_of_year
, day
, and month
)
Calculations over dates¶
We can also use crandas to find how many days happened between two dates or finding the date after a determined amount of days. To do this, we can simply add and subtract columns as normal.
dates = ["19910204", "30/08/23", "2023-09-06", "March 14, 2016"]
tab = cd.DataFrame({"dates" : dates, "days_for_next_app" : [12, 7, 20, 60]}, ctype={"dates": "date"})
tab = tab.assign(next_app=tab["dates"]+tab["days_for_next_app"])
tab = tab.assign(day_diff=tab["next_app"]-tab["dates"])
>>> tab.open()
dates days_for_next_app next_app day_diff
0 1991-02-04 12 1991-02-16 12
1 2023-08-30 7 2023-09-06 7
2 2023-09-06 20 2023-09-26 20
3 2016-03-14 60 2016-05-13 60
Of course, some operations are not allowed, like adding two dates together or subtracting a date from an integer.
We can also add an integer directly or a python date:
>>> (tab["dates"]+30).as_table().open()
0 1991-03-06
1 2023-09-29
2 2023-10-06
3 2016-04-13
>>> (pd.to_datetime("20230906") - tab["dates"]).as_table().open()
0 11902
1 7
2 0
3 2732
Note
In crandas, we can only add and subtract days. It is not possible to add months or years.
Filtering through dates¶
We can also use crandas to filter over a table, we can choose a specific date range or choose any date after a particular one. Similar to the arithmetic operations, we can treat them as any column type, as long as we provide python date objects.
>>> tab[tab["dates"] >= filter_date]
dates
0 2023-08-30
1 2023-09-06
2 2016-03-14
Warning
Due to technical limitations, crandas can only handle dates between 1901-01-01 and 2099-12-31.
After learning about dates, we will see what happens when we are working with missing data in the next section.