.. _datetime: Working with dates ###################### Dates can be a very useful datatype to work with. We can use crandas to deal with date data, regardless of the format. In this section we will learn how to upload dates to the VDL and the date specific functions we have that allow us to extract properties from the date data that we have. Uploading Dates ---------------- When uploading dates as strings we have to be more explicit than when uploading integers or strings. In order for crandas to know that the column that you are trying to upload contains dates, you *must* specify the :ref:`ctype<_ctypes_section>`. .. code:: python import crandas as cd import pandas as pd # We generally don't recommend attempting to upload dates in different formats, as it might lead to issues dates = ["19910204", "30/08/23", "2023-09-06", "March 14, 2016"] # We need to add the ctype or it won't work tab = cd.DataFrame({"dates" : dates}, ctype={"dates": "date"}) There are certain data formats that are ambiguous. For example, it is not clear whether ``"01/02/2000"`` refers to the first of February or the second of January. When using ambiguous formats, you can add the following format specification to the ctype: .. code:: python # We need to add the ctype or it won't work tab = cd.DataFrame({"ambiguous_dates" : ["01/02/2000", "02/04/1991", "03/12/1993"]}, ctype={"ambiguous_dates": "date[format=d/m/Y]"}) The accepted options follow the python conventions. The accepted options can be found in `here `_. Currently, only ``-``, ``/``, ``/`` and empty space are accepted as separators (not ``,``). Python has multiple ways to represent dates, including ``datetime.date``, ``pandas.timestamp`` and ``numpy.datetime64``. When uploading an existing ``DataFrame`` with either representation, the resulting :class:`CDataFrames<.CDataFrame>` will have the relevant column as a date. .. code:: python pydates = pd.to_datetime(["19910204", "10/08/23", "2023-09-06", "March 14, 2016", "19010101", "20991231"]) tab = cd.DataFrame({"dates" : pydates}) .. warning :: Crandas assumes time-zone agnostic datetimes are uploaded. When uploading datetime objects, make sure that they share this format or there might be of-by-one errors. Date functions --------------- Given a column of dates, we might be interested on which of them happen in the same year or month. Crandas contains functions that allow the extraction of year, month, day, day of the year and weekday (starting on Monday) from any date. .. code:: python tab = tab.assign(weekday=tab["dates"].weekday()) tab = tab.assign(day_of_year=tab["dates"].day_of_year()) tab = tab.assign(day=tab["dates"].day()) tab = tab.assign(month=tab["dates"].month()) tab = tab.assign(year=tab["dates"].year()) >>> tab.open() dates weekday day_of_year day month year 0 1991-02-04 0 35 4 2 1991 1 2023-08-30 2 242 30 8 2023 2 2023-09-06 2 249 6 9 2023 3 2016-03-14 0 74 14 3 2016 When using this operation, the resulting column is an integer column and no longer has a date format, but acts like any other integer column. .. note:: We also allow the typical pandas notation ``tab["dates"].dt.year`` (also for ``day_of_week``, ``day_of_year``, ``day``, and ``month``) Calculations over dates ------------------------- We can also use crandas to find how many days happened between two dates or finding the date after a determined amount of days. To do this, we can simply add and subtract columns as normal. .. code:: python dates = ["19910204", "30/08/23", "2023-09-06", "March 14, 2016"] tab = cd.DataFrame({"dates" : dates, "days_for_next_app" : [12, 7, 20, 60]}, ctype={"dates": "date"}) tab = tab.assign(next_app=tab["dates"]+tab["days_for_next_app"]) tab = tab.assign(day_diff=tab["next_app"]-tab["dates"]) >>> tab.open() dates days_for_next_app next_app day_diff 0 1991-02-04 12 1991-02-16 12 1 2023-08-30 7 2023-09-06 7 2 2023-09-06 20 2023-09-26 20 3 2016-03-14 60 2016-05-13 60 Of course, some operations are not allowed, like adding two dates together or subtracting a date from an integer. We can also add an integer directly or a python date: >>> (tab["dates"]+30).as_table().open() 0 1991-03-06 1 2023-09-29 2 2023-10-06 3 2016-04-13 >>> (pd.to_datetime("20230906") - tab["dates"]).as_table().open() 0 11902 1 7 2 0 3 2732 .. note :: In crandas, we can only add and subtract *days*. It is not possible to add *months* or *years*. Filtering through dates ------------------------ We can also use crandas to filter over a table, we can choose a specific date range or choose any date after a particular one. Similar to the arithmetic operations, we can treat them as any column type, as long as we provide python date objects. .. code::python dates = ["19910204", "30/08/23", "2023-09-06", "March 14, 2016"] tab = cd.DataFrame({"dates" : dates}, ctype={"dates": "date"}) filter_date = pd.to_datetime("20160314") >>> tab[tab["dates"] >= filter_date] dates 0 2023-08-30 1 2023-09-06 2 2016-03-14 .. warning :: Due to technical limitations, crandas can only handle dates between `1901-01-01` and `2099-12-31`. After learning about dates, we will see what happens when we are working with missing data in the :ref:`next section`.