Changelog
=========

.. _engine-and-crandas-release-1.17.2:

1.17.2
------

Crandas
~~~~~~~

New features and enhancements:

-  Add support for Python 3.14.

.. _engine-and-crandas-release-1.17.1:

1.17.1
------

.. _crandas-1:

Crandas
~~~~~~~

Changes:

-  Provide progress updates to the user while crandas is connecting to
   the server.

.. _engine-and-crandas-release-1.17.0:

1.17.0
------

.. _crandas-2:

Crandas
~~~~~~~

Changes:

-  **BREAKING**: Enforce that scripts are executed in linear order in
   authorized mode. This functionality can be disabled (e.g. for
   backward compatibility) by passing ``linear=False`` to
   ``cd.script.record()``.
-  **BREAKING**: Enforce that approved scripts in authorized mode are
   not allowed to continue after an error. This check does not apply for
   stepless queries; when linear mode is disabled; and when
   ``cd.script.allow_errors`` is used.
-  **BREAKING APPROVAL**: Fix bug where ``train_test_split`` could not
   be used in script recordings if no ``random_state`` was given or if
   ``train_size``/``test_size`` were provided as floats. This fix breaks
   all existing approvals of scripts that use ``train_test_split``. To
   use such existing approvals, use
   ``cd.compat.model_selection.train_test_split`` instead. Note that the
   specific rows that are selected as training/test data are also
   different from the previous version of the function.
-  Add ``cd.script.end()`` which saves a script (if it is being
   recorded) or closes it (if it is being executed). The old
   ``cd.script.save()`` and ``cd.script.close()`` functions are now
   deprecated.
-  Add ``path`` parameter to ``cd.script.record()`` to specify the
   location of the ``.recording`` file at the start of the recording. If
   the script filename is specified here, it should not be at the end of
   the recording.

New features and enhancements:

-  Add support for ``col.any()`` and ``col.all()`` on nullable columns.
-  Add option to slice a dataset based on a fraction, e.g.,
   ``cd.demo_table(10, 10).slice(slice(0.1), allow_fractions=True)``.

Fixes:

-  Fix bug where ``train_test_split`` would not work correctly if
   ``train_size`` and ``test_size`` were both provided and did not add
   up to the full dataset.
-  Fix bug where the ``shuffle`` option ``train_test_split`` did not
   work.
-  Fix bug where R2-score in linear regression was computed incorrectly
   due to wrong scaling.
-  Fix bug where consistency of placeholders between multiple
   invocations of ``astype`` and ``concat`` would not be enforced.

.. _engine-and-crandas-release-1.16.1:

1.16.1
------

.. _crandas-3:

Crandas
~~~~~~~

Changes:

-  Fix bug where, under high load conditions, the engine nodes may get
   out of sync and stop accepting new incoming queries.
-  Throw error when attempting to aggregate over a column that is used
   in to group-by.

.. _engine-and-crandas-release-1.16.0:

1.16.0
------

This version of crandas will include the user’s Python source code when
recording a script, so that the user will no longer need to separately
upload their analysis source code to the platform. Another notable
feature is added support for standard errors in linear regression.

.. _crandas-4:

Crandas
~~~~~~~

Changes:

-  **Changed behavior:** The Python source code is now included by
   default when creating a ``.recording`` file. The default of
   ``config.settings.include_python_script`` is changed to ``True``.
-  Combine functions ``DataFrame()``, ``LinearRegression()``, … and
   classes ``CDataFrame``, ``CLinearRegression``, … into one. Instead of
   ``DataFrame(...)`` returning an object of type ``CDataFrame``, it now
   returns an object of type ``DataFrame``, etcetera. The original names
   ``CDataFrame``, ``CLinearRegression``, … are kept as aliases for
   ``DataFrame``, ``LinearRegression``, …
-  Deprecate the ``name`` and ``dummy_for`` query arguments. To assign a
   name or dummy data mapping to an object ``obj``, please use
   ``obj.save(name=...)`` or ``obj.save(dummy_for=...)`` instead of
   directly supplying the name or dummy_for argument to the function
   that constructs ``obj``. For example, when uploading a table, use
   ``cd.upload_pandas_dataframe(df).save(name=...)`` instead of
   ``cd.upload_pandas_dataframe(df, name=...)``. This is to circumvent
   the problem that, when using the deprecated form, the name is
   overwritten before the upload succeeds.
-  **BREAKING** Fix a bug where ``CSeries.isin`` would exceed the
   recursion limit when called with collections containing about 1000
   items or more. This fix breaks existing authorizations.

New features and enhancements:

-  Add support for ``standard_error`` in linear regression for each
   estimated coefficient. When opening a linear regression model now
   also the standard errors for each estimated coefficient are provided:

.. code:: python

     from crandas.crlearn.linear_model import LinearRegression
     model = LinearRegression()
     model = model.fit(cd.demo_table(10,1), cd.demo_table(10))
     model.open()["standard_error_"]

-  Add support for ``tab[name] = value`` syntax. For example,
   ``tab["name"] = tab["name"].lower()`` is now possible.

-  Add type hints to return types, making autocomplete work in most
   editors.

-  Add the ``ascending`` argument to ``CDataFrame.sort_values()``, to
   allow sorting on a column in descending order:

   .. code:: python

      table = cd.DataFrame({"a": [3, 1, 4, 5, 2], "b": [1, 2, 3, 4, 5]}, auto_bounds=True)
      # Sorts by column `a` in descending order
      sorted_table = table.sort_values("a", ascending=False)

-  Add support for ``col.min()`` and ``col.max()`` on nullable columns.

-  Add support for ``col.std()``.

-  Add support for boolean columns (previously, these were stored as
   integer columns).

-  Add support for ``cdf.set_axis`` to change column names.

-  Let ``.save()`` function of ``DataFrame``, ``LogisticRegression``,
   etc. return the object itself, allowing syntax like
   ``cd.DataFrame(...).save("name").open()``

-  When fitting a linear regression model ``model`` it is now detected
   whether an all-zero feature column has been provided for fitting
   resulting in singularity during the operation, that could cause
   significant numerical instability leading to unreliable results. This
   is reflected in the boolean ``model.singular_``.

-  Validate string to int conversion. When the input contains invalid
   inputs, an error is given.

-  Provide clearer error messages when using the result of a cancelled
   upload/computation; and when applying an operation to an object in a
   faulty state.

Fixes:

-  Fix engine crash in exceptional circumstances (e.g. triggered by
   error handling in complex transaction queries).
-  Fix bug that prevented ``tab.astype()`` and ``tab.fillna()`` from
   working when the table contained a column with the name ``"name"``.
-  Fix bug where parsing/out of memory errors would sometimes be
   reported by the engine as an “Internal server error”.
-  Fix bug where some validated conversions using ``.astype()`` could
   lead to an “Internal server error”.

.. _engine-release-1.15.3:

1.15.3
------

.. _crandas-release-1.15.2:

1.15.2
------

This is a crandas-only bugfix release.

.. _crandas-5:

Crandas
~~~~~~~

-  Fix bug where, when connected to the engine in authorized mode,
   script recordings would performed in design mode. Performing a script
   recording when connected in authorized mode is no longer possible
   (unless performing a dry run).

.. _engine-and-crandas-release-1.15.1:

1.15.1
------

This is a crandas-only bugfix release.

.. _crandas-6:

Crandas
~~~~~~~

-  Fix bug ``ModuleNotFoundError: No module named 'IPython'`` by making
   the import optional.

.. _engine-and-crandas-release-1.15.0:

1.15.0
------

This release contains performance improvements, various usability
improvements, and some improvements to the script recording. For the
script recording, crandas now automatically records the Python commands
that are used in the analyses, as well as the outputs on dummy data.

See the full changelog below.

.. _crandas-7:

Crandas
~~~~~~~

-  Add support for ``subset`` parameter in ``dropna``.
-  Add support for placeholders in (linear regression, random forest)
   model parameters.
-  Add support for ``series.isin``.
-  Add support for ``.where``, ``.mask`` as an alternative to
   ``.if_else``.
-  Add support for encoding Unicode strings as bytes using UTF-8.
-  Add support for ``ddof`` parameter in ``col.var()``.
-  Add support for Polars-style ``.when().then()`` conditionals.
-  Add Pearson correlation (matrix) for CSeries and CDataFrames.
-  Add ``MinMaxScaler`` that replaces ``min_max_normalize`` and that
   can, after fitting, be applied multiple times.
-  Improved support for NULL values in ``if_else``, including using
   None/pd.NA as if/else clause.
-  Extend the range of supported values for ``col.sqrt()``.
-  When making a recording, crandas can now automatically include the
   Python script used for the analysis in the ``.recording`` file. This
   is supported when using Jupyter Notebook or when directly executing a
   Python file. A future version of the platform will include this
   Python script in the approval flow. By default, all code, including
   comments, between ``script.record()`` and ``script.save()`` is
   included. To enable this functionality, use
   ``script.record(include_python_script=True)`` or
   ``config.settings.include_python_script = True``. To include the full
   Python script, additionally use
   ``script.record(include_full_script=True)`` or
   ``config.settings.include_full_script = True``.
-  When making a recording, crandas can now automatically include the
   outputs revealed by the recorded script on the design data in the
   ``.recording`` file. Similar outputs are revealed when the approved
   script is executed on production data. A future version of the
   platform will include these outputs in the approval flow. To enable
   this functionality, use ``script.record(include_outputs=True)`` or
   ``config.settings.include_outputs = True``.
-  Allow to apply ``if_else`` to DataFrames, e.g.,
   ``cdf["a"].if_else(cdf, cdf2)``.
-  Major performance improvements to linear regression. Its performance
   is improved by around a factor 25.
-  Only show relevant digits for fixed-point bounds.
-  Improve performance of ``sort_values`` and ``groupby`` by around a
   factor 2.
-  **BREAKING** When a placeholder is used at multiple places in a
   script, it is now ensured that the same value is used multiple times.
-  Provide new API for logistic regression models:

   -  Logistic regressions now adhere to the ``CModel`` API, allowing
      among other things to upload/download fitted models, change
      parameters, re-fit, and use placeholders.
   -  **BREAKING** The output column names of ``.predict()``,
      ``.predict_proba()`` have changed for consistency with other
      types.
   -  Some functions and arguments have been moved or deprecated; see
      the deprecation warnings that are given.
   -  The original API remains accessible via
      ``crandas.compat.logistic_regression``.
   -  Make progress reporting work for all type
      (binomial/ordinal/multionomial) + optimizer (LBFGS/…) combinations

-  Let a function on a value result in a value instead of a series.
-  Fix bug in ``tab.shuffle()`` that dropped the nullability information
   when an explicit ``random_state`` was given.
-  Fix bug where in some cases the fixed-point value exactly equal to
   the provided maximum bound could not be uploaded.
-  Fix bug in ``astype`` when converting fixedpoint columns with
   non-default precision to a fixedpoint column type without specifying
   the precision. Now specifying a precision is mandatory in these
   cases.
-  The fractional column ctype is removed. When computing a ``mean`` or
   ``var`` on a column, a ``fp`` return value type is returned when
   ``mode=regular``. This implies that from now on also secure
   computations can be performed on the secret return value. For example
   the following is supported:

.. code:: python

     df = cd.DataFrame({"x": [1,2,3,4,5]})
     y = df["x"].var(mode='regular').sqrt()
     y.open()

-  Fix bug where crandas could not detect the type of a column with
   NULLs and non-unique index values.

.. _engine-and-crandas-release-1.14.3:

1.14.3
------

This release adds a recovery mechanism to the Engine from some faulty
on-disk persistence state, that could arise from large table uploads in
version <= 1.14.1

.. _engine-and-crandas-release-1.14.2:

1.14.2
------

.. _crandas-8:

Crandas
~~~~~~~

-  Add progress reporting for uploads and downloads.
-  Fix an issue with uploading a table. Now, instead of using a single
   POST request, use several requests lasting up to
   ``settings.http_write_timeout`` seconds (default: 30). This mitigates
   the problem that proxy servers (including the one used by Roseman
   Labs) commonly set a timeout of 60 seconds on such requests causing
   the connection to be killed.
-  Fix performance issue when uploading a large date column.

.. _engine-and-crandas-release-1.14.1:

1.14.1
------

.. _engine-and-crandas-release-1.14.0:

1.14.0
------

.. _crandas-9:

Crandas
~~~~~~~

New features
^^^^^^^^^^^^

-  Major performance improvements to fixed point division. Its
   performance is improved by around a factor 10.
-  Add support for random forests (see
   ``crandas.crlearn.ensemble.RandomForestClassifier``).
-  Add conversion from non nullable column types to nullable column
   types.
-  Add save mode to ``StateObject.save()`` to only save an object if it
   does or does not exist.
-  Add progress reporting for unsafe many-to-many join.
-  Add support for groupby on nullable columns when using
   ``dropna = False``.
-  Add function ``cd.get()`` to retrieve an object from the engine
   regardless of its type (to be used in place of ``cd.get_table``,
   etc.).
-  Add ``col.any()`` and ``col.all()`` functions for boolean columns.
-  Add compatibility with NumPy versions 1 and 2
-  Add support for conversion of fixedpoint column types to fixedpoint
   column types with lower precision.
-  Add support for conversion of fixedpoint column types to integer
   column types.
-  Add support for non-string column names and constructing a crandas
   dataframe from a numpy array.
-  Add more powerful interface for machine learning models, including
   linear regression models, via the ``CModel`` class.

Changes
^^^^^^^

-  Allow to use ``**query_args`` in ``obj.remove()``.
-  Report size of data in CDataFrame in its Jupyter/command-line
   representation.
-  Improve dry-run mode such that table uploads and downloads can be
   simulated without needing a connection to the engine.
-  **BREAKING** Inner product function ``col1.inner(col2)`` requires
   that both ``col1`` and ``col2`` has equal elements_per_value.
-  **BREAKING** The inner product function on vectors
   ``col1.inner(col2)`` (e.g., ctypes ``int_vec`` or ``fp_vec``) now
   requires that both ``col1`` and ``col2`` have vectors of equal
   length, and otherwise raises an error. The old behavior was to have
   an inner product of the first *n* entries of both vectors, where *n*
   is the smallest of the two vector lengths.
-  Get rid of ``ClientError`` in crandas, instead sending a relevant
   python error or the new ``ConnectionFileError`` or ``ModelError``.
-  **BREAKING** Change ``ServerError`` to ``EngineError``, adding new
   error codes and error structure.
-  When the computation of the variance in ``tab.describe()`` results in
   an overflow, return ``NaN`` instead of raising an exception.
-  Column information from ``table.columns`` is now better styled and
   reports data sizes per column.
-  Remove ``crandas.get2()`` that was only used for testing.
-  Remove support for slices with negative step sizes, which could give
   incorrect results.

Fixes
^^^^^

-  Fix parsing of fixed-point bounds of the form ``fp[min=0,max=1e+20]``
   (with a ``+`` in the exponent) which were incorrectly rejected.
-  Fix performance issues of ``col.min()`` and ``col.max()``.
-  Fix bug where the repr/Jupyter representation of tables with unknown
   structure (e.g., from running ``get_table`` in a transaction or in
   dry-run mode) would throw an error.
-  Fix bug where, in dry-run mode, the length of the randomly generated
   handles would be incorrect.
-  Fix bug where, if a placeholder was used in a script execution but no
   placeholder was used in the script recording, mismatches would not be
   detected.
-  Fix equality check between a ReturnValue and a constant like
   ``col.sum(mode="regular") == 0``. This interface allows to
   upload/download models; retrieve and set their parameters; and
   retrieve their public attributes.
-  Fix problem where performing a concat on a very large number of
   tables could result in I/O errors.
-  Fix incorrect slices on bytes columns in some cases.
-  Fix problem preventing uploading large nullable integers.
-  Fix bug where ``print_json`` query argument did not work outside of
   script recording.
-  Fix progress reporting for logistic regression.
-  Fix bug in ``function_to_json`` to compute correct ctype bounds.

.. _engine-and-crandas-release-1.13.0:

1.13.0
------

.. _crandas-10:

Crandas
~~~~~~~

-  Improve progress reporting for multi-column groupby and sort_values.

-  Add integer to string conversion using ``int_col.astype("varchar")``.

-  ``cd.read_parquet`` now accepts keyword arguments get passed through
   to ``pyarrow.Table.to_pandas``/``ParquetFile``

-  Add functions ``sum_squares()``, ``mean()``, ``var()``, ``count()``,
   ``min()``, ``max()`` to work on the result of series functions. For
   example, the following now works:

   .. code:: python

      table = cd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, auto_bounds=True)
      y = (table["a"] * table["b"]).mean()

   Before these functions only worked on direct columns of a DataFrame.
   Note all of these functions take a ``mode`` keyword argument, which
   defaults to ``"open"`` (i.e. open the resulting value). To keep the
   result secret-shared, specify ``mode="regular"``.

   **BREAKING**: the ``sum()`` function was already implemented, but its
   ``mode`` argument is now keyword-only instead of positional. That is,
   ``(table["a"] * table["b"]).sum("regular")`` now needs to be
   specified as ``(table["a"] * table["b"]).sum(mode="regular")``.

-  Fix a problem where, in some Python configurations (e.g., from the
   Spyder Console), the time counter of the progress bar could stall

-  If while running a query, crandas loses connection to the server, it
   will retry instead of immediately raising an error (see
   ``wait_for_reconnect`` configuration option)

-  Make information shown in the progress bar cleaner

-  **BREAKING**: Allow to continue running script after
   ``NoMatchingAuthorizationError``. Previously, in case of a mismatch
   during the use of a recorded script, it was not possible to fix the
   mismatch and continue using the script. This was because crandas
   moved on to try the next step of the script, even if a step failed.
   Now, crandas will keep trying the authorization for the failed step.
   It is still possible to manually skip a script step by using
   ``cd.script.current_script().skip()``.

-  Crandas can now connect to the same VDL server instance either in
   authorized or in design mode.

   -  The mode is typically specified by the connection file and can be
      retrieved using ``cd.base.session.mode``.
   -  A warning is now given when connecting in design mode. This
      warning can be disabled using the ``warn_design`` setting.
   -  **BREAKING**: ``crandas.tool.package_connection_file`` script now
      needs ``--design``, ``--authorized``, or ``--no-mode`` to indicate
      which mode to use

-  Add unsafe many-to-many join ``cd.unsafe.merge_m2m``. This join
   function is unsafe because it leaks statistical information to the
   servers; see documentation for details. Use with care.

-  Removed support for ``bitlength`` query argument (deprecated since
   version 1.4.0)

-  When using crandas in Jupyter, outputting a CDataFrame now is better
   styled using HTML

-  Change the recommended extension of a script recording from ``.json``
   to ``.recording``.

-  Fix ``predict_proba`` to work on small numbers of rows (smaller than
   the number of features)

-  Fix ``predict`` and ``predict_proba`` functionality for ordinal
   logistic regression. Previously these functions could return
   incorrect results for this regression type.

-  Major improvements to fixed point multiplication. Its performance is
   improved by around a factor 10 with network usage similar to that of
   integer multiplication.

-  Fix bug where, if no connection file is found, crandas would give an
   error message saying that “there are multiple connection files
   present”

-  Provide more detailed error messages for missing authorization,
   including a high-level hint for some frequently occurring mismatches

-  **BREAKING**: The crandas.experimental.stats module has been marked
   as stable and was moved to crandas.stats. In addition:

   -  The ``stats.rankdata()`` function was added
   -  The ``stats.tiecorrect()`` function was added
   -  The output of ``stats.chi2_contingency()`` was extended to also
      provide the field ``expected_freq``, which contains the expected
      frequencies of the observations. The structure remains
      backwards-compatible otherwise. In addition, accuracy was
      improved.
   -  The Kruskal-Wallis test ``stats.kruskal()`` was generalized to
      properly handle duplicates. The option ``allow_duplicates`` has
      been removed since it is no longer applicable. The function
      performs stricter input validation to ensure the test is
      well-defined.
   -  The ``stats.chisquare()`` and ``stats.chi2_contingency()``
      functions now perform stricter input validation to ensure the test
      is well-defined and can be evaluated accurately.
   -  The ``stats.contingency.expected_freq()`` function performs
      stricter input validation to ensure it can be computed accurately.
      In addition, accuracy was improved.

.. _engine-and-crandas-release-1.12.3:

1.12.3
------

.. _engine-and-crandas-release-1.12.2:

1.12.2
------

.. _crandas-11:

Crandas
~~~~~~~

-  Fix a bug, where reducing the bounds (e.g., using ``astype``) breaks
   the server when it tries to switch to a smaller internal
   representation.

-  Allow fixedpoint ctype specification to define min and max as floats.
   For example:

   .. code:: python

      table = cd.DataFrame({"vals": [0.75, 1.25, 2.25]}, ctype={"vals": "fp[min=0.5,max=2.5]"})

-  **BREAKING**: Statistics functionality is moved to the
   ``cd.experimental`` namespace because of potential inaccuracies for
   large inputs

-  Add ``cd.experimental.stats.contingency.crosstab`` for computing
   contingency tables

-  Add ``crandas.CSeries.as_series`` to obtain a ``ReturnValue`` for the
   series

-  Add ``crandas.experimental.stats.chi2_contingency`` for performing
   chisquare test on contingency table

-  Add ``crandas.experimental.stats.contingency.expected_freq`` for
   computing expected frequences

-  Fix ``crandas.experimental.stats.ttest_ind()`` to work with unequal
   variances for larger data

-  Fix ``crandas.experimental.stats.chisquare`` to provide better error
   message when input contains zeros

-  Added the ``crandas.compat``, ``crandas.crypto`` and
   ``crandas.experimental.stats`` packages in pyproject.toml via
   automatic package discovery.

.. _engine-and-crandas-release-1.12.1:

1.12.1
------

.. _crandas-12:

Crandas
~~~~~~~

-  Fix a bug in fixed-point multiplication; in certain cases this bug
   caused incorrect fixed-point multiplication results.
-  Fix an issue related to the printing of the min and max value of a
   fixed-point ctype
-  Fix bug where, in some cases, a schema mismatch between design and
   production would cause an exception rather than a user-friendly error
   message

.. _engine-and-crandas-release-1.12.0:

1.12.0
------

.. _crandas-13:

Crandas
~~~~~~~

-  Major performance improvements for comparisons and related operations
   like ``sort_values`` and ``groupby``. The performance of these
   operations is improved by around a factor 10.

-  Fix bugs in ``substitute`` in the following edge-cases:

   -  Fix crash when the ``output_size`` parameter is set too high.
   -  Fix potential invalid substitutions when Unicode substitutions are
      applied to ASCII strings.

-  Add support for fixed-point arrays, including array functions
   ``inner`` and ``vsum``.

-  Add support for ``cd.cut`` on fixed-point values, bins and labels.
   For example:

   .. code:: python

      table = cd.DataFrame({"vals":[1.1, 2.1, 2.2, 3.3]}, ctype={"vals":"fp[precision=30]"})
      table = table.assign(cut=cd.cut(table["vals"], [1.5,2.5], labels=[0.1,0.2,0.3], add_inf=True))

-  Fix bug which prevented uploading bytes columns with more than 100
   000 values.

-  Add support for Base64 encoding and decoding:
   ``bytes_col.b64encode()`` and ``string_col.b64decode()``:

   .. code:: python

      table = cd.DataFrame({"base64": ["TWFu", "bGlnaHQgd29yaw=="]}, auto_bounds=True)
      table = table.assign(bytes=table["base64"].b64decode()) # table["bytes"] set to [b"Man", b"light work"]
      table["bytes"].b64encode() # equal to table["base64"]

-  Add support for performing a groupby on more than 100 000 unique
   values.

-  Add support for computing the square root ``cdf["col"].sqrt()``

-  Add module ``crandas.stats`` for performing a number of statistical
   tests

-  ``override_version_check`` can now be supplied to the Session
   constructor, so
   ``cd.connect("connection-file", override_version_check=False)`` now
   works. Additionally, the ``CRANDAS_OVERRIDE_VERSION_CHECK``
   environment variable was moved to the ``cd.config`` framework.

-  Support ``mode`` and other query arguments for ``CSeries.sum()``,
   e.g. so that the following now works without opening values

   .. code:: python

      table = cd.DataFrame({"values": [1, 2, 3]})
      x = (table["values"] ** 2).sum(mode="regular")
      # x is not opened

-  Improve support for using multiple sessions at the same time with the
   ``session`` query argument and the use of a session as a context
   handler

   **BREAKING**: A check is introduced to ensure that objects from
   different sessions are not mixed. (Previously, if multiple sessions
   to the same endpoint were used, it was possible to mix objects.)

.. _engine-and-crandas-release-1.11.0:

1.11.0
------

.. _crandas-14:

Crandas
~~~~~~~

-  Preserve len(df) of pandas DataFrames without columns

-  Support for the concatenation of strings. For example:

   .. code:: python

      table = cd.DataFrame({"first_name": ["John", "Jan"], "last_name": ["Doe", "Jansen"]}, auto_bounds=True)
      full_names = table["first_name"] + " " + table["last_name"]
      full_names.open()

-  Add support for ``upper()``, similar to the existing ``lower()``. In
   addition, it is now also possible to only change the case of specific
   indices:

   .. code:: python

      table = cd.DataFrame({"name": ["john", "Jansen"]}, auto_bounds=True)
      table["name"].upper([0]).open() # Returns ["John", "Jansen"]
      table["name"].upper([1, 3, 5]).open() # Returns ["jOhN", "JAnSeN"]

-  ``crandas.tool.check_connection`` now cleans up tables after running
   a dummy computation

-  Perform parquet uploads through ``read_parquet``

-  Improved implementation of ``cd.merge``:

   -  Add support for right and one-to-many joins. Now, any combination
      of left/right/inner/outer one-to-one/one-to-many/many-to-one joins
      is supported.
   -  Improved support for nullable columns. When performing a non-inner
      join, now nullable columns are output (previously, a non-nullable
      column with the value zero was returned in some cases)
   -  Deal with key columns with different colum names consistently with
      pandas, e.g., when joining with ``left_on="a"`` and
      ``right_on="b"``, return two columns with the left and right key
      values, respectively
   -  Add support for suffixes
   -  Improved progress reporting for one-to-many joins
   -  Allow trivial grouping-based join where the right table only has
      key columns

   **BREAKING**: The signature of the ``cd.merge`` function has not
   changed, but, because of the above changes, the resulting table may
   have a different set of columns and/or columns of different types
   than in previous versions. Moreover, the underlying VDL API command
   has been renamed and changed internally so that existing
   authorizations do not apply to the new merge. The old merge is
   available via crandas as ``cd.compat.merge_v1``.

-  Fix bug in bytes column when using non-ASCII characters. Opening such
   values could give incorrect results.

-  Add support for (fixed-point) division (``/``). For example:

   .. code:: python

      table = cd.DataFrame({"num": [1.2, 3.4, 5.6], "denom": [2.1, 4.3, 5.4]}, auto_bounds=True)
      table = table.assign(div = lambda x: x.num / x.denom)
      table = table.assign(rec_num = lambda x: 1 / x.num) # computing the reciprocal of num
      table.open()

-  Add support for ``strip()`` on string columns, which will remove the
   leading and trailing spaces.

-  Add support for floor_division (``//``).

-  Add support for RIPEMD-160:

   .. code:: python

      import crandas.crypto.hash as hash

      tab = cd.DataFrame({"a": [b"Test 1", b"Test 2"]}, auto_bounds=True)
      h = hash.RIPEMD_160
      h.digest(tab["a"]).open()

      # HMAC is also supported
      tab_key = cd.DataFrame({"key": [bytes.fromhex("0123456789abcdef0123456789abcdef01234567")]}, auto_bounds=True)
      hmac = hash.HMAC_RIPEMD_160(tab_key["key"].as_value())
      hmac.digest(tab["a"]).open()

-  Add support for AES encryption:

   .. code:: python

      import crandas.crypto.cipher as cipher

      table = cd.DataFrame({"a": [bytes.fromhex("00112233445566778899aabbccddeeff")] * 2}, ctype={"a": "bytes[16]"})
      tab_key = cd.DataFrame({"key": [bytes.fromhex("000102030405060708090a0b0c0d0e0f")]}, ctype={"key": "bytes[16]"})
      aes_128 = cipher.AES_128(tab_key["key"].as_value())
      aes_128.encrypt(table["a"]).open()

-  Add support for ``len`` and slicing on bytes columns:
   ``bytes_col.len()`` and ``bytes_col[16:32]``

-  Add support for conversion to and from lowercase hex strings:
   ``string_col.hex_to_bytes()`` and ``bytes_col.bytes_to_hex()``

-  Add support for encoding ASCII strings as bytes:
   ``ascii_col.encode()``

-  Add bitwise operations on bytes columns: AND (``&``), OR (``|``), XOR
   (``^``), NEGATE (``~``)

-  Add support for string substitution:

   .. code:: python

      table = cd.DataFrame({"a": ["PÆR", "á"]}, auto_bounds=True)
      table["a"].substitute({"a": ["á", "à", "ä"], "AE": ["Æ"]}, output_size=4).open()

-  Add support for filtering characters:

   .. code:: python

      table = cd.DataFrame({"a": ["Test string", "More"]}, auto_bounds=True)
      table["a"].filter_chars(["a", "e", "i", "o", "u"]).open()

-  Add support for reading stored analyst, approver, and server keys in
   PEM format

-  Fix bug where uploading a series with only NULL values would give an
   error

-  Fix bug where ``repr(cdf)``, ``str(cdf)`` would not deal correctly
   with zero-row dataframes

-  We move ``auto_bounds`` from a ``Session`` property to be a
   configuration variable (using
   ``crandas.config.settings.auto_bounds``). Having this variable set to
   True suppresses data-derived column bound warnings by default. Each
   session object now has a deprecated ``auto_bounds`` property that
   gets/sets the configuration variable.

   **BREAKING:** This breaks the possibility of a user having two
   concurrent sessions with different ``auto_bounds`` values set.

-  Allow to provide ``pd.read_csv`` arguments (e.g., ``delimiter``) as
   arguments to ``cd.read_csv``

-  Warn user when calling crandas in a conditional context (e.g., an
   ``if`` statement) during script recording. See documentation of the
   ``crandas.check_recording`` module for details.

-  Warn users to specify a ``validate`` argument when using ``merge``
   during script recording. See documentation of the
   ``crandas.check_recording`` module for details.

-  Allow to specify ``ctype`` as argument to ``cd.Series``

-  Expose ``ctype`` and ``schema`` of a column as properties of the
   classes ``Col`` (``cdf.columns.cols[ix]``) and ``CSeriesColRef``
   (``cdf["col"]``); and of a ``CDataFrame``

-  Add function ``CDataFrame.astype()`` that converts the type of a
   individual columns (via ``ctype`` parameter) or the full CDataFrame
   (via ``schema`` parameter)

-  Add ``schema`` parameters to ``upload_pandas_dataframe``,
   ``read_csv``, ``DataFrame``, ``read_parquet`` functions. For
   ``ctype`` parameter, warn if the corresponding column does not exist

-  Add functions ``pandas_dataframe_schema`` and ``read_csv_schema``
   that return the schema corresponding to a DataFrame or CSV file

-  A server-side schema check for get_table is introduced. When
   get_table is used in a script, the schema of the resulting table is
   stored in the recorded script. When the script is used, a server-side
   check for adherence to the schema is performed.

   **BREAKING**: using get_table in a script where the tables do not
   match between recording and using the script, now produces an error;
   see documentation of ``get_table`` for details

-  Add tilde expansion to cd.base.Session.connect()

-  Improved error messages for: using ``get_table`` on a non-dummy
   handle in script recording; invalid arguments to ``cut``,
   e.g. non-integer bins or labels; sending unauthorized queries where
   authorization is needed; invalid ``how`` argument to ``merge``; use
   of ``None``-like values in functions (e.g., ``x.if_else(y, None)``);
   use of unknown ctypes (e.g., ``ctype={"a": "str"}``); uploading
   fixed-point columns where integers may be intended (e.g., uploading
   ``pd.Series([1, 2, None])``)

-  Fix bug where the use of a value placeholder (e.g.,
   ``cdf.assign(b=lambda x: x.a + cd.placeholders.Any(1))``) would in
   many cases not work

.. _engine-and-crandas-release-1.10.2:

1.10.2
------

This is a bugfix release.

.. _crandas-15:

Crandas
~~~~~~~

-  We update the pyformlang dependency to fix bugs in character ranges

.. _engine-and-crandas-release-1.10.1:

1.10.1
------

This is a bugfix release.

.. _crandas-16:

Crandas
~~~~~~~

-  We give better errors when receive unexpected responses from the
   server.
-  We fix a performance regression of the groupby operation, when it is
   performed on a single F64 column. It is now again as fast as in
   version 1.9.

.. _engine-and-crandas-release-1.10.0:

1.10.0
------

The major new feature is expanded support for fixed-point columns.

.. _crandas-17:

Crandas
~~~~~~~

-  Expanded support for fixed-point columns:

   -  Fixed point columns now support larger range and precision (96
      bits).
   -  Fixed point columns now support various statistical functions
      (``min()``,\ ``max()``,\ ``sum()``,\ ``sum_squares()``,
      ``mean()``, ``var()``).
   -  Support for arithmetic operations between two fixed point columns,
      and between fixed-point and integer columns is added. (NB: we do
      not yet support division; this will be added in a later release.)
   -  Support for concatenation of integer and fixed point columns
      (resulting in a fixed-point column) is added.
   -  Support for join and filtering on fixed point columns is added.
   -  Parsing of floats on column operations used in operations as
      filters or assign is supported.

-  The new ``dropna`` function removes rows with any missing values from
   a CDataFrame.

-  The new ``save`` can be used to save an object such as a CDataFrame.
   If persistence is enabled on the server, this means that the object
   is kept across server restarts. The ``save`` command may also be used
   to attach a *name* to a computed table,
   e.g. ``table.save(name="my_table")``.

-  The connection file and ``Session`` now both have an optional
   ``api_token`` property. This is sent to the server and may be used
   for authentication purposes.

-  The functions ``obj.remove()`` and ``cd.remove_objects()`` have been
   changed to provide more information in case non-existent object(s)
   are removed.

-  Support for division is added.

   **BREAKING**: when removing multiple objects using
   ``cd.remove_objects(lst)``, the new behavior is to try to remove all
   objects even if errors are encountered. The old behavior was to abort
   on the first error. See the documentation for details.

.. _engine-and-crandas-release-1.9.2:

1.9.2
-----

.. _engine-and-crandas-release-1.9.1:

1.9.1
-----

.. _crandas-18:

Crandas
~~~~~~~

No changes.

.. _engine-and-crandas-release-1.9.0:

1.9.0
-----

.. _crandas-19:

Crandas
~~~~~~~

-  The ``Session`` object now has two settings modes, depending on
   whether a engine connection file is used (recommended method), or
   whether the endpoint, certificate, and server public keys are
   specified manually (legacy method). These are reflected in the
   ``settings_mode`` attribute of the ``Session`` object.

   When ``endpoint`` is set by the user, the ``Session`` is set to
   legacy mode; otherwise, the connection file method is assumed. When
   the user does not configure anything, the default is to load the
   ``default.vdlconn`` file, residing in the configuration folder
   (default: ``~/.config/crandas``, overridable by the ``CRANDAS_HOME``
   environment variable). The name ``default.vdlconn`` can be overriden
   through the ``default_connection_file`` variable. If that file is not
   present, scan the configuration folder for files with the extension
   ``.vdlconn``. If there is a single file, use that. If there are
   multiple, raise an error.

   ``analyst_key`` is now a read-write property that returns the nacl
   SigningKey, and can be set to either a SigningKey, a filename, a
   path, or None. When set to None, the default key will be loaded. Both
   the default key file, and the default relative path, depend on the
   settings mode. For connection file mode, it is ``analyst.sk`` and the
   current working directory in case of a path (Path, or a string that
   includes a slash “/”); in case of a filename (string that does not
   include a slash), it is assumed to reside in the configuration
   folder; for legacy mode it is ``clientsign.sk`` and the base_path (to
   maintain backwards compatibility).

-  Besides the ``Session`` object, which is used to configure the
   connection to the engine, we introduce
   `Dynaconf <https://www.dynaconf.com/>`__ for user configuration for
   settings that are not directly related to the connection. The new
   method provides an easy way for the user to set variables, either
   using code, using environment variables, or using a settings file
   (default: ``settings.toml`` in the same configuration folder referred
   to above).

-  We make displaying progress bars configurable using the
   ``show_progress_bar`` and ``show_progress_bar_after`` (for the delay
   in seconds) variables.

-  To make the configuration folder and display the folder in the user’s
   file browser, the user can now call ``python -m crandas config``.

-  We support the ``Any`` placeholder for ``get_table``

-  We support ``stepless`` mode in scripts, that can be manually enabled
   to remove ``script_step`` numbers from certain queries. This can be
   useful together with the ``Any`` placeholder, to have queries that
   can be executed a variable number of times.

-  Add a ``map_dummy_handles`` override in call to ``get_table``

-  In ``CDataFrame.assign``, we now support the use of colum names that
   correspond to engine query arguments (e.g. “name”, “bitlength”).

   **BREAKING**: existing scripts that use these engine query arguments
   will now give an error message explaining how these arguments should
   be specified. Existing authorizations are not affected.

-  Add support for the following operators in regular expressions:

   -  ``{n}``: match exactly n times
   -  ``{min,}``: match at least min times
   -  ``{,max}``: match at most max times
   -  ``{min,max}``: match at least min and at most max times

-  Support was added to disable HTTP Keep-Alive in connections to the
   engine server. This can help solve connection stability issues.
   Keep-Alive can be disabled in the connection file by setting
   ``keepalive = false``. The setting can be overriden by the user by
   using the ``keepalive`` parameter of ``crandas.connect``.

-  Add ``sort_values`` function to a ``CDataFrame``, which sorts the
   dataframe according to a column. Example:

   .. code:: python

      cdf = cd.DataFrame({"a": [3, 1, 4, 5, 2], "b": [1, 2, 3, 4, 5]}, auto_bounds=True)
      cdf = cdf.sort_values("a")

   Currently, sorting on strings is not supported.

-  Add support for groupby on multiple columns and on all non-nullable
   column types.

   For example, this is now possible:

   .. code:: python

      cdf = cd.DataFrame({"a": ["foo", "bar", "foo", "bar"], "b": [1, 1, 1, 2]}, auto_bounds=True)
      tab = cdf.groupby(["a", "b"]).as_table()
      sorted(zip(tab["a"].open(), tab["b"].open()))

   The parameter name of the groupby is renamed from ``col`` to ``cols``
   to reflect these changes. Currently, a maximum of around 100 000
   unique values are supported. Above that, the groupby will fail and
   give an error message. Note that this is the number of *unique*
   values. The number of rows can be significantly higher as long as
   there are less than 100 000 different values in the groupby
   column(s). Furthermore, a consequence of the new implementation is
   that the output is not order-stable anymore but random.

-  Add k-nearest neighbors functionality. This allows the target value
   of a new data point to be predicted based on the existing data using
   its k nearest neighbors. Example:

   .. code:: python

      import crandas as cd
      from crandas.crlearn.neighbors import KNeighborsRegressor
      X_train = cd.DataFrame({"input": [0, 1, 2, 3]}, auto_bounds=True)
      y_train = cd.DataFrame({"output": [0, 0, 1, 1]}, auto_bounds=True)
      X_test = cd.DataFrame({"input": [1]}, auto_bounds=True)
      neigh = KNeighborsRegressor(n_neighbors=3)
      neigh.fit(X_train, y_train)
      neigh.predict_value(X_test)

   For more information, see
   ``crandas.crlearn.neighbors.KNeighborsRegressor``.

-  Add a new aggregator ``crandas.groupby.any`` that takes any value
   from the set of values and is faster than
   ``crandas.groupby.max``/``crandas.groupby.min``

-  In the HTTP connection to the engine server, use retries for certain
   HTTP requests to improve robustness

-  Add ``created`` property to dataframes and other objects indicating
   the date and time when they were uploaded or computed

-  Handle cancellation of a query by raising a
   ``QueryInterruptedError``. This replaces the previous behaviour of
   returning ``None`` and printing “Computation cancelled”. In ipython,
   the “Computation cancelled” message is still shown.

-  In the progress bar for long-running computations, show “no estimate
   available yet” as long as progress is at 0% (instead of a more
   cryptic notation).

-  Add functionality to list uploads to the engine. For more
   information, see: ``crandas.stateobject.list_uploads`` and
   ``crandas.stateobject.get_upload_handles``.

.. _vdl-and-crandas-release-1.8.1:

1.8.1
-----

Crandas fixes
~~~~~~~~~~~~~

-  ``crandas.get_table()`` now ensures ``connect()`` is called first
-  Fix upload and decoding of positive numbers of 64 bits In Crandas,
   trying to upload and download numbers of in the range
   ``R = [2^{63}, 2^{64} -1]`` would previously fail. We fix this issue
   by mimicking pandas behavior. That is, a number in the range ``R`` is
   returned as an ``np.uint64``. Secondly, w.r.t. uploading,
   ``np.uint64``, ``np.uint32``, and ``np.uint16`` are now recognized as
   integers.

.. _vdl-and-crandas-release-1.8.0:

1.8.0
-----

Major new features include:

-  Support for bigger (96 bit) integers
-  Progress bars for running queries and the possibility of cancelling
   running queries
-  Memory usage improvements (client & server)
-  Null value (missing values) support for all column types
-  Searching strings using regular expressions
-  Added a date column type

.. _new-features-1:

New features
~~~~~~~~~~~~

-  Support for columns with bigger (96 bit) integers

   Just like in the previous version, integers have the ctype ``int``.
   When specifying the ctype, minimum and maximum bounds for the values
   can be supplied using the ``min`` and ``max`` parameters,
   e.g. ``int[min=0, max=1000]``. Bounds (strictly) between -2^95 and
   2^95 are now supported.

   For example, to upload a column ``"col": [1, 2, 3, 4]`` as an ``int``
   use the following ``ctype spec``:

   .. code:: python

      table = cd.DataFrame({"col":[1, 2, 3, 4]},  ctype={"col": "int[min=1,max=4]"})

   as before.

   To force usage of a particular modulus the integer ctype accepts the
   keyword argument ``modulus``, which can be set to either of the
   moduli that are hardcoded in ``crandas.moduli``. For example, to
   force usage of large integers one can run:

   .. code:: python

      from crandas.moduli import moduli
      table = cd.DataFrame({"col":[1, 2, 3, 4]},  ctype={"col": f"int[min=1,max=4,modulus={moduli[128]}]"})

   Notes:

   -  crandas will automatically switch to
      ``int[modulus={moduli[128]}]`` if the (derived) bounds do not fit
      in an ``int32``.
   -  crandas will throw an error if the bounds do not fit in an
      ``int96``.

   We refer to 32-bit integer columns as F64, and 96-bit integer columns
   as F128, because they are internally represented as 64 and 128 bits
   numbers, respectively, since we account for a necessary security
   margin.

   Supported features for large integers:

   -  Basic binary arithmetic ``(+, -, *, ==, <, >, <=, >=)`` between
      any two integer columns
   -  Groupby and filter on large integers
   -  Unary functions on large integer columns, such as
      ``mean(), var(), sum(), ...``
   -  ``if_else`` where the 3 arguments ``guard``, ``ifval``,
      ``elseval`` may be any integer column
   -  Conversion from 32-bit integer columns to large integer columns
      via ``astype`` and vice versa
   -  Vertical concatenation of integer columns based on different
      moduli
   -  Performing a join on columns based on different moduli

   Current limitations:

   -  We do not yet support string conversion to large integers
   -  ``json_to_val`` only allows integers up to int32 yet
   -  IntegerList is only defined over F64 yet

   Changes:

   -  ``base.py``: deprecated ``session.modulus``
   -  ``crandas.py``: class ``Col`` and ``ReturnValue`` present also the
      ``modulus``
   -  ``ctypes.py``:

      -  added support to encode/decode integers of 128 bits
      -  made ctype class decoding modulus dependent

   -  ``input.py``: ``mask`` and ``unmask`` are now dependent on the
      modulus
   -  ``placeholders.py``: class Masker now also contains a modulus
   -  NEW FILE ``moduli.py``: containing the default moduli for F64 as
      well as F128.

-  Searching strings and regular expressions

   To search a string column for a particular substring, use the
   ``CSeries.contains`` function:

   .. code:: python

      table = cd.DataFrame({"col": ["this", "is", "a", "text", "column"]})
      only_is_rows = table["col"].contains("is")
      table[only_is_rows].open()

   Regular expressions are also supported, using the new
   ``CSeries.fullmatch`` function:

   .. code:: python

      import crandas.re
      table = cd.DataFrame({"col": ["this", "is", "a", "text", "column"]})
      starts_with_t = table["col"].fullmatch(cd.re.Re("t.*"))
      table[starts_with_t].open()

   Regular expressions support the following operations:

   -  ``|``: union
   -  ``*``: Kleene star (zero or or more)
   -  ``+``: one or more
   -  ``?``: zero or one
   -  ``.``: any character (note that this also matches non-printable
      characters)
   -  ``(``, ``)``: regexp grouping
   -  ``[...]``: set of characters (including character ranges, e.g.,
      ``[A-Za-z]``)
   -  ``\\d``: digits (equivalent to ``[0-9]``)
   -  ``\\s``: whitespace (equivalent to ``[\\\\ \\t\\n\\r\\f\\v]``)
   -  ``\\w``: alphanumeric and underscore (equivalent to
      ``[a-zA-Z0-9_]``)
   -  ``(?1)``, ``(?2)``, …: substring (given as additional argument to
      ``CSeries.fullmatch()``)

   Regular expressions are represented by the class ``crandas.re.Re``.
   It uses pyformlang’s functionality under the hood.

-  Efficient text operations for ASCII strings

   The ``varchar`` ctype now has an ASCII mode for increased efficiency
   with strings that do only contain ASCII characters (no “special”
   characters; all codepoints <= 127). Before this change, we only
   supported general Unicode strings. Certain operations (in particular,
   comparison, searching, and regular expression matching), are more
   efficient for ASCII strings.

   By default, crandas autodetects whether or not the more efficient
   ASCII mode can be used. This information (whether or not ASCII mode
   is used) becomes part of the public metadata of the column, and
   crandas will give a ``ColumnBoundDerivedWarning`` to indicate that
   the column metadata is derived from the data in the column, unless
   ``auto_bounds`` is set to True.

   Instead of auto-detection, it is also possible to explicitly specify
   the ctype ``varchar[ascii]`` or ``varchar[unicode]``, e.g.:

   .. code:: python

      import crandas as cd

      # ASCII autodetected: efficient operations available; warning given
      cdf = cd.DataFrame({"a": ["string"]})

      # Unicode autodetected: efficient operations not available; warning given
      cdf = cd.DataFrame({"a": ["stri\U0001F600ng"]})

      # ASCII annotated; efficient operations available; no warning given
      cdf = cd.DataFrame({"a": ["string"]}, ctype={"a": "varchar[ascii]"})

      # Unicode annotated; efficient operations not available; no warning given
      cdf = cd.DataFrame({"a": ["string"]}, ctype={"a": "varchar[unicode]"})

-  Running computations can now be cancelled

   Locally aborting a computation (e.g. Ctrl+C) will now cause it to be
   cancelled on the server as well.

   -  Rename crandas.query to crandas.command to be consistent with
      server-side implementation and to differentiate from the new
      crandas.queries module
   -  Add module crandas.queries providing client-side implementation of
      the task-oriented VDL query API, and use this for all queries
      performed via vdl_query. To perform queries, a block-then-poll
      strategy is used where first, a blocking query with a timeout of 5
      seconds is performed, and if the result is not ready then, status
      update polls are done at a 1 second interval

-  All column types now support missing values

   All ctypes now support a ``nullable`` flag, indicating that values
   may be missing. It may also be specified using a question mark,
   e.g. ``varchar?``.

-  Progress reporting for long-running queries

   Queries that take at least 5 seconds now result in a progress bar
   being displayed that estimates the progress of the computation.

   To enable this for Jupyter notebooks, note that crandas should be
   installed with the ``notebook`` dependency flag, see below.

-  Various memory improvements for both server and client

-  Large data uploads and downloads are now automatically chunked

   Uploads are processed in batches of size
   ``crandas.ctypes.ENCODING_CHUNK_SIZE``.

-  Added a date column type

   Dates can now be encoded using the ``date`` ctype.

   -  Dates limited between 1901/01/01 - 2099/12/31 for leap year
      reasons
   -  Ability to subtract two dates to get number of days and add days
      to a date
   -  All comparison operators apply for date
   -  Created functions for ``year``, ``month``, ``day`` and ``weekday``
   -  Able to group over dates, merge and filter
   -  New ctype ``DateCtype`` converts strings (through
      ``pd.to_datetime``) and python dates (``datetime.date``,
      ``datetime64`` and ``pd.timestamp``) into crandas dates
   -  Helper subclass of ``CSeries`` ``_DT`` allows for pandas-style
      calling of date retrieval functions (``col.dt.year``) *and*
      standard calls (``col.year``).

.. _crandas-20:

Crandas
~~~~~~~

-  New dependencies: ``tqdm`` and ``pyformlang``

-  New dependency flag: ``notebook``, for features related to Jupyter
   notebooks. Use ``pip install crandas[notebook]`` to install these.

-  Dependency urllib3 is updated to ensure ‘assert_hostname = False’
   does work as expected

-  Documentation updates

-  Recording or loading a new script when there is already another
   script active now no longer gives an error, but a warning message is
   printed instead.

-  feat(crandas): support with_threshold for aggregation

   This adds support for
   e.g. ``table["column"].with_threshold(10).sum()``. Before this
   change, ``with_threshold()`` was only supported for filtering
   operations, e.g. ``table[filter.with_threshold(5)]``, and not for
   aggregation operations (min, max, sum, etc.).

   Note that the alternative that worked before
   ``table["column"].sum(threshold=5)`` is still supported, for both
   aggregation and filtering operations.

   Minor change: supplying both with_threshold() and a threshold
   argument now raises a ValueError instead of a TypeError when these
   are different.

-  implement setter for base_path

   The crandas ``Session`` objects now supports setting ``base_path`` to
   either a string, a Path, or None. Retrieving the property will always
   return a Path.

-  Fix problem where calling size() on a groupby object would fail for
   int32 columns

-  Improved message for auto-determined bounds

   -  Collect all auto_bounds warnings from a data upload into a single
      warning message
   -  Allow to set auto_bounds globally in crandas.base.session