crandas.crlearn

class crandas.crlearn.linear_model.CLinearRegression(instance=None)

Bases: CModel

Linear ridge regression classifier corresponding to the scikit-learn Ridge class (see here).

Parameters:

  • alpha: regularization strength (see scikit-learn documentation); defaults to 1.0

Attributes:

  • n_features_in_: number of input features

  • feature_names_in_: input feature names

  • beta_: (encrypted) fitted parameters (intercept and respective feature coefficients)

fit(X, y, **query_args)

Fit a Linear Regression model on the data

Parameters:
  • X (CDataFrame) – Training data

  • y (CDataFrame) – Target data (should have only 1 column)

  • query_args – See queryargs

Return type:

self

get_beta(**query_args)

Get the fitted parameters (i.e. intercept and feature coeficients) as a table

This function is deprecated; instead, use CModel.open() to open the model, and use the returned beta_ attribute.

predict(X, **query_args)

Make predictions on a dataset using a linear regression model

Note: this returns predictions on the target, not probabilities!

Parameters:
  • X (CDataFrame) – predictor variables

  • query_args – See queryargs

Returns:

table containing the column consisting of the predicted target values

Return type:

CDataFrame

score(X, y, **query_args)

Scores the linear regression model using the R2 metric

Parameters:
  • X (CDataFrame) – Test data

  • y (CDataFrame) – Target test data (should have only 1 column)

  • query_args – See queryargs

Return type:

self

crandas.crlearn.linear_model.LinearRegression(alpha=0.0, *, fit_intercept=True, copy_X=True, n_jobs=None, positive=False, **query_args)

Create a new linear regression model (CLinearRegression) with given alpha (0.0 by default)

Other parameters are for compatibility with scikit-learn and cannot be overriden.

crandas.crlearn.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True, max_iter=None, tol=None, solver='cholesky', positive=False, random_state=None, **params_and_query_args)

Create a new ridge regression model (CLinearRegression) with given alpha (1.0 by default)

Other parameters are for compatibility with scikit-learn and cannot be overriden.

class crandas.crlearn.logistic_regression.CLogisticRegression(instance=None)

Bases: CModel

Logistic Regression Classifier Object with the same parameters as the Scikit-learn Logistic Regression Class

See here for its parameters.

Parameters:

  • type: type (binomial/multinomial/ordinal)

  • optimizer: optimizer used to fit the model (see crandas.crlearn.optimizer.OptimizerParams)

  • max_iter: number of iterations to perform

  • warm_start: whether to continue fitting from the previous optimizer state

Attributes:

  • feature_names_in_: input feature names

  • n_classes_: number of output classes

  • feature_name_out_: output feature name

  • optimizer_: attributes of the optimizer used to fit the model (see crandas.crlearn.optimizer.OptimizerAttributes)

  • beta_: (encrypted) fitted parameters (intercept(s) and coefficients)

fit(X, y, *, n_classes=None, sample_weight=None, **query_args)

Fit a Linear Regression model on the data

Parameters:
  • X (CDataFrame) – Training data

  • y (CDataFrame) – Target data (should have only 1 column)

  • n_classes (int or None) – Number of output classes (categories). For binomial models, if not given, n_classes is assumed to be equal to two. For other models, if not given, the number of classes is derived from the metadata of y.

  • sample_weight (None) – Not supported

  • query_args – See queryargs

Returns:

self

Return type:

CLogisticRegression

from_beta(*, type='binomial', n_classes=2, feature_names_in, feature_name_out='out')

Upload pre-fittted logistic regression model

Parameters:
  • beta (list[float]) – Fitted parameters

  • type (str, default "binomial") – Type of model (“binomial”/”multinomial”/”ordinal”)

  • n_classes (int, default 2) – Number of classes

  • feature_names_in (list[str]) – Input feature names

  • feature_name_out (str, default "out") – Output feature name

Returns:

Logistic regression model with given parameters

Return type:

CLogisticRegression

predict(X, decision_boundary=0.5, **query_args)

Make (binary) predictions on a dataset using a logistic regression model

Note: this returns binary predictions, not probabilities!

Parameters:
  • X (CDataFrame) – predictor variables

  • decision_boundary (float) – number between 0 and 1; records with a probability below this value are classified as 0, greater than or equal to as 1

  • query_args – See queryargs

Returns:

table containing the column consisting of the predicted target values

Return type:

CDataFrame

predict_proba(X, **query_args)

Make (probability) predictions on a dataset using a logistic regression model

Note: this returns probabilities, not binary predictions

Parameters:
  • X (CDataFrame) – predictor variables

  • query_args – See queryargs

Returns:

table with columns representing predicted class probabilities per input record

Return type:

CDataFrame

crandas.crlearn.logistic_regression.LogisticRegression(penalty='l2', *, optimizer='lbfgs', type='binomial', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=None, max_iter=10, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, **query_args)

Create a new logistic regression model (CLogisticRegression).

See CLogisticRegression) for the meaning of the parameters. Parameters not listed in that class have the same meaning as in scikit-learn but cannot be changed from their defaults.

crandas.crlearn.metrics.classification_accuracy(y, y_pred, n_classes=2, **query_args)

Compute the classification accuracy on class predictions

Parameters:
  • y (CDataFrame) – column with the actual values in range

  • y_pred (CDataFrame) – column with the predictions in range

  • n_classes (int) – number of classes (default = 2)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.confusion_matrix(y, y_pred, n_classes=2, **query_args)

Compute the confusion matrix on class predictions

The y-axis of the result represents the true class. The x-axis the predicted class.

Parameters:
  • y (CDataFrame) – column with the actual values in range

  • y_pred (CDataFrame) – column with the predictions in range

  • n_classes (int) – number of classes (default = 2)

  • query_args – See queryargs

Returns:

matrix of size n_classes * n_classes

Return type:

CDataFrame

crandas.crlearn.metrics.mcfadden_r2(model, X, y, **query_args)

Compute the McFadden R^2 metric

Parameters:
  • model (CLogisticRegression) – logistic regression model

  • X (CDataFrame) – predictor variables

  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.model_deviance(model, X, y, **query_args)

Compute the model deviance

Parameters:
  • model (CLogisticRegression) – logistic regression model

  • X (CDataFrame) – predictor variables

  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.null_deviance(y, **query_args)

Compute the null deviance

Parameters:
  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

  • NOTE (both classes NEED to be present in 'y', otherwise the computations are undefined internally (logarithm of 0))

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.precision_recall(y, y_pred, **query_args)

Compute the precision and recall on predictions

Parameters:
  • y (CDataFrame) – column with the actual values (binary)

  • y_pred (CDataFrame) – column with the predictions (binary)

query_args :

See queryargs

Returns:

two fixed numbers between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.score_r2(y, y_pred, **query_args)

Compute the R^2 metric on predictions

Parameters:
  • y (CDataFrame) – column with the actual values

  • y_pred (CDataFrame) – column with the predictions

  • query_args – See queryargs

Returns:

fixed point number between < 1

Return type:

CDataFrame

crandas.crlearn.metrics.tjur_r2(y, y_pred, **query_args)

Compute the Tjur R^2 metric on predictions

Parameters:
  • y (CDataFrame) – column with the actual values (binary)

  • y_pred (CDataFrame) – column with the predictions (probabilities!)

  • query_args – See queryargs

Returns:

fixed point number between -1 and 1

Return type:

CDataFrame