crandas.crlearn

class crandas.crlearn.linear_model.LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

Bases: Ridge

Linear Regression Classifier Object using ordinary Least Squares with the same parameters as the Scikit-learn LinearRegression Class

See here for its parameters.

Currently, this class inherits from Ridge since we implemented in terms of Ridge Regression. We use the fact that alpha = 0 in Ridge translates to Ordinary Least Squares

class crandas.crlearn.linear_model.LinearRegressionStateObject(reg_type=None, **kwargs)

Bases: StateObject

class crandas.crlearn.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True, max_iter=None, tol=None, solver='auto', positive=None, random_state=None)

Bases: object

Linear Ridge Regression Classifier Object with the same parameters as the Scikit-learn Ridge Regression Class

See here for its parameters.

fit(X, y, sample_weight=None, **query_args)

Fit a Linear Regression model on the data

Parameters:
  • X (CDataFrame) – Training data

  • y (CDataFrame) – Target data (should have only 1 column)

  • sample_weight – array of weights assigned to individual sampled (Not yet supported)

  • query_args – See queryargs

Return type:

self

get_beta(**kwargs)

Get the fitted parameters (i.e. intercept_ and coef_ combined in 1 table named beta).

predict(X, **query_args)

Make predictions on a dataset using a linear regression model

Note: this returns predictions on the target, not probabilities!

Parameters:
  • X (CDataFrame) – predictor variables

  • query_args – See queryargs

Returns:

table containing the column consisting of the predicted target values

Return type:

CDataFrame

score(X, y, **query_args)

Scores the linear regression model using the R2 metric

Parameters:
  • X (CDataFrame) – Test data

  • y (CDataFrame) – Target test data (should have only 1 column)

  • query_args – See queryargs

Return type:

self

class crandas.crlearn.logistic_regression.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=10, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, classes=[], n_classes=2)

Bases: object

Logistic Regression Classifier Object with the same parameters as the Scikit-learn Logistic Regression Class

See here for its parameters.

fit(X, y, sample_weight=None, max_iter=None, warm_start=None, **query_args)

Fit a Logistic Regression model on the data

Parameters:
  • X (CDataFrame) – predictor variables

  • y (CDataFrame) – response variable (should have only 1 column) that columns should be integer.

  • sample_weight – array of weights assigned to individual sampled (Not yet supported)

  • max_iter (int) – deviation from Scikit (see note above)

  • warm_start (bool) – deviation from Scikit (see note above) if True: determines whether successive fits continue approximation from where it stopped else: indicates that each successive fit will start from scratch.

  • query_args – See queryargs

Returns:

self

Return type:

LogisticRegression

Notes

Note

Compared to Scikit-learn we add the parameter max_iter and warm_start. Scikit-learn treats max_iter and warm_start as object configurations which are set at construction and cannot be changed. We prefer to give the user the freedom of deviating form the global setting in successive calls to fit().

We rather use the corresponding class attributes as default values for each call to fit.

get_beta(**kwargs)

Get the fitted parameters (i.e. intercept_ and coef_ combined in 1 table named beta).

predict(X, decision_boundary=0.5, **query_args)

Make (binary) predictions on a dataset using a logistic regression model

Note: this returns binary predictions, not probabilities!

Parameters:
  • X (CDataFrame) – predictor variables

  • decision_boundary (float) – number between 0 and 1; records with a probability below this value are classified as 0, greater than or equal to as 1

  • query_args – See queryargs

Returns:

column consisting of the predicted probabilities

Return type:

CDataFrame

predict_proba(X, **query_args)

Make (probability) predictions on a dataset using a logistic regression model

Note: this returns probabilities, not binary predictions

Parameters:
  • X (CDataFrame) – predictor variables

  • query_args – See queryargs

Returns:

column consisting of the predicted probabilities

Return type:

CDataFrame

class crandas.crlearn.logistic_regression.LogisticRegressionStateObject(reg_type=None, **kwargs)

Bases: StateObject

crandas.crlearn.metrics.classification_accuracy(y, y_pred, n_classes=2, **query_args)

Compute the classification accuracy on class predictions

Parameters:
  • y (CDataFrame) – column with the actual values in range

  • y_pred (CDataFrame) – column with the predictions in range

  • n_classes (int) – number of classes (default = 2)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.confusion_matrix(y, y_pred, n_classes=2, **query_args)

Compute the confusion matrix on class predictions

The y-axis of the result represents the true class. The x-axis the predicted class.

Parameters:
  • y (CDataFrame) – column with the actual values in range

  • y_pred (CDataFrame) – column with the predictions in range

  • n_classes (int) – number of classes (default = 2)

  • query_args – See queryargs

Returns:

matrix of size n_classes * n_classes

Return type:

CDataFrame

crandas.crlearn.metrics.mcfadden_r2(model, X, y, **query_args)

Compute the McFadden R^2 metric

Parameters:
  • model (LogisticModel) – logistic regression model

  • X (CDataFrame) – predictor variables

  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.model_deviance(model, X, y, **query_args)

Compute the model deviance

Parameters:
  • model (LogisticModel) – logistic regression model

  • X (CDataFrame) – predictor variables

  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.null_deviance(y, **query_args)

Compute the null deviance

Parameters:
  • y (CDataFrame) – binary response variable (should have only 1 column)

  • query_args – See queryargs

  • NOTE (both classes NEED to be present in 'y', otherwise the computations are undefined internally (logarithm of 0))

Returns:

fixed point number between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.precision_recall(y, y_pred, **query_args)

Compute the precision and recall on predictions

Parameters:
  • y (CDataFrame) – column with the actual values (binary)

  • y_pred (CDataFrame) – column with the predictions (binary)

query_args :

See queryargs

Returns:

two fixed numbers between 0 and 1

Return type:

CDataFrame

crandas.crlearn.metrics.score_r2(y, y_pred, **query_args)

Compute the R^2 metric on predictions

Parameters:
  • y (CDataFrame) – column with the actual values

  • y_pred (CDataFrame) – column with the predictions

  • query_args – See queryargs

Returns:

fixed point number between < 1

Return type:

CDataFrame

crandas.crlearn.metrics.tjur_r2(y, y_pred, **query_args)

Compute the Tjur R^2 metric on predictions

Parameters:
  • y (CDataFrame) – column with the actual values (binary)

  • y_pred (CDataFrame) – column with the predictions (probabilities!)

  • query_args – See queryargs

Returns:

fixed point number between -1 and 1

Return type:

CDataFrame