crandas.crlearn¶
- class crandas.crlearn.linear_model.CLinearRegression(instance=None)¶
Bases:
CModel
Linear ridge regression classifier corresponding to the scikit-learn
Ridge
class (see here).Parameters:
alpha
: regularization strength (see scikit-learn documentation); defaults to1.0
Attributes:
n_features_in_
: number of input featuresfeature_names_in_
: input feature namesbeta_
: (encrypted) fitted parameters (intercept and respective feature coefficients)
- fit(X, y, **query_args)¶
Fit a Linear Regression model on the data
- Parameters:
X (CDataFrame) – Training data
y (CDataFrame) – Target data (should have only 1 column)
query_args – See queryargs
- Return type:
self
- get_beta(**query_args)¶
Get the fitted parameters (i.e. intercept and feature coeficients) as a table
This function is deprecated; instead, use
CModel.open()
to open the model, and use the returnedbeta_
attribute.
- predict(X, **query_args)¶
Make predictions on a dataset using a linear regression model
Note: this returns predictions on the target, not probabilities!
- Parameters:
X (CDataFrame) – predictor variables
query_args – See queryargs
- Returns:
table containing the column consisting of the predicted target values
- Return type:
- score(X, y, **query_args)¶
Scores the linear regression model using the R2 metric
- Parameters:
X (CDataFrame) – Test data
y (CDataFrame) – Target test data (should have only 1 column)
query_args – See queryargs
- Return type:
self
- crandas.crlearn.linear_model.LinearRegression(alpha=0.0, *, fit_intercept=True, copy_X=True, n_jobs=None, positive=False, **query_args)¶
Create a new linear regression model (
CLinearRegression
) with given alpha (0.0 by default)Other parameters are for compatibility with scikit-learn and cannot be overriden.
- crandas.crlearn.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True, max_iter=None, tol=None, solver='cholesky', positive=False, random_state=None, **params_and_query_args)¶
Create a new ridge regression model (
CLinearRegression
) with given alpha (1.0 by default)Other parameters are for compatibility with scikit-learn and cannot be overriden.
- class crandas.crlearn.logistic_regression.CLogisticRegression(instance=None)¶
Bases:
CModel
Logistic Regression Classifier Object with the same parameters as the Scikit-learn Logistic Regression Class
See here for its parameters.
Parameters:
type
: type (binomial
/multinomial
/ordinal
)optimizer
: optimizer used to fit the model (seecrandas.crlearn.optimizer.OptimizerParams
)max_iter
: number of iterations to performwarm_start
: whether to continue fitting from the previous optimizer state
Attributes:
feature_names_in_
: input feature namesn_classes_
: number of output classesfeature_name_out_
: output feature nameoptimizer_
: attributes of the optimizer used to fit the model (seecrandas.crlearn.optimizer.OptimizerAttributes
)beta_
: (encrypted) fitted parameters (intercept(s) and coefficients)
- fit(X, y, *, n_classes=None, sample_weight=None, **query_args)¶
Fit a Linear Regression model on the data
- Parameters:
X (CDataFrame) – Training data
y (CDataFrame) – Target data (should have only 1 column)
n_classes (int or None) – Number of output classes (categories). For binomial models, if not given, n_classes is assumed to be equal to two. For other models, if not given, the number of classes is derived from the metadata of
y
.sample_weight (None) – Not supported
query_args – See queryargs
- Returns:
self
- Return type:
- from_beta(*, type='binomial', n_classes=2, feature_names_in, feature_name_out='out')¶
Upload pre-fittted logistic regression model
- Parameters:
beta (list[float]) – Fitted parameters
type (str, default "binomial") – Type of model (“binomial”/”multinomial”/”ordinal”)
n_classes (int, default 2) – Number of classes
feature_names_in (list[str]) – Input feature names
feature_name_out (str, default "out") – Output feature name
- Returns:
Logistic regression model with given parameters
- Return type:
- predict(X, decision_boundary=0.5, **query_args)¶
Make (binary) predictions on a dataset using a logistic regression model
Note: this returns binary predictions, not probabilities!
- Parameters:
X (CDataFrame) – predictor variables
decision_boundary (float) – number between 0 and 1; records with a probability below this value are classified as 0, greater than or equal to as 1
query_args – See queryargs
- Returns:
table containing the column consisting of the predicted target values
- Return type:
- predict_proba(X, **query_args)¶
Make (probability) predictions on a dataset using a logistic regression model
Note: this returns probabilities, not binary predictions
- Parameters:
X (CDataFrame) – predictor variables
query_args – See queryargs
- Returns:
table with columns representing predicted class probabilities per input record
- Return type:
- crandas.crlearn.logistic_regression.LogisticRegression(penalty='l2', *, optimizer='lbfgs', type='binomial', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=None, max_iter=10, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, **query_args)¶
Create a new logistic regression model (
CLogisticRegression
).See
CLogisticRegression
) for the meaning of the parameters. Parameters not listed in that class have the same meaning as in scikit-learn but cannot be changed from their defaults.
- crandas.crlearn.metrics.classification_accuracy(y, y_pred, n_classes=2, **query_args)¶
Compute the classification accuracy on class predictions
- Parameters:
y (CDataFrame) – column with the actual values in range
y_pred (CDataFrame) – column with the predictions in range
n_classes (int) – number of classes (default = 2)
query_args – See queryargs
- Returns:
fixed point number between 0 and 1
- Return type:
- crandas.crlearn.metrics.confusion_matrix(y, y_pred, n_classes=2, **query_args)¶
Compute the confusion matrix on class predictions
The y-axis of the result represents the true class. The x-axis the predicted class.
- Parameters:
y (CDataFrame) – column with the actual values in range
y_pred (CDataFrame) – column with the predictions in range
n_classes (int) – number of classes (default = 2)
query_args – See queryargs
- Returns:
matrix of size n_classes * n_classes
- Return type:
- crandas.crlearn.metrics.mcfadden_r2(model, X, y, **query_args)¶
Compute the McFadden R^2 metric
- Parameters:
model (CLogisticRegression) – logistic regression model
X (CDataFrame) – predictor variables
y (CDataFrame) – binary response variable (should have only 1 column)
query_args – See queryargs
- Returns:
fixed point number between 0 and 1
- Return type:
- crandas.crlearn.metrics.model_deviance(model, X, y, **query_args)¶
Compute the model deviance
- Parameters:
model (CLogisticRegression) – logistic regression model
X (CDataFrame) – predictor variables
y (CDataFrame) – binary response variable (should have only 1 column)
query_args – See queryargs
- Returns:
fixed point number between 0 and 1
- Return type:
- crandas.crlearn.metrics.null_deviance(y, **query_args)¶
Compute the null deviance
- Parameters:
y (CDataFrame) – binary response variable (should have only 1 column)
query_args – See queryargs
NOTE (both classes NEED to be present in 'y', otherwise the computations are undefined internally (logarithm of 0))
- Returns:
fixed point number between 0 and 1
- Return type:
- crandas.crlearn.metrics.precision_recall(y, y_pred, **query_args)¶
Compute the precision and recall on predictions
- Parameters:
y (CDataFrame) – column with the actual values (binary)
y_pred (CDataFrame) – column with the predictions (binary)
- query_args :
See queryargs
- Returns:
two fixed numbers between 0 and 1
- Return type:
- crandas.crlearn.metrics.score_r2(y, y_pred, **query_args)¶
Compute the R^2 metric on predictions
- Parameters:
y (CDataFrame) – column with the actual values
y_pred (CDataFrame) – column with the predictions
query_args – See queryargs
- Returns:
fixed point number between < 1
- Return type:
- crandas.crlearn.metrics.tjur_r2(y, y_pred, **query_args)¶
Compute the Tjur R^2 metric on predictions
- Parameters:
y (CDataFrame) – column with the actual values (binary)
y_pred (CDataFrame) – column with the predictions (probabilities!)
query_args – See queryargs
- Returns:
fixed point number between -1 and 1
- Return type: