crandas.crlearn¶
- class crandas.crlearn.linear_model.LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)¶
- Bases: - Ridge- Linear Regression Classifier Object using ordinary Least Squares with the same parameters as the Scikit learn LinearRegression Class - see: https://github.com/scikit-learn/scikit-learn/blob/7f9bad99d/sklearn/linear_model/_base.py#L534 for its parameters. - Currently, this class inherits from Ridge since we implemented in terms of Ridge Regression. We use the fact that alpha = 0 in Ridge translates to Ordinary Least Squares 
- class crandas.crlearn.linear_model.LinearRegressionStateObject(reg_type, **kwargs)¶
- Bases: - StateObject
- class crandas.crlearn.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True, max_iter=None, tol=None, solver='auto', positive=None, random_state=None)¶
- Bases: - object- Linear Ridge Regression Classifier Object with the same parameters as the Scikit learn Ridge Regression Class - see: https://github.com/scikit-learn/scikit-learn/blob/364c77e047ca08a95862becf40a04fe9d4cd2c98/sklearn/linear_model/_ridge.py for its parameters. - fit(X, y, sample_weight=None, **query_args)¶
- Fit a Linear Regression model on the data - Parameters:
- X (CDataFrame) – Training data 
- y (CDataFrame) – Target data (should have only 1 column) 
- sample_weight – array of weights assigned to individual sampled (Not yet supported) 
 
- Return type:
- self 
 
 - get_beta(**kwargs)¶
- Get the fitted parameters (i.e. intercept_ and coef_ combined in 1 table named beta). 
 - predict(X, **query_args)¶
- Make predictions on a dataset using a linear regression model - Note: this returns predictions on the target, not probabilities! - Parameters:
- X (CDataFrame) – predictor variables 
- Returns:
- table containing the column consisting of the predicted target values 
- Return type:
 
 - score(X, y, **query_args)¶
- Scores the linear regression model using the R2 metric - Parameters:
- X (CDataFrame) – Test data 
- y (CDataFrame) – Target test data (should have only 1 column) 
 
- Return type:
- self 
 
 
- class crandas.crlearn.logistic_regression.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=10, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, classes=[], n_classes=2)¶
- Bases: - object- Logistic Regression Classifier Object with the same parameters as the Scikit learn Logistic Regression Class - See: https://github.com/scikit-learn/scikit-learn/blob/98cf537f5/sklearn/linear_model/_logistic.py#L783 for its parameters. - fit(X, y, sample_weight=None, max_iter=None, warm_start=None, **query_args)¶
- Fit a Logistic Regression model on the data - NOTE: Compared to Scikit learn we add the parameter max_iter and warm_start. Scikit learn treats max_iter and warm_start as object configurations which are set at construction and cannot be changed. We prefer to give the user the freedom of deviating form the global setting in successive fitting calls. - We rather use the corresponding class attributes as default values for each call to fit. - Parameters:
- X (CDataFrame) – predictor variables 
- y (CDataFrame) – response variable (should have only 1 column) 
- sample_weight – array of weights assigned to individual sampled (Not yet supported) 
- max_iter (int) – deviation from Scikit (see note above) 
- warm_start (bool) – deviation from Scikit (see note above) if True: determines whether successive fits continue approximation from where it stopped else: indicates that each successive fit will start from scratch. 
 
- Returns:
- self 
- Return type:
 
 - get_beta(**kwargs)¶
- Get the fitted parameters (i.e. intercept_ and coef_ combined in 1 table named beta). 
 - predict(X, decision_boundary=0.5, **query_args)¶
- Make (binary) predictions on a dataset using a logistic regression model - Note: this returns binary predictions, not probabilities! - Parameters:
- X (CDataFrame) – predictor variables 
- decision_boundary (float) – number between 0 and 1; records with a probability below this value are classified as 0, greater than or equal to as 1 
 
- Returns:
- column consisting of the predicted probabilities 
- Return type:
 
 - predict_proba(X, **query_args)¶
- Make (probability) predictions on a dataset using a logistic regression model - Note: this returns probabilities, not binary predictions - Parameters:
- CDataFrame (X ;) – predictor variables 
- Returns:
- column consisting of the predicted probabilities 
- Return type:
 
 
- class crandas.crlearn.logistic_regression.LogisticRegressionStateObject(reg_type, **kwargs)¶
- Bases: - StateObject
- crandas.crlearn.metrics.classification_accuracy(y, y_pred, n_classes=2, **query_args)¶
- Compute the classification accuracy on class predictions - Parameters:
- y (CDataFrame) – column with the actual values in range 
- y_pred (CDataFrame) – column with the predictions in range 
- n_classes (int) – number of classes (default = 2) 
 
- Returns:
- fixed point number between 0 and 1 
- Return type:
 
- crandas.crlearn.metrics.confusion_matrix(y, y_pred, n_classes=2, **query_args)¶
- Compute the confusion matrix on class predictions - The y-axis of the result represents the true class. The x-axis the predicted class. - Parameters:
- y (CDataFrame) – column with the actual values in range 
- y_pred (CDataFrame) – column with the predictions in range 
- n_classes (int) – number of classes (default = 2) 
 
- Returns:
- matrix of size n_classes * n_classes 
- Return type:
 
- crandas.crlearn.metrics.mcfadden_r2(model, X, y, **query_args)¶
- Compute the McFadden R^2 metric - Parameters:
- model (LogisticModel) – logistic regression model 
- X (CDataFrame) – predictor variables 
- y (CDataFrame) – binary response variable (should have only 1 column) 
 
- Returns:
- fixed point number between 0 and 1 
- Return type:
 
- crandas.crlearn.metrics.model_deviance(model, X, y, **query_args)¶
- Compute the model deviance - Parameters:
- model (LogisticModel) – logistic regression model 
- X (CDataFrame) – predictor variables 
- y (CDataFrame) – binary response variable (should have only 1 column) 
 
- Returns:
- fixed point number between 0 and 1 
- Return type:
 
- crandas.crlearn.metrics.null_deviance(y, **query_args)¶
- Compute the null deviance - Parameters:
- y (CDataFrame) – binary response variable (should have only 1 column) 
- NOTE (both classes NEED to be present in 'y', otherwise the computations are undefined internally (logarithm of 0)) 
 
- Returns:
- fixed point number between 0 and 1 
- Return type:
 
- crandas.crlearn.metrics.precision_recall(y, y_pred, **query_args)¶
- Compute the precision and recall on predictions - Parameters:
- y (CDataFrame) – column with the actual values (binary) 
- y_pred (CDataFrame) – column with the predictions (binary) 
 
- Returns:
- two fixed numbers between 0 and 1 
- Return type:
 
- crandas.crlearn.metrics.score_r2(y, y_pred, **query_args)¶
- Compute the R^2 metric on predictions - Parameters:
- y (CDataFrame) – column with the actual values 
- y_pred (CDataFrame) – column with the predictions 
 
- Returns:
- fixed point number between < 1 
- Return type:
 
- crandas.crlearn.metrics.tjur_r2(y, y_pred, **query_args)¶
- Compute the Tjur R^2 metric on predictions - Parameters:
- y (CDataFrame) – column with the actual values (binary) 
- y_pred (CDataFrame) – column with the predictions (probabilities!) 
 
- Returns:
- fixed point number between -1 and 1 
- Return type:
 
- crandas.crlearn.utils.min_max_normalize(table, columns=None, **query_args)¶
- Apply min-max normalization on columns of a table, to get values in [0, 1] - Parameters:
- table (CDataFrame) – table to normalize 
- columns (list of strings, optional) – columns that should be normalized. If None, all columns will be normalized. The columns that are not specified in this list will remain untouched, by default None 
 
- Returns:
- new table with normalized columns 
- Return type:
 
- class crandas.crlearn.neighbors.KNeighborsRegressor(n_neighbors=5, *, weights='uniform', algorithm='auto', p=2, metric='minkowski', metric_weights=None)¶
- Bases: - object- Regression based on k-nearest neighbors with similar use as the Scikit learn K-Nearest Regressor Class. - The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. - https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm - Parameters:
- n_neighbors (int, default=5) – Number of neighbors to use. 
- p (int, default=2) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Currently, integer values between 1 and 5 are supported. 
- metric_weights (CDataFrame, default=None) – - Weights given to the different columns for the metric. The differences between columns are multiplied by the corresponding factors given in - metric_weights. This is equivalent to multiplying all columns by the corresponding weights.- Nonemeans no extra factors, equivalent to all weights being 1.
 
 - Notes - Warning - Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data. - fit(X, y)¶
- Fit the k-nearest neighbors classifier from the training dataset. - Parameters:
- X (CDataFrame) – Predictor variables. 
- y (CDataFrame) – Response variable (should have only 1 column). 
 
- Returns:
- self 
- Return type:
 
 - predict_value(X, **query_args)¶
- Predict the target value for the provided data. - Parameters:
- X (CDataFrame) – Predictor variables. Required to contain a single row. 
- Returns:
- y – Predicted value. 
- Return type: