edamame.classifier#
classification#
- class edamame.classifier.classification.TrainClassifier(X_train, y_train, X_test, y_test)[source]#
Bases:
object
This class represents a pipeline for training and handling classification models.
- X_train#
The input training data.
- Type:
pd.DataFrame
- y_train#
The target training data.
- Type:
pd.Series
- X_test#
The input test data.
- Type:
pd.DataFrame
- y_test#
The target test data.
- Type:
pd.Series
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> logistic = classifier.logistic() >>> classifier.model_metrics(model_name="logisitc") >>> classifier.model_save(model_name="logisitc") >>> nb = classifier.gaussian_nb() >>> knn = classifier.knn() >>> tree = classifier.tree() >>> rf = classifier.random_forest() >>> xgb = classifier.xgboost() >>> svm = classifier.svm() >>> classifier.model_metrics() >>> # using AutoML >>> models = classifier.auto_ml() >>> classifier.save_model()
- auto_ml(n_folds: int = 5, data: Literal['train', 'test'] = 'train') List [source]#
Perform automated machine learning with cross validation on a list of classification models.
- Parameters:
n_folds (int) – Number of cross-validation folds. Defaults to 5.
data (Literal['train', 'test']) – Target dataset for cross-validation. Must be either ‘train’ or ‘test’. Defaults to ‘train’.
- Returns:
List of best-fit classification models for each algorithm.
- Return type:
List
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> model_list = classifier.auto_ml()
- gaussian_nb(**kwargs) GaussianNB [source]#
Trains a Gaussian Naive Bayes classifier using the training data and returns the fitted model.
- Parameters:
**kwargs – Arbitrary keyword arguments to be passed to the Gaussian NB constructor.
- Returns:
The trained Gaussian Naive Bayes classifier.
- Return type:
GaussianNB
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> nb = classifier.gaussian_nb()
- knn(n_neighbors: Tuple[int, int, int] = (1, 50, 50), n_folds: int = 5, **kwargs) KNeighborsClassifier [source]#
Train a k-Nearest Neighbors classification model using the training data, and perform a grid search to find the best value of ‘n_neighbors’ hyperparameter.
- Parameters:
n_neighbors (Tuple[int, int, int]) – A tuple with three integers. The first and second integers are the range of the ‘n_neighbors’ hyperparameter that will be searched by the grid search, and the third integer is the number of values to generate in the interval [n_neighbors[0], n_neighbors[1]]. Default is [1, 50, 50].
n_folds (int) – The number of cross-validation folds to use for the grid search. Default is 5.
**kwargs – Arbitrary keyword arguments to be passed to the KNN constructor.
- Returns:
The trained k-Nearest Neighbors classification model with the best ‘n_neighbors’ hyperparameter found by the grid search.
- Return type:
KNeighborsClassifier
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> knn = classifier.knn(n_neighbors=(1,50,50), n_folds=3)
- logistic(**kwargs) LogisticRegression [source]#
Trains a logistic regression model using the training data and returns the fitted model.
- Parameters:
**kwargs – Arbitrary keyword arguments to be passed to the logistic constructor.
- Returns:
The trained logistic regression model.
- Return type:
LogisticRegression
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> logistic = classifier.logistic()
- model_metrics(model_name: Literal['all', 'logistic', 'gaussian_nb', 'knn', 'tree', 'random_forest', 'xgboost', 'svm'] = 'all', cm: bool = False) None [source]#
Display classification metrics (confusion matrix and classification report) for specified or all trained models.
- Parameters:
model_name (Literal["all", "logistic", "guassian_nb", "knn", "tree", "random_forest", "xgboost", "svm"]) – The name of the model to display the metrics for. Defaults to ‘all’.
cm (bool) – Whether to display the confusion matrix. Defaults to False.
- Returns:
None
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> xgboost = classifier.xgboost(n_estimators=(10, 100, 5), n_folds=2) >>> classifier.model_metrics(model_name="xgboost")
- random_forest(n_estimators: Tuple[int, int, int] = (50, 1000, 5), n_folds: int = 2, **kwargs) RandomForestClassifier [source]#
Train a Random Forest classifier using the training data and return the fitted model.
- Parameters:
n_estimators (Tuple[int, int, int]) – The range of the number of trees in the forest. Default is (50, 1000, 5).
n_folds (int) – The number of folds in cross-validation. Default is 2.
**kwargs – Arbitrary keyword arguments to be passed to the random forest constructor.
- Returns:
The trained Random Forest classifier.
- Return type:
RandomForestClassifier
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> rf = classifier.random_forest(n_estimators=(50, 1000, 5), n_folds=2)
- save_model(model_name: Literal['all', 'logistic', 'gaussian_nb', 'knn', 'tree', 'random_forest', 'xgboost', 'svm'] = 'all') None [source]#
Saves the specified machine learning model or all models in the instance to a pickle file.
- Parameters:
model_name (Literal["all", "linear", "lasso", "ridge", "tree", "random_forest", "xgboost", "svm"]) – The name of the model to save. Defaults to ‘all’.
- Returns:
None
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> model_list = classifier.auto_ml() >>> classifier.save_model(model_name="all")
- svm(n_folds: int = 2, **kwargs) SVC [source]#
Trains an SVM classifier using the training data and returns the fitted model.
- Parameters:
n_folds (int) – The number of folds in cross-validation. Default is 2.
**kwargs – Arbitrary keyword arguments to be passed to the SVC constructor.
- Returns:
The trained SVM classifier.
- Return type:
SVC
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> svm = classifier.svm(kernel="linear", C=1.0, gamma="auto")
- tree(alpha: Tuple[float, float, int] = (0.0, 0.001, 5), impurity: Tuple[float, float, int] = (0.0, 1e-05, 5), n_folds: int = 5, **kwargs) DecisionTreeClassifier [source]#
Trains a decision tree classifier using the training data and returns the fitted model.
- Parameters:
alpha (Tuple[float, float, int]) – A tuple containing the minimum and maximum values of ccp_alpha and the number of values to try (default: (0., 0.001, 5)).
impurity (Tuple[float, float, int]) – A tuple containing the minimum and maximum values of min_impurity_decrease and the number of values to try (default: (0., 0.00001, 5)).
n_folds (int) – The number of cross-validation folds to use for grid search (default: 5).
**kwargs – Arbitrary keyword arguments to be passed to the tree constructor.
- Returns:
The trained decision tree classifier model.
- Return type:
DecisionTreeClassifier
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> tree = classifier.tree(alpha=(0., 0.001, 5), impurity=(0., 0.00001, 5), n_folds=3)
- xgboost(n_estimators: Tuple[int, int, int] = (10, 100, 5), n_folds: int = 2, **kwargs) XGBClassifier [source]#
Train an XGBoost classifier using the training data and return the fitted model.
- Parameters:
n_estimators (Tuple[int, int, int]) – The range of the number of boosting rounds. Default is (10, 100, 5).
n_folds (int) – The number of folds in cross-validation. Default is 2.
**kwargs – Arbitrary keyword arguments to be passed to the xgboost constructor.
- Returns:
The trained XGBoost classifier.
- Return type:
XGBClassifier
Example
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> xgboost = classifier.xgboost(n_estimators=(10, 100, 5), n_folds=2)
- edamame.classifier.classification.classifier_metrics(model: LogisticRegression | GaussianNB | KNeighborsClassifier | DecisionTreeClassifier | RandomForestClassifier | XGBClassifier | SVC, X: DataFrame, y: DataFrame, cm: bool = False) None [source]#
Display classification metrics (confusion matrix and classification report) for the model passed as input to the function.
- Parameters:
model (Union[LogisticRegression, GaussianNB, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, XGBClassifier, SVC]) – Classification model.
X (pd.DataFrame) – Input features.
y (pd.DataFrame) – Target feature.
cm (bool) – Whether to display the confusion matrix. Defaults to False.
- Returns:
None
diagnose#
- class edamame.classifier.diagnose.ClassifierDiagnose(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame)[source]#
Bases:
object
A class for diagnosing classification models.
- X_train#
The input training data.
- Type:
pd.DataFrame
- y_train#
The target training data.
- Type:
pd.Series
- X_test#
The input test data.
- Type:
pd.DataFrame
- y_test#
The target test data.
- Type:
pd.Series
Examples
>>> from edamame.classifier import TrainClassifier >>> classifier = TrainClassifier(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> nb = classifier.gaussian_nb() >>> classifiers_diagnose = ClassifierDiagnose(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test) >>> classifiers_diagnose.class_prediction_error(model=nb)
- class_prediction_error(model: LogisticRegression | GaussianNB | KNeighborsClassifier | DecisionTreeClassifier | RandomForestClassifier | XGBClassifier | SVC, figsize: Tuple[float, float] = (8.0, 6.0), train_data: bool = True) None [source]#
This plot method shows the support (number of training samples) for each class in the fitted classification model as a stacked bar chart. Each bar is segmented to show the proportion of predictions (including false negatives and false positives, like a Confusion Matrix) for each class
- Parameters:
model (Union[LogisticRegression, GaussianNB, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, XGBClassifier, SVC]) – Classification model.
figsize (Tuple[float, float]) – Figure size for the plot. Defaults to (8, 6).
train_data (bool) – Defines if you want to plot the stacked barplot on train or test data (train by default).
- Returns:
None
- plot_roc_auc(model: LogisticRegression | GaussianNB | KNeighborsClassifier | DecisionTreeClassifier | RandomForestClassifier | XGBClassifier | SVC, figsize: Tuple[float, float] = (8.0, 6.0), train_data: bool = True) None [source]#
Method for plotting the ROC curve and calculating the AUC values for a given model.
- Parameters:
model (Union[LogisticRegression, GaussianNB, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, XGBClassifier, SVC]) – Classification model.
figsize (Tuple[float, float]) – Figure size for the plot. Defaults to (8, 6).
train_data (bool) – Defines if you want to plot the stacked barplot on train or test data (train by default).
- Returns:
None
- random_forest_fi(model: RandomForestClassifier, figsize: Tuple[float, float] = (8.0, 6.0)) None [source]#
Displays the feature importance plot of the random forest model.
- Parameters:
model (RandomForestClassifier) – The input random forest model.
figsize (Tuple[float, float]) – Figure size for the plot. Defaults to (8, 6).
- Returns:
None
- xgboost_fi(model: XGBClassifier, figsize: Tuple[float, float] = (8.0, 6.0)) None [source]#
Displays the feature importance plot of the xgboost model.
- Parameters:
model (XGBClassifier) – The input xgboost model.
figsize (Tuple[float, float]) – Figure size for the plot. Defaults to (8, 6).
- Returns:
None
- edamame.classifier.diagnose.check_random_forest(model: RandomForestClassifier) None [source]#
The function checks if the model passed is a random forest regression.
- Parameters:
model (RandomForestClassifier) – The input model to be checked.
- Raises:
TypeError – If the input model is not a random forest regression model.
- Returns:
None
- edamame.classifier.diagnose.check_xgboost(model: XGBClassifier) None [source]#
The function checks if the model passed is a xgboost regression.
- Parameters:
model (xgb.XGBRegressor) – The input model to be checked.
- Raises:
TypeError – If the input model is not an XGBoost regression model.
- Returns:
None