edamame.regressor#

diagnose#

class edamame.regressor.diagnose.RegressorDiagnose(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame)[source]#

Bases: object

A class for diagnosing regression models.

X_train#

The input training data.

Type:: pd.DataFrame

y_train#

The target training data.

Type:: pd.Series

X_test#

The input test data.

Type:: pd.DataFrame

y_test#

The target test data.

Type:: pd.Series

Example

>>> from edamame.regressor import TrainRegressor, RegressorDiagnose
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> linear = regressor.linear()
>>> diagnose = RegressorDiagnose(X_train, np.log(y_train), X_test, np.log(y_test))
>>> diagnose.coefficients()
>>> diagnose.prediction_error(model=linear)
>>> diagnose.residual_plot(model=linear)
>>> diagnose.qqplot(model=linear)

coefficients(model: LinearRegression | Lasso | Ridge) → None[source]#

Display coefficients for Linear, Lasso, and Ridge model.

Parameters:: model (Union[LinearRegression, Lasso, Ridge]) – The input model for which coefficients need to be displayed.
Returns:: None

Define a scatterpolot with ygt and ypred of the model passed.

Parameters:

model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
train_data (bool) – Defines if you want to plot the scatterplot on train or test data (train by default).
figsize (Tuple[float, float]) – Define the size of the prediction_erros plot.

Returns:

None

QQplot for train and test data.

Parameters:: model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
Returns:: None

random_forest_fi(model: RandomForestRegressor, figsize: Tuple[float, float] = (12, 10)) → None[source]#

The function displays the feature importance plot.

Parameters:: model (RandomForestRegressor) – The input random forest model.
Returns:: None

Residual plot for train and test data.

Parameters:: model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
Returns:: None

xgboost_fi(model: XGBRegressor, figsize: tuple[float, float] = (14, 12)) → None[source]#

The function displays the feature importance plot.

Parameters:: model (xgb.XGBRegressor) – The input xgboost model.
Returns:: None

edamame.regressor.diagnose.check_random_forest(model: RandomForestRegressor) → None[source]#

The function checks if the model passed is a random forest regression.

Parameters:: model (RandomForestRegressor) – The input model to be checked.
Raises:: TypeError – If the input model is not a random forest regression model.
Returns:: None

edamame.regressor.diagnose.check_xgboost(model: XGBRegressor) → None[source]#

The function checks if the model passed is a xgboost regression.

Parameters:: model (xgb.XGBRegressor) – The input model to be checked.
Raises:: TypeError – If the input model is not an XGBoost regression model.
Returns:: None

regression#

class edamame.regressor.regression.TrainRegressor(X_train: DataFrame, y_train: DataFrame, X_test, y_test)[source]#

Bases: object

This class represents a pipeline for training and handling regression models.

X_train#

The input training data.

Type:: pd.DataFrame

y_train#

The target training data.

Type:: pd.Series

X_test#

The input test data.

Type:: pd.DataFrame

y_test#

The target test data.

Type:: pd.Series

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> linear = regressor.linear()
>>> regressor.model_metrics(model_name="linear")
>>> regressor.save_model(model_name="linear")
>>> lasso = regressor.lasso()
>>> ridge = regressor.ridge()
>>> tree = regressor.tree()
>>> rf = regressor.random_forest()
>>> xgb = regressor.xgboost()
>>> regressor.model_metrics()
>>> # using AutoML
>>> models = regressor.auto_ml()
>>> regressor.model_metrics()
>>> regressor.save_model()

auto_ml(n_folds: int = 5, data: Literal['train', 'test'] = 'train') → List[source]#

Perform automated machine learning with cross validation on a list of regression models.

Parameters:

n_folds (int) – Number of cross-validation folds. Defaults to 5.
data (Literal['train', 'test']) – Target dataset for cross-validation. Must be either ‘train’ or ‘test’. Defaults to ‘train’.

Returns:

List of best-fit regression models for each algorithm.

Return type:

List

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> model_list = regressor.auto_ml()

lasso(alpha: Tuple[float, float, int] = (0.0001, 10.0, 50), n_folds: int = 5, **kwargs) → Lasso[source]#

Train a Lasso regression model using the training data and return the fitted model.

Parameters:

alpha (Tuple[float, float, int]) – The range of alpha values to test for hyperparameter tuning. Default is (0.0001, 10., 50).
n_folds (int) – The number of cross-validation folds to use for hyperparameter tuning. Default is 5.
**kwargs – Arbitrary keyword arguments to be passed to the lasso constructor.

Returns:

The trained Lasso regression model.

Return type:

Lasso

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> lasso = regressor.lasso(alpha=(0.0001, 10., 50), n_folds=5)

linear(**kwargs) → LinearRegression[source]#

Train a linear regression model using the training data and return the fitted model.

Parameters:: **kwargs – Arbitrary keyword arguments to be passed to the linear constructor.
Returns:: The trained linear regression model.
Return type:: LinearRegression

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> linear = regressor.linear()

model_metrics(model_name: Literal['all', 'linear', 'lasso', 'ridge', 'tree', 'random_forest', 'xgboost'] = 'all') → None[source]#

Displays the metrics of a trained regression model. The metrics displayed are R2, MSE, and MAE for both the training and test sets.

Parameters:: model_name (Literal["all", "linear", "lasso", "ridge", "tree", "random_forest", "xgboost"]) – The name of the model to display metrics for. Can be one of ‘all’, ‘linear’, ‘lasso’, ‘ridge’, ‘tree’, ‘random_forest’, or ‘xgboost’. Defaults to ‘all’.
Returns:: None

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> xgboost = regressor.xgboost(n_estimators=(10, 200, 10), n_folds=5)
>>> regressor.model_metrics(model_name="xgboost")

random_forest(n_estimators: Tuple[int, int, int] = (50, 1000, 5), n_folds: int = 2, **kwargs) → RandomForestRegressor[source]#

Trains a Random Forest regression model on the training data and returns the best estimator found by GridSearchCV.

Parameters:

n_estimators (Tuple[int, int, int]) – A tuple of integers specifying the minimum and maximum number of trees to include in the forest, and the step size between them.
n_folds (int) – The number of cross-validation folds to use when evaluating models.
**kwargs – Arbitrary keyword arguments to be passed to the random forest constructor.

Returns:

The best Random Forest model found by GridSearchCV.

Return type:

RandomForestRegressor

ridge(alpha: Tuple[float, float, int] = (0.1, 50.0, 50), n_folds: int = 5, **kwargs) → Ridge[source]#

Train a Ridge regression model using the training data and return the fitted model.

Parameters:

alpha (Tuple[float, float, int]) – The range of alpha values to test for hyperparameter tuning. Default is (0.1, 50, 50).
n_folds (int) – The number of cross-validation folds to use for hyperparameter tuning. Default is 5.
**kwargs – Arbitrary keyword arguments to be passed to the ridge constructor.

Returns:

The trained Ridge regression model.

Return type:

Ridge

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> ridge = regressor.ridge(alpha=((0.1, 50., 50), n_folds=5)

save_model(model_name: Literal['all', 'linear', 'lasso', 'ridge', 'tree', 'random_forest', 'xgboost'] = 'all') → None[source]#

Saves the specified machine learning model or all models in the instance to a pickle file.

Parameters:: model_name (Literal["all", "linear", "lasso", "ridge", "tree", "random_forest", "xgboost"]) – The name of the model to save. Defaults to ‘all’.
Returns:: None

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> model_list = regressor.auto_ml()
>>> regressor.save_model(model_name="all")

tree(alpha: Tuple[float, float, int] = (0.0, 0.001, 5), impurity: Tuple[float, float, int] = (0.0, 1e-05, 5), n_folds: int = 5, **kwargs) → DecisionTreeRegressor[source]#

Fits a decision tree regression model using the provided training data and hyperparameters.

Parameters:

alpha (Tuple[float, float, int]) – A tuple specifying the range of values to use for the ccp_alpha hyperparameter. The range is given as a tuple (start, stop, num), where start is the start of the range, stop is the end of the range, and num is the number of values to generate within the range. Defaults to (0., 0.001, 5).
impurity (Tuple[float, float, int]) – A tuple specifying the range of values to use for the min_impurity_decrease hyperparameter. The range is given as a tuple (start, stop, num), where start is the start of the range, stop is the end of the range, and num is the number of values to generate within the range. Defaults to (0., 0.00001, 5).
n_folds (int) – The number of folds to use for cross-validation. Defaults to 5.
**kwargs – Arbitrary keyword arguments to be passed to the tree constructor.

Returns:

The fitted decision tree regressor model.

Return type:

DecisionTreeRegressor

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> tree = regressor.tree(alpha=(0., 0.001, 5), impurity=(0., 0.00001, 5), n_folds=3)

xgboost(n_estimators: Tuple[int, int, int] = (10, 100, 5), n_folds: int = 2, **kwargs) → XGBRegressor[source]#

Trains an XGBoost model using the specified hyperparameters.

Parameters:

n_estimators (Tuple[int, int, int]) – A tuple containing the start, end and step values for number of estimators. Default is (10, 100, 5).
n_folds (int) – The number of folds to use in the cross-validation process. Default is 2.
**kwargs – Arbitrary keyword arguments to be passed to the xgboost constructor.

Returns:

The trained XGBoost model.

Return type:

xgb.XGBRegressor

Example

>>> from edamame.regressor import TrainRegressor
>>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test))
>>> xgboost = regressor.xgboost(n_estimators=(10, 200, 10), n_folds=5)

Compute and display the regression metrics R2, MSE and MAE of the input model.

Parameters:

model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – Regression model.
X (pd.DataFrame) – Input features.
y (pd.DataFrame) – Target feature.

Returns:

None