edamame.regressor#
diagnose#
- class edamame.regressor.diagnose.RegressorDiagnose(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame)[source]#
Bases:
object
A class for diagnosing regression models.
- X_train#
The input training data.
- Type:
pd.DataFrame
- y_train#
The target training data.
- Type:
pd.Series
- X_test#
The input test data.
- Type:
pd.DataFrame
- y_test#
The target test data.
- Type:
pd.Series
Example
>>> from edamame.regressor import TrainRegressor, RegressorDiagnose >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> linear = regressor.linear() >>> diagnose = RegressorDiagnose(X_train, np.log(y_train), X_test, np.log(y_test)) >>> diagnose.coefficients() >>> diagnose.prediction_error(model=linear) >>> diagnose.residual_plot(model=linear) >>> diagnose.qqplot(model=linear)
- coefficients(model: LinearRegression | Lasso | Ridge) None [source]#
Display coefficients for Linear, Lasso, and Ridge model.
- Parameters:
model (Union[LinearRegression, Lasso, Ridge]) – The input model for which coefficients need to be displayed.
- Returns:
None
- prediction_error(model: LinearRegression | Lasso | Ridge | DecisionTreeRegressor | RandomForestRegressor | XGBRegressor, train_data: bool = True, figsize: Tuple[float, float] = (8.0, 6.0)) None [source]#
Define a scatterpolot with ygt and ypred of the model passed.
- Parameters:
model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
train_data (bool) – Defines if you want to plot the scatterplot on train or test data (train by default).
figsize (Tuple[float, float]) – Define the size of the prediction_erros plot.
- Returns:
None
- qqplot(model: LinearRegression | Lasso | Ridge | DecisionTreeRegressor | RandomForestRegressor | XGBRegressor) None [source]#
QQplot for train and test data.
- Parameters:
model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
- Returns:
None
- random_forest_fi(model: RandomForestRegressor, figsize: Tuple[float, float] = (12, 10)) None [source]#
The function displays the feature importance plot.
- Parameters:
model (RandomForestRegressor) – The input random forest model.
- Returns:
None
- residual_plot(model: LinearRegression | Lasso | Ridge | DecisionTreeRegressor | RandomForestRegressor | XGBRegressor) None [source]#
Residual plot for train and test data.
- Parameters:
model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – The input model.
- Returns:
None
- edamame.regressor.diagnose.check_random_forest(model: RandomForestRegressor) None [source]#
The function checks if the model passed is a random forest regression.
- Parameters:
model (RandomForestRegressor) – The input model to be checked.
- Raises:
TypeError – If the input model is not a random forest regression model.
- Returns:
None
- edamame.regressor.diagnose.check_xgboost(model: XGBRegressor) None [source]#
The function checks if the model passed is a xgboost regression.
- Parameters:
model (xgb.XGBRegressor) – The input model to be checked.
- Raises:
TypeError – If the input model is not an XGBoost regression model.
- Returns:
None
regression#
- class edamame.regressor.regression.TrainRegressor(X_train: DataFrame, y_train: DataFrame, X_test, y_test)[source]#
Bases:
object
This class represents a pipeline for training and handling regression models.
- X_train#
The input training data.
- Type:
pd.DataFrame
- y_train#
The target training data.
- Type:
pd.Series
- X_test#
The input test data.
- Type:
pd.DataFrame
- y_test#
The target test data.
- Type:
pd.Series
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> linear = regressor.linear() >>> regressor.model_metrics(model_name="linear") >>> regressor.save_model(model_name="linear") >>> lasso = regressor.lasso() >>> ridge = regressor.ridge() >>> tree = regressor.tree() >>> rf = regressor.random_forest() >>> xgb = regressor.xgboost() >>> regressor.model_metrics() >>> # using AutoML >>> models = regressor.auto_ml() >>> regressor.model_metrics() >>> regressor.save_model()
- auto_ml(n_folds: int = 5, data: Literal['train', 'test'] = 'train') List [source]#
Perform automated machine learning with cross validation on a list of regression models.
- Parameters:
n_folds (int) – Number of cross-validation folds. Defaults to 5.
data (Literal['train', 'test']) – Target dataset for cross-validation. Must be either ‘train’ or ‘test’. Defaults to ‘train’.
- Returns:
List of best-fit regression models for each algorithm.
- Return type:
List
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> model_list = regressor.auto_ml()
- lasso(alpha: Tuple[float, float, int] = (0.0001, 10.0, 50), n_folds: int = 5, **kwargs) Lasso [source]#
Train a Lasso regression model using the training data and return the fitted model.
- Parameters:
alpha (Tuple[float, float, int]) – The range of alpha values to test for hyperparameter tuning. Default is (0.0001, 10., 50).
n_folds (int) – The number of cross-validation folds to use for hyperparameter tuning. Default is 5.
**kwargs – Arbitrary keyword arguments to be passed to the lasso constructor.
- Returns:
The trained Lasso regression model.
- Return type:
Lasso
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> lasso = regressor.lasso(alpha=(0.0001, 10., 50), n_folds=5)
- linear(**kwargs) LinearRegression [source]#
Train a linear regression model using the training data and return the fitted model.
- Parameters:
**kwargs – Arbitrary keyword arguments to be passed to the linear constructor.
- Returns:
The trained linear regression model.
- Return type:
LinearRegression
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> linear = regressor.linear()
- model_metrics(model_name: Literal['all', 'linear', 'lasso', 'ridge', 'tree', 'random_forest', 'xgboost'] = 'all') None [source]#
Displays the metrics of a trained regression model. The metrics displayed are R2, MSE, and MAE for both the training and test sets.
- Parameters:
model_name (Literal["all", "linear", "lasso", "ridge", "tree", "random_forest", "xgboost"]) – The name of the model to display metrics for. Can be one of ‘all’, ‘linear’, ‘lasso’, ‘ridge’, ‘tree’, ‘random_forest’, or ‘xgboost’. Defaults to ‘all’.
- Returns:
None
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> xgboost = regressor.xgboost(n_estimators=(10, 200, 10), n_folds=5) >>> regressor.model_metrics(model_name="xgboost")
- random_forest(n_estimators: Tuple[int, int, int] = (50, 1000, 5), n_folds: int = 2, **kwargs) RandomForestRegressor [source]#
Trains a Random Forest regression model on the training data and returns the best estimator found by GridSearchCV.
- Parameters:
n_estimators (Tuple[int, int, int]) – A tuple of integers specifying the minimum and maximum number of trees to include in the forest, and the step size between them.
n_folds (int) – The number of cross-validation folds to use when evaluating models.
**kwargs – Arbitrary keyword arguments to be passed to the random forest constructor.
- Returns:
The best Random Forest model found by GridSearchCV.
- Return type:
RandomForestRegressor
- ridge(alpha: Tuple[float, float, int] = (0.1, 50.0, 50), n_folds: int = 5, **kwargs) Ridge [source]#
Train a Ridge regression model using the training data and return the fitted model.
- Parameters:
alpha (Tuple[float, float, int]) – The range of alpha values to test for hyperparameter tuning. Default is (0.1, 50, 50).
n_folds (int) – The number of cross-validation folds to use for hyperparameter tuning. Default is 5.
**kwargs – Arbitrary keyword arguments to be passed to the ridge constructor.
- Returns:
The trained Ridge regression model.
- Return type:
Ridge
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> ridge = regressor.ridge(alpha=((0.1, 50., 50), n_folds=5)
- save_model(model_name: Literal['all', 'linear', 'lasso', 'ridge', 'tree', 'random_forest', 'xgboost'] = 'all') None [source]#
Saves the specified machine learning model or all models in the instance to a pickle file.
- Parameters:
model_name (Literal["all", "linear", "lasso", "ridge", "tree", "random_forest", "xgboost"]) – The name of the model to save. Defaults to ‘all’.
- Returns:
None
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> model_list = regressor.auto_ml() >>> regressor.save_model(model_name="all")
- tree(alpha: Tuple[float, float, int] = (0.0, 0.001, 5), impurity: Tuple[float, float, int] = (0.0, 1e-05, 5), n_folds: int = 5, **kwargs) DecisionTreeRegressor [source]#
Fits a decision tree regression model using the provided training data and hyperparameters.
- Parameters:
alpha (Tuple[float, float, int]) – A tuple specifying the range of values to use for the ccp_alpha hyperparameter. The range is given as a tuple (start, stop, num), where start is the start of the range, stop is the end of the range, and num is the number of values to generate within the range. Defaults to (0., 0.001, 5).
impurity (Tuple[float, float, int]) – A tuple specifying the range of values to use for the min_impurity_decrease hyperparameter. The range is given as a tuple (start, stop, num), where start is the start of the range, stop is the end of the range, and num is the number of values to generate within the range. Defaults to (0., 0.00001, 5).
n_folds (int) – The number of folds to use for cross-validation. Defaults to 5.
**kwargs – Arbitrary keyword arguments to be passed to the tree constructor.
- Returns:
The fitted decision tree regressor model.
- Return type:
DecisionTreeRegressor
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> tree = regressor.tree(alpha=(0., 0.001, 5), impurity=(0., 0.00001, 5), n_folds=3)
- xgboost(n_estimators: Tuple[int, int, int] = (10, 100, 5), n_folds: int = 2, **kwargs) XGBRegressor [source]#
Trains an XGBoost model using the specified hyperparameters.
- Parameters:
n_estimators (Tuple[int, int, int]) – A tuple containing the start, end and step values for number of estimators. Default is (10, 100, 5).
n_folds (int) – The number of folds to use in the cross-validation process. Default is 2.
**kwargs – Arbitrary keyword arguments to be passed to the xgboost constructor.
- Returns:
The trained XGBoost model.
- Return type:
xgb.XGBRegressor
Example
>>> from edamame.regressor import TrainRegressor >>> regressor = TrainRegressor(X_train, np.log(y_train), X_test, np.log(y_test)) >>> xgboost = regressor.xgboost(n_estimators=(10, 200, 10), n_folds=5)
- edamame.regressor.regression.regression_metrics(model: LinearRegression | Lasso | Ridge | DecisionTreeRegressor | RandomForestRegressor | XGBRegressor, X: DataFrame, y: DataFrame) None [source]#
Compute and display the regression metrics R2, MSE and MAE of the input model.
- Parameters:
model (Union[LinearRegression, Lasso, Ridge, DecisionTreeRegressor, RandomForestRegressor, xgb.XGBRegressor]) – Regression model.
X (pd.DataFrame) – Input features.
y (pd.DataFrame) – Target feature.
- Returns:
None