The API¶
miraiml
provides the following classes:
miraiml.HyperSearchSpace
represents the search space of hyperparametersmiraiml.Config
defines the general behavior ofmiraiml.Engine
miraiml.Engine
manages the optimization process
You can import them by doing
>>> from miraiml import HyperSearchSpace, Config, Engine
miraiml.HyperSearchSpace¶
-
class
miraiml.
HyperSearchSpace
(model_class, id, parameters_values=None, parameters_rules=<function HyperSearchSpace.<lambda>>)¶ Bases:
object
This class represents the search space of hyperparameters for a base model.
Parameters: - model_class (type) – Any class that represents a statistical model. It must
implement the methods
fit
as well aspredict
for regression orpredict_proba
for classification problems. - id (str) – The id that will be associated with the models generated within this search space.
- parameters_values (dict, optional, default=None) – A dictionary containing lists of values to be
tested as parameters when instantiating objects of
model_class
. - parameters_rules (function, optional, default=lambda x: None) –
A function that constrains certain parameters because of the values assumed by others. It must receive a dictionary as input and doesn’t need to return anything. Not used if
parameters_values
has no keys.Warning
Make sure that the parameters accessed in
parameters_rules
exist in the set of parameters defined onparameters_values
, otherwise the engine will attempt to access an invalid key.
Raises: NotImplementedError
,TypeError
,ValueError
Example: from sklearn.linear_model import LogisticRegression from miraiml import HyperSearchSpace def logistic_regression_parameters_rules(parameters): if parameters['solver'] in ['newton-cg', 'sag', 'lbfgs']: parameters['penalty'] = 'l2' hyper_search_space = HyperSearchSpace( model_class = LogisticRegression, id = 'Logistic Regression', parameters_values = { 'penalty': ['l1', 'l2'], 'C': np.arange(0.1, 2, 0.1), 'max_iter': np.arange(50, 300), 'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'], 'random_state': [0] }, parameters_rules = logistic_regression_parameters_rules )
Warning
Do not allow
random_state
assume multiple values. Ifmodel_class
has arandom_state
parameter, force the engine to always choose the same value by providing a list with a single element.Allowing
random_state
to assume multiple values will confuse the engine because the scores will be unstable even with the same choice of hyperparameters and features.- model_class (type) – Any class that represents a statistical model. It must
implement the methods
miraiml.Config¶
-
class
miraiml.
Config
(local_dir, problem_type, hyper_search_spaces, score_function, use_all_features=False, n_folds=5, stratified=True, ensemble_id=None)¶ Bases:
object
This class defines the general behavior of the engine.
Parameters: - local_dir (str) – The name of the folder in which the engine will save its
internal files. If the directory doesn’t exist, it will be created
automatically.
..
and/
are not allowed to composelocal_dir
. - problem_type (str) –
'classification'
or'regression'
. The problem type. Multi-class classification problems are not supported. - hyper_search_spaces (list) – The list of
miraiml.HyperSearchSpace
objects to optimize. Ifhyper_search_spaces
has length 1, the engine will not run ensemble cycles. - score_function (function) – A function that receives the “truth” and the predictions (in this order) and returns the score. Bigger scores must mean better models.
- use_all_features (bool, optional, default=False) – Whether to force MiraiML to always use all features or not.
- n_folds (int, optional, default=5) – The number of folds for the fitting/predicting process.
- stratified (bool, optional, default=True) – Whether to stratify folds on target or not. Only used if
problem_type == 'classification'
. - ensemble_id (str, optional, default=None) – The id for the ensemble. If none is given, the engine will not ensemble base models.
Raises: NotImplementedError
,TypeError
,ValueError
Example: from sklearn.metrics import roc_auc_score from miraiml import Config config = Config( local_dir = 'miraiml_local', problem_type = 'classification', hyper_search_spaces = hyper_search_spaces, score_function = roc_auc_score, use_all_features = False, n_folds = 5, stratified = True, ensemble_id = 'Ensemble' )
- local_dir (str) – The name of the folder in which the engine will save its
internal files. If the directory doesn’t exist, it will be created
automatically.
miraiml.Engine¶
-
class
miraiml.
Engine
(config, on_improvement=None)¶ Bases:
object
This class offers the controls for the engine.
Parameters: - config (miraiml.Config) – The configurations for the behavior of the engine.
- on_improvement (function, optional, default=None) – A function that will be executed everytime the engine
finds an improvement for some id. It must receive a
status
parameter, which is the return of the methodrequest_status()
.
Raises: TypeError
Example: from miraiml import Engine def on_improvement(status): print('Scores:', status['scores']) engine = Engine(config, on_improvement=on_improvement)
-
is_running
()¶ Tells whether the engine is running or not.
Return type: bool Returns: True
if the engine is running andFalse
otherwise.
-
interrupt
()¶ Makes the engine stop on the first opportunity.
Note
This method is not asynchronous. It will wait for the engine to stop.
-
load_data
(train_data, target_column, test_data=None, restart=False)¶ Interrupts the engine and loads a new pair of train/test datasets. All of their columns must be instances of str or int.
Parameters: - train_data (pandas.DataFrame) – The training data.
- target_column (str or int) – The target column identifier.
- test_data (pandas.DataFrame, optional, default=None) – The testing data. Use the default value if you don’t need to make predictions for data with unknown labels.
- restart (bool, optional, default=False) – Whether to restart the engine after updating data or not.
Raises: TypeError
,ValueError
-
shuffle_train_data
(restart=False)¶ Interrupts the engine and shuffles the training data.
Parameters: restart (bool, optional, default=False) – Whether to restart the engine after shuffling data or not. Raises: RuntimeError
Note
It’s a good practice to shuffle the training data periodically to avoid overfitting on a certain folding pattern.
-
reconfigure
(config, restart=False)¶ Interrupts the engine and loads a new configuration.
Parameters: - config (miraiml.Config) – The configurations for the behavior of the engine.
- restart (bool, optional, default=False) – Whether to restart the engine after reconfiguring it or not.
-
restart
()¶ Interrupts the engine and starts again from last checkpoint (if any).
Raises: RuntimeError
,KeyError
-
request_status
()¶ Queries the current status of the engine.
Return type: dict or None Returns: The current status of the engine in the form of a dictionary. If no score has been computed yet, returns None
. The available keys and their respective values on the status dictionary are:'best_id'
: The current best id'scores'
: A dictionary containing the score of each id'predictions'
: Apandas.DataFrame
object containing the predictions from each id for the testing dataset. If no testing dataset was provided, the value associated with this key isNone
'ensemble_weights'
: A dictionary containing the ensemble weights for each base model. If no ensembling cycle has been executed, the value associated with this key isNone
'base_models'
: A dictionary containing the current description of each base model, which can be accessed by their ids
The dictionary associated with the
'base_models'
key contains the following keys and respective values:'model_class'
: The name of the base model’s class'parameters'
: The dictionary of hyperparameters'features'
: The list of features
-
request_report
(include_features=False)¶ Returns the report of the current status of the engine in a formatted string.
Parameters: include_features (bool, optional, default=False) – Whether to include the list of features on the report or not (may cause some visual mess). Return type: str Returns: The formatted report.
-
extract_model
()¶ Generates an unfit model object with methods similar to scikit-learn models. The generated model is the result of the optimizations made by MiraiML, which takes care of the choices of hyperparameters, features and the ensembling weights.
After extracting the model, you can use it to fit new data.
Return type: miraiml.core.MiraiModel Returns: The optimized model object.