The API¶

miraiml provides the following classes:

miraiml.HyperSearchSpace represents the search space of hyperparameters
miraiml.Config defines the general behavior of miraiml.Engine
miraiml.Engine manages the optimization process

You can import them by doing

>>> from miraiml import HyperSearchSpace, Config, Engine

miraiml.HyperSearchSpace¶

class miraiml.HyperSearchSpace(model_class, id, parameters_values=None, parameters_rules=<function HyperSearchSpace.<lambda>>)¶

Bases: object

This class represents the search space of hyperparameters for a base model.

Parameters:	model_class (type) – Any class that represents a statistical model. It must implement the methods `fit` as well as `predict` for regression or `predict_proba` for classification problems. id (str) – The id that will be associated with the models generated within this search space. parameters_values (dict, optional, default=None) – A dictionary containing lists of values to be tested as parameters when instantiating objects of `model_class`. parameters_rules (function, optional, default=lambda x: None) – A function that constrains certain parameters because of the values assumed by others. It must receive a dictionary as input and doesn’t need to return anything. Not used if `parameters_values` has no keys. Warning Make sure that the parameters accessed in `parameters_rules` exist in the set of parameters defined on `parameters_values`, otherwise the engine will attempt to access an invalid key.
Raises:	`NotImplementedError`, `TypeError`, `ValueError`
Example:

from sklearn.linear_model import LogisticRegression
from miraiml import HyperSearchSpace

def logistic_regression_parameters_rules(parameters):
    if parameters['solver'] in ['newton-cg', 'sag', 'lbfgs']:
        parameters['penalty'] = 'l2'

hyper_search_space = HyperSearchSpace(
    model_class = LogisticRegression,
    id = 'Logistic Regression',
    parameters_values = {
        'penalty': ['l1', 'l2'],
        'C': np.arange(0.1, 2, 0.1),
        'max_iter': np.arange(50, 300),
        'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
        'random_state': [0]
    },
    parameters_rules = logistic_regression_parameters_rules
)

Warning

Do not allow random_state assume multiple values. If model_class has a random_state parameter, force the engine to always choose the same value by providing a list with a single element.

Allowing random_state to assume multiple values will confuse the engine because the scores will be unstable even with the same choice of hyperparameters and features.

miraiml.Config¶

class miraiml.Config(local_dir, problem_type, hyper_search_spaces, score_function, use_all_features=False, n_folds=5, stratified=True, ensemble_id=None)¶

Bases: object

This class defines the general behavior of the engine.

Parameters:	local_dir (str) – The name of the folder in which the engine will save its internal files. If the directory doesn’t exist, it will be created automatically. `..` and `/` are not allowed to compose `local_dir`. problem_type (str) – `'classification'` or `'regression'`. The problem type. Multi-class classification problems are not supported. hyper_search_spaces (list) – The list of `miraiml.HyperSearchSpace` objects to optimize. If `hyper_search_spaces` has length 1, the engine will not run ensemble cycles. score_function (function) – A function that receives the “truth” and the predictions (in this order) and returns the score. Bigger scores must mean better models. use_all_features (bool, optional, default=False) – Whether to force MiraiML to always use all features or not. n_folds (int, optional, default=5) – The number of folds for the fitting/predicting process. stratified (bool, optional, default=True) – Whether to stratify folds on target or not. Only used if `problem_type == 'classification'`. ensemble_id (str, optional, default=None) – The id for the ensemble. If none is given, the engine will not ensemble base models.
Raises:	`NotImplementedError`, `TypeError`, `ValueError`
Example:

from sklearn.metrics import roc_auc_score
from miraiml import Config

config = Config(
    local_dir = 'miraiml_local',
    problem_type = 'classification',
    hyper_search_spaces = hyper_search_spaces,
    score_function = roc_auc_score,
    use_all_features = False,
    n_folds = 5,
    stratified = True,
    ensemble_id = 'Ensemble'
)

miraiml.Engine¶

class miraiml.Engine(config, on_improvement=None)¶

Bases: object

This class offers the controls for the engine.

Parameters:	config (miraiml.Config) – The configurations for the behavior of the engine. on_improvement (function, optional, default=None) – A function that will be executed everytime the engine finds an improvement for some id. It must receive a `status` parameter, which is the return of the method `request_status()`.
Raises:	`TypeError`
Example:

from miraiml import Engine

def on_improvement(status):
    print('Scores:', status['scores'])

engine = Engine(config, on_improvement=on_improvement)

is_running()¶

Tells whether the engine is running or not.

Return type:	bool
Returns:	`True` if the engine is running and `False` otherwise.

interrupt()¶: Makes the engine stop on the first opportunity.

Note

This method is not asynchronous. It will wait for the engine to stop.

load_data(train_data, target_column, test_data=None, restart=False)¶

Interrupts the engine and loads a new pair of train/test datasets. All of their columns must be instances of str or int.

Parameters:

train_data (pandas.DataFrame) – The training data.
target_column (str or int) – The target column identifier.
test_data (pandas.DataFrame, optional, default=None) – The testing data. Use the default value if you don’t need to make predictions for data with unknown labels.
restart (bool, optional, default=False) – Whether to restart the engine after updating data or not.

Raises:

TypeError, ValueError

shuffle_train_data(restart=False)¶

Interrupts the engine and shuffles the training data.

Parameters:	restart (bool, optional, default=False) – Whether to restart the engine after shuffling data or not.
Raises:	`RuntimeError`

Note

It’s a good practice to shuffle the training data periodically to avoid overfitting on a certain folding pattern.

reconfigure(config, restart=False)¶

Interrupts the engine and loads a new configuration.

Parameters:	config (miraiml.Config) – The configurations for the behavior of the engine. restart (bool, optional, default=False) – Whether to restart the engine after reconfiguring it or not.

restart()¶

Interrupts the engine and starts again from last checkpoint (if any).

Raises:	`RuntimeError`, `KeyError`

request_status()¶

Queries the current status of the engine.

Return type: dict or None

Returns:

The current status of the engine in the form of a dictionary. If no score has been computed yet, returns None. The available keys and their respective values on the status dictionary are:

'best_id': The current best id
'scores': A dictionary containing the score of each id
'predictions': A pandas.DataFrame object containing the predictions from each id for the testing dataset. If no testing dataset was provided, the value associated with this key is None
'ensemble_weights': A dictionary containing the ensemble weights for each base model. If no ensembling cycle has been executed, the value associated with this key is None
'base_models': A dictionary containing the current description of each base model, which can be accessed by their ids

The dictionary associated with the 'base_models' key contains the following keys and respective values:

'model_class': The name of the base model’s class
'parameters': The dictionary of hyperparameters
'features': The list of features

request_report(include_features=False)¶

Returns the report of the current status of the engine in a formatted string.

Parameters:	include_features (bool, optional, default=False) – Whether to include the list of features on the report or not (may cause some visual mess).
Return type:	str
Returns:	The formatted report.

extract_model()¶

Generates an unfit model object with methods similar to scikit-learn models. The generated model is the result of the optimizations made by MiraiML, which takes care of the choices of hyperparameters, features and the ensembling weights.

After extracting the model, you can use it to fit new data.

Return type:	miraiml.core.MiraiModel
Returns:	The optimized model object.