The API

miraiml provides the following classes:

You can import them by doing

>>> from miraiml import HyperSearchSpace, Config, Engine

miraiml.HyperSearchSpace

class miraiml.HyperSearchSpace(model_class, id, parameters_values=None, parameters_rules=<function HyperSearchSpace.<lambda>>)

Bases: object

This class represents the search space of hyperparameters for a base model.

Parameters:
  • model_class (type) – Any class that represents a statistical model. It must implement the methods fit as well as predict for regression or predict_proba for classification problems.
  • id (str) – The id that will be associated with the models generated within this search space.
  • parameters_values (dict, optional, default=None) – A dictionary containing lists of values to be tested as parameters when instantiating objects of model_class.
  • parameters_rules (function, optional, default=lambda x: None) –

    A function that constrains certain parameters because of the values assumed by others. It must receive a dictionary as input and doesn’t need to return anything. Not used if parameters_values has no keys.

    Warning

    Make sure that the parameters accessed in parameters_rules exist in the set of parameters defined on parameters_values, otherwise the engine will attempt to access an invalid key.

Raises:

NotImplementedError, TypeError, ValueError

Example:
from sklearn.linear_model import LogisticRegression
from miraiml import HyperSearchSpace

def logistic_regression_parameters_rules(parameters):
    if parameters['solver'] in ['newton-cg', 'sag', 'lbfgs']:
        parameters['penalty'] = 'l2'

hyper_search_space = HyperSearchSpace(
    model_class = LogisticRegression,
    id = 'Logistic Regression',
    parameters_values = {
        'penalty': ['l1', 'l2'],
        'C': np.arange(0.1, 2, 0.1),
        'max_iter': np.arange(50, 300),
        'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
        'random_state': [0]
    },
    parameters_rules = logistic_regression_parameters_rules
)

Warning

Do not allow random_state assume multiple values. If model_class has a random_state parameter, force the engine to always choose the same value by providing a list with a single element.

Allowing random_state to assume multiple values will confuse the engine because the scores will be unstable even with the same choice of hyperparameters and features.

miraiml.Config

class miraiml.Config(local_dir, problem_type, hyper_search_spaces, score_function, use_all_features=False, n_folds=5, stratified=True, ensemble_id=None)

Bases: object

This class defines the general behavior of the engine.

Parameters:
  • local_dir (str) – The name of the folder in which the engine will save its internal files. If the directory doesn’t exist, it will be created automatically. .. and / are not allowed to compose local_dir.
  • problem_type (str) – 'classification' or 'regression'. The problem type. Multi-class classification problems are not supported.
  • hyper_search_spaces (list) – The list of miraiml.HyperSearchSpace objects to optimize. If hyper_search_spaces has length 1, the engine will not run ensemble cycles.
  • score_function (function) – A function that receives the “truth” and the predictions (in this order) and returns the score. Bigger scores must mean better models.
  • use_all_features (bool, optional, default=False) – Whether to force MiraiML to always use all features or not.
  • n_folds (int, optional, default=5) – The number of folds for the fitting/predicting process.
  • stratified (bool, optional, default=True) – Whether to stratify folds on target or not. Only used if problem_type == 'classification'.
  • ensemble_id (str, optional, default=None) – The id for the ensemble. If none is given, the engine will not ensemble base models.
Raises:

NotImplementedError, TypeError, ValueError

Example:
from sklearn.metrics import roc_auc_score
from miraiml import Config

config = Config(
    local_dir = 'miraiml_local',
    problem_type = 'classification',
    hyper_search_spaces = hyper_search_spaces,
    score_function = roc_auc_score,
    use_all_features = False,
    n_folds = 5,
    stratified = True,
    ensemble_id = 'Ensemble'
)

miraiml.Engine

class miraiml.Engine(config, on_improvement=None)

Bases: object

This class offers the controls for the engine.

Parameters:
  • config (miraiml.Config) – The configurations for the behavior of the engine.
  • on_improvement (function, optional, default=None) – A function that will be executed everytime the engine finds an improvement for some id. It must receive a status parameter, which is the return of the method request_status().
Raises:

TypeError

Example:
from miraiml import Engine

def on_improvement(status):
    print('Scores:', status['scores'])

engine = Engine(config, on_improvement=on_improvement)
is_running()

Tells whether the engine is running or not.

Return type:bool
Returns:True if the engine is running and False otherwise.
interrupt()

Makes the engine stop on the first opportunity.

Note

This method is not asynchronous. It will wait for the engine to stop.

load_data(train_data, target_column, test_data=None, restart=False)

Interrupts the engine and loads a new pair of train/test datasets. All of their columns must be instances of str or int.

Parameters:
  • train_data (pandas.DataFrame) – The training data.
  • target_column (str or int) – The target column identifier.
  • test_data (pandas.DataFrame, optional, default=None) – The testing data. Use the default value if you don’t need to make predictions for data with unknown labels.
  • restart (bool, optional, default=False) – Whether to restart the engine after updating data or not.
Raises:

TypeError, ValueError

shuffle_train_data(restart=False)

Interrupts the engine and shuffles the training data.

Parameters:restart (bool, optional, default=False) – Whether to restart the engine after shuffling data or not.
Raises:RuntimeError

Note

It’s a good practice to shuffle the training data periodically to avoid overfitting on a certain folding pattern.

reconfigure(config, restart=False)

Interrupts the engine and loads a new configuration.

Parameters:
  • config (miraiml.Config) – The configurations for the behavior of the engine.
  • restart (bool, optional, default=False) – Whether to restart the engine after reconfiguring it or not.
restart()

Interrupts the engine and starts again from last checkpoint (if any).

Raises:RuntimeError, KeyError
request_status()

Queries the current status of the engine.

Return type:dict or None
Returns:The current status of the engine in the form of a dictionary. If no score has been computed yet, returns None. The available keys and their respective values on the status dictionary are:
  • 'best_id': The current best id
  • 'scores': A dictionary containing the score of each id
  • 'predictions': A pandas.DataFrame object containing the predictions from each id for the testing dataset. If no testing dataset was provided, the value associated with this key is None
  • 'ensemble_weights': A dictionary containing the ensemble weights for each base model. If no ensembling cycle has been executed, the value associated with this key is None
  • 'base_models': A dictionary containing the current description of each base model, which can be accessed by their ids

The dictionary associated with the 'base_models' key contains the following keys and respective values:

  • 'model_class': The name of the base model’s class
  • 'parameters': The dictionary of hyperparameters
  • 'features': The list of features
request_report(include_features=False)

Returns the report of the current status of the engine in a formatted string.

Parameters:include_features (bool, optional, default=False) – Whether to include the list of features on the report or not (may cause some visual mess).
Return type:str
Returns:The formatted report.
extract_model()

Generates an unfit model object with methods similar to scikit-learn models. The generated model is the result of the optimizations made by MiraiML, which takes care of the choices of hyperparameters, features and the ensembling weights.

After extracting the model, you can use it to fit new data.

Return type:miraiml.core.MiraiModel
Returns:The optimized model object.