Internal modules

The documentation related to these modules is meant for developers.

miraiml.core

miraiml.core contains internal classes responsible for the optimization process.

class miraiml.core.BaseModel(model_class, parameters, features)

Bases: object

Represents an element from the search space, defined by an instance of miraiml.HyperSearchSpace and a set of features.

Read more in the User Guide.

Parameters:
  • model_class (type) – A statistical model class that must implement the methods fit and predict for regression or predict_proba classification problems.
  • parameters (dict) – The parameters that will be used to instantiate objects of model_class.
  • features (list) – The list of features that will be used to train the statistical model.
predict(X_train, y_train, X_test, config)

Performs the predictions for the training and testing datasets and also computes the score of the model.

Parameters:
  • X_train (pandas.DataFrame) – The dataframe that contains the training inputs for the model.
  • y_train (pandas.Series or numpy.ndarray) – The training targets for the model.
  • X_test (pandas.DataFrame) – The dataframe that contains the testing inputs for the model.
  • config (miraiml.Config) – The configuration of the engine.
Return type:

tuple

Returns:

(train_predictions, test_predictions, score)

  • train_predictions: The predictions for the training dataset
  • test_predictions: The predictions for the testing dataset
  • score: The score of the model on the training dataset

Raises:

RuntimeError

class miraiml.core.MiraiSeeker(hyper_search_spaces, all_features, config)

Bases: object

This class implements a smarter way of searching good parameters and sets of features.

Read more in the User Guide.

Parameters:
  • base_models_ids (list) – The list of base models’ ids to keep track of.
  • all_features (list) – A list containing all available features.
  • config (miraiml.Config) – The configuration of the engine.
reset()

Deletes all base models registries.

parameters_features_to_dataframe(parameters, features, score)

Creates an entry for a history.

Parameters:
  • parameters (list) – The set of parameters to transform.
  • parameters – The set of features to transform.
  • score (float) – The score to transform.
register_base_model(id, base_model, score)

Registers the performance of a base model and its characteristics.

Parameters:
  • id (str) – The id associated with the base model.
  • base_model (miraiml.core.BaseModel) – The base model being registered.
  • score (float) – The score of base_model.
is_ready(id)

Tells whether the history of an id is large enough for more advanced strategies or not.

Parameters:id (str) – The id to be inspected.
Return type:bool
Returns:Whether id can be used to generate parameters and features lists or not.
seek(id)

Manages the search strategy for better solutions.

With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.

Parameters:id (str) – The id for which a new base model is required.
Return type:miraiml.core.BaseModel
Returns:The next base model for exploration.
Raises:KeyError

Generates completely random sets of parameters and features.

Parameters:all_features (list) – The list of available features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

Characteristics that achieved higher scores have independently higher chances of being chosen again.

Parameters:id (str) – The id for which we want a new set of parameters and features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.

Parameters:id (str) – The id for which we want a new set of parameters and features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
class miraiml.core.Ensembler(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)

Bases: object

Performs the ensemble of the base models.

Read more in the User Guide.

Parameters:
  • y_train (pandas.Series or numpy.ndarray) – The target column.
  • base_models_ids (list) – The list of base models’ ids to keep track of.
  • train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
  • test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
  • scores (dict) – The dictionary of scores.
  • config (miraiml.Config) – The configuration of the engine.
interrupt()

Sets an internal flag to interrupt the optimization process on the first opportunity.

update()

Updates the ensemble with the newest predictions from the base models.

gen_weights()

Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.

Return type:dict
Returns:A dictionary containing the weights for each base model id.
ensemble(weights)

Performs the ensemble of the current predictions of each base model.

Parameters:weights (dict) – A dictionary containing the weights related to the id of each base model.
Return type:tuple
Returns:(train_predictions, test_predictions, score)
  • train_predictions: The ensemble predictions for the training dataset
  • test_predictions: The ensemble predictions for the testing dataset
  • score: The score of the ensemble on the training dataset
optimize(max_duration)

Performs ensembling cycles for max_duration seconds.

Parameters:max_duration (float) – The maximum duration allowed for the optimization process.
Return type:bool
Returns:True if a better set of weights was found and False otherwise.
class miraiml.core.MiraiModel(base_models, weights, problem_type)

Bases: object

Represents an unified model optimized by MiraiML.

fit(X, y)

Fits all base models.

Parameters:
  • X (pandas.DataFrame) – The training data.
  • y (pandas.Series or numpy.ndarray) – The target.
predict(X)

Predicts the classes for classification problems and the output for regression problems.

Parameters:X (pandas.DataFrame) – The input for new predictions.
predict_proba(X)

Predicts the probabilities for each class. Only available for classification problems.

Parameters:X (pandas.DataFrame) – The input for new predictions.
Raises:RuntimeError

miraiml.util

miraiml.util provides utility functions that are used by higher level modules.

miraiml.util.load(path)

A clean pickle.load wrapper for binary files.

Parameters:path (string) – The path of the binary file to be loaded.
Return type:object
Returns:The loaded object.
miraiml.util.dump(obj, path)

Optimizes the process of writing objects on disc by triggering a thread.

Parameters:
  • obj (object) – The object to be dumped to the binary file.
  • path (string) – The path of the binary file to be written.
miraiml.util.sample_random_len(lst)

Returns a sample of random size from the list lst. The minimum length of the returned list is 1.

Parameters:lst (list) – A list containing the elements to be sampled.
Return type:sampled_lst: list
Returns:The randomly sampled elements from lst.
miraiml.util.is_valid_filename(filename)

Tells whether a string can be used as a safe file name or not.

Parameters:filename (str) – The file name.
Return type:bool
Returns:Whether filename is a valid file name or not.