Internal modules’ API

The documentation related to these modules is meant for developers.

miraiml.core

miraiml.core contains internal classes responsible for the optimization process.

class miraiml.core.BaseModel(model_class, parameters, features)[source]

Represents an element from the search space, defined by an instance of miraiml.SearchSpace and a set of features.

Read more in the User Guide.

Parameters:
  • model_class (type) – A statistical model class that must implement the methods fit and predict for regression or predict_proba classification problems.
  • parameters (dict) – The parameters that will be used to instantiate objects of model_class.
  • features (list) – The list of features that will be used to train the statistical model.
predict(X_train, y_train, X_test, config)[source]

Performs the predictions for the training and testing datasets and also computes the score of the model.

Parameters:
  • X_train (pandas.DataFrame) – The dataframe that contains the training inputs for the model.
  • y_train (pandas.Series or numpy.ndarray) – The training targets for the model.
  • X_test (pandas.DataFrame) – The dataframe that contains the testing inputs for the model.
  • config (miraiml.Config) – The configuration of the engine.
Return type:

tuple

Returns:

(train_predictions, test_predictions, score)

  • train_predictions: The predictions for the training dataset
  • test_predictions: The predictions for the testing dataset
  • score: The score of the model on the training dataset

Raises:

RuntimeError when fitting or predicting doesn’t work.

miraiml.core.dump_base_model(base_model, path)[source]

Saves the characteristics of a base model as a checkpoint.

Parameters:
  • base_model (miraiml.core.BaseModel) – The base model to be saved
  • path (str) – The path to save the base model
Return type:

tuple

Returns:

(train_predictions, test_predictions, score)

miraiml.core.load_base_model(model_class, path)[source]

Loads the characteristics of a base model from disk and returns its respective instance of miraiml.core.BaseModel.

Parameters:
  • model_class (type) – The model class related to the base model
  • path (str) – The path to load the base model from
Return type:

miraiml.core.BaseModel

Returns:

The base model loaded from disk

class miraiml.core.MiraiSeeker(search_spaces, all_features, config)[source]

This class implements a smarter way of searching good parameters and sets of features.

Read more in the User Guide.

Parameters:
  • base_models_ids (list) – The list of base models’ ids to keep track of.
  • all_features (list) – A list containing all available features.
  • config (miraiml.Config) – The configuration of the engine.
reset()[source]

Deletes all base models registries.

parameters_features_to_dataframe(parameters, features, score)[source]

Creates an entry for a history.

Parameters:
  • parameters (list) – The set of parameters to transform.
  • parameters – The set of features to transform.
  • score (float) – The score to transform.
register_base_model(id, base_model, score)[source]

Registers the performance of a base model and its characteristics.

Parameters:
  • id (str) – The id associated with the base model.
  • base_model (miraiml.core.BaseModel) – The base model being registered.
  • score (float) – The score of base_model.
is_ready(id)[source]

Tells whether the history of an id is large enough for more advanced strategies or not.

Parameters:id (str) – The id to be inspected.
Return type:bool
Returns:Whether id can be used to generate parameters and features lists or not.
seek(id)[source]

Manages the search strategy for better solutions.

With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.

Parameters:id (str) – The id for which a new base model is required.
Return type:miraiml.core.BaseModel
Returns:The next base model for exploration.
Raises:KeyError if parameters_rules tries to access an invalid key.

Generates completely random sets of parameters and features.

Parameters:all_features (list) – The list of available features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

Characteristics that achieved higher scores have independently higher chances of being chosen again.

Parameters:id (str) – The id for which we want a new set of parameters and features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.

Parameters:id (str) – The id for which we want a new set of parameters and features.
Return type:tuple
Returns:(parameters, features) Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
class miraiml.core.Ensembler(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)[source]

Performs the ensemble of the base models and optimizes its weights.

Read more in the User Guide.

Parameters:
  • y_train (pandas.Series or numpy.ndarray) – The target column.
  • base_models_ids (list) – The list of base models’ ids to keep track of.
  • train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
  • test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
  • scores (dict) – The dictionary of scores.
  • config (miraiml.Config) – The configuration of the engine.
interrupt()[source]

Sets an internal flag to interrupt the optimization process on the first opportunity.

update()[source]

Updates the ensemble with the newest predictions from the base models.

gen_weights()[source]

Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.

Return type:dict
Returns:A dictionary containing the weights for each base model id.
ensemble(weights)[source]

Performs the ensemble of the current predictions of each base model.

Parameters:weights (dict) – A dictionary containing the weights related to the id of each base model.
Return type:tuple
Returns:(train_predictions, test_predictions, score)
  • train_predictions: The ensemble predictions for the training dataset
  • test_predictions: The ensemble predictions for the testing dataset
  • score: The score of the ensemble on the training dataset
optimize(max_duration)[source]

Performs ensembling cycles for max_duration seconds.

Parameters:max_duration (float) – The maximum duration allowed for the optimization process.
Return type:bool
Returns:True if a better set of weights was found and False otherwise.
class miraiml.core.BasePipelineClass(**params)[source]

This is the base class for your custom pipeline classes.

Warning

Instantiating this class directly does not work.

get_params()[source]

Gets the list of parameters that can be set.

Parameters:X (iterable) – Data to predict on.
Return type:list
Returns:The list of allowed parameters
set_params(**params)[source]

Sets the parameters for the pipeline. You can check the parameters that are allowed to be set by calling get_params().

Return type:miraiml.core.BasePipelineClass
Returns:self
fit(X, y)[source]

Fits the pipeline to X using y as the target.

Parameters:
  • X (iterable) – The training data.
  • y (iterable) – The target.
Return type:

miraiml.core.BasePipelineClass

Returns:

self

predict(X)[source]

Predicts the class for each element of X in case of classification problems or the estimated target value in case of regression problems.

Parameters:X (iterable) – Data to predict on.
Return type:numpy.ndarray
Returns:The set of predictions
predict_proba(X)[source]

Returns the probabilities for each class. Available only if your end estimator implements it.

Parameters:X (iterable) – Data to predict on.
Return type:numpy.ndarray
Returns:The probabilities for each class

miraiml.util

miraiml.util provides utility functions that are used by higher level modules.

miraiml.util.load(path)[source]

A clean pickle.load wrapper for binary files.

Parameters:path (string) – The path of the binary file to be loaded.
Return type:object
Returns:The loaded object.
miraiml.util.dump(obj, path)[source]

Optimizes the process of writing objects on disc by triggering a thread.

Parameters:
  • obj (object) – The object to be dumped to the binary file.
  • path (string) – The path of the binary file to be written.
miraiml.util.sample_random_len(lst)[source]

Returns a sample of random size from the list lst. The minimum length of the returned list is 1.

Parameters:lst (list) – A list containing the elements to be sampled.
Return type:sampled_lst: list
Returns:The randomly sampled elements from lst.
miraiml.util.is_valid_filename(filename)[source]

Tells whether a string can be used as a safe file name or not.

Parameters:filename (str) – The file name.
Return type:bool
Returns:Whether filename is a valid file name or not.
miraiml.util.is_valid_pipeline_name(pipeline_name)[source]

Tells whether a string can be used to compose pipelines or not.

Parameters:pipeline_name (str) – The file name.
Return type:bool
Returns:Whether pipeline_name is a valid name or not.