Internal modules¶

The documentation related to these modules is meant for developers.

miraiml.core¶

miraiml.core contains internal classes responsible for the optimization process.

miraiml.core.BaseModel represents a solution
miraiml.core.MiraiSeeker implements the strategies to search for good solutions
miraiml.core.Ensembler searches for smart ways of combining the current solutions o generate a better one

class miraiml.core.BaseModel(model_class, parameters, features)¶

Bases: object

Represents an element from the search space, defined by an instance of miraiml.HyperSearchSpace and a set of features.

Read more in the User Guide.

Parameters:	base_models_ids (list) – The list of base models’ ids to keep track of. all_features (list) – A list containing all available features. config (miraiml.Config) – The configuration of the engine.

reset()¶: Deletes all base models registries.

parameters_features_to_dataframe(parameters, features, score)¶

Creates an entry for a history.

Parameters:	parameters (list) – The set of parameters to transform. parameters – The set of features to transform. score (float) – The score to transform.

register_base_model(id, base_model, score)¶

Registers the performance of a base model and its characteristics.

Parameters:	id (str) – The id associated with the base model. base_model (miraiml.core.BaseModel) – The base model being registered. score (float) – The score of `base_model`.

is_ready(id)¶

Tells whether the history of an id is large enough for more advanced strategies or not.

Parameters:	id (str) – The id to be inspected.
Return type:	bool
Returns:	Whether `id` can be used to generate parameters and features lists or not.

seek(id)¶

Manages the search strategy for better solutions.

With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.

Parameters:	id (str) – The id for which a new base model is required.
Return type:	miraiml.core.BaseModel
Returns:	The next base model for exploration.
Raises:	`KeyError`

random_search(id)¶

Generates completely random sets of parameters and features.

Parameters:	all_features (list) – The list of available features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

naive_search(id)¶

Characteristics that achieved higher scores have independently higher chances of being chosen again.

Parameters:	id (str) – The id for which we want a new set of parameters and features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

linear_regression_search(id)¶

Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.

Parameters:	id (str) – The id for which we want a new set of parameters and features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

class miraiml.core.Ensembler(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)¶

Bases: object

Performs the ensemble of the base models.

Read more in the User Guide.

Parameters:

y_train (pandas.Series or numpy.ndarray) – The target column.
base_models_ids (list) – The list of base models’ ids to keep track of.
train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
scores (dict) – The dictionary of scores.
config (miraiml.Config) – The configuration of the engine.

interrupt()¶: Sets an internal flag to interrupt the optimization process on the first opportunity.

update()¶: Updates the ensemble with the newest predictions from the base models.

gen_weights()¶

Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.

Return type:	dict
Returns:	A dictionary containing the weights for each base model id.

ensemble(weights)¶

Performs the ensemble of the current predictions of each base model.

Parameters:	weights (dict) – A dictionary containing the weights related to the id of each base model.
Return type:	tuple
Returns:	`(train_predictions, test_predictions, score)` `train_predictions`: The ensemble predictions for the training dataset `test_predictions`: The ensemble predictions for the testing dataset `score`: The score of the ensemble on the training dataset

optimize(max_duration)¶

Performs ensembling cycles for max_duration seconds.

Parameters:	max_duration (float) – The maximum duration allowed for the optimization process.
Return type:	bool
Returns:	`True` if a better set of weights was found and `False` otherwise.

class miraiml.core.MiraiModel(base_models, weights, problem_type)¶

Bases: object

Represents an unified model optimized by MiraiML.

fit(X, y)¶

Fits all base models.

Parameters:	X (pandas.DataFrame) – The training data. y (pandas.Series or numpy.ndarray) – The target.

predict(X)¶

Predicts the classes for classification problems and the output for regression problems.

Parameters:	X (pandas.DataFrame) – The input for new predictions.

predict_proba(X)¶

Predicts the probabilities for each class. Only available for classification problems.

Parameters:	X (pandas.DataFrame) – The input for new predictions.
Raises:	`RuntimeError`

miraiml.util¶

miraiml.util provides utility functions that are used by higher level modules.

miraiml.util.load(path)¶

A clean pickle.load wrapper for binary files.

Parameters:	path (string) – The path of the binary file to be loaded.
Return type:	object
Returns:	The loaded object.

miraiml.util.dump(obj, path)¶

Optimizes the process of writing objects on disc by triggering a thread.

Parameters:	obj (object) – The object to be dumped to the binary file. path (string) – The path of the binary file to be written.

miraiml.util.sample_random_len(lst)¶

Returns a sample of random size from the list lst. The minimum length of the returned list is 1.

Parameters:	lst (list) – A list containing the elements to be sampled.
Return type:	sampled_lst: list
Returns:	The randomly sampled elements from `lst`.

miraiml.util.is_valid_filename(filename)¶

Tells whether a string can be used as a safe file name or not.

Parameters:	filename (str) – The file name.
Return type:	bool
Returns:	Whether `filename` is a valid file name or not.