Internal modules’ API¶

The documentation related to these modules is meant for developers.

miraiml.core¶

miraiml.core contains internal classes responsible for the optimization process.

class miraiml.core.BaseModel(model_class, parameters, features)[source]¶

Represents an element from the search space, defined by an instance of miraiml.SearchSpace and a set of features.

Read more in the User Guide.

Parameters:	model_class (type) – A statistical model class that must implement the methods `fit` and `predict` for regression or `predict_proba` classification problems. parameters (dict) – The parameters that will be used to instantiate objects of `model_class`. features (list) – The list of features that will be used to train the statistical model.

predict(X_train, y_train, X_test, config)[source]¶

Performs the predictions for the training and testing datasets and also computes the score of the model.

Parameters:	X_train (pandas.DataFrame) – The dataframe that contains the training inputs for the model. y_train (pandas.Series or numpy.ndarray) – The training targets for the model. X_test (pandas.DataFrame) – The dataframe that contains the testing inputs for the model. config (miraiml.Config) – The configuration of the engine.
Return type:	tuple
Returns:	`(train_predictions, test_predictions, score)` `train_predictions`: The predictions for the training dataset `test_predictions`: The predictions for the testing dataset `score`: The score of the model on the training dataset
Raises:	`RuntimeError` when fitting or predicting doesn’t work.

miraiml.core.dump_base_model(base_model, path)[source]¶

Saves the characteristics of a base model as a checkpoint.

Parameters:	base_model (miraiml.core.BaseModel) – The base model to be saved path (str) – The path to save the base model
Return type:	tuple
Returns:	`(train_predictions, test_predictions, score)`

miraiml.core.load_base_model(model_class, path)[source]¶

Loads the characteristics of a base model from disk and returns its respective instance of miraiml.core.BaseModel.

Parameters:	model_class (type) – The model class related to the base model path (str) – The path to load the base model from
Return type:	miraiml.core.BaseModel
Returns:	The base model loaded from disk

class miraiml.core.MiraiSeeker(search_spaces, all_features, config)[source]¶

This class implements a smarter way of searching good parameters and sets of features.

Read more in the User Guide.

Parameters:	base_models_ids (list) – The list of base models’ ids to keep track of. all_features (list) – A list containing all available features. config (miraiml.Config) – The configuration of the engine.

reset()[source]¶: Deletes all base models registries.

parameters_features_to_dataframe(parameters, features, score)[source]¶

Creates an entry for a history.

Parameters:	parameters (list) – The set of parameters to transform. parameters – The set of features to transform. score (float) – The score to transform.

register_base_model(id, base_model, score)[source]¶

Registers the performance of a base model and its characteristics.

Parameters:	id (str) – The id associated with the base model. base_model (miraiml.core.BaseModel) – The base model being registered. score (float) – The score of `base_model`.

is_ready(id)[source]¶

Tells whether the history of an id is large enough for more advanced strategies or not.

Parameters:	id (str) – The id to be inspected.
Return type:	bool
Returns:	Whether `id` can be used to generate parameters and features lists or not.

seek(id)[source]¶

Manages the search strategy for better solutions.

With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.

Parameters:	id (str) – The id for which a new base model is required.
Return type:	miraiml.core.BaseModel
Returns:	The next base model for exploration.
Raises:	`KeyError` if `parameters_rules` tries to access an invalid key.

random_search(id)[source]¶

Generates completely random sets of parameters and features.

Parameters:	all_features (list) – The list of available features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

naive_search(id)[source]¶

Characteristics that achieved higher scores have independently higher chances of being chosen again.

Parameters:	id (str) – The id for which we want a new set of parameters and features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

linear_regression_search(id)[source]¶

Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.

Parameters:	id (str) – The id for which we want a new set of parameters and features.
Return type:	tuple
Returns:	`(parameters, features)` Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.

class miraiml.core.Ensembler(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)[source]¶

Performs the ensemble of the base models and optimizes its weights.

Read more in the User Guide.

Parameters:

y_train (pandas.Series or numpy.ndarray) – The target column.
base_models_ids (list) – The list of base models’ ids to keep track of.
train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
scores (dict) – The dictionary of scores.
config (miraiml.Config) – The configuration of the engine.

interrupt()[source]¶: Sets an internal flag to interrupt the optimization process on the first opportunity.

update()[source]¶: Updates the ensemble with the newest predictions from the base models.

gen_weights()[source]¶

Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.

Return type:	dict
Returns:	A dictionary containing the weights for each base model id.

ensemble(weights)[source]¶

Performs the ensemble of the current predictions of each base model.

Parameters:	weights (dict) – A dictionary containing the weights related to the id of each base model.
Return type:	tuple
Returns:	`(train_predictions, test_predictions, score)` `train_predictions`: The ensemble predictions for the training dataset `test_predictions`: The ensemble predictions for the testing dataset `score`: The score of the ensemble on the training dataset

optimize(max_duration)[source]¶

Performs ensembling cycles for max_duration seconds.

Parameters:	max_duration (float) – The maximum duration allowed for the optimization process.
Return type:	bool
Returns:	`True` if a better set of weights was found and `False` otherwise.

class miraiml.core.BasePipelineClass(**params)[source]¶

This is the base class for your custom pipeline classes.

Warning

Instantiating this class directly does not work.

get_params()[source]¶

Gets the list of parameters that can be set.

Parameters:	X (iterable) – Data to predict on.
Return type:	list
Returns:	The list of allowed parameters

set_params(**params)[source]¶

Sets the parameters for the pipeline. You can check the parameters that are allowed to be set by calling get_params().

Return type:	miraiml.core.BasePipelineClass
Returns:	self

fit(X, y)[source]¶

Fits the pipeline to X using y as the target.

Parameters:	X (iterable) – The training data. y (iterable) – The target.
Return type:	miraiml.core.BasePipelineClass
Returns:	self

predict(X)[source]¶

Predicts the class for each element of X in case of classification problems or the estimated target value in case of regression problems.

Parameters:	X (iterable) – Data to predict on.
Return type:	numpy.ndarray
Returns:	The set of predictions

predict_proba(X)[source]¶

Returns the probabilities for each class. Available only if your end estimator implements it.

Parameters:	X (iterable) – Data to predict on.
Return type:	numpy.ndarray
Returns:	The probabilities for each class

miraiml.util¶

miraiml.util provides utility functions that are used by higher level modules.

miraiml.util.load(path)[source]¶

A clean pickle.load wrapper for binary files.

Parameters:	path (string) – The path of the binary file to be loaded.
Return type:	object
Returns:	The loaded object.

miraiml.util.dump(obj, path)[source]¶

Optimizes the process of writing objects on disc by triggering a thread.

Parameters:	obj (object) – The object to be dumped to the binary file. path (string) – The path of the binary file to be written.

miraiml.util.sample_random_len(lst)[source]¶

Returns a sample of random size from the list lst. The minimum length of the returned list is 1.

Parameters:	lst (list) – A list containing the elements to be sampled.
Return type:	sampled_lst: list
Returns:	The randomly sampled elements from `lst`.

miraiml.util.is_valid_filename(filename)[source]¶

Tells whether a string can be used as a safe file name or not.

Parameters:	filename (str) – The file name.
Return type:	bool
Returns:	Whether `filename` is a valid file name or not.

miraiml.util.is_valid_pipeline_name(pipeline_name)[source]¶

Tells whether a string can be used to compose pipelines or not.

Parameters:	pipeline_name (str) – The file name.
Return type:	bool
Returns:	Whether `pipeline_name` is a valid name or not.