Internal modules¶
The documentation related to these modules is meant for developers.
miraiml.core¶
miraiml.core
contains internal classes responsible for the optimization
process.
miraiml.core.BaseModel
represents a solutionmiraiml.core.MiraiSeeker
implements the strategies to search for good solutionsmiraiml.core.Ensembler
searches for smart ways of combining the current solutions o generate a better one
-
class
miraiml.core.
BaseModel
(model_class, parameters, features)¶ Bases:
object
Represents an element from the search space, defined by an instance of
miraiml.HyperSearchSpace
and a set of features.Read more in the User Guide.
Parameters: - model_class (type) – A statistical model class that must implement the methods
fit
andpredict
for regression orpredict_proba
classification problems. - parameters (dict) – The parameters that will be used to instantiate objects of
model_class
. - features (list) – The list of features that will be used to train the statistical model.
-
predict
(X_train, y_train, X_test, config)¶ Performs the predictions for the training and testing datasets and also computes the score of the model.
Parameters: - X_train (pandas.DataFrame) – The dataframe that contains the training inputs for the model.
- y_train (pandas.Series or numpy.ndarray) – The training targets for the model.
- X_test (pandas.DataFrame) – The dataframe that contains the testing inputs for the model.
- config (miraiml.Config) – The configuration of the engine.
Return type: tuple
Returns: (train_predictions, test_predictions, score)
train_predictions
: The predictions for the training datasettest_predictions
: The predictions for the testing datasetscore
: The score of the model on the training dataset
Raises: RuntimeError
- model_class (type) – A statistical model class that must implement the methods
-
class
miraiml.core.
MiraiSeeker
(hyper_search_spaces, all_features, config)¶ Bases:
object
This class implements a smarter way of searching good parameters and sets of features.
Read more in the User Guide.
Parameters: - base_models_ids (list) – The list of base models’ ids to keep track of.
- all_features (list) – A list containing all available features.
- config (miraiml.Config) – The configuration of the engine.
-
reset
()¶ Deletes all base models registries.
-
parameters_features_to_dataframe
(parameters, features, score)¶ Creates an entry for a history.
Parameters: - parameters (list) – The set of parameters to transform.
- parameters – The set of features to transform.
- score (float) – The score to transform.
-
register_base_model
(id, base_model, score)¶ Registers the performance of a base model and its characteristics.
Parameters: - id (str) – The id associated with the base model.
- base_model (miraiml.core.BaseModel) – The base model being registered.
- score (float) – The score of
base_model
.
-
is_ready
(id)¶ Tells whether the history of an id is large enough for more advanced strategies or not.
Parameters: id (str) – The id to be inspected. Return type: bool Returns: Whether id
can be used to generate parameters and features lists or not.
-
seek
(id)¶ Manages the search strategy for better solutions.
With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.
Parameters: id (str) – The id for which a new base model is required. Return type: miraiml.core.BaseModel Returns: The next base model for exploration. Raises: KeyError
-
random_search
(id)¶ Generates completely random sets of parameters and features.
Parameters: all_features (list) – The list of available features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
naive_search
(id)¶ Characteristics that achieved higher scores have independently higher chances of being chosen again.
Parameters: id (str) – The id for which we want a new set of parameters and features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
linear_regression_search
(id)¶ Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.
Parameters: id (str) – The id for which we want a new set of parameters and features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
class
miraiml.core.
Ensembler
(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)¶ Bases:
object
Performs the ensemble of the base models.
Read more in the User Guide.
Parameters: - y_train (pandas.Series or numpy.ndarray) – The target column.
- base_models_ids (list) – The list of base models’ ids to keep track of.
- train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
- test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
- scores (dict) – The dictionary of scores.
- config (miraiml.Config) – The configuration of the engine.
-
interrupt
()¶ Sets an internal flag to interrupt the optimization process on the first opportunity.
-
update
()¶ Updates the ensemble with the newest predictions from the base models.
-
gen_weights
()¶ Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.
Return type: dict Returns: A dictionary containing the weights for each base model id.
-
ensemble
(weights)¶ Performs the ensemble of the current predictions of each base model.
Parameters: weights (dict) – A dictionary containing the weights related to the id of each base model. Return type: tuple Returns: (train_predictions, test_predictions, score)
train_predictions
: The ensemble predictions for the training datasettest_predictions
: The ensemble predictions for the testing datasetscore
: The score of the ensemble on the training dataset
-
optimize
(max_duration)¶ Performs ensembling cycles for
max_duration
seconds.Parameters: max_duration (float) – The maximum duration allowed for the optimization process. Return type: bool Returns: True
if a better set of weights was found andFalse
otherwise.
-
class
miraiml.core.
MiraiModel
(base_models, weights, problem_type)¶ Bases:
object
Represents an unified model optimized by MiraiML.
-
fit
(X, y)¶ Fits all base models.
Parameters: - X (pandas.DataFrame) – The training data.
- y (pandas.Series or numpy.ndarray) – The target.
-
predict
(X)¶ Predicts the classes for classification problems and the output for regression problems.
Parameters: X (pandas.DataFrame) – The input for new predictions.
-
predict_proba
(X)¶ Predicts the probabilities for each class. Only available for classification problems.
Parameters: X (pandas.DataFrame) – The input for new predictions. Raises: RuntimeError
-
miraiml.util¶
miraiml.util
provides utility functions that are used by higher level
modules.
-
miraiml.util.
load
(path)¶ A clean pickle.load wrapper for binary files.
Parameters: path (string) – The path of the binary file to be loaded. Return type: object Returns: The loaded object.
-
miraiml.util.
dump
(obj, path)¶ Optimizes the process of writing objects on disc by triggering a thread.
Parameters: - obj (object) – The object to be dumped to the binary file.
- path (string) – The path of the binary file to be written.
-
miraiml.util.
sample_random_len
(lst)¶ Returns a sample of random size from the list
lst
. The minimum length of the returned list is 1.Parameters: lst (list) – A list containing the elements to be sampled. Return type: sampled_lst: list Returns: The randomly sampled elements from lst
.
-
miraiml.util.
is_valid_filename
(filename)¶ Tells whether a string can be used as a safe file name or not.
Parameters: filename (str) – The file name. Return type: bool Returns: Whether filename
is a valid file name or not.