Internal modules’ API¶
The documentation related to these modules is meant for developers.
miraiml.core¶
miraiml.core
contains internal classes responsible for the optimization
process.
-
class
miraiml.core.
BaseModel
(model_class, parameters, features)[source]¶ Represents an element from the search space, defined by an instance of
miraiml.SearchSpace
and a set of features.Read more in the User Guide.
Parameters: - model_class (type) – A statistical model class that must implement the methods
fit
andpredict
for regression orpredict_proba
classification problems. - parameters (dict) – The parameters that will be used to instantiate objects of
model_class
. - features (list) – The list of features that will be used to train the statistical model.
-
predict
(X_train, y_train, X_test, config)[source]¶ Performs the predictions for the training and testing datasets and also computes the score of the model.
Parameters: - X_train (pandas.DataFrame) – The dataframe that contains the training inputs for the model.
- y_train (pandas.Series or numpy.ndarray) – The training targets for the model.
- X_test (pandas.DataFrame) – The dataframe that contains the testing inputs for the model.
- config (miraiml.Config) – The configuration of the engine.
Return type: tuple
Returns: (train_predictions, test_predictions, score)
train_predictions
: The predictions for the training datasettest_predictions
: The predictions for the testing datasetscore
: The score of the model on the training dataset
Raises: RuntimeError
when fitting or predicting doesn’t work.
- model_class (type) – A statistical model class that must implement the methods
-
miraiml.core.
dump_base_model
(base_model, path)[source]¶ Saves the characteristics of a base model as a checkpoint.
Parameters: - base_model (miraiml.core.BaseModel) – The base model to be saved
- path (str) – The path to save the base model
Return type: tuple
Returns: (train_predictions, test_predictions, score)
-
miraiml.core.
load_base_model
(model_class, path)[source]¶ Loads the characteristics of a base model from disk and returns its respective instance of
miraiml.core.BaseModel
.Parameters: - model_class (type) – The model class related to the base model
- path (str) – The path to load the base model from
Return type: Returns: The base model loaded from disk
-
class
miraiml.core.
MiraiSeeker
(search_spaces, all_features, config)[source]¶ This class implements a smarter way of searching good parameters and sets of features.
Read more in the User Guide.
Parameters: - base_models_ids (list) – The list of base models’ ids to keep track of.
- all_features (list) – A list containing all available features.
- config (miraiml.Config) – The configuration of the engine.
-
parameters_features_to_dataframe
(parameters, features, score)[source]¶ Creates an entry for a history.
Parameters: - parameters (list) – The set of parameters to transform.
- parameters – The set of features to transform.
- score (float) – The score to transform.
-
register_base_model
(id, base_model, score)[source]¶ Registers the performance of a base model and its characteristics.
Parameters: - id (str) – The id associated with the base model.
- base_model (miraiml.core.BaseModel) – The base model being registered.
- score (float) – The score of
base_model
.
-
is_ready
(id)[source]¶ Tells whether the history of an id is large enough for more advanced strategies or not.
Parameters: id (str) – The id to be inspected. Return type: bool Returns: Whether id
can be used to generate parameters and features lists or not.
-
seek
(id)[source]¶ Manages the search strategy for better solutions.
With a probability of 0.5, the random strategy will be chosen. If it’s not, the other strategies will be chosen with equal probabilities.
Parameters: id (str) – The id for which a new base model is required. Return type: miraiml.core.BaseModel Returns: The next base model for exploration. Raises: KeyError
ifparameters_rules
tries to access an invalid key.
-
random_search
(id)[source]¶ Generates completely random sets of parameters and features.
Parameters: all_features (list) – The list of available features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
naive_search
(id)[source]¶ Characteristics that achieved higher scores have independently higher chances of being chosen again.
Parameters: id (str) – The id for which we want a new set of parameters and features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
linear_regression_search
(id)[source]¶ Uses the history to model the score with a linear regression. Guesses the scores of n/2 random sets of parameters and features, where n is the size of the history. The one with the highest score is chosen.
Parameters: id (str) – The id for which we want a new set of parameters and features. Return type: tuple Returns: (parameters, features)
Respectively, the dictionary of parameters and the list of features that can be used to generate a new base model.
-
class
miraiml.core.
Ensembler
(base_models_ids, y_train, train_predictions_df, test_predictions_df, scores, config)[source]¶ Performs the ensemble of the base models and optimizes its weights.
Read more in the User Guide.
Parameters: - y_train (pandas.Series or numpy.ndarray) – The target column.
- base_models_ids (list) – The list of base models’ ids to keep track of.
- train_predictions_df (pandas.DataFrame) – The dataframe of predictions for the training dataset.
- test_predictions_df (pandas.DataFrame) – The dataframe of predictions for the testing dataset.
- scores (dict) – The dictionary of scores.
- config (miraiml.Config) – The configuration of the engine.
-
interrupt
()[source]¶ Sets an internal flag to interrupt the optimization process on the first opportunity.
-
gen_weights
()[source]¶ Generates the ensemble weights according to the score of each base model. Higher scores have higher chances of generating higher weights.
Return type: dict Returns: A dictionary containing the weights for each base model id.
-
ensemble
(weights)[source]¶ Performs the ensemble of the current predictions of each base model.
Parameters: weights (dict) – A dictionary containing the weights related to the id of each base model. Return type: tuple Returns: (train_predictions, test_predictions, score)
train_predictions
: The ensemble predictions for the training datasettest_predictions
: The ensemble predictions for the testing datasetscore
: The score of the ensemble on the training dataset
-
class
miraiml.core.
BasePipelineClass
(**params)[source]¶ This is the base class for your custom pipeline classes.
Warning
Instantiating this class directly does not work.
-
get_params
()[source]¶ Gets the list of parameters that can be set.
Parameters: X (iterable) – Data to predict on. Return type: list Returns: The list of allowed parameters
-
set_params
(**params)[source]¶ Sets the parameters for the pipeline. You can check the parameters that are allowed to be set by calling
get_params()
.Return type: miraiml.core.BasePipelineClass Returns: self
-
fit
(X, y)[source]¶ Fits the pipeline to
X
usingy
as the target.Parameters: - X (iterable) – The training data.
- y (iterable) – The target.
Return type: Returns: self
-
miraiml.util¶
miraiml.util
provides utility functions that are used by higher level
modules.
-
miraiml.util.
load
(path)[source]¶ A clean pickle.load wrapper for binary files.
Parameters: path (string) – The path of the binary file to be loaded. Return type: object Returns: The loaded object.
-
miraiml.util.
dump
(obj, path)[source]¶ Optimizes the process of writing objects on disc by triggering a thread.
Parameters: - obj (object) – The object to be dumped to the binary file.
- path (string) – The path of the binary file to be written.
-
miraiml.util.
sample_random_len
(lst)[source]¶ Returns a sample of random size from the list
lst
. The minimum length of the returned list is 1.Parameters: lst (list) – A list containing the elements to be sampled. Return type: sampled_lst: list Returns: The randomly sampled elements from lst
.