imputr.imputers._base#
Module Contents#
Classes#
Abstract base class for imputer classes. |
- class imputr.imputers._base._BaseImputer(data: pandas.DataFrame, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None)#
Bases:
abc.ABCAbstract base class for imputer classes.
This class contains a number of generic implementations that are relevant for all Imputer subclasses. It also contains generic implementation of methods that can be used by the subclasses, but may also be overwritten.
- Parameters
- table :imputr.domain.Table#
- predefined_order :Dict[str, int]#
- predefined_strategies :Dict[str, Dict]#
- strategies :Dict[str, imputr.strategy._base._BaseStrategy]#
- ordered_columns :List[imputr.domain.Column]#
- include_non_missing :bool#
- abstract impute() pandas.DataFrame#
Imputes dataset as configured in the framework.
- Returns
pd.DataFrame (Imputed dataset.)
- _determine_order(columns: List[imputr.domain.Column], predefined_strategies: Dict[str, imputr.strategy._base._BaseStrategy], predefined_order: Dict[str, int] = None) List[imputr.domain.Column]#
Determines the imputation order based on the predefined order, imputation strategy type and the number of missing values. The algorithm looks at predefined order first, then whether the column has a univariate or multivariate strategy and finally the number of missing values. Ranks univariate strategies before multivariate strategies and less number of missing values before more number of missing values.
- Parameters
columns (List[Column]) – The columns that will undergo sequential imputation.
predefined_strategies (Dict[str, _BaseStrategy]) – Dictionary of of column names and their respective strategy that the imputer will use.
predefined_order (Dict[str, int] (optional)) – Dictionary of predefined order in which the imputation must be done.
- Returns
List[Column] (returns List of Column references in imputation order.)
- _determine_list_of_included_columns(predefined_strategies: Dict[str, Dict] = None, predefined_order: Dict[str, int] = None, include_non_missing: bool = False) List[imputr.domain.Column]#
Determines List of columns that need fitting of imputation strategies.
By default includes all columns that have missing value, a defined strategy or defined order.
- Parameters
predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.
predefined_order (Dict[int, str] (optional)) – Contains predefined order as defined in public API. Defaults to None
include_non_missing (bool) – Boolean flag that describes whether all column need fitting. Defaults to None.
Returns – List[Column]: List of columns that need strategy fitting.
- _construct_strategies(default_strategy: imputr.strategy._base._BaseStrategy, predefined_strategies: Dict[str, Dict] = None) Dict[str, imputr.strategy._base._BaseStrategy]#
Constructs strategies to prepare for fitting and imputation.
- Parameters
strategies (Dict[str, Dict] (optional)) – Contains name - Dict as defined in public API. Defaults to None.
Returns – Dict[str, _BaseStrategy]: Contains strategy for each column.
- str_to_strategy(string_name: str) imputr.strategy._base._BaseStrategy#
Returns the strategy class type for given string abbreviation.
- Parameters
string_name (str) – The string abbrevaion of the imputation strategy.
- Returns
_BaseStrategy (the imputation strategy class type.)
- impute() pandas.DataFrame#
Imputes dataframe with specified strategies.
Overwrite this method if you wish to implement different imputation behavior.
- Returns
pd.DataFrame – imputed dataset.