imputr.imputers._base#

Module Contents#

Classes#

_BaseImputer

Abstract base class for imputer classes.

class imputr.imputers._base._BaseImputer(data: pandas.DataFrame, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None)#

Bases: abc.ABC

Abstract base class for imputer classes.

This class contains a number of generic implementations that are relevant for all Imputer subclasses. It also contains generic implementation of methods that can be used by the subclasses, but may also be overwritten.

Parameters
  • data (pd.DataFrame) – The dataframe which undergoes imputation.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

table :imputr.domain.Table#
predefined_order :Dict[str, int]#
predefined_strategies :Dict[str, Dict]#
strategies :Dict[str, imputr.strategy._base._BaseStrategy]#
ordered_columns :List[imputr.domain.Column]#
include_non_missing :bool#
abstract impute() pandas.DataFrame#

Imputes dataset as configured in the framework.

Returns

pd.DataFrame (Imputed dataset.)

_determine_order(columns: List[imputr.domain.Column], predefined_strategies: Dict[str, imputr.strategy._base._BaseStrategy], predefined_order: Dict[str, int] = None) List[imputr.domain.Column]#

Determines the imputation order based on the predefined order, imputation strategy type and the number of missing values. The algorithm looks at predefined order first, then whether the column has a univariate or multivariate strategy and finally the number of missing values. Ranks univariate strategies before multivariate strategies and less number of missing values before more number of missing values.

Parameters
  • columns (List[Column]) – The columns that will undergo sequential imputation.

  • predefined_strategies (Dict[str, _BaseStrategy]) – Dictionary of of column names and their respective strategy that the imputer will use.

  • predefined_order (Dict[str, int] (optional)) – Dictionary of predefined order in which the imputation must be done.

Returns

List[Column] (returns List of Column references in imputation order.)

_determine_list_of_included_columns(predefined_strategies: Dict[str, Dict] = None, predefined_order: Dict[str, int] = None, include_non_missing: bool = False) List[imputr.domain.Column]#

Determines List of columns that need fitting of imputation strategies.

By default includes all columns that have missing value, a defined strategy or defined order.

Parameters
  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

  • predefined_order (Dict[int, str] (optional)) – Contains predefined order as defined in public API. Defaults to None

  • include_non_missing (bool) – Boolean flag that describes whether all column need fitting. Defaults to None.

  • Returns – List[Column]: List of columns that need strategy fitting.

_construct_strategies(default_strategy: imputr.strategy._base._BaseStrategy, predefined_strategies: Dict[str, Dict] = None) Dict[str, imputr.strategy._base._BaseStrategy]#

Constructs strategies to prepare for fitting and imputation.

Parameters
  • strategies (Dict[str, Dict] (optional)) – Contains name - Dict as defined in public API. Defaults to None.

  • Returns – Dict[str, _BaseStrategy]: Contains strategy for each column.

str_to_strategy(string_name: str) imputr.strategy._base._BaseStrategy#

Returns the strategy class type for given string abbreviation.

Parameters

string_name (str) – The string abbrevaion of the imputation strategy.

Returns

_BaseStrategy (the imputation strategy class type.)

impute() pandas.DataFrame#

Imputes dataframe with specified strategies.

Overwrite this method if you wish to implement different imputation behavior.

Returns

pd.DataFrame – imputed dataset.