imputr#

Subpackages#

Submodules#

Package Contents#

Classes#

AutoImputer

Automatic imputation class that implements the RandomForest strategy

MeanImputer

Simple imputation class that uses average imputation

class imputr.AutoImputer(data: pandas.DataFrame, predefined_order: Dict[str, int] = None, predefined_strategies: Dict[str, Dict] = None, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None, include_non_missing: bool = False)#

Bases: imputr.imputers._base._BaseImputer

Automatic imputation class that implements the RandomForest strategy as main imputation method. Can be configured to implement other strategies for specific columns and a custom imputation order.

Variables
  • predefined_order (Dict[int, str] (optional)) – Dictionary of column names and their order for imputation. Keys must be incremental starting from zero: 0, 1, 2

  • strategies (Dict[str, Dict] (optional)) – Dictionary of column name and strategy kwargs.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

Parameters
  • data (pd.DataFrame) – The dataframe which undergoes imputation.

  • predefined_order (Dict[int, str] (optional)) – Dictionary of column names and their order for imputation. Keys must be incremental starting from zero: 0, 1, 2

  • predefined_strategies (Dict[str, Dict] (optional)) – Dictionary of column name and strategy kwargs.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

  • include_non_missing (bool (optional)) – Flag to indicate whether columns without missing value need fitting of strategies. Default is set to False.

strategies :Dict[str, imputr.strategy._base._BaseStrategy]#
ordered_columns :List[imputr.domain.Column]#
included_columns :List[imputr.domain.Column]#
class imputr.MeanImputer(data: pandas.DataFrame, predefined_order: Dict[str, int] = None, predefined_strategies: Dict[str, Dict] = None, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None, include_non_missing: bool = False)#

Bases: imputr.imputers._base._BaseImputer

Simple imputation class that uses average imputation as main imputation method. Uses mode for categorical and mean for continuous columns. Can be configured to implement other strategies for specific columns and a custom imputation order.

Parameters
  • data (pd.DataFrame) – The dataframe which undergoes imputation.

  • predefined_order (Dict[int, str] (optional)) – Dictionary of column names and their order for imputation. Keys must be incremental starting from zero: 0, 1, 2

  • predefined_strategies (Dict[str, Dict] (optional)) – Dictionary of column name and strategy kwargs.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

  • include_non_missing (bool (optional)) – Flag to indicate whether columns without missing value need fitting of strategies. Default is set to False.

predefined_order :Dict[str, int]#
predefined_strategies :Dict[str, Dict]#
strategies :Dict[str, imputr.strategy._base._BaseStrategy]#
ordered_columns :List[imputr.domain.Column]#
include_non_missing :bool#