`imputr.strategy`#

Submodules#

Package Contents#

Classes#

`RandomForestStrategy`	Strategy implementation for RandomForest-based imputation.
`MeanStrategy`	Mean imputation strategy. Imputes calculated mean for numeric columns

class imputr.strategy.RandomForestStrategy(target_column: imputr.domain.Column, feature_columns: List[imputr.domain.Column], n_estimators: int = 64, max_depth: int = 8, min_sample_split: int = 512, min_samples_leaf: int = 128, min_weight_fraction_leaf: float = 0.35, max_features: Union[str, float] = 'sqrt', max_leaf_nodes: int = 32)#

Bases: imputr.strategy._base._MultivariateStrategy

Strategy implementation for RandomForest-based imputation.

Parameters

target_column (Column) – The column which needs imputation.
feature_columns (List[Column]) – The predictor columns for the Random Forest to train on.
data_type (Union[str, DataType] (optional)) – The string or enum representation of the data_type.
n_estimators (int (optional)) – Number of decision trees used in the forest. Please refer …
max_depth (int (optional)) – Maximum depth of decision trees used in the forest. Please refer …
min_sample_split (int (optional)) – Minimum sample split of decision trees. Please refer …
min_samples_leaf (int (optional)) – Minimum samples at leaves of decision trees. Please refer …
min_weight_fraction_leaf (float (optional)) – Minimum weight fractions of leaves of decision trees. Please refer…
max_features (Union[str, float] (optional)) – Max features used per decision tree. Can be fraction or identifier like sqrt. Please refer…
max_leaf_nodes (int (optional)) – Max number of nodes at leaves of the decision trees. Please refer…

supported_data_types :List#

classmethod from_dict(target_column: imputr.domain.Column, feature_columns: List[imputr.domain.Column], **kwargs: Dict)#

Class constructor that uses the dictionary to build strategy.

Uses a part of the dictionary given to imputer constructor.

Parameters: target_column (Column) – Column that needs imputation by strategy.

fit() → None#

Fits RandomForest to make ready for imputation.

Looks at DataType to determine if it needs a Regressor or Classifier. The scikit APIs are the same for both models, which is why we use the estimator_cls variable.

impute_column() → pandas.Series#

Imputes all null values with the Random Forest and unions with non-null values.

TODO: Refactor this in general method for better reuse.

Returns: pd.Series (fully imputed data column.)

class imputr.strategy.MeanStrategy(target_column: imputr.domain.Column)#

Bases: imputr.strategy._base._UnivariateStrategy

Mean imputation strategy. Imputes calculated mean for numeric columns and median for categoric columns.

supported_data_types :List#

classmethod from_dict(target_column: imputr.domain.Column, **kwargs: Dict)#