imputr.strategy#
Submodules#
Package Contents#
Classes#
Strategy implementation for RandomForest-based imputation. |
|
Mean imputation strategy. Imputes calculated mean for numeric columns |
- class imputr.strategy.RandomForestStrategy(target_column: imputr.domain.Column, feature_columns: List[imputr.domain.Column], n_estimators: int = 64, max_depth: int = 8, min_sample_split: int = 512, min_samples_leaf: int = 128, min_weight_fraction_leaf: float = 0.35, max_features: Union[str, float] = 'sqrt', max_leaf_nodes: int = 32)#
Bases:
imputr.strategy._base._MultivariateStrategyStrategy implementation for RandomForest-based imputation.
- Parameters
target_column (Column) – The column which needs imputation.
feature_columns (List[Column]) – The predictor columns for the Random Forest to train on.
data_type (Union[str, DataType] (optional)) – The string or enum representation of the data_type.
n_estimators (int (optional)) – Number of decision trees used in the forest. Please refer …
max_depth (int (optional)) – Maximum depth of decision trees used in the forest. Please refer …
min_sample_split (int (optional)) – Minimum sample split of decision trees. Please refer …
min_samples_leaf (int (optional)) – Minimum samples at leaves of decision trees. Please refer …
min_weight_fraction_leaf (float (optional)) – Minimum weight fractions of leaves of decision trees. Please refer…
max_features (Union[str, float] (optional)) – Max features used per decision tree. Can be fraction or identifier like sqrt. Please refer…
max_leaf_nodes (int (optional)) – Max number of nodes at leaves of the decision trees. Please refer…
- supported_data_types :List#
- classmethod from_dict(target_column: imputr.domain.Column, feature_columns: List[imputr.domain.Column], **kwargs: Dict)#
Class constructor that uses the dictionary to build strategy.
Uses a part of the dictionary given to imputer constructor.
- Parameters
target_column (Column) – Column that needs imputation by strategy.
- fit() None#
Fits RandomForest to make ready for imputation.
Looks at DataType to determine if it needs a Regressor or Classifier. The scikit APIs are the same for both models, which is why we use the estimator_cls variable.
- impute_column() pandas.Series#
Imputes all null values with the Random Forest and unions with non-null values.
TODO: Refactor this in general method for better reuse.
- Returns
pd.Series (fully imputed data column.)
- class imputr.strategy.MeanStrategy(target_column: imputr.domain.Column)#
Bases:
imputr.strategy._base._UnivariateStrategyMean imputation strategy. Imputes calculated mean for numeric columns and median for categoric columns.
- supported_data_types :List#
- classmethod from_dict(target_column: imputr.domain.Column, **kwargs: Dict)#
Class constructor that uses the dictionary to build strategy.
Uses a part of the dictionary given to imputer constructor.
- Parameters
target_column (Column) – Column that needs imputation by strategy.
- impute_column() pandas.Series#
Imputes column with mean value.
- Returns
pd.Series (fully imputed data column.)