imputr.domain.column#
Module Contents#
Classes#
Data class that encapsulates the data and imputr-specific metadata of a column. |
- class imputr.domain.column.Column(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None)#
Data class that encapsulates the data and imputr-specific metadata of a column.
- Parameters
- property imputed_data: pandas.Series#
Gets imputed data.
If the data has not been imputed by any strategy yet, it interally sets the _imputed_data value as average-based imputed pd.Series (mode for discrete and mean for continuous) and returns it.
- Returns
pd.Series (imputed data of the Column object.)
- property numeric_encoded_imputed_data: pandas.Series#
Gets the imputed-then-numerically-encoded data.
Transforms categorical data types to incrementally labeled integer data. Calls the property getter of self._imputed_data.
- Returns
pd.Series (series containing in imputed data in numerically encoded form.)
- property null_indices: numpy.ndarray#
Returns np.ndarray of indexes where a null value is found.
Mutually exclusive with the non_null_indices property.
- Returns
np.ndarray (indexes where a null value is found)
- property non_null_indices: numpy.ndarray#
Returns np.ndarray of indexes where a non-null value is found.
Mutually exclusive with the null_indices property.
- Returns
np.ndarray (indexes where a non-null value is found)
- data :pandas.Series#
- name :str#
- type :imputr.domain.types.DataType#
- missing_value_count :int#
- unique_value_count :int#
- average :Union[bool, str, float]#
- _imputed_data :pandas.Series#
- _label_encoder :sklearn.preprocessing.LabelEncoder#
- _cast_data_if_necessary(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) pandas.Series#
If given data is numeric as defined in pandas’ isnumeric function, and given datatype is categorical, map to pandas object.
- _infer_data_type(column_data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) imputr.domain.types.DataType#
Helper method to infer the imputr-defined data type of a given column.
- _count_number_of_unique_values(column: pandas.Series) int#
Counts the number of unique values in a column. Includes NaN in the count.
- Returns
int (the number of unique values in a column.)
- _count_number_of_missing_values(column: pandas.Series) int#
Counts the number of missing values in a column.
- Returns
int (the number of missing values in a column.)
- _compute_average(column: pandas.Series, type: imputr.domain.types.DataType) Union[str, float]#
Calculates mode or mean of the given pd.Series.
If the column has a categorical type this method computes the mode of the column If has a continuous type it calculates the mean.
- Parameters
column (pd.Series) – The Pandas Series that contains the column data.
type (DataType) – The data type as modeled by the imputr library.
- Returns
Union[str, float] (Either the mode or the mean of the library.)