imputr.domain#
Submodules#
Package Contents#
Classes#
Data class that encapsulates the data and imputr-specific metadata of a column. |
|
Enum class that represents the various data types that the library is able to |
|
Data class that encapsulates the data and imputr-specific metadata of a table. |
- class imputr.domain.Column(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None)#
Data class that encapsulates the data and imputr-specific metadata of a column.
- Parameters
- property imputed_data: pandas.Series#
Gets imputed data.
If the data has not been imputed by any strategy yet, it interally sets the _imputed_data value as average-based imputed pd.Series (mode for discrete and mean for continuous) and returns it.
- Returns
pd.Series (imputed data of the Column object.)
- property numeric_encoded_imputed_data: pandas.Series#
Gets the imputed-then-numerically-encoded data.
Transforms categorical data types to incrementally labeled integer data. Calls the property getter of self._imputed_data.
- Returns
pd.Series (series containing in imputed data in numerically encoded form.)
- property null_indices: numpy.ndarray#
Returns np.ndarray of indexes where a null value is found.
Mutually exclusive with the non_null_indices property.
- Returns
np.ndarray (indexes where a null value is found)
- property non_null_indices: numpy.ndarray#
Returns np.ndarray of indexes where a non-null value is found.
Mutually exclusive with the null_indices property.
- Returns
np.ndarray (indexes where a non-null value is found)
- data :pandas.Series#
- name :str#
- type :imputr.domain.types.DataType#
- missing_value_count :int#
- unique_value_count :int#
- average :Union[bool, str, float]#
- _imputed_data :pandas.Series#
- _label_encoder :sklearn.preprocessing.LabelEncoder#
- _cast_data_if_necessary(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) pandas.Series#
If given data is numeric as defined in pandas’ isnumeric function, and given datatype is categorical, map to pandas object.
- _infer_data_type(column_data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) imputr.domain.types.DataType#
Helper method to infer the imputr-defined data type of a given column.
- _count_number_of_unique_values(column: pandas.Series) int#
Counts the number of unique values in a column. Includes NaN in the count.
- Returns
int (the number of unique values in a column.)
- _count_number_of_missing_values(column: pandas.Series) int#
Counts the number of missing values in a column.
- Returns
int (the number of missing values in a column.)
- _compute_average(column: pandas.Series, type: imputr.domain.types.DataType) Union[str, float]#
Calculates mode or mean of the given pd.Series.
If the column has a categorical type this method computes the mode of the column If has a continuous type it calculates the mean.
- Parameters
column (pd.Series) – The Pandas Series that contains the column data.
type (DataType) – The data type as modeled by the imputr library.
- Returns
Union[str, float] (Either the mode or the mean of the library.)
- class imputr.domain.DataType#
Bases:
enum.EnumEnum class that represents the various data types that the library is able to impute for and with.
Currently only contains categorical and continuous. Future releases may contain specific enumertions for discrete, discrete-ordinal and a separate datetime type.
- CATEGORICAL = [1]#
- CONTINUOUS = 2#
- class imputr.domain.Table(data: pandas.DataFrame, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None)#
Data class that encapsulates the data and imputr-specific metadata of a table.
- Variables
- Parameters
- data :pandas.DataFrame#
- columns :List[imputr.domain.Column]#
- _construct_columns(data: pandas.DataFrame, predefined_datatypes) List[imputr.domain.Column]#
Loops over dataframe columns to construct Column objects.
- Parameters
- Returns
List[Column] (the List of constructed Column objects.)