imputr.domain#

Submodules#

Package Contents#

Classes#

Column

Data class that encapsulates the data and imputr-specific metadata of a column.

DataType

Enum class that represents the various data types that the library is able to

Table

Data class that encapsulates the data and imputr-specific metadata of a table.

class imputr.domain.Column(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None)#

Data class that encapsulates the data and imputr-specific metadata of a column.

Parameters
  • data (pd.Series) – The Pandas Series that contains the column data.

  • data_type (Union[str, DataType] (optional)) – The imputr DataType specified per string or DataType enum class.

property imputed_data: pandas.Series#

Gets imputed data.

If the data has not been imputed by any strategy yet, it interally sets the _imputed_data value as average-based imputed pd.Series (mode for discrete and mean for continuous) and returns it.

Returns

pd.Series (imputed data of the Column object.)

property numeric_encoded_imputed_data: pandas.Series#

Gets the imputed-then-numerically-encoded data.

Transforms categorical data types to incrementally labeled integer data. Calls the property getter of self._imputed_data.

Returns

pd.Series (series containing in imputed data in numerically encoded form.)

property null_indices: numpy.ndarray#

Returns np.ndarray of indexes where a null value is found.

Mutually exclusive with the non_null_indices property.

Returns

np.ndarray (indexes where a null value is found)

property non_null_indices: numpy.ndarray#

Returns np.ndarray of indexes where a non-null value is found.

Mutually exclusive with the null_indices property.

Returns

np.ndarray (indexes where a non-null value is found)

data :pandas.Series#
name :str#
type :imputr.domain.types.DataType#
missing_value_count :int#
unique_value_count :int#
average :Union[bool, str, float]#
_imputed_data :pandas.Series#
_label_encoder :sklearn.preprocessing.LabelEncoder#
_cast_data_if_necessary(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) pandas.Series#

If given data is numeric as defined in pandas’ isnumeric function, and given datatype is categorical, map to pandas object.

_infer_data_type(column_data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) imputr.domain.types.DataType#

Helper method to infer the imputr-defined data type of a given column.

Parameters
  • column (pd.Series) – The column for which the data type must be determined.

  • data_type (Union[str, DataType]) – String or DataType enum representing imputr data type.

Returns

DataType (The data type as modeled by the imputr library.)

_count_number_of_unique_values(column: pandas.Series) int#

Counts the number of unique values in a column. Includes NaN in the count.

Returns

int (the number of unique values in a column.)

_count_number_of_missing_values(column: pandas.Series) int#

Counts the number of missing values in a column.

Returns

int (the number of missing values in a column.)

_compute_average(column: pandas.Series, type: imputr.domain.types.DataType) Union[str, float]#

Calculates mode or mean of the given pd.Series.

If the column has a categorical type this method computes the mode of the column If has a continuous type it calculates the mean.

Parameters
  • column (pd.Series) – The Pandas Series that contains the column data.

  • type (DataType) – The data type as modeled by the imputr library.

Returns

Union[str, float] (Either the mode or the mean of the library.)

class imputr.domain.DataType#

Bases: enum.Enum

Enum class that represents the various data types that the library is able to impute for and with.

Currently only contains categorical and continuous. Future releases may contain specific enumertions for discrete, discrete-ordinal and a separate datetime type.

CATEGORICAL = [1]#
CONTINUOUS = 2#
classmethod str_to_data_type(string_name: str)#

Maps string to imputr DataType.

class imputr.domain.Table(data: pandas.DataFrame, predefined_datatypes: Dict[str, Union[str, imputr.domain.DataType]] = None)#

Data class that encapsulates the data and imputr-specific metadata of a table.

Variables
  • data (pd.DataFrame) – The Pandas DataFrame that contains the table data.

  • columns (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

Parameters
  • data (pd.DataFrame) – The Pandas DataFrame that contains the table data.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

data :pandas.DataFrame#
columns :List[imputr.domain.Column]#
_construct_columns(data: pandas.DataFrame, predefined_datatypes) List[imputr.domain.Column]#

Loops over dataframe columns to construct Column objects.

Parameters
  • data (pd.DataFrame) – The Pandas DataFrame that contains the columns.

  • predefined_datatypes (Dict[str, Union[str, DataType]] (optional)) – Dictionary that has column names as key and the data type as specified in the Column constructor as value.

Returns

List[Column] (the List of constructed Column objects.)