imputr.domain.column#

Module Contents#

Classes#

Column

Data class that encapsulates the data and imputr-specific metadata of a column.

class imputr.domain.column.Column(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None)#

Data class that encapsulates the data and imputr-specific metadata of a column.

Parameters
  • data (pd.Series) – The Pandas Series that contains the column data.

  • data_type (Union[str, DataType] (optional)) – The imputr DataType specified per string or DataType enum class.

property imputed_data: pandas.Series#

Gets imputed data.

If the data has not been imputed by any strategy yet, it interally sets the _imputed_data value as average-based imputed pd.Series (mode for discrete and mean for continuous) and returns it.

Returns

pd.Series (imputed data of the Column object.)

property numeric_encoded_imputed_data: pandas.Series#

Gets the imputed-then-numerically-encoded data.

Transforms categorical data types to incrementally labeled integer data. Calls the property getter of self._imputed_data.

Returns

pd.Series (series containing in imputed data in numerically encoded form.)

property null_indices: numpy.ndarray#

Returns np.ndarray of indexes where a null value is found.

Mutually exclusive with the non_null_indices property.

Returns

np.ndarray (indexes where a null value is found)

property non_null_indices: numpy.ndarray#

Returns np.ndarray of indexes where a non-null value is found.

Mutually exclusive with the null_indices property.

Returns

np.ndarray (indexes where a non-null value is found)

data :pandas.Series#
name :str#
type :imputr.domain.types.DataType#
missing_value_count :int#
unique_value_count :int#
average :Union[bool, str, float]#
_imputed_data :pandas.Series#
_label_encoder :sklearn.preprocessing.LabelEncoder#
_cast_data_if_necessary(data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) pandas.Series#

If given data is numeric as defined in pandas’ isnumeric function, and given datatype is categorical, map to pandas object.

_infer_data_type(column_data: pandas.Series, data_type: Union[str, imputr.domain.types.DataType] = None) imputr.domain.types.DataType#

Helper method to infer the imputr-defined data type of a given column.

Parameters
  • column (pd.Series) – The column for which the data type must be determined.

  • data_type (Union[str, DataType]) – String or DataType enum representing imputr data type.

Returns

DataType (The data type as modeled by the imputr library.)

_count_number_of_unique_values(column: pandas.Series) int#

Counts the number of unique values in a column. Includes NaN in the count.

Returns

int (the number of unique values in a column.)

_count_number_of_missing_values(column: pandas.Series) int#

Counts the number of missing values in a column.

Returns

int (the number of missing values in a column.)

_compute_average(column: pandas.Series, type: imputr.domain.types.DataType) Union[str, float]#

Calculates mode or mean of the given pd.Series.

If the column has a categorical type this method computes the mode of the column If has a continuous type it calculates the mean.

Parameters
  • column (pd.Series) – The Pandas Series that contains the column data.

  • type (DataType) – The data type as modeled by the imputr library.

Returns

Union[str, float] (Either the mode or the mean of the library.)