clarite.modify.colfilter_percent_zero

clarite.modify.colfilter_percent_zero(data:pandas.core.frame.DataFrame, filter_percent:float=90.0, skip:Union[str, List[str], NoneType]=None, only:Union[str, List[str], NoneType]=None)

Remove continuous variables which have <proportion> or more values of zero (excluding NA)

Parameters
data: pd.DataFrame

The DataFrame to be processed and returned

filter_percent: float, default 90.0

If the percentage of rows in the data with a value of zero is greater than or equal to this value, the variable is filtered out.

skip: str, list or None (default is None)

List of variables that the filter should not be applied to

only: str, list or None (default is None)

List of variables that the filter should only be applied to

Returns
data: pd.DataFrame

The filtered DataFrame

Examples

>>> import clarite
>>> nhanes_filtered = clarite.modify.colfilter_percent_zero(nhanes_filtered)
================================================================================
Running colfilter_percent_zero
--------------------------------------------------------------------------------
WARNING: 36 variables need to be categorized into a type manually
Testing 483 of 483 continuous variables
        Removed 30 (6.21%) tested continuous variables which were equal to zero in at least 90.00% of non-NA observations.