clarite.modify.make_binary(data:pandas.core.frame.DataFrame, skip:Union[str, List[str], NoneType]=None, only:Union[str, List[str], NoneType]=None)

Set variable types as Binary

Checks that each variable has at most 2 values and converts the type to pd.Categorical.

Note: When these variables are used in regression, they are ordered by value. For example, Sex (Male=1, Female=2) will encode “Male” as 0 and “Female” as 1 during the EWAS regression step.

data: pd.DataFrame or pd.Series

Data to be processed

skip: str, list or None (default is None)

List of variables that should not be made binary

only: str, list or None (default is None)

List of variables that are the only ones to be made binary

data: pd.DataFrame

DataFrame with the same data but validated and converted to binary types


>>> import clarite
>>> nhanes = clarite.modify.make_binary(nhanes, only=['female', 'black', 'mexican', 'other_hispanic'])
Running make_binary
Set 4 of 970 variable(s) as binary, each with 22,624 observations