CLARITE functions are organized into several modules:
EWAS and associated calculations
Functions that are used to gather information about some data
Return variables with pearson correlation above the threshold
Return the count of each unique value for all binary and categorical variables.
Return the type of each variable
Return the percent of observations that are NA for each variable
Print the number of each type of variable and the number of observations
Load data from different formats or sources
Functions used to filter and/or change some data, always taking in one set of data and returning one set of data.
categorize(data, cat_min, cat_max, cont_min)
Classify variables into binary, categorical, continuous, and ‘unknown’.
colfilter(data, skip, List[str], …)
Remove some variables (skip) or keep only certain variables (only)
colfilter_percent_zero(data, filter_percent, …)
Remove continuous variables which have <proportion> or more values of zero (excluding NA)
colfilter_min_n(data, n, skip, List[str], …)
Remove variables which have less than <n> non-NA values
colfilter_min_cat_n(data, n, skip, …)
Remove binary and categorical variables which have less than <n> occurences of each unique value
make_binary(data, skip, List[str], …)
Set variable types as Binary
make_categorical(data, skip, List[str], …)
Set variable types as Categorical
make_continuous(data, skip, List[str], …)
Set variable types as Numeric
Merge two datasets, keeping only the columns present in both.
merge_variables(left, right, how)
Merge a list of dataframes with different variables side-by-side.
move_variables(left, right, skip, List[str], …)
Move one or more variables from one DataFrame to another
recode_values(data, replacement_dict, skip, …)
Convert values in a dataframe.
remove_outliers(data, method[, cutoff])
Remove outliers from continuous variables by replacing them with np.nan
rowfilter_incomplete_obs(data, skip, …)
Remove rows containing null values
transform(data, variable, transform, …)
Apply a transformation function to a variable
Functions that generate plots
histogram(data, column, figsize, int]=, …)
Plot a histogram of the values in the given column.
distributions(data, filename, …)
Create a pdf containing histograms for each binary or categorical variable, and one of several types of plots for each continuous variable.
manhattan(dfs, pandas.core.frame.DataFrame], …)
Create a Manhattan-like plot for a list of EWAS Results