ewas(phenotype:str, covariates:List[str], data:pandas.core.frame.DataFrame, survey_design_spec:Union[clarite.modules.survey.survey_design.SurveyDesignSpec, NoneType]=None, cov_method:Union[str, NoneType]='stata', min_n:Union[int, NoneType]=200)¶
Run an EWAS on a phenotype.
Binary variables are treated as continuous features, with values of 0 and 1.
The results of a likelihood ratio test are used for categorical variables, so no Beta values or SE are reported.
The regression family is automatically selected based on the type of the phenotype. * Continuous phenotypes use gaussian regression * Binary phenotypes use binomial regression (the larger of the two values is counted as “success”)
Categorical variables run with a survey design will not report Diff_AIC
- phenotype: string
The variable to be used as the output of the regressions
- covariates: list (strings),
The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.
- data: pd.DataFrame
The data to be analyzed, including the phenotype, covariates, and any variables to be regressed.
- survey_design_spec: SurveyDesignSpec or None
A SurveyDesignSpec object is used to create SurveyDesign objects for each regression.
- cov_method: str or None
Covariance calculation method (if survey_design_spec is passed in). ‘stata’ or ‘jackknife’
- min_n: int or None
Minimum number of complete-case observations (no NA values for phenotype, covariates, variable, or weight) Defaults to 200
- df: pd.DataFrame
EWAS results DataFrame with these columns: [‘variable_type’, ‘N’, ‘beta’, ‘SE’, ‘var_pvalue’, ‘LRT_pvalue’, ‘diff_AIC’, ‘pvalue’]
>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery) Running EWAS on a continuous variable