The overarching goal of our research program is to develop computationally efficient and statistically sound methods to sift through the vast amounts of genomic and other high dimensional data with the goal of making discoveries that can be translated to improve human health.

To achieve this goal, we build and maintain an analytic infrastructure, i.e. a dry lab. This includes statistical and computational methods, pipelines and workflows, streamlined access to big data, and a research software engineering, bioinformatic and analysis team. This infrastructure constitutes a powerful instrument that gives us a unique vantage point from which discoveries can be made.

The cards below describe some of the highlights of our research.

Selected Published Research

𝔸lthough investments in genomic studies of complex diseases made possible the discovery of thousands of variants robustly associated with these traits, the translation of these discoveries into actionable targets has been hampered by the lack of a mechanistic understanding on how genome variation relates to phenotype. Moreover, it has been widely shown that a substantial portion of the genetic control of complex traits is exerted through the regulation of gene expression traits, but effective methods to fully harness this mechanism were not available. To address these challenges, we developed a gene-based test —PrediXcan— that directly tests this regulatory mechanism and substantially improves power relative to single variant tests and other gene-based tests. PrediXcan is inherently mechanistic and provides directionality, highlighting its potential utility in identifying novel targets for therapy. PrediXcan was the first of the class of methods known currently as TWAS … Read more →

Ongoing Research

ℙrediXcan method has proven to be successful and widely adopted. However there are several aspects that can be improved to increase detection and reduce false discoveries. A substantial portion of my effort will be devoted to developing methods to achieve this goal. […] Linkage disequilibrium (LD, correlation of genetic variation within proximal genomic regions) can lead to associated genes and phenotypes even when the underlying causal variants are distinct. This is a difficult problem given the widespread LD in the population. We are examining several approaches to mitigate this problem. One is by using fine-mapping methods, which seeks to increase the resolution of the association and narrow down the causal variant to a single variant. Another one is to use orthogonal approaches that seek to cancel out the confounding by averaging across multiple independent genes. […] The goal is to quantify the proportion of phenotypic variability that can be explained by the … Read more →

Other Published Research

𝔾enomic privacy and data sharing are issues of relevance for the whole scientific community. Protecting the privacy of individuals who participate in a study has always been a top priority and it has been widely assumed that publishing summary results did not jeopardize privacy. In 2008, Homer et al found that in the case of genome wide association studies (GWAS), summary results such as allele frequencies for a large number of genetic variants can reveal whether a person participated in a study and the disease status of the individual. These results forced the NIH to withdraw most of the public access to GWAS study results. We were interested in sharing results from quantitative traits such as gene expression phenotypes, which provide critical information on the regulatory role of genetic variants. The question here was whether publishing regression coefficients from GWAS would also allow re-identification. Dr. Im proved mathematically that re-identification based on regression … Read more →