Enhancing PrediXcan to improve power and specificity
PrediXcan method has proven to be successful and widely adopted. However there are several aspects that can be improved to increase detection and reduce false discoveries. A substantial portion of my effort will be devoted to developing methods to achieve this goal.
Reducing spurious results due to LD contamination
Linkage disequilibrium (LD, correlation of genetic variation within proximal genomic regions) can lead to associated genes and phenotypes even when the underlying causal variants are distinct. This is a difficult problem given the widespread LD in the population. We are examining several approaches to mitigate this problem. One is by using fine-mapping methods, which seeks to increase the resolution of the association and narrow down the causal variant to a single variant. Another one is to use orthogonal approaches that seek to cancel out the confounding by averaging across multiple independent genes.
Methods to quantify mediating role of molecular traits that are robust to LD contamination
The goal is to quantify the proportion of phenotypic variability that can be explained by the regulation of transcript levels or other molecular intermediate phenotypes. We call this concept regulability in analogy to the heritability concept. This quantity can be tissue dependent. Characterizing the relative importance of different tissues and brain regions will provide insight into the etiology of complex diseases. Moreover, this method can be applied to quantify the contribution of a gene set or pathway, which provides not only significance but also magnitude of the effect.
Open sharing of software, methods, and public databases of resources and results
Users across the globe are using PrediXcan and related methods and resources. We will continue hosting the resources and providing support to users.
Build a catalog of the function of every human gene (PhenomeXcan)
One of the important feature of PrediXcan is the ability to identify target genes. My team and We will apply state of the art methods from my lab to large scale data from UK Biobank and other public repositories such as dbGaP to build a catalog of the function of every human gene.
Extend PrediXcan to imaging (ImageXcan)
Environmental perturbations have a large impact in phenotypes both in concert and independently of genetic factors. We will investigate the effect of environment and its interaction with genetic factors on complex phenotypes. Large scale biobanks, such as UK Biobank and the recently started All of Us, offer an unprecedented opportunity to study gene-environment interaction with high dimensional data on millions of individuals. We will start to extending the ideas of PrediXcan from the transcriptome to the “radiome”, the high dimensional data available in MRIs and other images of the human body. We expect this line of research to provide important insights into the gene and gene-environment contribution to complex diseases. It will be a focus of my R01 application anticipated to be submitted in October 2019.
Multi-ethnic and multi-omic extension of PrediXcan
Until recently, large reference transcriptome datasets were composed of individuals of mostly European descent. In response to the need of multi-ethnic representation, The NHLBI Trans-omics for Precision Medicine (TOPMed) Program has selected samples from the Multi-Ethnic Study of Atherosclerosis (MESA) for the pilot phase of trans-omic data generation. This provides the opportunity to address the limitations of PrediXcan by training prediction models in multiple ethnic backgrounds. Therefore, We propose here to adapt my prediction pipelines for the multi-ethnic RNA-seq, methylomic, and proteomic data from MESA. These prediction models will allow propagation of molecular information from the subset of assayed individuals to the much larger set for which only genetic data are available. Leveraging these higher quality models, We anticipate achieving improved functional data interpretation and multi-ethnic performance.