Selected Published Research

Published research

PrediXcan, a regulatory mechanism driven gene discovery tool

Although investments in genomic studies of complex diseases made possible the discovery of thousands of variants robustly associated with these traits, the translation of these discoveries into actionable targets has been hampered by the lack of a mechanistic understanding on how genome variation relates to phenotype. Moreover, it has been widely shown that a substantial portion of the genetic control of complex traits is exerted through the regulation of gene expression traits, but effective methods to fully harness this mechanism were not available. To address these challenges, we developed a gene-based test —PrediXcan— that directly tests this regulatory mechanism and substantially improves power relative to single variant tests and other gene-based tests. PrediXcan is inherently mechanistic and provides directionality, highlighting its potential utility in identifying novel targets for therapy. PrediXcan was the first of the class of methods known currently as TWAS (transcriptome-wide association studies). More details can be found in this paper. The following figure shows a schematic description of the method.

PrediXcan tests the mediating effect of a target gene on disease risk. The left panel shows the components of the expression of a gene. PrediXcan tests the association between the genetically regulated component and the complex trait and is not affected by the reverse effects of the disease on the expression. The mechanistic model behind PrediXcan is shown on the right panel. The genetic profile of an individual determines a baseline gene expression level (genetically determined expression), which is further modified by environment and other factors. For deleterious causal genes this final level determines the liability to disease which can lead to disease if it surpasses certain threshold.

Scaling up PrediXcan to integrate large-scale data

Genetic studies have discovered many thousands of locations on chromosomes that are associated with health or disease, but the majority of these fall outside of protein-encoding regions, and so it is difficult to understand just how they exert their influences. PrediXcan leverages large scale DNA, observable trait, and gene expression datasets, as well as machine learning, to infer from these discoveries which genes are actually influenced and whether an up or downregulation of the gene is beneficial. Despite the appeal and success of this approach, which directly identifies candidate drug targets, accumulating experience revealed that data from over a million individuals would be required in many cases. We therefore developed an approach that bypasses the need to use individual level data and takes advantage of summary results, to vastly expand the applicability and power of the original PrediXcan method. The new approach, called S-PrediXcan, allows to find potential target genes for modulating thousands of traits across a broad set of human tissues. These results are now publicly available and have gathered users across the globe and led to funded collaborations. Most of the associations proved to be tissue-specific, suggesting context specificity of the trait etiology. Significant associations in unexpected tissues also underscored the need for an agnostic scanning of multiple contexts to improve the ability to detect causal regulatory mechanisms. Monogenic disease genes were enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes. More details can be found in this paper

Genetic architecture of expression traits across tissues

We investigated the genetic architecture of gene expression traits, which are important intermediate processes to understand the mechanisms by which genetic variation affect complex diseases and traits. We found that in contrast to the polygenic nature of complex diseases, most of the variation of gene expression traits is driven by a small number of variants (eQTLs). In addition to providing insight into the biology of the transcriptome, this finding has practical implications for the best approaches to predict gene expression traits. More details can be found in this paper

Boosting the power to identify target genes by exploiting the tissue sharing of gene expression regulation

An unexpected finding from the GTEx consortium (large scale transcriptome study of 900 organ donors with whole body tissue sampled across 49 tissues) was the widespread sharing of genetic regulation of expression traits across tissues. We estimated that more than half of the eQTLs (variants associated with expression levels) are active across all tissues. One implication of this is that eQTL-based studies will have little specificity to narrow down the tissue where disease initiates. However, one can take advantage of this lack of specificity and consider the different tissue panels as independent experiments and aggregate the information across all of them. We developed MultiXcan to implement this idea and show its effectiveness. More details can be found in this paper

Application to childhood and adult onset asthma data in the UK Biobank

Regardless of how sophisticated they may be, analytical methods only matter if they can answer relevant scientific questions. We teamed up with Carole Ober and Dan Nicolae, experts in asthma genetics, to examine the shared and distinct genetic factors of childhood and adult onset asthma risk applying state of the art methods implemented in our lab’s analytical pipeline. For the first time, data on age of onset of asthma became available through the UK Biobank for a sufficiently large number of individuals. We seized the opportunity to identify novel genetic factors that are specific to childhood onset cases as well as those shared with adult onset cases. We found that genetic factors of adult onset asthma are a subset of childhood onset asthma with overall smaller effects suggesting a larger environmental component in adult cases. Tissue enrichment analysis suggested the role of allergy and epithelial barrier disfunction in childhood onset asthma and lung involvement in adult onset asthma. Both types shared immune components. More details can be found in this paper