PredictDB: Transcriptome Prediction Model Repository

Predictdb Here you can find transcriptome prediction models for the PrediXcan family of methods: S-PrediXcan, MultiXcan and S-MultiXcan. .db files are prediction models, usable by all methods. .txt.gz files are compilations of LD reference for summary-based methods (S- prefix). S-PrediXcan is meant to use the single-tissue LD reference files (“covariances”) appropriate to each model. S-MultiXcan uses single-tissue prediction models and a cross-tissue LD reference. GTEx v8 models on eQTL and sQTL We have produced different families of prediction models for sQTL and eQTL, using several prediction strategies, on GTEx v8 release data.

GTEx V8 Model Release

We have recently published a new set of prediction models trained on GTEx v8 data (as part of efforts detailed in this preprint. We have overhauled the model construction, incorporating posterior inclusion probabilities and global patterns of tissue sharing, while also benefiting from larger sample sizes. We cover both expression and alternate splicing mechanisms. We are very excited about these new models and the potential for new discoveries. However, these models require additional GWAS preprocessing in some public GWAS studies, which we describe here.

Qqplot Calibration Rare Variants

In this report I calculate the lower bounds of p-values when using very rare variants, for which minor allele counts are in the single digits. This report was prepared back in 2012 for the T2D-GENES consortium that had just generated 10K whole exome sequenced data.

How to query our gene2pheno database directly

We have opened direct access to the gene2pheno database, where we are hosting the PrediXcan results of close to 3000 phenotypes (from public GWAS meta analysis results and UKBiobank results from Ben Neale/HAIL team). Below are R functions that will allow you access and query the database. These results are based on GTEx V6p models and details of the analysis can be found in our preprint link to preprint in press now in Nature Communications.

GTEx V7 Prediction Model Release Announcement

We are releasing prediction models trained on GTEx Version 7 data. Download from here. We have updated our processing pipeline, and restricted to individuals of European ancestry to obtain more reliable LD data. This reduces false positive associations in the Summary Version of PrediXcan. Because of this choice, the gain in sample size relative to V6p is modest (ranging from -18 to 89), with whole blood, LCLs and fibroblasts experiencing reduced sample size.

Bidding Farewell to Scott

After two years in the Lab, Scott has decided to join the well paid workforce. Scott is a wonderful colleague to all of us and has made important contributions to our team. Thank you and best of lucks, Scott 🍀🍀🍀

Also thank you, Wenndy, for organizing and buying the present for Scott.

Limitations of PrediXcan association results

Keep in mind that significant associations shown here do not imply causality. That said, given that PrediXcan is seeking to test the role of gene expression variation on traits and we and others have shown that significant PrediXcan genes are enriched in causal genes, these results should be useful to delve into the mechanisms underlying gene to phenotype associations. False positives can arise because of several factors LD contamination By computing the probability of LD contamination, we try to reduce false positives due to LD rather than genuine colocalization of trait and expression causal variants.

Vulnerabilities of the 'Vulnerabilities of transcriptome wide association studies' argument

PrediXcan and other transcriptome wide association study (TWAS) methods discover and prioritize genes based on a functional mechanism –regulation of gene expression. We agree that we have to temper over-enthusiasm, but Wainberg et al’s paper could represent a backlash to the enthusiasm that the community has for this approach, which we believe is well placed. Below are our responses to some of the statements of the paper. PrediXcan/TWAS associations do not imply causality

Transcriptome prediction models are robust across populations

Most GWAS and eQTL studies have been performed in European samples. So how well do models trained in Europeans translate to other populations? Segal et al have shown that predictions of gene expression levels are robust across populations (link) The following figure shows the p-value of the correlation between predicted and observed expression levels in European and African samples from the 1000 Genomes set (GEUVADIS RNA-seq) using model trained in GTEx with majority European individuals.

We are releasing all PrediXcan associations based on Neale Lab's UKB rapid GWAS results

Update (11/1/2017) Gene level results “meta analyzed” across tissues now available. Shinyapp to directly query by gene and phenotype is here but can be slow. Stay tuned for faster version coming soon. As many of you may know, Neale Lab made a big splash in the GWAS community by releasing the summary results of 2400+ phenotypes from the UK Biobank. Following their lead on open science and open data sharing, we are releasing the bulk runs of PrediXcan association based on Neale Lab’s UKB results and 44 tissues models from GTEx V6p release.