In this report I calculate the lower bounds of p-values when using very rare variants, for which minor allele counts are in the single digits. This report was prepared back in 2012 for the T2D-GENES consortium that had just generated 10K whole exome sequenced data.
We have opened direct access to the gene2pheno database, where we are hosting the PrediXcan results of close to 3000 phenotypes (from public GWAS meta analysis results and UKBiobank results from Ben Neale/HAIL team). Below are R functions that will allow you access and query the database. These results are based on GTEx V6p models and details of the analysis can be found in our preprint link to preprint in press now in Nature Communications.
We are releasing prediction models trained on GTEx Version 7 data. Download from here. We have updated our processing pipeline, and restricted to individuals of European ancestry to obtain more reliable LD data. This reduces false positive associations in the Summary Version of PrediXcan. Because of this choice, the gain in sample size relative to V6p is modest (ranging from -18 to 89), with whole blood, LCLs and fibroblasts experiencing reduced sample size.
After two years in the Lab, Scott has decided to join the well paid workforce. Scott is a wonderful colleague to all of us and has made important contributions to our team. Thank you and best of lucks, Scott 🍀🍀🍀
Also thank you, Wenndy, for organizing and buying the present for Scott.
Keep in mind that significant associations shown here do not imply causality. That said, given that PrediXcan is seeking to test the role of gene expression variation on traits and we and others have shown that significant PrediXcan genes are enriched in causal genes, these results should be useful to delve into the mechanisms underlying gene to phenotype associations. False positives can arise because of several factors LD contamination By computing the probability of LD contamination, we try to reduce false positives due to LD rather than genuine colocalization of trait and expression causal variants.
PrediXcan and other transcriptome wide association study (TWAS) methods discover and prioritize genes based on a functional mechanism –regulation of gene expression. We agree that we have to temper over-enthusiasm, but Wainberg et al’s paper could represent a backlash to the enthusiasm that the community has for this approach, which we believe is well placed. Below are our responses to some of the statements of the paper. PrediXcan/TWAS associations do not imply causality
Most GWAS and eQTL studies have been performed in European samples. So how well do models trained in Europeans translate to other populations? Segal et al have shown that predictions of gene expression levels are robust across populations (link) The following figure shows the p-value of the correlation between predicted and observed expression levels in European and African samples from the 1000 Genomes set (GEUVADIS RNA-seq) using model trained in GTEx with majority European individuals.
Update (11/1/2017) Gene level results “meta analyzed” across tissues now available. Shinyapp to directly query by gene and phenotype is here but can be slow. Stay tuned for faster version coming soon. As many of you may know, Neale Lab made a big splash in the GWAS community by releasing the summary results of 2400+ phenotypes from the UK Biobank. Following their lead on open science and open data sharing, we are releasing the bulk runs of PrediXcan association based on Neale Lab’s UKB results and 44 tissues models from GTEx V6p release.
We are delighted to be awarded NIH Cloud Credit that will help us fund our cloud-based web applications and databases and broadly share our tools and resources.
Thank you, Jiamao, for pushing this through 👍😃
Information about Cloud Credits here