Posts

Here are some recent news.

Alex Pearson's Visit

2021-02-18 Haky Im
𝔸lex Pearson shared with us his research on deep learning applied to medical outcomes Read more →

IGES journal club reads the original PrediXcan paper

2021-01-13 Haky Im
𝕌PDATE: The IGES Journal Club has been rescheduled to 27 Jan, 1 pm EDT. We will read Gamazon (2015) Nat Genet (PMID 26258848). Want to know how "transcriptome-wide association studies" work? We've got you covered. @FallinDani will lead the discussion. https://t.co/zNVvv439WM Read more →

PredictDB: Transcriptome Prediction Model Repository

2020-01-07
ℍere you can find transcriptome prediction models for the PrediXcan family of methods: S-PrediXcan, MultiXcan and S-MultiXcan. .db files are prediction models, usable by all methods. .txt.gz files are compilations of LD reference for summary-based methods (S- prefix). S-PrediXcan is meant to use the single-tissue LD reference files (“covariances”) appropriate to each model. S-MultiXcan uses single-tissue prediction models and a cross-tissue LD reference. […] We have produced different families of prediction models for sQTL and eQTL, using several prediction strategies, on GTEx v8 release data. We recommend MASHR-based models below. Elastic Net-based are a safe, robust alternative with decreased power. […] Expression and splicing prediction models with LD reference data are available in this Zenodo repository. Files: […] Warning: these models are based on fine-mapped variants that may occasionally be absent in a tipical GWAS, and frequently absent in … Read more →

GTEx V8 Model Release

2019-12-11
𝕎e have recently published a new set of prediction models trained on GTEx v8 data (as part of efforts detailed in this preprint. We have overhauled the model construction, incorporating posterior inclusion probabilities and global patterns of tissue sharing, while also benefiting from larger sample sizes. We cover both expression and alternate splicing mechanisms. We are very excited about these new models and the potential for new discoveries. However, these models require additional GWAS preprocessing in some public GWAS studies, which we describe here. We are happy to announce the user-friendly tutorial and detailed documentation. Our new recommended models use fine-mapped variants (as computed by DAP-G). These variants have a high probability of being causal for QTL. The model effect sizes are computed leveraging cross-tissue patterns with MASHR. These new models are parsimonious, efficient, available for more genes, and have many benefits like improved rate of colocalized … Read more →

Qqplot Calibration Rare Variants

2018-04-13
[In this report] (https://s3.amazonaws.com/imlab-open/Webdata/Files/2018/qqplot-calibration.pdf) I calculate the lower bounds of p-values when using very rare variants, for which minor allele counts are in the single digits. This report was prepared back in 2012 for the T2D-GENES consortium that had just generated 10K whole exome sequenced data. Read more →

How to query our gene2pheno database directly

2018-01-24
𝕎e have opened direct access to the gene2pheno database, where we are hosting the PrediXcan results of close to 3000 phenotypes (from public GWAS meta analysis results and UKBiobank results from Ben Neale/HAIL team). Below are R functions that will allow you access and query the database. These results are based on GTEx V6p models and details of the analysis can be found in our preprint link to preprint in press now in Nature Communications. Usual disclaimers apply. This is provided for your convenience and with the hope of being useful. We provide no guarantee of accuracy etc… […] The code below may not be completely self explanatory. We are working on the documentation. ## First run the preliminary command further down ## this will fetch all associations for Schizophrenia data = query.pheno("pgc_scz") ## this will fetch all associations for FTO data = query.gene("FTO") ## these rather small tables will be handy phenotb = dbGetQuery(db,"select * … Read more →

GTEx V7 Prediction Model Release Announcement

2018-01-08
𝕎e are releasing prediction models trained on GTEx Version 7 data. Download from here. We have updated our processing pipeline, and restricted to individuals of European ancestry to obtain more reliable LD data. This reduces false positive associations in the Summary Version of PrediXcan. Because of this choice, the gain in sample size relative to V6p is modest (ranging from -18 to 89), with whole blood, LCLs and fibroblasts experiencing reduced sample size. We developed new criteria to assess model performance. We have also decided to include prediction models for both pseudogenes and lincRNAs. While preparing GTEx V7 prediction models, we identified a few issues in the way prediction performance was estimated in the previous release (2016-09-08 release). In aggregate, these caused the prediction model performance to be overestimated. Reassuringly, predicted expression levels and the downstream associations with phenotypes remain mostly unchanged, even though prediction weights vary … Read more →

Bidding Farewell to Scott

2017-11-30
𝔸fter two years in the Lab, Scott has decided to join the well paid workforce. Scott is a wonderful colleague to all of us and has made important contributions to our team. Thank you and best of lucks, Scott :four_leaf_clover::four_leaf_clover::four_leaf_clover: Also thank you, Wenndy, for organizing and buying the present for Scott. Read more →

Limitations of PrediXcan association results

2017-11-07
𝕂eep in mind that significant associations shown here do not imply causality. That said, given that PrediXcan is seeking to test the role of gene expression variation on traits and we and others have shown that significant PrediXcan genes are enriched in causal genes, these results should be useful to delve into the mechanisms underlying gene to phenotype associations. False positives can arise because of several factors […] By computing the probability of LD contamination, we try to reduce false positives due to LD rather than genuine colocalization of trait and expression causal variants. Filtering out genes with high probability of independent signals (P3>0.5) would reduce the number of hits due to LD contamination. […] Many loci show regulation of multiple genes. PrediXcan is not able to distinguish between truly causal and co-regulation but non causal genes. […] Misspecified LD can lead to false positives (for example if LD in the study population is much … Read more →

Vulnerabilities of the 'Vulnerabilities of transcriptome wide association studies' argument

2017-11-07
ℙrediXcan and other transcriptome wide association study (TWAS) methods discover and prioritize genes based on a functional mechanism –regulation of gene expression. We agree that we have to temper over-enthusiasm, but Wainberg et al’s paper could represent a backlash to the enthusiasm that the community has for this approach, which we believe is well placed. Below are our responses to some of the statements of the paper. […] PrediXcan/TWAS associations do not imply causality […] Yes, we have explicitly stated that we do not claim causality but agree that the message needs to be emphasized. However, the over-interpretation of TWAS results as causal (which itself seems overstated by Wainberg et al), is precisely because of the exciting aspect of TWAS as association testing ‘through a lens’ of a particular functional mechanism that we believe is very important, namely, regulation of gene expression. […] TWAS is vulnerable to find multiple associated genes … Read more →