PredictDB: Transcriptome Prediction Model Repository

Published

January 7, 2020

Predictdb

Here you can find transcriptome prediction models for the PrediXcan family of methods: S-PrediXcan, MultiXcan and S-MultiXcan. .db files are prediction models, usable by all methods. .txt.gz files are compilations of LD reference for summary-based methods (S- prefix). S-PrediXcan is meant to use the single-tissue LD reference files (“covariances”) appropriate to each model. S-MultiXcan uses single-tissue prediction models and a cross-tissue LD reference.

GTEx v8 models on eQTL and sQTL

We have produced different families of prediction models for sQTL and eQTL, using several prediction strategies, on GTEx v8 release data.

We recommend MASHR-based models below. Elastic Net-based are a safe, robust alternative with decreased power.

MASHR-based models

Expression and splicing prediction models with LD reference data are available in this Zenodo repository.

Files: * mashr_eqtl.tar: PrediXcan’s and S-PrediXcan’s support on expression * mashr_sqtl.tar: PrediXcan’s and S-PrediXcan’s support on splicing * gtex_v8_expression_mashr_snp_smultixcan_covariance.txt.gz: S-MultiXcan expression’s LD reference * gtex_v8_splicing_mashr_snp_smultixcan_covariance.txt.gz: S-MultiXcan splicing’s LD reference

Warning: these models are based on fine-mapped variants that may occasionally be absent in a tipical GWAS, and frequently absent in older GWAS. We have tools to address this, presented here. A tutorial is available here.

Acknowledging these models: If you use these models in your research, please cite: * “Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits”, Barbeira et al, 2019, preprint * “A gene-based association method for mapping traits using reference transcriptome data”, Gamazon et al, 2015, Nature Genetics * “Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics”, Barbeira et al, 2018, Nature Communications

If you use S-MultiXcan, we ask you to cite: * “Integrating predicted transcriptome from multiple tissues improves association detection”, Barbeira et al, 2019, PLOS Genetics

Elastic Net

Expression and splicing prediction models with LD references data are available in this Zenodo repository.

Files:

Acknowledging these models : If you use these models in your research, we ask you to cite: * “The GTEx Consortium atlas of genetic regulatory effects across human tissues”, Aguet et al, 2019, preprint * “A gene-based association method for mapping traits using reference transcriptome data”, Gamazon et al, 2015, Nature Genetics * “Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics”, Barbeira et al, 2018, Nature Communications

If you use S-MultiXcan, we ask you to cite: * “Integrating predicted transcriptome from multiple tissues improves association detection”, Barbeira et al, 2019, PLOS Genetics

GTEx v7 Expression models

Expression prediction models with LD reference data ar available in this Zenodo repository. The underlying algorithm is Elastic Net.

Additional support information and details are available in the Zenodo repository.

Acknowledging these models: If you use these models in your research, we ask you to cite: * “A gene-based association method for mapping traits using reference transcriptome data”, Gamazon et al, 2015, Nature Genetics * “Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics”, Barbeira et al, 2018, Nature Communications

If you use S-MultiXcan, we ask you to cite: * “Integrating predicted transcriptome from multiple tissues improves association detection”, Barbeira et al, 2019, PLOS Genetics

GTEx v6 Expression models

Expression prediction models with LD reference data are available in this Zenodo repository.

Additional support information and details are available in the Zenodo repository.

Acknowledging these models: If you use these models in your research, we ask you to cite: * “A gene-based association method for mapping traits using reference transcriptome data”, Gamazon et al, 2015, Nature Genetics * “Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics”, Barbeira et al, 2018, Nature Communications

If you use S-MultiXcan, we ask you to cite: * “Integrating predicted transcriptome from multiple tissues improves association detection”, Barbeira et al, 2019, PLOS Genetics

Models from collaborators and other sources:

MESA models

Single-tissue expression prediction models with LD reference data are available in this Zenodo repository. The underlying algorithm is Elastic Net on MESA multi-ethnic cohort.

These models were presented in “Genetic architecture of gene expression traits across diverse populations”, Mogil et al, 2018, PLOS Genetics. Please cite if you find these useful.

CommonMind consortium

Single-tissue expression prediction models with LD reference data are available in this GitHub repository. The underlying algorithm is Elastic Net.

These models were presented in “Gene expression imputation across multiple brain regions provides insights into schizophrenia risk, Huckins et al, 2019, Nature Genetics. Please cite if you find these useful.

EpiXcan Models

Expression prediction models with LD reference data are available in this website. The models were trained on Common Mind Consortium, GTEx, and STARNET consortiums. The underlying algorithm is Elastic Net, informed by epigenetic data.

These models were presented in “Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits”, Zhaneg et al, 2019, Nature Communications. Please cite if you find these useful.