Welcome to the IM Lab’s home page

We develop quantitative and computational methods and tools to sift through the vast amounts of genomic and other high dimensional data with the goal of making discoveries that can be translated to improve human health.

Guimin Gao is a collaborator who likes to join our lab meetings :smile:


The overarching goal of our research program is to develop computationally efficient and statistically sound methods to sift through the vast amounts of genomic and other high dimensional data with the goal of making discoveries that can be translated to improve human health.

To achieve this goal, we build and maintain an analytic infrastructure, i.e. a dry lab. This includes statistical and computational methods, pipelines and workflows, streamlined access to big data, and a research software engineering, bioinformatic and analysis team. This infrastructure constitutes a powerful instrument that gives us a unique vantage point from which discoveries can be made.

Check this link for some highlights of the methods we have developed and future directions of the lab.


Find the full list of our publications courtesy of Google Scholar following this link here

Alex Pearson's Visit

2021-02-18 Haky Im
𝔸lex Pearson shared with us his research on deep learning applied to medical outcomes Read more →

IGES journal club reads the original PrediXcan paper

2021-01-13 Haky Im
𝕌PDATE: The IGES Journal Club has been rescheduled to 27 Jan, 1 pm EDT. We will read Gamazon (2015) Nat Genet (PMID 26258848). Want to know how "transcriptome-wide association studies" work? We've got you covered. @FallinDani will lead the discussion. https://t.co/zNVvv439WM Read more →

Genomic Data Scientist Opening

2020-12-17 Haky Im
[…] We are looking to hire a life-long learner interested in developing statistical and computational tools to sift through large amounts of data with the ultimate goal of making discoveries that can make a difference in the health of people. Methods and resources created in our lab are being used by researchers around the globe. We continue innovating and providing user-friendly and statistically efficient tools to get the most out of the ever-increasing amount of data. We partner with large consortia and biomedical researchers to find the most pressing questions in the field and apply our statistical expertise to develop reliable and efficient approaches to answer them. Read more →

PredictDB: Transcriptome Prediction Model Repository

ℍere you can find transcriptome prediction models for the PrediXcan family of methods: S-PrediXcan, MultiXcan and S-MultiXcan. .db files are prediction models, usable by all methods. .txt.gz files are compilations of LD reference for summary-based methods (S- prefix). S-PrediXcan is meant to use the single-tissue LD reference files (“covariances”) appropriate to each model. S-MultiXcan uses single-tissue prediction models and a cross-tissue LD reference. […] We have produced different families of prediction models for sQTL and eQTL, using several prediction strategies, on GTEx v8 release data. We recommend MASHR-based models below. Elastic Net-based are a safe, robust alternative with decreased power. […] Expression and splicing prediction models with LD reference data are available in this Zenodo repository. Files: […] Warning: these models are based on fine-mapped variants that may occasionally be absent in a tipical GWAS, and frequently absent in … Read more →

GTEx V8 Model Release

𝕎e have recently published a new set of prediction models trained on GTEx v8 data (as part of efforts detailed in this preprint. We have overhauled the model construction, incorporating posterior inclusion probabilities and global patterns of tissue sharing, while also benefiting from larger sample sizes. We cover both expression and alternate splicing mechanisms. We are very excited about these new models and the potential for new discoveries. However, these models require additional GWAS preprocessing in some public GWAS studies, which we describe here. We are happy to announce the user-friendly tutorial and detailed documentation. Our new recommended models use fine-mapped variants (as computed by DAP-G). These variants have a high probability of being causal for QTL. The model effect sizes are computed leveraging cross-tissue patterns with MASHR. These new models are parsimonious, efficient, available for more genes, and have many benefits like improved rate of colocalized … Read more →

Qqplot Calibration Rare Variants

[In this report] (https://s3.amazonaws.com/imlab-open/Webdata/Files/2018/qqplot-calibration.pdf) I calculate the lower bounds of p-values when using very rare variants, for which minor allele counts are in the single digits. This report was prepared back in 2012 for the T2D-GENES consortium that had just generated 10K whole exome sequenced data. Read more →