Portrait of Joy Shi
Boston, Massachusetts

Joy Shi

Epidemiologist. Investigator. Methodologist.

I am an Assistant Professor of Medicine in the Mongan Institute at Massachusetts General Hospital and Harvard Medical School. I am also an Instructor in CAUSALab at Harvard T.H. Chan School of Public Health. My research focuses on applying and advancing causal inference methods to improve clinical and policy decision-making for chronic diseases. My work leverages large administrative health databases to evaluate the comparative effectiveness of interventions for cancer and cardiovascular diseases.

About Me

Research Interests

My current research focuses on applying causal inference methods in administrative health databases and other observational data to conduct target trial emulations which inform clinical and policy decision-making. This work has largely focused on evaluating screening strategies for colorectal cancer and prostate cancer using data from the Nordic-European Initiative on Colorectal Cancer (NordICC) trial and electronic health record data from the Veterans Health Administration, respectively.

Methodologically, my work focuses on addressing some of the limitations of instrumental variable methods, a statistical technique that can be used to estimate causal effects, even in the presence of unmeasured confounding. This includes (1) developing instrumental variable methods for time-varying treatments and outcomes (via g-estimation of structural models), and the application of these methods to Mendelian randomization (MR) studies; (2) evaluating and quantifying the impact of different study designs for Mendelian randomization; and (3) mitigating common sources of selection bias in Mendelian randomization studies.

Previous research (during my MSc in Epidemiology at Queen’s University and as a Data Analyst at the Center for Global Child Health at the Hospital for Sick Children) includes work in cancer epidemiology, genetics, and global child health and nutrition.

Teaching

I teach methods for causal inference at the Harvard T.H. Chan School of Public Health (as part of the Department of Epidemiology and CAUSALab).

You can find information about some of the courses I teach here, as well as access to some of my teaching and course-related materials here.

Current Roles

  • Assistant Professor of Medicine
    Health Policy Research Center, Massachusetts General Hospital
    Department of Medicine, Harvard Medical School
  • Instructor
    CAUSALab and Department of Epidemiology, Harvard T.H. Chan School of Public Health

Education

  • PhD, Population Health Sciences
    Harvard T.H. Chan School of Public Health, USA
  • SM, Biostatistics
    Harvard T.H. Chan School of Public Health, USA
  • MSc, Epidemiology
    Queen's University, Canada
  • BScH, Life Science
    Queen's University, Canada

Currently Teaching

  • Fundamentals of Confounding Adjustment, CAUSALab
  • Advanced Confounding Adjustment, CAUSALab
  • Target Trial Emulation, CAUSALab
  • Models for Causal Inference (EPI289), Harvard T.H. Chan School of Public Health

Interests

Causal inference Methods Target trial emulation Cancer screening Comparative effectiveness Health policy

Publications

You can find a full list of my publications here

* denotes shared first authorship

December 2025

Evidence for clinical treatment decisions without randomized data

Shi J, Wasfy JH

JAMA Internal Medicine DOI

August 2025

Mendelian randomization, lipids and coronary artery disease: trade-offs between study designs and assumptions

Shi J, Swanson SA, Diemer EW, Gerlovin H, Posner DC, Wilson PW, Gaziano JM, Cho K, Hernán MA on behalf of the VA Million Veteran Program

American Journal of Epidemiology DOI

April 2025

Effect of colonoscopy screening on risks of colorectal cancer and related death: instrumental variable estimation of per-protocol effects

Shi J, Løberg M, Kalager M, Wieszczy P, Pilonis ND, Adami HO, Kaminski MF, Bretthauer M, Hernán MA, for the NordICC study group

European Journal of Epidemiology DOI

November 2024

Reparations for African enslavement in the U.S. and Black survival using the Panel Study of Income Dynamics

Lawrence JA*, Jahn JL*, Shi J*, Himmelstein KEW, Feldman JM, Linos N, Bassett MT

American Journal of Epidemiology DOI

September 2024

Methodological approaches to structural change: epidemiology and the case for reparations

Lawrence JA*, Shi J*, Jahn JL*, Himmelstein KEW, Feldman JM, Bassett MT

American Journal of Epidemiology DOI

August 2024

PSA screening and prostate cancer mortality: an emulation of target trials in U.S. Medicare

García-Albéniz X, Hsu J, Etzioni R, Chan JM, Shi J, Dickerman BA, Hernán MA

JCO Clinical Cancer Informatics DOI

January 2023

Risk prediction models for endometrial cancer: development and validation in an international consortium

Shi J, Kraft P, Rosner BA, Benavente Y, Black A, Brinton LA, Chen C, Clarke MA, Cook LS, Costas L, Dal Maso L, Freudenheim JL, Frias-Gomez J, Friedenreich CM, Garcia-Closas M, Goodman MT, Johnson L, La Vecchia C, Levi F, Lissowska J, Lu L, McCann SE, Moysich KB, Negri E, O'Connell K, Parazzini F, Petruzella S, Polesel J, Ponte J, Rebbeck TR, Reynolds P, Ricceri F, Risch HA, Sacerdote C, Setiawan VW, Shu XO, Spurdle AB, Trabert B, Webb PM, Wentzensen N, Wilkens LR, Xu WH, Yang HP, Yu H, Du M, De Vivo I

Journal of the National Cancer Institute DOI

October 2022

Effect of colonoscopy screening on risks of colorectal cancer and related death

Bretthauer M, Løberg M, Wieszczy P, Kalager M, Emilsson L, Garborg K, Rupinski M, Dekker E, Spaander M, Bugajski M, Holme Ø. Zauber AG, Pilonis ND, Mroz A, Kuipers EJ, Shi J, Hernán MA, Adami HO, Regula J, Hoff G, Kaminski MF

New England Journal of Medicine DOI

January 2022

Mendelian randomization with repeated measures of a time-varying exposure: an application of structural mean models

Shi J, Swanson SA, Kraft P, Rosner B, De Vivo I, Hernán MA

Epidemiology DOI

November 2021

Instrumental variable estimation for a time-varying treatment and a time-to-event outcome via structural nested cumulative failure time models

Shi J, Swanson SA, Kraft P, Rosner B, De Vivo I, Hernán MA

BMC Medical Research Methodology DOI

Teaching

I teach courses in epidemiologic and causal inference methods. You can find a complete description of my teaching experience here.

▸ Below are some of the graduate-level courses I have taught or currently teach at Harvard.

Epidemiologic Methods III - Models for Causal Inference (EPI289)

2023 - Present

Harvard T.H. Chan School of Public Health

Causal inference is a fundamental component of epidemiologic research. This course describes models for causal inference, their application to epidemiologic data, and the assumptions required to endow the parameter estimates with a causal interpretation. More information available here.

Comparative Effectiveness Research I (CI722)

2021 - 2023

Harvard Medical School

This course introduces causal interference methodology for settings in which randomized trials are not available. The course focuses on the use of epidemiologic studies, electronic health records and other sources of observational data for comparative effectiveness and safety research. More information available here.

Comparative Effectiveness Research II (CI732)

2022 - 2023

Harvard Medical School

This course builds on the foundational concepts from CI 722 and applies them to real-world comparative effectiveness research. Advanced topics relevant to comparative effectiveness research will be discussed, including the target trial framework, time varying exposure and confounding, analysis of longitudinal data, and sensitivity analysis. More information available here.

▸ I also teach as part of the CAUSALab Summer Courses on causal inference.

Fundamentals of Confounding Adjustment

2025 - Present

CAUSALab

Causal inference from observational data often relies on appropriate adjustment for confounders. This online course uses a combination of video lectures and hands-on exercises to introduce different methods to adjust for confounding in the context of time-fixed treatments. More information available here.

Advanced Confounding Adjustment

2023 - Present

CAUSALab

In time-varying settings, advanced g-methods for confounding adjustment—inverse probability weighting and the parametric g-formula—are needed. This course focuses on the implementation of these methods in increasingly complex analytical settings using a combination of lectures and hands-on sessions. More information available here.

Target Trial Emulation

2022 - Present

CAUSALab

Causal inference from observational data can be conceptualized as an attempt to emulate a pragmatic randomized trial—the target trial. Through a combination of lectures and hands-on sessions, the course introduces the target trial emulation framework in increasingly complex settings and dissects examples of emulations in the health sciences and related fields. More information available here.

▸ You can find some of my teaching materials below

Directed acylic graphs

An introduction to the basic components of a directed acyclic graph (DAGs), how to identify structural sources of bias (i.e., confounding, selection bias, information bias) using a DAG, and extensions to time-varying treatments.

Slide deck 1
Slide deck 2
Exercises

Standardization for time-fixed treatments

Standardization for time-fixed treatments is described, with a focus on (1) how to interpret standardized estimates, (2) how to use models to estimate standardized estimates, and (3) the use of bootstrapping to obtain 95% confidence intervals.

Slide deck
Exercises
R code

Instrumental variable estimation for time-fixed treatments

Instrumental variable (IV) estimation for time-fixed treatments, with an emphasis on the underlying assumptions of IV, common estimators for IV, and implementation of the methods in R.

Slide deck
R code

Measurement bias

Structures and mechanisms of different types of measurement bias.

Slide deck
Exercises

Introduction to time-varying treatments

An introduction to formulating causal questions for time-varying treatments, depicting time-varying treatments on a directed acyclic graph and understanding why conventional methods fail.

Slide deck

Inverse probability weighting for time-varying treatments

Extensions of inverse probability weighting to time-varying treatment strategies.

Slide deck
R code
Dataset

Software

Below, you’ll find some of my statistical code from various projects and publications.

A full list of my programs are available on my Github page.

MR data challenge

Data challenge developed for “The Future of Mendelian Randomization Studies 2021” workshop.

Files

Structural mean models with instrumental variables

R code for g-estimation of structural mean models for estimating the causal effect of a time-varying treatment using instrumental variables.

Software
Publication

Structural nested cumulative failure time models

R code for g-estimation of structural nested cumulative failure time models via confounding adjustment or using an instrumental variable.

Software
Publication

Risk prediction model for endometrial cancer

R code for developing and validating an absolute risk prediction model for endometrial cancer.

Software
Publication

Let's Connect

Contact Information

Feel free to reach out to connect! I'm always interested in hearing about new projects and opportunities.

Location

Boston, Massachusetts

Causal inference Methods Target trial emulation Cancer screening Comparative effectiveness Health policy