From theory to practice: implementing targeted learning for causal inference using real-world data

Written by The Evidence Base

causal inference

As efforts to enhance the robustness and reliability of real-world evidence (RWE) in decision-making progress, deriving accurate causal inferences from real-world data (RWD) becomes increasingly crucial. Advanced methodologies, such as the targeted maximum likelihood estimator (TMLE) offering doubly robust machine learning-based methods, are being proposed to address the limitations of current methods. What are the benefits of these new advanced methodologies, and how are they perceived by decision-makers?

During the workshop at ISPOR 2024, experts discussed, ‘Targeted learning for causal inference using real-world data,’ sharing their insights on the application of novel methods of estimating causal inference within their fields. The workshop was led by Suzanne McMullen (Medlior Health Outcomes Research, Canada), who was joined by her colleague John Paul Ekwaru, Mark van der Laan (University of California Berkeley, USA) and Stephen Duffield (National Institute for Health and Care Excellence [NICE], UK). In this Deep Dive, we recap the key takeaways and discussions from the presentations.


Introduction to targeted learning to generate RWE

Mark van der Laan set the scene, introducing the audience to the concept of targeted learning – a sub-field of statistics that integrates state-of-the-art causal modeling, machine learning (ML) and statistical inference [1,2], as depicted below.

Van der Laan acknowledged several issues associated with generating RWE from RWD in healthcare decision-making, including:

  • Selection bias
  • Intercurrent events
  • Informative missingness
  • Treatment by indication
  • High dimensional covariates
  • Outcome measurement error
  • Statistical model misspecification
  • Differences between external controls and single trial arms in randomized controlled trials

To address these challenges, the targeted learning roadmap provides a systematic principled approach, offering a step-by-step guide to generating valid RWE and assessing its reliability.

Van der Laan discussed the intricacies of constructing estimators (step 4). An estimator must be precisely defined and reproducible, which presents a significant challenge: creating estimates that are not only effective but also reliable. He emphasized the need to ensure that estimators function robustly in real-world applications where they truly matter. Furthermore, he stressed the importance of valid inference while utilizing all available data, even in complex settings with high-dimensional covariates, rare outcomes, and significant dropout rates. These challenges can be addressed through the general framework of TMLE, which allows for tailored implementation to specific contexts.

Van der Laan provided an in-depth overview of the two-step TMLE methodology, with readers referred to Targeted Learning in Data Science, and Targeted Learning for more details. Of note, super learner, an ensemble machine learning algorithm, can be employed with TMLE to enhance the flexibility and robustness of the estimation process by combining multiple candidate estimators, such as linear regression, to create an optimal weighted combination that maximizes predictive accuracy.

In summation, audience polling agreed on the key features of doubly robust methods based on ML:

  • Only one model needs to be correct
  • Super learning results in a better algorithm for fitting these regressions than any one algorithm
  • Doubly robust machine learning-based methods have a higher likelihood of getting a correct estimate effect

TMLE in practice: Case study generating robust RWE in chronic obstructive pulmonary disease

John Paul Ekwaru took the audience through a practical application of TMLE within the context of RWE generation in chronic obstructive pulmonary disease (COPD). In current clinical practice, patients with symptomatic COPD and frequent exacerbations typically receive combination therapy involving inhaled corticosteroids (ICS) and long-acting bronchodilators (LABA+LAMA). While pivotal randomized controlled trials (RCTs) have demonstrated lower exacerbation rates with ICS therapy compared to non-ICS treatments, recent RWE studies have failed to replicate these findings, often due to insufficient confounder data and less robust analysis methods. Moreover, evidence across various patient demographics is lacking. Leveraging RWD from Alberta, Canada, which includes data from over 4.5 million individuals, alongside doubly robust methodologies, researchers at Medlior aimed to estimate the impact of ICS-containing COPD maintenance therapy on exacerbation rates over a 1-year period following initiation of COPD maintenance. Their objective was twofold: to gauge the practicality of using accessible administrative data and double-robust causal inference methods to obtain dependable RWE.

Ekwaru explained that the study analyzed longitudinal data from Alberta Health’s COPD chronic disease cohort, tracking medications dispensed, including ICS and LABA+LAMA combinations. The longitudinal cohort analysis examined moderate and severe COPD exacerbations, adjusting for potential confounders identified with clinical advisors, including demographic characteristics, comorbidities, and additional variables from ICD-10 and intervention procedure codes. A longitudinal model accounted for therapy switches over a 1-year follow-up, divided into 15-day intervals, using an intention-to-treat analysis where the first long-acting therapy combination in each interval was considered the treatment. Graphical representation of the statistical analysis model is shown below.

The analysis aimed to estimate the mean difference in exacerbation rates if all patients received a combination therapy including ICS compared to any treatment without ICS. This was achieved using an extension of TMLE for longitudinal data (L-TMLE), available in the R package ‘ltmle’. The L-TMLE implementation involved estimating the treatment assignment mechanism, the censoring mechanism (accounting for non-informative censoring), and sequential regressions for the outcome, with a targeting step after each regression using a clever covariate or weight. All three components were estimated using the super learner algorithm, which included four machine learning algorithms as candidates in its library. Each algorithm was paired with a correlation test screener to filter out irrelevant covariates.

Preliminary results from the analysis demonstrated a benefit of ICS+LABA+LAMA therapy over LABA+LAMA alone for both high-risk COPD patients, as confirmed by RCTs, and low-risk COPD patients, who were not included in RCTs. Ekwaru highlighted some limitations of the study: the treatment definition was based on medication dispense dates and intended supply duration, not actual medication use. Patients may not have used the dispensed medication within the supply period, potentially affecting the results. Future steps include adding mortality analysis, conducting sensitivity analysis using outcome-blind simulations, and exploring additional machine learning algorithms and screeners, along with parameter tuning.


Novel analytical techniques – what matter most in decision-making: The NICE perspective

Audience polling revealed that a primary challenge in submitting findings based on advanced novel methods to regulatory and HTA bodies is the proper implementation of these methods and the uncertainty regarding sufficient expertise. This issue served as an excellent segue into Stephen Duffield’s presentation. He highlighted that, despite being available for over 15 years, these methods are still viewed as novel by regulatory and HTA bodies.

In 2022, NICE introduced their ‘Real-World Evidence Framework’, offering guidance on integrating RWE into decision-making and establishing best practice guidelines for planning and reporting studies. Although not explicitly mentioning doubly robust methods like TMLE, the framework discusses combined weighting and regression methods, including machine learning approaches. It specifically addresses the use of machine learning for covariate selection in high-dimensional datasets, emphasizing the importance of justification. Duffield highlighted the availability of methodological guidance in NICE’s decision support unit (DSU) technical support documents, notably DSU technical support document 17, which outlines key principles and methods for using observational data in technology appraisals. He noted the RWE framework as a living document that may evolve to incorporate guidance on novel analytical techniques in the future.

Duffield provided an overview of the areas where doubly robust methods have been utilized in evidence submissions to NICE, primarily focusing on matching and regression techniques. He noted that evidence review groups frequently request the use of doubly robust methods, indicating their established status and regular inclusion in evidence submissions to NICE. TMLE was employed in one appraisal but was not endorsed by the committee due to methodology concerns, particularly with regard to how the TMLE method benefits the data, reiterating the need for justification of the approaches used.

In contrast, machine learning has seen limited use in evidence submissions to NICE, as highlighted by Duffield. He presented an instance where the LASSO technique was employed in a technology appraisal to address uncertainties. The company adjusted for baseline characteristic imbalances, yet the choice of covariates lacked clear justification. The company presented various methods, among which LASSO was favored for its ability to mitigate the impact of large coefficients with wide confidence intervals on the adjusted treatment. Given the uncertainty surrounding the analysis, LASSO was deemed as the most conservative estimate and potentially the most suitable approach in the absence of compelling alternative evidence.

Duffield reiterated the strengths of TMLE in combining robust and machine learning approaches:

  • Highly effective in high-dimensional settings, incorporating machine learning methods that handle complex, non-linear relationships and interactions among variables
  • Especially effective in moderate to large sample sizes with sufficient ‘compute’
  • Cross-validation (super learner) is optimal for selection among estimators (‘transparent’ process of estimator selection)
  • Achieves the lowest possible variance among unbiased estimators
  • Estimator of average treatment effects can produce valid estimates of uncertainty (standard errors, confidence intervals), and these estimates of uncertainty can be propagated in decision models used for HTA

For stakeholders to fully embrace methods like TMLE, Duffield suggested a stepwise approach to assess the suitability of causal machine learning methods, as shown below, as well as clarity on the methods employed and the justification for the methods.


Q&A – what collaborations have taken place with regulatory bodies such as the FDA regarding the consideration of causal inference analyses?

Mark van der Laan mentioned ongoing interactions with the FDA involving multiple groups, emphasizing the FDA’s open-mindedness and meticulous approach to evaluation of new methods rather than simply endorsing them. The FDA’s targeted learning demonstration projects and various projects within the Sentinel System are an example. He stressed that the current focus is less on why these methods should be used, but rather on how to implement them effectively. Van der Laan underscored the need for infrastructure development and the importance of making a compelling case for the adoption of these methods.


Q&A – how can TMLE be used in learning health systems?

Both Mark van der Laan and John Paul Ekwaru explained that by continuously analyzing RWD and updating estimates over time, TMLE supports ongoing learning and improvement within health systems. TMLE enables precise adjustment for confounding factors, yielding more accurate estimates of treatment effects. Moreover, TMLE’s flexibility makes it suitable for assessing the effects of dynamic treatment assignment strategies, accommodating complex treatment rules and time-varying factors. This capability allows researchers to accurately estimate the causal effects of different treatment strategies as patient characteristics evolve, potentially informing future policies and guidelines.

“That’s really how the future presumably is going to look. We’re going to have healthcare systems and we’re going to have pragmatic randomized trials embedded within them. Then we have continuous observational data so we can continuously learn from the past and adapt, in particular, our randomization probabilities. And all that can be handled through the TMLE framework.” Mark van der Laan, ISPOR 2024


About the speakers

Suzanne McMullen, Vice President of Research and Innovation, Medlior Health Outcomes Research

As Vice President of Research and Innovation at Medlior Health Outcomes Research Ltd, Suzanne oversees research activities and pursues innovative solutions to advancing the methodology and applications of HEOR. Suzanne has close to 20 years of research experience in both commercial and public sector organizations, with previous roles at ICON plc, Oxford Outcomes, Vancouver Coastal Health, and the Centre for High-Throughput Biology. She is a member of the ISPOR RWE special interest group. She obtained her Master of Health Administration from the University of British Columbia.

Mark van der Laan, University of California Berkeley, USA

Dr van der Laan is Distinguished Professor of biostatistics and statistics and Endowed Chair Jiann-Piang Hsu and Karl E. Peace in Biostatistics at UC Berkeley and founding member and editor of the Journal of Causal Inference. Mark’s research interests include censored data, causal inference, genomics, observational studies and adaptive designs. Mark has led the development of Targeted Learning, including Super Learning and Targeted maximum likelihood estimation (TMLE). In 2005 Mark was awarded the Committee of Presidents of Statistical Societies (COPSS) Presidential Award in recognition of outstanding contributions to the statistics profession. He also received the 2004 Spiegelman Award and 2005 van Dantzig Award. He is co-founder of the international Journal of Biostatistics and Journal of Causal Inference. Mark has authored various books on Targeted Learning, Censored Data and Multiple Testing, published over 400 publications, mentored 60 PhD students and 30 postdoctoral fellows.

John Paul Ekwaru, Lead Biostatistician, Medlior Health Outcomes Research, Canada

Paul is the Lead Biostatistician at Medlior Health Outcomes Research. His previous roles include positions as a Biostatistician at University of Alberta, Statistician at the US Centers for Disease Control and Prevention, Lecturer/Biostatistician at Makerere University, and a consultant on a number of research projects by both academic and non-academic institutions. Paul has co-authored over 80 publications in peer-reviewed scientific journals, including statistical methods publications, and was the recipient of the International Statistical Institute Jan Tinbergen Award for his work in deriving an approximation for the Rank Adjacency Statistic (D) for analyzing spatial clustering with sparse data. Paul obtained his Bachelor of Statistics degree from Makerere University, MSc from McMaster University, and PhD in Epidemiology from University of California at Berkeley.

Stephen Duffield, Associate Director of Real-world methods, NICE, UK

Stephen Duffield’s role involves the continuing development of NICE’s RWE framework, collaboration on RWE demonstration projects, and helping to transform NICE’s use of RWD across guidance products. He is also involved with upskilling individuals within and externally to the organization, contributing to training workshops and technical forums. Stephen has a degree in medicine and a PhD in public health. Previously, he worked as a clinical doctor and a guideline developer in NICE Centre for Guidelines.


Sponsorship for this Deep Dive was provided by Medlior Health Outcomes Research Ltd