- Gee, I can get the wrong answer.
So if I go to these in studies in warfarin, for example,
we know the clinical trial result is two.
In observational data,
the hazard ratio that I get for warfarin on bleeding is one,
and I'm using standard adjustment methods
that would be publishable in a typical journal:
IPW, I adjusted for all the variables that are available.
So I didn't do anything to make this bad.
I didn't purposely make it bad.
I actually did exactly what we would normally do.
And statins as well.
So we know that there should be some benefit,
and I go study that question
using standard adjustment methods,
classic analyses that I think would be publishable,
and I get the wrong answer.
So the question then is,
is the reason that the analysis is too confounded?
Is Chris Grainger right?
Is it hopeless?
So I can do the same thing, though.
I can get the right answer,
and you'll find out that it doesn't take that much.
So in the same two data sets,
using the same outcome,
same adjustment variables,
very similar data sources,
and same definition of treatment,
I can get results
that agree pretty darn well with clinical trials.
So on the right-hand side now
I'm seeing that warfarin elevates bleeding,
and statins are beneficial with respect to CV events,
and so a question is, what changed?
So to consider what changed,
I'm just gonna go through
five of the primary principles of causal inference.
The first question that we all have been talking about,
that we immediately talk about with observational data is,
no unmeasured confounding.
We don't know that here.
It's observational data.
I don't know whether that holds,
but I do know that I was able to achieve successful results
with the same set of adjustment variables
that matched clinical trials.
So it might be plausible
that we have no unmeasured confounding,
and it's important to note that in the analysis that failed
and the analysis that succeeded,
the exact same adjustment variables were used.
So it's not the acquisition of new variables.
Then, another primary principle
that is done well in clinical trials
is that the interventions and endpoints are well-defined,
and in the interest of time, I'll skip over that,
and claim that they are,
but we could talk later.
Then, another topic is
whether the measured confounders have been balanced well.
So did the statistician do their job?
In both cases, the result using standard adjustment methods
and checks on balance,
we would check standardized differences,
things are looking good.
So propensity adjustment appears to work.
We've balanced the measured covariate,
so confounding does not explain the result,
but a big difference in where the analyses depart
is the new user design.
So in the analysis that doesn't work out well,
follow-up begins at the beginning of the data set,
at a cross-section in people's lives,
and they either are or are not on the drug.
So they are prevalent users,
and so when, in the analysis that is successful,
we used a new user design,
where we're making sure that follow-up begins
immediately after treatment initiation.
So that's like a clinical trial.
A clinical trial, when they give you two drugs,
they definitely start following you up immediately,
whereas the prevalent user design,
the one that failed,
would be like a clinical trial
where you waited two years and then started the analysis,
based on whatever they were currently taking.
So you can imagine why that might have problems.
The other question is to consider the topic of equipoise,
which doesn't necessarily get considered
in observational research,
but has to get considered in randomized studies.
In the setting, at least, of the warfarin example,
I would say, and I'll talk about it later,
that there probably was uncertainty,
and the patients were reasonably eligible for warfarin,
but the analysis in Framingham,
pretty much purposely,
took all comers,
and there were a lot of candidates, patients,
that were not reasonable candidates for statins.
They were just available in the data.
So as we have large clinical databases,
it's very common at this point,
for me to see things that look
patients that look really different.
So we're creating these cohorts
out of observational data sets,
and it might do well to consider clinical equipoise.
Not that it has to be the same,
and defined in the same way as a trial,
but that we can't totally neglect it.
If we do analyses in patients
that are not really reasonable candidates for treatment,
no adjustment, no IPW, no balancing measured covariates
is gonna make the difference.
So clinical trials have to meet all of these criteria,
not just randomization,
and I think that's really important.
When we're arguing about observational research,
whether it has value,
or whether clinical trials are better,
we need to think more about
not just randomization as the key difference,
but of all of the ways in which a trial is rigorous,
and carefully done,
and then question
whether we can actually meet those same standards
in observational research.
And I would claim, will claim, that often we can,
and the future is bright,
but we'll pause for discussion.
- So we though we'd break it up
and have some question/answers in the middle here,
rather than waiting till the end,
and I think one of the things that's always interesting
in observational data sets is
you look at every person who's in the data set, typically,
and how the treatment's being used,
whereas in a clinical trial, obviously,
you define that cohort by who's randomized.
So you have this group in the observational data set
who's not treated,
but they aren't even included in the clinical trial,
other than being in the placebo group, for example,
but they're pretty much the same patients
as those who get the active treatment.
So I think that's one of the biggest challenges,
and it's always an issue of confounding,
and that is,
most therapies used in the observational setting
are used in healthier patients
who don't have as many comorbidities,
and so I think that's always a challenge
of how you adjust for that, or how you can control for that.
- So Matt, I wanna start with a question for you.
One of the things you went through was
you talked about the different principles
that were important for trials.
One of them was issues around sort of blinding,
and sort of who on the study team
knew about exposures, et cetera.
I guess the question related to that is,
do you think those standards exist in observational studies,
or should exist there?
- Well I mean, you know who got the treatment, basically.
They're getting the treatment in routine practice.
So I think it's not possible to blind that in some setting,
but we know that one treatment
is related to many other treatments that a patient gets.
So if they're in a clinical trial,
and they're getting a blinded treatment,
their concomitant medications
that may have an influence
on their outcomes and their results
should be relatively balanced,
because the investigator or the treating physician
doesn't know what other therapies they're getting.
But I don't see how it's possible to really blind,
in an observational analysis,
especially at the level of the patient
or the treating physician,
so I'm not sure if that's what you were specifically asking,
or whether the researchers who are doing the analysis
should be blinded.
- So that's exactly where I wanted to end up.
So I'll switch to Laine here.
So Laine's quite famous for a number of things.
One of the things she's famous for is
developing sort of a proposal
for how to do statistical analysis plans,
and what sort of work
can and should be done in the planning phase,
sort of assessing,
figuring out whether we should do the study or not.
Do you think we should apply, or do you think we do apply
those measures and observational studies across the board?
- It's an interesting question, yeah,
'cause I do still probably adhere to that as an ideal,
but I've certainly encountered
why we don't always adhere to that.
So the principle is,
especially if you're using propensity methods,
you can do the design phase first.
You don't have to look at the outcomes.
You can do the balancing.
You can check the balance.
You can look at who's at equipoise.
All the things I just talked about,
you can check first, before you run outcomes,
and you could say just fix it,
and fix your design, and separate,
and then once you run the analysis of outcomes
you're not allowed to change it,
because that would be fishy,
and so that's sort of the principle,
if that's kind of
what you're thinking along the lines, right?
And so one of the challenges I've encountered with that is
that oftentimes we are hopeful
that we've got a good analysis,
and then when we run the outcome model
and it looks totally wrong,
we come to Jesus and remember that there were confounders we
could have, should have measured,
and then we're gonna pay to go get them,
and sometimes they're not all just sitting in your data set,
available, ready, easy,
and so sometimes you really do improve your analysis
in that second round.
Usually when I see us going back to redo something,
it's because we've realized we can do it better,
and so to stop that,
or to be rigid about that has a downside too.
So I'm still a fan of the principle,
but I don't always execute it.
- Excellent.
No, I appreciate that.
So during your talk I got an anonymous text from a DCRI MD,
and they asked me, is 1.49 the same as two?
(audience laughing)
- Oh, look at the confidence interval.
(audience laughing)
- That's my response.
(audience laughing)
So Matt, I know you're anxious to get to the second session,
and maybe we should start with that, and then
but at the end I do wanna ask,
it seems like some of your work is actually
crossing over trials, observational studies, et cetera.
Do you view that as the future
of what we would be doing at DCRI?
- I think we can generate evidence
from a variety of different sources,
and many people have done that,
and been fortunate to work in all those different areas,
and I think when you look at it, for a given treatment,
for example, taking the clinical trial data,
looking at that in different populations,
not only from the United States,
but from other countries as well,
it's important to understand what the treatment effect is,
and how it's used in practice,
and what the outcome may ultimately be.
So I think they're iterative,
and evidence can be generated
from multiple different things, as Laine said,
even with the series in the New England Journal this year.
So I think there's room for both,
but you have to understand
the strengths and limitations of both,
and we do numerous secondary analyses
from clinical trials that are fairly well-known,
most of which are often subgroup analyses
looking at the randomized treatment in different subgroups,
and many would argue those are inappropriate,
but I think they're hypothesis-generating for future work
and trying to understand that.
So I think understanding the goals and objectives
of what you're doing,
and defining it ahead of time can limit some of the bias,
and actually help you then not overinterpret your findings.
- Okay, thank you Matt.
One other comment.
So Laine, you obviously do a ton of work,
both on the methodology side as well as the applied work.
So in terms of the trials going on, in outcomes,
if you're doing a cluster randomized trial,
where do you see that fitting in the spectrum of
sort of the pure randomized trial
versus more observational study?
- You'll have to-- - At least in
appropriateness for the methods?
- You'll have to answer that question.
I haven't done a single cluster randomized trial (laughs).
- At least my impression is,
you end up doing a randomized trial
that has almost exactly the same issues, data-wise,
that an observational study has, so.
- Okay, we could talk about that in the second part.
Okay, any questions at this point?
- [Dr. Granger] So of course I have to comment, right?
I mean,
but Laine, but so I love the idea
I really do like this idea of taking treatments
where we have a very kind of narrow confidence interval
about the known effect,
and then using those to see how well,
or how well we may or may not be able to replicate those
in observational data sets.
I really like that.
Of course, we have lots of examples.
The challenge
I agree with what you've been saying, but the challenge is,
because sometimes you can come up
with a observation
that approximates what we know to be the truth,
based on the only way to really account for
measured and unmeasured confounders,
doesn't mean that that's the right thing to do,
because we have lots of examples
where that's not been the case.
So many examples,
back from hormone replacement therapy
to erythropoietin for--
- [Laine] I have a slide on that.
- We have so many examples
where we know we can't depend on these types of analyses,
and in fact they result in substantial public health harm
that I think we need to...
Because we can be successful every once in a while
doesn't mean that we should be doing it.
- Yeah, so you bring up a great point,
which is that examples don't make a rule,
and so no way should you say,
these examples mean if you copy this analysis
it will always work.
What does exist is that
all the five principles have been proven by its own theory.
They're know to have theoretical justification.
Without them you can't have causation,
and so what is interesting to see is that
if you adhere to the theoretical principles,
indeed we can replicate reasonable results.
One could argue whether they should be identical,
because the populations are different,
and that we don't necessarily have to be so discouraged
that observational research has no future,
or is always wrong,
because indeed when we can adhere to the principles,
it is actually possible to see results
that are exactly what we should see,
and so it's an example of why the principles matter,
but I would agree that the examples are not the rule.
- [Audience Member] I would also point out, Chris,
that those examples you used
were examples that used traditional adjustment at baseline--
- I have a slide coming.
- [Audience Member] Okay, great.
(audience laughing)
- [Audience Member] But Laine, if I can just--
- [Audience Member] No, you can't, it's mine, sorry.
(audience laughing)
- Sorry, but the key is that
you can't know whether there are unmeasured confounders
until you have another gold standard to test that against,
you can't--
- So I think we're using the word truth
a little fast and loose here.
If you are recruiting half a patient per month,
and you see 100 patients a month,
that is a
and you are one of hundreds of providers
who see those types of patients,
this is a huge selection bias,
which can also have public health consequences
when you apply the answer, not the truth,
but the answer that you got in this incredibly select,
highly-controlled situation,
to the rest of life, which doesn't work that way.
So this is why
Matt said it's iterative.
You have to look.
So we've had efficacy studies that
weren't effective,
and so you can't really say truth.
- Well that's fine, but then
but Matt's gonna address this.
Then the answer there is
not to forego randomization
as the only way we can come up with
equal groups at baseline,
it's to do trials that are more generalizable
because we are less selective.
I think that's where we wanna get to.
- But that assumes, then, that you can always do a trial.
I mean, given the number of conditions,
situations for which you're not gonna do a trial,
we do also wanna be able to do observational research,
in a way,
that mirrors as many advantages of a trial that we can.
I mean, there's a lot of good things about a trial,
and we can emulate more of them.
- That's a good segue, I think, into part two.
- Cool.
- Yeah, let's do that, okay?
- Thank you, Matt.
- All right, well that's good.
We wanted to get a little bit of a discussion in the middle,
which was exactly what we had hoped for.
So real-world evidence,
I think everybody here knows what that is.
There are huge sources of data available now
that we can get from routine practice,
both from the electronic health record,
from claims, and other registries,
and the key now is we have common standards
for how those data are developed,
and how they're archived,
and how they're analyzed,
so that this is a data source now
to do a new type of clinical trial,
pragmatic trials,
and so I think we know,
and we've talked about traditional trials on the left here,
that is it's a very select population,
it's a narrow investigative pool,
it's controlled with blinding and placebo,
and you have a standalone data collection,
whereas with pragmatic trials,
you look at a routine population, we hope,
you have a larger pool of investigators,
or just routine clinicians taking care of patients,
typically done in just a single or few countries,
typically done in an unblinded fashion
because it's often too difficult
to do blinding and storage of placebo or blinded drug
in a pragmatic setting,
and so we often then would have an active control treatment,
and we have centralized data collection,
but the thing that links this is randomization.
I think that's what Chris was making that point is.
Even in a pragmatic trial,
there's still randomization
such that the cohorts of patients who are studied
should be relatively equal,
and these are a series of principles proposed
for pragmatic trials that are different,
but similar to those that I talked about earlier
for traditional trials,
and that is that patients are randomized
or stayed on the randomized treatment as much as possible.
We have reasonable ascertainment
of outcomes and other things,
and I'm not gonna go through the whole list,
but as we think about the evolution of clinical trials,
we need to think about quality and how that is defined,
and then determined when the pragmatic trial is done.
So ADAPTABLE, I think all of you have heard about this.
We've talked about it at previous research forums.
It's the first intervention trial done out of PCORnet
where we're studying low versus high dose of aspirin
in patients with chronic cardiovascular disease.
It's an over-the-counter therapy.
It's an open-label study
where participants are self-randomized through a web portal,
and then followed-up through phone and web portal contact
as well as queries through the common data model
with PCORnet and other data sources,
and so this trial is trying to answer the question of
which dose of aspirin is more effective
and safer for patients with chronic cardiovascular disease.
And we have enrollment rates from ADAPTABLE
that are quite different
than what you see with traditional trials.
We have some sites, including our own at Duke,
that are enrolling up to 60 patients per month at a site.
So you think this was still
a much more representative population
than the traditional trial
that has half a patient per site per month,
but only 4% of the approached participants
actually agreed to enroll and randomize in the trial.
So we're approaching hundreds of thousands of patients
to get the patients enrolled in the study.
So there still is a selection bias
that's unavoidable, I think, even in a pragmatic trial,
but informed consent is necessary,
and you have to recruit patients,
and they have to consent voluntarily
to participate in the study.
This is a trial
that was just presented a few months ago from Sweden,
conducted in the country of Sweden,
where they were looking at patients
who were undergoing percutaneous intervention,
and studying unfractionated heparin and bivalirudin
intravenous anticoagulants,
and the key here is that almost all of the centers
that did this procedure in Sweden participated in the study.
Almost half of the patients with the disease of interest
were actually enrolled in the trial
during the time period that it was conducted, as shown here,
and if you look more narrowly
at the pool that was eligible for enrollment,
70% were randomized.
So that's a much more representative population,
but when we look at the groups who were randomized,
shown in red and blue,
compared to the group that was screened but not randomized,
there are clear differences,
and so even though the randomized groups
were similar and comparable,
the group that's not randomized
does have high-risk features,
and so they're not the ones
who the therapy's being tested in,
and so I think we need to recognize that in pragmatic trials
we still don't look at the whole population
and understand fully how these treatments may work
in patients who are not eligible for the study,
or who cannot be consented for a variety of reasons.
And so randomization is the critical factor
that's still present in pragmatic trials.
When you don't have blinding of the treatment,
and other things that go with a traditional trial,
you may introduce bias,
and that remains to be seen,
and that may influence how you ascertain outcomes,
how the patients are retained in the trial,
and stay on therapy if it's a longer-term therapy
like we're seeing in ADAPTABLE,
but we can recruit larger groups of patients
that are more diverse,
from a larger pool of investigators and sites,
however you wanna call a site,
and so there likely will be this synergism
between traditional trials and pragmatic trials
when you're looking at a given therapy,
and its development cycle of,
how do you assess the treatment effect of a therapy
early on in its development
when it's going for an approved base indication,
and then later when it's being used more commonly
in routine practice,
and so I think the pragmatic trial
is the bridge between the traditional trial
and the observational treatment effects,
and next Laine will talk about some newer approaches
and new methodology that's being used
for comparative effectiveness treatments.
- I think a nice bridge,
and I wanna pause too, because people are interested.
So my intention was to skip the top three principles:
unmeasured confounding,
definition of endpoints,
and adjustment techniques like propensity score methods,
not because they're not important,
because it seems like they've been widely talked about,
like if you came to a talk on observational studies
you expected to hear that,
and so I wanted to talk about some other aspects,
but I will pause for a minute,
particularly because of Chris's interest
in the no-unmeasured-confounding assumption.
I think an interesting point
case in point is that the study I'm working on
on uterine fibroids,
where there's no way we're gonna randomize hysterectomy,
in order to make that actual plausible
that we might have no unmeasured confounding,
we're doing a lot that's never been done before
in another registry,
in terms of going and getting the imaging,
and getting the detailed fibroid features,
the weight, the dimensions of every fibroid.
It's a pain, it's expensive,
and it's one of the burdens of the study,
but the study's bothering to do that
because they wanna do causal inference,
and so we can also, sometimes,
when we know we're not gonna be able to randomize,
do a better job getting those confounders so it's plausible.
So I would in no way minimize the point you're making
as to how important measuring confounders is,
if you have to do observational research.
It's like the main thing I talked about
for a whole two years on that study,
but I'm skipping it today
only because I feel like that's something
people really have a lot of knowledge about.
So I wanted to focus on the two things
that differed in the two analyses that I saw,
which was the concept of new user designs and equipoise,
in particular too,
because that's an area where there's some new methodology
that we can be considering.
So the prevalent user design,
just to clarify what we're talking about,
I'm picturing the anticoagulation example,
and so calendar time is my scale across the bottom,
and this might be kind of the profiles of patients,
and so treatment might be the time
that somebody starts anticoagulation
if they have atrial fibrillation, the Tx,
and they go along,
and at some point they might have a bleed,
and if they have a bleed on anticoagulation,
for the purpose of exaggeration in the example,
I have said they are taken off anticoagulation.
So as we see them later,
the line that's vertical would be the start of a registry,
or the start of a clinical trial.
This happens in ARISTOTLE, the clinical trial, in ORBIT.
Any time we have a start point,
and our data's there at baseline,
we're gonna see their treatment status,
and in my exaggerated example,
what kind of patients are left?
The kind of patients who are left,
still treated at our registry start time,
are the type who don't bleed.
That's the reason why the prevalent user design
tends to give the wrong answer.
So we could assume that there should be selection bias,
and we should expect to see
an attenuated-toward-the-null result
for the risk of warfarin on bleeding,
due to the fact that many of the patients who may have bled
would be selected off the treatment or out of the sample
by the time we see them,
and so selection bias,
this bias of selecting out of our sample,
is not addressed by our typical confounders,
and that's the most important thing,
because I wanted to say today is that
so many people think statisticians do adjustment.
We have selection bias.
Laine, go do adjustment, and I have some variables,
and I'm gonna run a propensity model,
and it's gonna be balanced,
and you'll be like, good, it worked.
The methods we use, and the adjustment variables we have,
do not adjust for this kind of bias,
so it is not getting addressed.
The kind of variables
we would need to adjust for selection bias
would not be the usual confounders.
It would be all unobserved causes of bleeding.
Everything in the biology.
Not just the stuff we see.
We make treatment decisions based on confounders,
but the biology involved in selection bias
is much more complex,
and we don't have the adjustment for this problem.
So this is addressed by design,
and I have the famous Nurses' study,
because I'm not the only person saying this.
Miguel Hernan's going around giving a similar talk, and he
The famous result with the Nurses' Health Study is that
hormone replacement therapy appeared to be beneficial
in the old analyses.
The observational data sets had been showing
that it was beneficial to be on hormone replacement therapy
with respect to coronary heart disease,
and then the Women's Health Initiative randomized trial
showed the exact opposite:
that it was harmful, with a hazard ratio of 1.6.
Then, Miguel Hernan did the analysis in the same data set
with the same adjustment variables,
but just restructured to a new user design,
which is not a trivial analysis to do,
for those few that have supported me in doing it,
it's a bit of work,
but using a principled approach to identifying new users,
the same observational data
that had been giving the hazard ratio of 0.67,
gave a hazard ratio of 1.2,
in line with the clinical trial result.
So it wasn't inherent in the data
that you couldn't get a good result,
or that the analysis was biased.
In this particular case it looks largely attributable
to the prevalent user problem,
and he gives a much longer talk on the details around that,
including a good paper.
So if we know we should use the new user design,
why are we not always doing it?
The new user design sometimes has too few patients.
So we've certainly encountered that the number of patients
using things at baseline
is much more than the number of people
that can identify during followup,
and that, in order for it to be unbiased,
adjustment variables need to be collected longitudinally.
So we need to know why somebody's starting longitudinally,
and that's sometimes not feasible.
So depending on the data set,
it can be done well or it can be done poorly,
but the solutions that I think,
why it's more likely to become possible is,
as we have larger quality improvement registries
and larger clinical databases,
which I'm certainly seeing in outcomes,
we have much more opportunity
to get a decent sample size so we don't--
The sample size restriction
doesn't have to be as big of a deal as it used to
in, say, a smaller registry,
and then, a lot of my projects
have started to collect a lot more information
on longitudinal followup,
so the CHAMP-HF registry has detailed information
about symptoms and treatment changes with dates,
and all that kind of stuff
that I would need to be able to adjust.
So as we're improving our ability,
pulling this information in,
I think in the future as well,
from EHR and claims,
so that we don't have to get somebody
to fill out a CRF with all of it,
the possibilities in the future to have enough sample size,
and have enough information to do new user designs,
I think is increasing.
And so then we also have to have methodological improvements
to help promote these methods and help do this work,
and so there's a couple methods
called sequential stratification,
risk-set matching,
dynamic matching,
everybody who does a method writes their own name for it.
So it's really confusing in terms of jargon,
but they all really come down to
something along these lines,
just to give you the intuition,
that when a person starts treatment,
you have to have a relevant time scale
for this matching to occur,
but a great time scale is time since eligibility
for that treatment.
When somebody starts treatment,
you can go acquire another person who looked just like them,
who did not start treatment,
and follow them from that time forward.
If you create a bunch of these pairs,
pull them together,
they're a new user design,
and one can pool the results over that.
So there's hundreds of papers in this area,
and there's a lot of development still going forward.
There's a lot of open questions, methodologically,
but these are the types of methods
that were used by Miguel Hernan in the Nurses' Health Study,
and that I used to get new users.
So also I've done a couple papers,
and I just wanted to thank ORBIT and ARISTOTLE
for supporting these methods,
'cause they're slow and time-consuming,
and they shouldn't be forever,
but they are now, because I'm still learning.
So another topic is to consider,
do we care about equipoise in observational research?
We know we care about it in a randomized trial
because we're not allowed to randomize if we don't have it,
but maybe we shouldn't totally throw it out the window
in observational research,
kind of at least keep an eye.
So these are distributions of propensity scores,
or the probability of being treated,
and this is something that's really facilitated
by taking a propensity approach to your analysis.
So if you take a propensity approach,
you can look at the propensity score distribution
among the treated and untreated.
This is simulated data,
but both examples come from real things I've seen.
On the left, it's a TAVR/SAVR intervention for heart valves,
and on the right it's flipped, but OAC.
So these are examples that I really see,
where you can see on the left, among the treated patients,
that's the dark line,
tons of people have a propensity score of almost one,
which means their chance,
at least in the current practice in that data set,
physicians are in agreement that type of person
absolutely must get treated. 100% chance, or nearly.
And on the other side, among the untreated,
there is tremendous agreement among a lot of
or among everyone, on certain types of patients,
they absolutely will not get treated,
and so it's, where is equipoise?
Hard to say, but maybe 100% agreement
among practicing physicians
that nobody of that type should get treated
might suggest we're not in a position of equipoise,
and one of the reasons
well, couple things.
So who would we consider putting in a clinical trial?
In these pictures,
I don't know exactly what threshold we would pick,
but maybe somewhere in the middle,
and so I drew some lines to suggest
that might be who you'd be willing to randomize.
Now, the propensity is not the thing
on which we define equipoise for a trial.
So this is just connecting to the concept,
but it's not literally
we think about patient characteristics
in defining equipoise, not propensity scores,
but propensity scores could give us a warning
that we're way out of the ballpark.
If you have ones and zeros,
that person is somebody
for whom everyone in current practice
thinks they know the answer.
So maybe they do.
So the problem is,
we of course want to use observational data
to extend generalizability,
and that goal is at tension with the concept of equipoise,
but we can think about,
over what dimension are we extending generalizability?
Maybe we wanna be more geographically representative,
so is your economic status representative?
A whole lot of things.
And of course even clinical risk factors.
Not all of that is what we have here.
How broad do we really wanna go?
Do we really wanna try to answer causal questions
in people for whom practitioners apparently know the answer?
Maybe.
Maybe that's the purpose of a study.
Maybe we don't trust current practice,
and we actually do wanna test causal questions in patients
for whom everybody already thinks they know the answer,
but it's one thing to want that.
It's another thing to realize
that there's very little evidence in those tails.
So regardless of whether we're interested in those patients,
they have high variability, and the potential for high bias.
It's kind of like a randomized
if we were randomizing them,
we would be using a coin with a chance of 1%
to get treatment, or 0.01,
so imagine a randomized study
that gave 99.9% of patients treatment A,
and 0.01% of patients treatment B.
That would be a really inefficient randomized study.
That's sort of what's happening here,
in the best-case scenario,
given I can adjust for everything,
no unmeasured confounding,
oh, and my randomization coin is terrible.
So if we consider the propensity approach
as a possible way to get red flags,
we might wanna reconsider.
Maybe there are other things.
Maybe there are people
for whom we don't wanna stretch that far.
So there's a trade-off when we ask observational data
to answer harder questions.
When we ask it to say, be more generalizable,
tell me about everybody who I have in this data set
where the data is sparse and the biases can be strong.
So I wouldn't say that we don't ask stretch questions
when they're needed.
I think we still should.
That's one of the advantages of observational data,
but we should possibly check how far we're stretching,
to answer something that is poorly informed
by the available data set,
and the way we can do that methodologically,
one approach is to check the propensity distribution
and think
take a step back when you see stuff like this,
and think about who, possibly, is not--
Maybe somebody's in their data set
you didn't mean to have there.
Why are you studying people who everyone knows how to treat?
Possibly refine the population scientifically,
but it actually does happen a lot
that I've noticed that we can't.
So we think we've got the population,
and we still got a distribution like that,
in which case there's other methods that are coming.
For example, overlap weighting
is a method that was developed by Fan Li
in the Department of Statistical Sciences here,
and she and I both really like it,
and that weighting method reweights the population
towards the center.
So where the original population have these two tails
that we can see on the left, the green line,
I don't know how well you can see it,
is the distribution of patients after overlap weighting.
They get pulled towards equipoise automatically.
So if you can't decide,
you don't know what it is about these patients
that are making them so extreme,
you don't wanna exclude anyone,
the advantage of this is that it adds efficiency,
it stops having the problem
where the tails add all that inefficiency,
pulls toward the center,
and can be a methodological approach
if there's not a scientific one.
So my feeling is there's so many reasons
to do observational research,
but we can design,
rather than simply rely on the disclaimer.
This disclaimer that I write on every paper,
"As with all observational treatment comparisons,
"we cannot rule out the possibility
"that associations are biased by unmeasured confounding."
I think that's an appropriately cautious statement,
but it would be a shame
if that statement eclipses the potential
to do good causal inference in observational data,
and I think we can do even better observational research
by appreciating the strength of clinical trials.
That's my end.
(audience applauding)
- Well thank you for round two.
So do you have any comments on Laine's presentation?
Seems like the two of you
were actually coming together at the end.
She's trying to emulate your methods
and you're talking about (speaking drowned out by laughing).
- I think that's the case,
and I think that part of it is also
you're trying to apply this ahead of time.
Again, not getting into the analysis
and then trying to take a different direction,
but looking at the strength of the data ahead of time
and then applying these new methods
which, again, are worthwhile talking about.
I think in this setting,
there's value to different types of analyses,
and what we wanna do is
show the comparison and contrast here.
There always will be unmeasured confounding,
and we can't get that,
because we never really know
how physicians are making decisions
to use treatments in their patients,
but assessing that in the larger population,
and with the methods that you use,
I think adds value,
and adds strength to the evidence
of what the treatment effects are.
- Yeah, no, I agree that it was very much by design
that we converged,
because I think that some of the strengths
of observational research
are what is being emulated in pragmatic trials,
and whenever you can randomize,
nobody would dispute the value of it,
but I have enough projects where that's not gonna happen,
and so we're trying to see
how can we become even more like a pragmatic trial,
basically.
We might be able to do pragmatic observational studies
that converge to that same goal.
- Yeah, and I think these data sources
allow for that to happen.
So you can do the trial and the observational analyses
in the same data source as the foundation,
and that's, I think, one of the strengths going forward is
that you have a common data source,
and then that's used for both purposes,
and you can even do them in parallel,
for example, in that regard,
if you're doing a randomized trial,
and then looking at a treatment effect
in patients who are not included in the trial,
that also may be an opportunity going forward
with millions and millions of patients
in some of these data sets.
- Excellent.
So I did wanna make two or three comments,
and then maybe take questions from the group.
So one of the things that's come up
with observational studies,
so we talk about how it's more generalizable
than a randomized trial,
and we sort of take that for granted.
I do wanna make a specific comment about that.
So many of my projects are in the heart failure network,
and in the heart failure network
we've done a dozen or so trials,
and then in addition
we do a bunch of secondary manuscripts off that.
One of the cases I had, and I put together a few slides,
I don't necessarily wanna go through them now,
but there's a statement,
and Rob Mentz will laugh at this statement,
but we basically have looked at acute heart failure patients
from this trial, this trial, and this trial,
and the trials were DOSE, CARRESS, and ROSE,
and that means something to you,
probably not the the rest of the group,
but what I think we're trying to say is,
investigators, as we studied acute heart failure,
it seems like a general statement,
but when you put together
the inclusions and exclusions from that trial,
I think there were 18 inclusions to get in those trials.
There were like 65 exclusions to get out of those trials.
Then you end up making statements later in the paper
where you say something like,
the unadjusted rate of whatever it is, is some number,
and I actually put together some SAS code.
I guess I'm interviewing for a SAS programming job later,
but one of the things I said is, what would that
what you did versus what you meant.
So what you did was you took a hyperdistorted population,
like with a massive astigmatism.
You estimated something.
Your intention was actually to do something else,
and that is basically to estimate something
in acute heart failure.
When people actually take your results,
they're gonna take your results
as if you took it from a general registry
of acute heart failure patients.
So I think that's a huge potential problem.
That's sort of one problem.
I don't know if that's for or against observational studies.
Another problem,
and this kind of gets at what Laine was talking about
with the sample size things,
many of the areas I'm involved with,
we care about things like time to event,
like clinical endpoints,
and we have sort of a relative,
like, would you believe an effect of this size?
And oftentimes
it becomes almost implausible to believe something
where you're gonna have a hazard ratio
below, say, something like 0.75.
So if I say I have a new treatment,
or this treatment exists,
and it's gonna make your hazard ratio,
instead of where it is, 0.5.
That's almost implausible,
but when you sort of take those sort of beliefs,
and you sort of say, how many events do I need?
Oftentimes you become sort of
out of the clinical trial business.
You're sort of priced out.
So that's sort of another problem.
Another thing related to that is
the cycle of how long things take.
Many, many years ago,
a lot of people here were involved in a study
using the Duke Data Bank,
which I think
was one of the beautiful observational studies,
and one of the things that was so nice about it
is that so much of what happened,
happened sort of at a very specific point in time,
and so a lot of the biases you're referring to were sort of
we didn't have to pay attention to them
because they were kind of automatically washed away,
but we basically published a paper,
and in that paper,
at the end we said something like,
we would recommend that you do a study of 10,000 patients.
We sort of described, roughly speaking,
how you'd power it, et cetera.
That study was done by a different group
that was funded by something like eight different companies.
That result didn't come out for like nine more years,
and so the cycle of how quick we can do things is actually--
- Yeah.
- I would say it's incredibly slow,
so I just wanna make those brief comments.
- Good summary.
I mean, there clearly are,
despite all your best efforts,
there clearly are limits with the randomized trials,
in time, and biases, and so forth.
- Can I ask a question with ADAPTABLE?
Why do we think the sort of approached participation rate
is as low as it is?
I was surprised by that number.
- Well, I think the key is,
is that we don't really know the right approach now
to try to engage participants in the study.
Many of them are being recruited remotely
by email or by letter,
and they're not having a direct conversation
with the provider.
So they have to be contacted multiple times in a row
in order for them to be randomized in the study,
and we're still learning
why they would choose to participate
and why they would not be.
Most people are already on aspirin,
and they may choose,
if they don't wanna have the chance of changing their dose,
and so that's one of the things that's an unknown right now:
how many patients you have to try to approach or screen,
to then randomize and do pragmatic studies,
and in this case I think you have the bias
that most of them are already on the treatment,
and they have to agree that
there might be a chance that their dose could be changed.
- Okay, so I'll take a question if there is one.
I do wanna make one comment.
I think we are working at the right place at the right time,
in that we have incredibly detailed data sets
to be able to do some of the stuff
that Laine was talking about.
So Eric?
- (audio cuts out) to address your last one.
The issue comes in in the study
that many of those patients weren't at equipoise, right?
The clinician and the patient weren't at equipoise
because the patient, for whatever reason,
really wasn't a candidate for ADAPTABLE.
We see them all the time in our clinic.
You have the ability to actually collect, if you would,
a question that,
where you would,
just a simple question of asking us,
would this be somebody
you really think should have been randomized in ADAPTABLE?
If you had that, and we also have all the clinical data,
then actually a lot of what Laine is talking about
could be done.
You could look at the overall population.
You could look at a population amongst
for whom the question was really relevant,
the patient was at equipoise to actually be randomized,
and then see prospectively whether the answer was the same.
The other thing I might challenge Laine is,
she will have the answer
before you actually have the results from the trial,
so you should have Laine now do the results
and see how close she is,
because I really had this huge problem with the results
of observational studies trying to see
how close they can replicate randomized trials,
because in fact you know what answer
you're trying to hit for,
and lo and behold,
you can find methods that will get you there,
and you can find methods
that will take you farther away from there,
but you ignore those,
because you wouldn't report a result
that's farther away from the result
that you actually wanted.
The final thing is--
- [Laine] I don't have time to run that many analyses.
- No, I got it.
- [Matt] Yeah, I know they're great points.
- But the other final thing is,
it's kind of funny, Kevin, is the choice you chose.
Unless I'm mistaken,
the example we found from the data bank,
which actually did change practice,
actually wasn't defended by the trial itself.
When the trial was done, eight years later,
the result was neutral, that there wasn't harm seen.
So in essence, it's an interesting example to choose, right?
Should our study, although it was profound,
have changed practice?
- All great questions.
More work to do.
Think that's a nice summary
of the things we tried to talk about today.
So it's hard to summarize better than that,
in my perspective.
- Oh yeah, plus our time's up.
- Yeah.
- All right, thank you, everyone.
- Thank you all.
(audience applauding)
(whooshing)
(lively music)
Không có nhận xét nào:
Đăng nhận xét