Published Online: January 24, 2024
How can we leverage AI to transform health care into a more efficient model for delivering care? In this Q&A, JAMA Editor in Chief Kirsten Bibbins-Domingo, PhD, MD, MAS, interviews Atul Butte, MD, PhD, the director of the Bakar Computational Health Sciences Institute at UCSF, to discuss scalable privilege and the need for the broad distribution of AI-driven expertise.
Transcript
[This transcript is auto-generated and unedited.]
- Are randomized control trials the gold standard for evaluating AI tools? With the vast amount of clinical data available, are there alternative approaches to evaluation in real time? Is AI a democratizing force that enables privilege that can scale across all patient populations? I'm Dr. Kirsten Bibbins-Domingo, editor-in-chief of JAMA and the JAMA Network. This conversation is part of a series of videos and podcasts hosted by JAMA, in which we explore the issues surrounding the rapidly evolving intersection of AI and clinical practice. Today, I'm joined by Dr. Atul Butte, who is the distinguished professor and director of the Bakar Computational Health Sciences Institute at the University of California, San Francisco. He is also the chief data scientist over all six academic health systems in the University of California. He's received numerous acknowledgements and awards, including election to the National Academy of Medicine and being recognized by the Obama Administration as a White House champion of change in open science. Welcome, Dr. Butte.
- Thanks so much for having me today.
- Okay. I hope we can do this on a first name basis. We know each other pretty well.
- Of course, we do.
- Okay. Wonderful, wonderful. So, Atul, I love preparing for these interviews because it reminds me of all of the outstanding work our guests have done. You are somebody who has a long history of, as a research scientist in precision medicine, in genetics, and you have over the last number of years taken on this pretty big role of really overseeing the data across University of California campuses. And I'm wondering what makes you excited about this big task of unleashing data, which I've heard you say before. We know that data is the thing that fuels artificial intelligence, which is the focus of this series. What made you decide to continue to research but take on this role and what makes you excited about what you're doing now?
- Yeah, so eight and a half years ago, I got this opportunity to move to UCSF and the University of California Health System as an umbrella entity to build and then eventually use this central data warehouse. When you have six academic health centers and more importantly, you have these natural experiments going on, right? Each clinician, each department takes care of patients slightly differently based on their training. When you have all these natural experiments, you can start to compare and contrast against each other, right? So it's comparative effectiveness to a new scale, right? We can compare every drug, every device, every biologic, because we use them all in University of California. That to me, is super appealing, and I think in general, we have now near 100% deployment of electronic health records in the United States, but in general, nobody's looking at any of that data. But to me, it's a tragedy now if we don't use all that data to improve the practice of medicine.
- So let's describe the... A few things, describe for us, our listeners then, how large the University of California is. How many patients are we talking about? What does it mean to think across many different health systems?
- In our repository right now, we have 9.1 million patients, so we see about a million and a half of them every year. We have 10 hospitals, but I think we're in talks to acquire four more in the next few months, 1000 care delivery sites. And so, the rough number is we have a billion and a half drugs that have been prescribed or ordered half a billion encounters, about 40,000 cancer genomes in the same database, about 50 million medical devices. And all that data is in one central repository that we can use for research.
- So then give us an idea. What you are talking about is doing research, but not necessarily setting up experiments, but actually using the way practice is actually happening to gain new insights about what the consequence of those variations in practice actually mean for patients.
- It's literally the care patterns and pathways of every single patient that we extract from Epic. So it's every dose of every drug, every vital sign, every pain score, every respiratory rate, every lab test result is all in the same database. So we can use that to ask and answer questions, but it takes a lot of magic, you know, those coding magic to make that work. We see a lot of transactions in a health system, right? You have to synthesize that from all the raw data. So this is almost like a cross between comparative effectiveness studies and what the FDA has labeled real world data, real world evidence. This is what happens when we actually use these devices, these drugs, these biologics in real patients. What actually happens, what are the outcomes? And I think, and I believe that those outcomes are worthy of study to ensure that all of our patients are benefiting from all these magical devices and drugs the same way they did in the trials. These are simple questions to ask, but incredibly hard to get answers for. It's not just the coding magic that's needed here. It's also the statistical magic. How do I know that these two doctors on different campuses had the same patients, the same prior medical history, right? So we have to do scoring and matching, propensity score matching, for example, to try to get to the same kind of sub cohort so we can do a comparison. One great example of this just came out in JAMA Network Open where we looked at all secondary therapeutics past metformin for type two diabetes, and we were able to run a massive comparative effectiveness study across all of those pair by pair, looking at each category of second line therapeutics to figure out what are the advantages and disadvantages to each. And to me, this is gonna be the future. In fact, health systems are gonna want to do this to figure out the best universal way to practice medicine. All of this unnecessary care variation might be unnecessary.
- Well, let me ask you, I wanna ask you a question. Now, from a patient perspective, I'm concerned that you're gonna take good care of my data. I wanna know what you're going to use my data for. I wanna know that the data doesn't go anywhere else, and it really is used for the things you're telling me.
- I think the first exposure to any of this comes from contracts or agreements that patients sign to get care. Patients might at that point realize and see that their health system has to follow federal rules and federal laws like HIPAA, but might be using that data for research commercialization samples the same way. So a lot of that first exposure to patients comes from the terms and conditions of service and then the HIPAA privacy policies. Now, if we fully de-identify that data, we're allowed to use it for research. That de-identification process needs to be blessed and certified, which we do. And so, it's also our responsibility to educate communities more, not just getting patient representatives for us to go to the community to educate what we do with that data and why it's in their interest, why it's in all of our interests to have all of our data out there for research so that future drugs and devices aren't just generated from data of a few and a few privileged individuals.
- What you are saying is that there is broad latitude when we're using de-identify data within the health systems protection of that data for purposes that are not ones that you describe. But within that, as long as the data is de-identified, a health system has broad latitude to use within these intended purposes. When there's the need for identifiers that usually HIPAA protects, then there needs to be what we know in this pathway of getting an institutional review board or IRB approval.
- That's exactly right. And I'll just further clarify a little bit. Health systems are allowed to use identifiable data for improving treatment payment and operations that they need. They have to contact patients. We have to send letters for those things, and that includes quality of care, improving quality of care. And then there's research. If we still need those identifiers for research, we need to get IRB approval. But in most cases, we don't need those identifiers. And then, we can do research with de-identified data, and then that's easier to do for health systems.
- And so, the reason that a patient, so the example you gave about metformin and the diabetes therapeutics that was published. The reason that I might want a health system to have access to my data would be so they take good care of me, but also so that they learn what drug, if I had diabetes, I should take as a second line therapy or a third line therapy by knowing the patterns across all patients like me might help to make, to give better guidance for a future clinical decision.
- That's exactly right. And it's not just on the positive side, it's also on the negative side. What drugs should be avoided now? Why are we not seeing benefit? Now, those drugs were approved, so at one point they passed some sort of trial and the FDA might've given their blessings. But now, since we have newer drugs, should we keep using the older ones? Even if they're cheaper, are they actually getting benefit? And so, another nice part of our central repository, it's not just 9.1 million patients, it's spanning over 11 years now. So we have incredible longitudinal data. So it's not just looking at one blood test, like the hemoglobin A1C, but we're able to look over years, more than a decade in some individuals. Are long-term consequences being avoided, like heart attacks and strokes?
- Well, one point that you made that I think is really important and I think about this for what we publish in JAMA, is that the clinical trials really still represent the gold standard for new therapeutics certainly, and in many other instances for interventions we do in medicine. There are many clinical questions we have after a trial is conducted or as practice rolls out or as we move further away that we oftentimes either can't study in a trial, it would be too hard to study in a trial. It it is really part of clinical practice and so therefore, we wouldn't randomize. But there's still questions we'd like to know the answer to. And I sort of think of that as where this real world evidence really has an important role to serve.
- That's absolutely right. We're not saying that this is gonna replace randomized controlled trials. That being said, one cannot run a randomized controlled trial across every pair of second line drugs for diabetes. It would be impossibly expensive. In fact, I think we calculated how onerous it would be. So, there are certain RCTs that are just not possible to run. We have issues in RCTs for pediatrics, for example, getting those done. Another challenge, RCTs, again, I'm not disputing that they're the gold standard evidence. At the same time, I think you also would agree perhaps that they're not the most diverse in terms of the patient participants that are recruited for trials, whereas real world evidence could be more diverse than the actual original populations.
- So tell me, what's your advice to me now as an editor? How should I evaluate studies like yours that come? How do I know that the thing you learned from studying, even a lot of patients in California really applies to anyone else? And what type of designs because you've taken me away from the trials, should I expect when saying, "Hmm, I can understand this finding as a true difference between this second line agent and this second line agent."
- Even the FDA's learning what to do with real world data. How do we generate real world evidence. I think the formula you're gonna see though, first of all, for a top quality journal and network like yours, you're going to see multicenter. I think you're gonna want large numbers because the minute you start to do propensity score matching, right? How do I get exactly the same patients that were chosen with this drug versus that drug? You can't put the whole cohort together that way because the patients might have had different cardiovascular risk, they might have different past medical histories. And so, you need to control for those, whether mathematically with propensity score matching or just looking through past medical history and coding for that. You're gonna need a large enough end, say each of these types of comparisons actually yield a good answer and sometimes they don't. Now, another thing that we started to add to our studies is cross validation.
- So tell me this, a big treasure trove of data like this must be of interest not just to the academics, not just to the health systems seeking to improve care, but to anyone seeking to develop algorithms, developing any type of AI tools that really need lots of data to learn from. How do you think about the role here of data that is ultimately still collected from patients and what we want to do for these types of opportunities that really allow us to scale the information that we learn from any given patient because you have all this data available.
- Yeah, so certainly data feeds AI as we started the software and certainly more data is gonna be much more useful, especially diverse data, representative data that I think we would cover in the University of California. So first of all, we are focused internally for all of our AI researchers to have access to this. Typically, what happens in University of California is faculty postdocs, graduate students learn on their health campus first, and then when they're ready and approved, then they can scale to get five times the answers or five times the testing or the validation. So certainly, having a lot of data helps with the training. I think when I last looked, we have 100 million ambulatory encounters, for example. Imagine training a dataset on a hundred million ambulatory account. We have something like 7 million telemedicine visits that have been documented. Imagine building an AI and just what telemedicine could do. I think there are a lot of companies working on that.
- So you wrote a wonderful editorial in JAMA Oncology and it was on a paper using artificial intelligence. And you closed by saying that the opportunities are to move beyond these smaller pilot studies of artificial intelligence to the real vision, which is in the term you use scalable privilege. Tell me what you mean by scalable privilege when it comes to AI.
- Yeah, so I've had some health issues over the past year and it strikes me when I go to the emergency department, I had to go once at UCSF, that I'm treated very differently than other patients. I know that's what we call privilege. I'm a faculty member here, I'm a donor here, I'm a professor here, so I'm gonna be treated very differently as I would presume you would be too. But the solution then can't be, even though, I know I'm getting better care, the solution can't be necessarily, and it just cannot happen for 10,000 people to wait in our waiting room for our level of care. But what if the future is to learn from the data of those doctors, right? Well, we can extract that from the electronic health records. And then, what if we use that to train the AI and then, deploy that AI to doctors, of course, who perhaps order sets in Epic or Cerner or decision support tools, you should be thinking about this or that, but also deploy to patients to make sure that a patient now increasingly empowered with their medical record because we have to give them their medical records digitally. That patient who's empowered might get advice, you know, patient facing decision support tool on their mobile device, for example. And so to me, that privilege, what I'm really working towards now is scaling that privilege that we get when we go to our own university's medical system and emergency room. I would love to scale that privilege to everyone else in the world. And I think that's the way I'm describing AI now. Of course, we know about the fairness, we know the transparency, we know it's gotta be trained with the right data sets and inclusive data sets, diverse data sets. But the positive side is that if we get this right, we can scale that privilege that many of us have to really treat more in the world.
- So meaning that in that rural hospital, that the types of decision supports, the decision aids, the types of ways in which we interact with technology in clinical practice, those will be, we will have the assistance of this type of intelligence that actually has learned from the places where care might have been developed for a rare condition, might have been optimized in some way. And that that would actually assist then all doctors or all clinicians, wherever they're practicing and then benefit all patients.
- Exactly right. I think we might not get one of these decision support tools. We might get different ones across the country, but the patients can't traverse to a NCI comprehensive cancer center or a heart center. So they're going to get care locally. Wouldn't it be the right thing to do for us to use our data to help those health systems and really help those patients through that way.
- What types of things are you using AI tools in your daily work now, and what things would you stay away from right now?
- In my group are using it for a lot of different uses, for research purposes, where we're researching how emergency department notes like the history and physical or even just the chief complaint and history might triage a patient in emergency department or can predict whether that patient's gonna get admitted or not. We're looking at progress notes to see can we discern from a list of 15 reasons why did this patient stop the medication. We can see they stopped the medication in structured data. Can we figure out why from the doctor's note, did they have a side effect from this medication because we can't see it in structured data? Did the doctor document a side effect? Why is AI so exciting right now? Because of the notes, as you know, so much clinical care is documented in text notes and now we've got these magical tools like GPT and large language models that understand notes. And so, that's why it's so exciting right now. It's because we can get to that final, that last mile of understanding patients digitally that we never could unless we hired people to read these notes. So that's why it's super exciting right now.
- That's a great example. What do you stay away from or what are you still a little bit skeptical of or be more hype than it's worth right now?
- It's hard to tell what's hype, but the warning I gave everyone, right, is this warning about hallucinations that these language models, any AI, if you ask it to extrapolate beyond. No, it's gonna start to make up things and we might call those hallucinations. I think we bucket a lot of things under hallucinations that aren't as pejorative as this phrase hallucination. Peter Lee has the same belief, Microsoft Research. Sometimes some good hallucinations are what I think he and I would call hypotheses, that there is no right answer to give, but maybe this is a right answer to test. Now, it would be useful for these large language models like OpenAI to tell us when they were hallucinating. They don't often do that, so you gotta be careful. And I think all of that's gonna start to get better and better every six to 12 months. But for now, we had to watch out for hallucinations.
- Wonderful, wonderful. Great example. Well, Atul, it's been a pleasure talking with you. Thank you for taking the time today.
- As always, it is such a great pleasure talking to you Kirsten.
- And to our audience, thank you for watching and listening. We welcome submissions in response to JAMA's AI and Medicine Call for Papers. Subscribe to the JAMA Network YouTube channel, and follow JAMA Network Podcasts wherever you get your podcasts. Until next time, stay informed and stay inspired.