What's the word on Health IT street?

Health IT and General
Rob Brisk
June 18, 2023

It’s been a busy few months at Eolas, and as usual the blog is the first thing to suffer! The good news, though, is that we’ve been busy attending some of the world’s leading health IT events. Here’s what we have to report:


At HIMSS23 in Chicago, three words seemed to crop up in almost every conversation: “large language models”.

HIMSS23 Chicago

The headline event on day one was a panel discussion including Peter Lee (corporate vice president of Microsoft Research), Andrew Moore (former dean of computer science at Carnegie Mellon who serves on the US National Security Commission on AI), Kay Firth-Butterfield (internationally renowned human rights lawyer and co-founder of the Responsible AI institute) and Reid Blackman (professor of philosophy and author of “Ethical Machines”). Eolas CEO Declan and I were in the audience, but with such a high profile line-up we’d expected it to be a fairly bland, tow-the-insitutional-line sort of affair. In fact, it was nothing of the sort.

One question that threw up a striking difference of opinion was how “intelligent” LLMs actually are. Peter Lee, who has been working extensively with GPT-4 following Microsoft’s $10BN investment in OpenAI, reported that his team have been “unable to disprove” the hypothesis that GPT-4 has elements of “true” intelligence. Reid Blackman, on the other hand, dismissed this with what has become a pet phrase of AI skeptics recently: “At the end of the day, it’s just a next word predictor”. (More on this in the next blog instalment.)

Another sticky issue was around future development of LLMs. Kay Firth-Butterfield is one the highest profile signatories of an open letter calling for a moratorium on the training of very large AI models. Proponents of this position argue that the speed of AI evolution has massively outstripped the relevant policy, regulation and ethical frameworks. This is making AI a dangerously unregulated space and, they say, we need to hit the pause button before it all goes too far. Some well known figures like Elon Musk have even described AI as an existential threat to human civilisation. Needless to say, both Microsoft’s Peter Lee and Google’s Andrew Moore were a little less keen on this idea.

(NB - if you’re interested in this debate, there’s a great conversation between two “AI superstars” - Andrew Ng and Yann LeCun - who think a pause in AI development is a bad idea here. )

The theme of LLMs continued to pop up throughout the conference, and was thrown into even sharper relief when Microsoft and Epic announced a strategic partnership to integrate LLM technology with the EHR. For me, though, the real showstopper was (Microsoft-owned company) Nuance’s demo of their new ambient clinical intelligence tool. The tool (named “DAX”) uses a smartphone to place an ambient listener in a clinician-patient consultation. It then summarises the transcript of the consultation using GPT-4, and goes on to provide suggestions around differential diagnoses and management options. The AI-powered assistant even offers to arrange investigations and book future appointments. It’s early days and DAX is still in early testing with a few key partner sites, but the potential efficiency gains from this kind of technology are staggering.

Nuance DAX System

The one major shortcoming of the demo was a reliance on GPT-4’s internal medical knowledge for making clinical suggestions. Or, put another way, the demo didn’t pull in curated clinical evidence sources in real time. This seemed like a pretty significant issue for two reasons: firstly, there’s the risk that GPT-4’s medical knowledge, acquired during its initial training, will become out of date as the medical evidence base evolves. Secondly, asking generative models to provide insights or answer questions “from memory” - as opposed to providing them with relevant reference materials and asking them to extract the answers from those documents - increases the chance of “hallucinations”. This term is a euphemism for when LLMs confabulate, which could be particularly dangerous in a frontline clinical setting.

In future, we at Eolas are betting that clinicians will want this kind of system to make clinical suggestions based on reference materials that they can select and control. One reason is that this will allow a clinician to point the system to their institution’s local guidance, which is tailored to the local patient population and locally available resources. Incorporating local guidance will help to mitigate the bias that tends to creep into broad-scope guidelines (where privileged populations are almost always over-represented).

Another reason is that clinical suggestions derived from uncurated sources can strong-arm clinicians into a certain course of action. For example, you may see a patient with chest pain. Based on your clinical acumen, you might be very comfortable this is not a case of pulmonary embolism. But if an AI assistant tells you: International Guideline X states that all patients with chest pain and tachycardia should have a D-dimer checked...

You can't unsee that.

Well, as one of our clinical colleagues put it: “You can’t unsee that”.

The challenge, of course, is that most institutions don’t store their clinical guidelines and protocols in a way that can be made accessible to AI systems like DAX. At Eolas, that’s exactly what we’re building: a “semantic search platform” that underpins all our users' clinical documents. All in all, our HIMSS experience reinforced our view that now is a great time for healthcare providers to be adopting platforms like ours to get AI ready. Having seen DAX in action - as just one example of a whole new generation of LLM-powered clinical applications - we’re willing to bet that the clinical workforce will start voting with its feet as these technologies start hitting the market over the next couple of years.

By the way, for more information on what we mean by a “semantic search platform” and how we’re approach this space, check out our blog series on semantic search in a healthcare setting!

AIMed 23

Having reaffirmed our belief that AI - and LLMs in particular - have a huge role to play in the future of health IT, I was lucky enough to attended AIMed’s 2023 Global Summit. For anyone who hasn’t heard of AIMed, it’s the leading clinician-facing AI community, closely partnered with the American Board of Artificial Intelligence in Medicine (ABAIM). You can check out highlights from a couple of the sessions I had the pleasure of moderating here and here. But at a high level, a few themes really shone through over the course of the show.

AIMed23 San Diego

Firstly, just like at HIMSS, LLMs were the talk of the town. Secondly, the issue of equality shone through loud and clear -  discussed both in terms of ensuring equal access to LLM-powered technologies, and putting ethical principles into practice to ensure that LLM-based applications do not amplify existing societal inequalities (as AI is wont to do). Lastly, there was a lot of talk about the issue of the growing knowledge gap between clinicians and technologists.

When it comes to the discussions around ethics, equality and bias - it strikes me that next-generation LLMs like GPT-4 raise a lot of questions that we’re yet to clearly answer. For example, data provenance (i.e. knowing where your training data came from) has been a huge issue in the field of AI to date, because datasets that under-represent key elements of your target population can cause your AI system to perform poorly when applied to those groups. But where do we stand with LLMs like GPT-4? Sure, there’s huge bias in their training data: they’re trained on giant swathes of the public internet, where a disproportionate amount of data is derived from privileged (often caucasian) populations. But the “higher reasoning” abilities that sufficiently large and sophisticated models develop seems very akin to what we humans might call “insight” into these issues. For example, here’s an interaction we just had with GPT-4:

Me: Can you describe extra considerations that we might need to account for when applying guidance on the management of acute coronary syndrome to minority populations?

GPT-4: When applying guidelines for the management of acute coronary syndrome (ACS) to minority populations, it is essential to consider several factors to ensure equitable and effective care. Some of these considerations include:

  1. Socioeconomic Status: Minority populations often have lower socioeconomic status. This may affect their ability to access healthcare services, afford medications, or adhere to lifestyle modifications suggested for managing ACS. Tailored interventions that take into account financial constraints should be considered.
  2. Cultural Competence: Cultural beliefs and attitudes toward health and illness can greatly affect how patients perceive and manage ACS. For instance, some cultures might rely more on traditional or alternative medicines. Health professionals need to be culturally competent and sensitive to these differences.
  3. Language Barriers: Communication is crucial in managing ACS. Language barriers can hinder proper understanding of the condition, its severity, and the importance of follow-up care. Providing translators or materials in a patient’s native language can improve communication.
  4. Genetic Variability: Some minority populations may have different genetic predispositions to heart disease, which can influence the incidence, presentation, and response to treatment. This may necessitate adapting screening approaches or therapeutic regimens.
  5. Comorbidities: Minority populations might have higher prevalence rates of certain comorbidities such as diabetes and hypertension, which can complicate the management of ACS. Addressing these conditions alongside ACS is essential for a comprehensive approach.
  6. Health Literacy: Lower health literacy is common among minority populations. This can affect their ability to understand and manage ACS. Educational interventions tailored to varying literacy levels can be beneficial.
  7. Bias and Discrimination: Implicit biases among healthcare providers may lead to differences in the quality of care provided to minority populations. Healthcare professionals should be trained to recognize and address these biases.
  8. Access to Healthcare: In some cases, minority populations have limited access to healthcare facilities or specialized cardiac care. This can delay diagnosis and treatment. Strategies to increase access, such as community-based clinics or telemedicine, can be important.
  9. Trust in Healthcare System: Historical injustices and ongoing disparities in healthcare may contribute to a lack of trust in the healthcare system among minority populations. Building relationships and trust with patients is important for successful management.
  10. Pharmacogenomics: Variability in drug metabolism and response across different ethnic groups can impact the efficacy and safety of medications used in ACS management. Understanding these differences may help in optimizing drug therapy for individual patients.
  11. Social Support: The availability and role of family and community support can vary among different cultures and may affect adherence to management plans. Engaging family and community members in care planning may improve outcomes.

Incorporating these considerations into the management of acute coronary syndrome in minority populations can contribute to better outcomes and health equity. It is also important to actively involve patients from minority populations in decision-making processes regarding their care, to ensure that the management plans are aligned with their values, preferences, and circumstances.

So what does that mean for someone trying to apply today’s ethical AI principles to projects involving LLMs? If an LLM shows insight into sources of inequality and mitigating measures, and can incorporate these into clinical recommendations, is the question of data provenance  rendered moot for this type of AI model? Or do we risk a more insidious type of bias that is masked by the illusion of human-like insight?

Honestly, I’ve no idea. But it was great to learn at AIMed that some very smart people are already tackling this sort of question! And also to hear that we’re moving towards industry collaboration and consensus building with organisations like the Coalition for Health AI (CHAI).

In terms of that second issue around the knowledge gap (or perhaps knowledge gulf) that has grown between an increasingly sub-specialised medical profession and the AI community over recent years - this really got me thinking.

Meeting of minds

Firstly, assuming that the way to bridge this gap is for clinicians and data scientists to find some middle point for a meeting of minds - where does that point lie? It’s true that most practising clinicians are going to struggle to commit the kind of time and headspace needed to gen up on modern AI methods. But beyond those things, the only additional requirements are a half decent laptop and a reliable internet connection. Whereas coming from the other direction? Personally, it took six years at medical school and probably another ten thousand hours at the clinical coal face before I felt I understood the health system deeply enough to think about systems-level innovation. For my money, the onus is on the clinical community to learn to speak AI, not the other way around.

Which brings us to “secondly”: when it comes to LLMs, how much does a clinician or healthcare executive need to know to be dangerous? The more I’ve thought about this, the more I’ve realised it’s a really tough one. Interacting with ChatGPT is fantastic to get an intuitive understanding of what LLMs can do, and even to get a feel for what they can’t. But it’s also not nearly enough. Assuming you’ve managed to get GPT-3.5 (the model that powers the public-facing version of ChatGPT, at the time of writing) to “hallucinate” - or even if you’ve just read a story like this - you’ll understand why we can’t just signpost patients to ChatGPT and provide a first-line medical advice service to help reduce the burden on primary care. But how about if we provide GPT with robust reference information in real-time as we ask questions - does that solve the problem? And how would we even go about doing that? Or where else can we use LLMs to solve our biggest headaches at the point of care without putting patients at risk? And what fundamental AI concepts do I need to understand to start ideating around this for myself?

For my money, the advent of LLMs - and the speed with which they are set to disrupt healthcare - has made the question of clinician education in this field more important than ever. In fact, I think it’s among the most significant blockers of desirable progress of AI in healthcare today. Which is why I’m going to aim to be a bit more timely with the next instalment of the blog, and to try and sketch out a roadmap of how a clinician or healthcare leader might go about educating themselves in this space (without needing a sabbatical for degree in data science).

Thanks for reading, and if you’re not already an Eolas user - don’t forget it’s now completely free to use! Just create an account and get cracking at www.eolasmedical.com