ChatGPT is the bleeding edge in artificial intelligence. But why should healthcare professionals care?
What is ChatGPT?
In case you haven’t heard, OpenAI - the same organisation that brought us the groundbreaking GPT-3 AI that is able to engage in human-like creative discourse - has just announced a new generation of its natural language processing (NLP) algorithms. This one is called ChatGPT, and it’s already making big waves in the tech world.
Before we go any further, it’s worth pointing out that many people take OpenAI’s press releases with a pinch of salt. They’ve fallen foul of overhyping their work in the past, famously claiming that an earlier generation of this type of AI called GPT-2 was "too dangerous" to release to the public. However, this time around it seems like they’ve learned from their mistakes: ChatGPT is currently available for anyone to interact with, so they’re effectively letting the tech speak for itself.
And what does it have to say? Well, here’s a short dialogue I had with ChatGPT this morning:
Me: What medications are usually prescribed for a patient with a suspected NSTEMI?
ChatGPT: A patient with a suspected non-ST-elevation myocardial infarction (NSTEMI) may be prescribed medications to help manage the condition and prevent further complications. These may include medications to help improve blood flow to the heart, such as aspirin and other antiplatelet drugs, as well as medications to help lower the heart rate and blood pressure, such as beta blockers and ACE inhibitors. The specific medications and treatment plan will depend on the individual patient and their specific needs. It is important for the patient to follow their healthcare provider's instructions and take all medications as prescribed.
Me: Which antiplatelets would be given in addition to aspirin?
ChatGPT: Aspirin is a commonly used antiplatelet medication, and it may be used alone or in combination with other antiplatelet drugs in the treatment of a suspected non-ST-elevation myocardial infarction (NSTEMI). Some examples of other antiplatelet drugs that may be used in combination with aspirin include clopidogrel, prasugrel, and ticagrelor. The specific antiplatelet medication or combination of medications that is best for a particular patient will depend on their individual situation, and it is important for them to follow the advice and treatment plan provided by their healthcare provider.
Me: If a patient complained of chest pain that was tearing in nature and radiated to the interscapular region, what might be on the list of differentials alongside an acute MI?
ChatGPT: Chest pain that is tearing in nature and radiates to the interscapular region may be a symptom of a number of different conditions. Some possible differentials alongside an acute myocardial infarction (MI) include:
- Angina: Chest pain caused by reduced blood flow to the heart.
- Pulmonary embolism: A blockage in the pulmonary artery or one of its branches.
- Aortic dissection: A tear in the inner layer of the aorta, the major artery that carries blood from the heart to the rest of the body.
- Gastroesophageal reflux disease (GERD): A condition in which stomach acid flows back into the esophagus, causing symptoms such as chest pain.
- Esophageal spasm: Uncoordinated contractions of the muscles in the esophagus, which can cause chest pain.
It is important for patients experiencing chest pain to seek medical attention as soon as possible so that the underlying cause can be identified and treated.
Not bad, right? And it’s worth noting that ChatGPT provided these responses as quickly as I might expect a fellow medical professional to provide them. So what’s going on under the hood?
How ChatGPT works
Despite the name, OpenAI are not actually very open. We know from the original GPT-3 research paper that GPT-3 (ChatGPT’s predecessor) is a behemoth of an AI. It’s based on the “transformer” neural network architecture that has become standard for NLP algorithms in the last few years, but it contains a staggering 175BN trainable parameters. However, OpenAI have pointed out that ChatGPT is not based on the original AI described in that paper. Rather, it’s based on a new and improved variant called GPT-3.5, about which we know very little.
To make matters even murkier, OpenAI's blog about ChatGPT also describes ChatGPT as a “sibling model to InstructGPT”. We do know a fair amount about InstructGPT by virtue of another recent OpenAI research paper. The thing is, there’s a huge difference between GPT-3 and InstructGPT. Most importantly, InstructGPT has a mere 1.3BN trainable parameters, making it fully two orders of magnitude smaller than GPT-3. And as a rule, the smaller an AI algorithm, the cheaper and easier it is to work with, which could have big implications for point-of-care applications. So understanding where ChatGPT sits on this scale is pretty important.
One thing we do know sets ChatGPT apart from GPT-3 is that ChatGPT was trained using “reinforcement learning”. That’s an AI technique made most famous by DeepMind's AlphaGo AI. However, where the original AlphaGo learned largely from self-play, ChatGPT used human-in-loop reinforcement learning to help make it as user-friendly as possible. From a healthcare standpoint, that could mean that ChatGPT is better able to interact effectively with both patients and medical workforces.
Retrieval-augmented language generation
Probably the important aspect of ChatGPT’s design, though, is its use of a “retrieval-augmented” approach to language generation. This means that when it is asked a question, the first thing ChatGPT does is look for (“retrieve”) data sources that might contain information relevant to the question. A bit like Google, it uses a combination of old-fashioned and AI-based techniques to comb a huge bank of data and shortlist the most relevant sources. (You can find out more about these techniques in our blog series on intelligent search for healthcare.)
Once ChatGPT has marshalled some appropriate sources of information, it takes a look at both the user-provided question and the search-engine-provided information together, and takes account of the whole lot when it generates a response. This is quite different from the original GPT-3, which relied largely on “rote learning” information during training. Looked at through a healthcare lens, GPT-3 is like a medical student from the mid 20th century who was expected to rote learn and regurgitate whole textbooks. Retrieval-augmented models like ChatGPT are more like modern med students, for whom the ability to find the answers to their questions online in real time is much more important than the ability to rote learn (even if some med school curricula haven’t quite caught up with this yet…).
To exercise even more control over how ChatGPT generates responses, we can provide our own information sources along with our questions. To get a feel for how this might work in the clinical setting, here’s another interaction I had with ChatGPT:
Me: Given the following guidance, would I test thyroid function in a patient feeling hot who has a chest infection? Please justify vour answer.
[The following is a snippet from the Thyroid Disease: Assessment and Management guideline from the UK’s National Institute for Health and Care Excellence, or NICE:]
Indications for tests for thyroid dysfunction
- 1.2.1 Consider tests for thyroid dysfunction for adults, children and young people if there is a clinical suspicion of thyroid disease, but bear in mind that 1 symptom alone may not be indicative of thyroid disease.
- 1.2.2 Offer tests for thyroid dysfunction to adults, children and young people with:
- type 1 diabetes or other autoimmune diseases, or
- new-onset atrial fibrillation.
- 1.2.3 Consider tests for thyroid dysfunction for adults, children and young people with depression or unexplained anxiety.
- 1.2.4 Consider tests for thyroid dysfunction for children and young people with abnormal growth, or unexplained change in behaviour or school performance.
- 1.2.5 Be aware that in menopausal women symptoms of thyroid dysfunction may be mistaken for menopause.
- 1.2.6 Do not test for thyroid dysfunction during an acute illness unless you suspect the acute illness is due to thyroid dysfunction, because the acute illness may affect the test results.
ChatGPT: Based on the guidance provided, it would not be necessary to test thyroid function in a patient feeling hot who has a chest infection. The guidance indicates that tests for thyroid dysfunction should only be offered if there is a clinical suspicion of thyroid disease, and the patient's symptoms do not appear to be indicative of thyroid disease. Additionally, the guidance states that tests should not be performed during an acute illness unless the acute illness is suspected to be due to thyroid dysfunction. Since the patient in this scenario has a chest infection, it is unlikely that the illness is related to thyroid dysfunction, and therefore testing is not indicated.
As you can see, ChatGPT is reasoning over the NICE guidance in its answer. Older language models like GPT-3 could do that to an extent too. But retrieval-augmented models like ChatGPT are specifically using this approach (generating responses based on provided context), and tend to be much better at it.
Example of how we could use ChatGPT to build clinical applications
Bottom line then: let’s take a look at how one could actually use an AI like ChatGPT to add value at the point of care.
Anyone who follows Eolas knows that we’re working hard on intelligent clinical search, which helps us connect healthcare professionals with high quality information as quickly as possible. Although OpenAI have kept much of the detail of ChatGPT’s implementation secret, it clearly handles medical language very well. As we’ve described in a previous blog post on medical language processing, dealing with the unique vocabulary of clinical language is a major challenge for NLP applications. The part of ChatGPT that generates semantic embeddings - known as the “encoder” - could be very useful for indexing clinical documents as part of a semantic search application, particularly where other AIs struggle to make sense of the specialised language.
The advantage of using ChatGPT to index documents for clinical search is that this use-case could be deployed tomorrow. However, ChatGPT also gives us a glimpse into a future generation of clinical applications - probably still a few years away from being deployed at the point-of-care - that fully leverage its language-generating capabilities.
For example, we could use ChatGPT to deliver real-time decision support or automated audit using free-text EHR entries:
Me: Here is a patient note: "50yo male presenting with central CP, TnT 34, new TWI anterior chest leads. Likely diagnosis: NSTEMI. Given aspirin 300mg and clopidogrel 300mg, referred cardiology."
Here is part of a guideline: "In patient with suspected acute coronary syndrome, alongside a loading dose of aspiring, ticagrelor 180mg should be administered in preference to clopidogrel."
Has the patient been treated in accordance with the guidance shown here?
ChatGPT: It appears that the patient was treated with aspirin and clopidogrel, but the guideline recommends administering ticagrelor in preference to clopidogrel in this situation. Therefore, it does not appear that the patient was treated in accordance with the guideline. It is important for healthcare providers to carefully review and follow guidelines when treating patients to ensure they receive the most appropriate and effective care.
Or we can use it to make sense of medical jargon, either to help patients make sense of their medical records or to “tidy up” medical records so that they can be used in other downstream applications that handle medical abbreviations poorly:
Me: Here is an outpatient clinic note:
"50yo M. Presenting complaint: exertional CP, none at rest. PMH: HTN, T2DM. DH: Rampril 5mg OD, Metformin 500mg BD, Atorvastatin 40mg ON. SH: ex smoker, 20 pack years. Occasional alcohol. Office worker. FH: Mother MI at 67yo, Father unknown. O/E Normal HS, no signs of CCF. ECG: NSR, nil of note. Plan: CTCA, r/v with results."
Can you translate this into a plain English summary?
ChatGPT: This patient is a 50-year-old male who is presenting with exertional chest pain but no chest pain at rest. He has a medical history of hypertension and type 2 diabetes mellitus, and is currently taking Rampril (an angiotensin-converting enzyme inhibitor) once daily, metformin twice daily, and atorvastatin once nightly. He used to smoke and has a 20-pack-year history of smoking, and occasionally drinks alcohol. He is an office worker and has a family history of myocardial infarction in his mother at the age of 67. The physical examination is normal and there are no signs of congestive heart failure. The electrocardiogram shows normal sinus rhythm and nothing of note. The plan is to perform a computed tomography coronary angiogram and review the results with the patient.
But here’s the example that really blew me away. With just three prompts from me, ChatGPT is able to create a computer programme that automatically applied clinical rules, translate a shorthand patient note into a computer-readable object, and execute the programme. If you’re either not a medic or not a programmer, you’ll have to take my word for the fact that ChatGPT navigates both of these highly specialised domains - and the interface between them - stunningly well:
Me: Here is an outpatient clinic note: "50yo M. Presenting complaint: exertional CP, none at rest. PMH: HTN, T2DM. DH: Rampril 5mg OD, Metformin 500mg BD, Atorvastatin 40mg ON. SH: ex smoker, 20 pack years. Occasional alcohol. Office worker. FH: Mother MI at 67yo, Father unknown. O/E Normal HS, no signs of CCF. ECG: NSR, nil of note."
Can you translate this into Python code, describing a class named "patient" with suitable properties and types, then instantiate an instance of this class to the describe the patient from the clinic note above?
ChatGPT: Here is an example of how you could create a Python class named "Patient" to represent the information in the clinic note:
class Patient: def __init__(self, age, gender, presenting_complaint, past_medical_history, current_medications, social_history, family_history, examination, ecg): self.age = age self.gender = gender self.presenting_complaint = presenting_complaint self.past_medical_history = past_medical_history self.current_medications = current_medications self.social_history = social_history self.family_history = family_history self.examination = examination self.ecg = ecg
Here is an example of how you could create an instance of the Patient class to represent the patient from the clinic note:
patient = Patient(age=50, gender="M", presenting_complaint="exertional CP, none at rest", past_medical_history="HTN, T2DM", current_medications=["Rampril 5mg OD", "Metformin 500mg BD", "Atorvastatin 40mg ON"], social_history="ex smoker, 20 pack years. \ Occasional alcohol. Office worker.", family_history="Mother MI at 67yo, Father unknown", examination="normal HS, no signs of CCF", ecg="NSR, nil of note")
Me: Can you translate the following guidance into Python functions that take an instance of the class "patient" defined below as an input argument?
[The following is an extract from the NICE guideline: "Recent-onset chest pain of suspected cardiac origin: assessment and diagnosis"]
1.3 People presenting with stable chest pain This section of the guideline addresses the assessment and diagnosis of intermittent stable chest pain in people with suspected stable angina. 1.3.1.1 Exclude a diagnosis of stable angina if clinical assessment indicates nonanginal chest pain (see recommendation 1.3.3.1) and there are no other aspects of the history or risk factors raising clinical suspicion. [2016] 1.3.1.2 If clinical assessment indicates typical or atypical angina (see recommendation 1.3.3.1), offer diagnostic testing (see the sections on diagnostic testing for people in whom stable angina cannot be excluded by clinical assessment alone, additional diagnostic investigations and use of non-invasive functional testing for myocardial ischaemia). [2016]
[Copy-paste of the patient class previously defined by ChatGPT]
ChatGPT: The following Python functions can be used to implement the guidance provided in the question:
def exclude_diagnosis_of_stable_angina(patient): if patient.presenting_complaint == "nonanginal chest pain" and not any(patient.past_medical_history, patient.social_history, patient.family_history): return True else: return Falsedef offer_diagnostic_testing(patient): if patient.presenting_complaint == "typical angina" or patient.presenting_complaint == "atypical angina": return True else: return False
These functions can be called on an instance of the Patient class, and will return True or False depending on whether the guidance recommends excluding the diagnosis of stable angina or offering diagnostic testing. For example:
# create an instance of the Patient classpatient = Patient(age=56, gender="male", presenting_complaint="nonanginal chest pain", past_medical_history=[], current_medications=[], social_history=[], family_history=[], examination="", ecg="")# check whether the diagnosis of stable angina should be excludedexclude_diagnosis_of_stable_angina(patient) # returns True# check whether diagnostic testing should be offeredoffer_diagnostic_testing(patient) # returns False
It is important to note that these functions are only intended as a rough example of how the guidance could be implemented in Python. In a real-world application, these functions would likely need to be more complex and comprehensive in order to accurately implement the guidance.
Me: Can you (1) translate the presenting complaint of the patient shown below into terms compliant with the two Python functions shown; (2) evaluate the Python functions with respect the patient given?
def exclude_diagnosis_of_stable_angina(patient): if patient.presenting_complaint == "nonanginal chest pain" and not any(patient.past_medical_history, patient.social_history, patient.family_history): return True else: return Falsedef offer_diagnostic_testing(patient): if patient.presenting_complaint == "typical angina" or patient.presenting_complaint == "atypical angina": return True else: return Falsepatient = Patient(age=50, gender="M", presenting_complaint="exertional CP, none at rest", past_medical_history="HTN, T2DM", current_medications=["Rampril 5mg OD", "Metformin 500mg BD", "Atorvastatin 40mg ON"], social_history="ex smoker, 20 pack years. \ Occasional alcohol. \ Office worker.", family_history="Mother MI at 67yo, Father unknown", examination="normal HS, no signs of CCF", ecg="NSR, nil of note")
ChatGPT: To translate the patient's presenting complaint into terms compliant with the two Python functions, we can use the following mapping:
"exertional CP, none at rest" -> "typical angina" We can then evaluate the Python functions with respect to the patient as follows:
# create an instance of the Patient classpatient = Patient(age=50, gender="M", presenting_complaint="typical angina", past_medical_history="HTN, T2DM", current_medications=["Rampril 5mg OD", "Metformin 500mg BD", "Atorvastatin 40mg ON"], social_history="ex smoker, 20 pack years. \ Occasional alcohol. \ Office worker.", family_history="Mother MI at 67yo, Father unknown", examination="normal HS, no signs of CCF", ecg="NSR, nil of note")# check whether the diagnosis of stable angina should be excludedexclude_diagnosis_of_stable_angina(patient) # returns False# check whether diagnostic testing should be offeredoffer_diagnostic_testing(patient) # returns True
In this case, the exclude_diagnosis_of_stable_angina function returns False because the patient's presenting complaint (typical angina) does not match the criteria for excluding the diagnosis of stable angina. The offer_diagnostic_testing function returns True because the patient's presenting complaint (typical angina) matches the criteria for offering diagnostic testing.
And there you have it: a quick tour of ChatGPT and an illustration of the kind of clinical applications that this technology might catalyse in the future. And one final note: all the images in this blog were generated by another OpenAI model named DALL-E! Welcome to the future...