
Credit: Unsplash/CC0 Public Domain
In the future, large-scale language models (LLMs) may be used to automatically read clinical notes in medical records and reliably and efficiently extract relevant information to aid in patient care and research. However, a recent study from Columbia University’s Mailman School of Public Health used ChatGPT-4 to read emergency room medical notes and determine whether injured scooter and cyclists were wearing helmets, and found that LLMs cannot yet do this reliably. JAMA Network Open.
In a study of 54,569 emergency department visits from patients injured while riding bicycles, scooters, and other micro-mobility vehicles from 2019 to 2022, the AI LLM struggled to replicate the results of a text string search-based approach to extract helmet status from clinical records.
LLM performed well only when the prompt contained all the text used in the text string search-based approach. LLM also struggled to reproduce the task on each trial over five consecutive days, and performed better at reproducing hallucinations than exact tasks. LLM especially struggled when the phrase was negated, such as reading “no helmet” or “no helmet” and the patient reporting that they were wearing a helmet.
Electronic medical records contain a large amount of medically relevant data in the form of clinical notes, a type of unstructured data. An efficient way to read and extract information from these notes would be extremely useful for research.
Currently, information from these clinical records can be extracted using simple string matching text search approaches, or more advanced artificial intelligence (AI) based approaches such as natural language processing. It was hoped that new LLMs such as ChatGPT-4 would enable information extraction more quickly and reliably.
“Using generative AI LLMs for information extraction tasks has the potential to improve efficiency, but issues with reliability and illusion currently limit their usefulness,” said lead author Andrew Rundle, PhD, professor of epidemiology at Columbia University’s Mailman School.
“When we used a highly detailed prompt that included all helmet-related text strings, ChatGPT-4 was able to extract accurate data from clinical records on some days. However, the time required to define and test all the text that should be included in the prompt, and ChatGPT-4’s inability to reproduce the work on a daily basis, indicate that ChatGPT-4 is not yet up to the task.”
Using publicly available data from the U.S. Consumer Product Safety Commission’s National Electronic Injury Surveillance System for 2019 to 2022 (a sample of 96 U.S. hospitals), Rundle and his colleagues analyzed emergency department records of patients injured in e-bike, bicycle, hoverboard, and e-scooter accidents. They compared ChatGPT-4’s analysis of the records to data generated using more traditional text string-based searches, and for 400 records, they compared ChatGPT’s analysis to their own interpretation of the clinical notes within the records.
The study builds on research looking at ways to prevent injuries among micromobility users (riders of bicycles, e-bikes and scooters). “Helmet use is an important factor in determining injury severity, yet in most emergency medical records and accident reports, information about helmet use is buried in clinical notes written by the physician or EMS respondent. Substantial research is needed to reliably and efficiently access this information,” said Katherine Burford, lead author of the paper and a postdoctoral researcher in the Mailman School’s Department of Epidemiology.
“Our study explored the potential of LLM to extract information from clinical records, which are a rich source of information for medical professionals and researchers,” Rundle said, “but at the time we used ChatGPT-4, they were not able to provide reliable data.”
Co-authors are Nicole G. Itzkowitz of Columbia University’s Mailman School of Public Health, Ashley G. Ortega of Columbia University’s Population Research Center, and Julianne O. Teitler of Columbia University’s Graduate School of Social Work.
More information:
Kathryn G. Burford et al., “Using Generative AI to Identify Helmet Wear Status in Patients with Micromobility-Related Injuries from Unstructured Clinical Records.” JAMA Network Open (2024). DOI: 10.1001/jamanetworkopen.2024.25981
Courtesy of Columbia University Mailman School of Public Health
Citation: Generative AI can’t reliably read and extract information from clinical notes in medical records, study finds (August 19, 2024) Retrieved August 19, 2024 from https://medicalxpress.com/news/2024-08-generative-ai-reliably-clinical-medical.html
This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.