[Skip to Navigation]
Sign In
Table 1.  Sample Characteristics
Sample Characteristics
Table 2.  Performance Metrics According to Injury Category
Performance Metrics According to Injury Category
1.
Lorenzoni  G, Bressan  S, Lanera  C, Azzolina  D, Da Dalt  L, Gregori  D.  Analysis of unstructured text-based data using machine learning techniques: the case of pediatric emergency department records in Nicaragua.   Med Care Res Rev. 2021;78(2):138-145. doi:10.1177/1077558719844123 PubMedGoogle ScholarCrossref
2.
Gianfrancesco  MA, Goldstein  ND.  A narrative review on the validity of electronic health record-based research in epidemiology.   BMC Med Res Methodol. 2021;21(1):234. doi:10.1186/s12874-021-01416-5 PubMedGoogle ScholarCrossref
3.
Azzolina  D, Bressan  S, Lorenzoni  G,  et al.  Pediatric injury surveillance from uncoded emergency department admission records in Italy: machine learning-based text-mining approach.   JMIR Public Health Surveill. 2023;9:e44467. doi:10.2196/44467 PubMedGoogle ScholarCrossref
4.
Peden  M, Oyebite  K, Ozanne-Smith  J,  et al, eds.  World Report on Child Injury Prevention. World Health Organization; 2008.
5.
Rudnytskyi  I. Openai: R wrapper for OpenAI API. R package version 0.4.1. Accessed April 15, 2024. https://github.com/irudnyts/openai
6.
Lancet Digital Health.  ChatGPT: friend or foe?   Lancet Digit Health. 2023;5(3):e102. doi:10.1016/S2589-7500(23)00023-7 PubMedGoogle ScholarCrossref
Research Letter
Public Health
May 28, 2024

Use of a Large Language Model to Identify and Classify Injuries With Free-Text Emergency Department Data

Author Affiliations
  • 1Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padova, Italy
  • 2Division of Pediatric Emergency Medicine, Department of Women’s and Children’s Health, University of Padova, Padova, Italy
  • 3Department of Environmental and Preventive Science, University of Ferrara, Ferrara, Italy
  • 4Centre for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
JAMA Netw Open. 2024;7(5):e2413208. doi:10.1001/jamanetworkopen.2024.13208
Introduction

Timely and accurate identification of injury data from pediatric emergency department (ED) records is critical for injury prevention. However, free text is commonly used in the medical records of EDs in most countries.1 In such context, free text is a valuable, and sometimes the only, tool of documentation for epidemiologic surveillance, but its use is challenging.2 In this rapidly evolving artificial intelligence era, large language models offer an opportunity to exploit free-text information in medical records. This study evaluated the performance of a large language model in identifying and classifying injury data in Italian from pediatric ED records.

Methods

The study analyzed 283 468 medical records of the pediatric ED of Padova University Hospital in Padova, Italy, from January 1, 2007, to December 31, 2018.3 The Azienda Ospedaliera di Padova Ethics Committee approved this cross-sectional study. Patients signed a written consent form to allow the use of data for scientific purposes. We followed the STROBE reporting guideline.

A subset of the records (n = 40 031) was randomly extracted from the dataset, and the free-text discharge diagnoses in Italian were classified manually by an expert clinician according to the World Health Organization injury classification system.4 This manual classification served as the criterion standard for evaluating the Generative Pretrained Transformer 4 (OpenAI) performance of the classification task.

The software manufacturer’s application programming interface end points were used as a basis for the classification task. The large language model was accessed through the openai R package.5 The eTable in Supplement 1 presents the prompts used. A description of the classification task methods used is presented in the eMethods in Supplement 1.

The performance of the large language model in the classification task was evaluated by calculating the accuracy, sensitivity, and specificity, which were reported with bootstrap 95% CIs within 1000 iterations. Analyses were conducted using R 4.3.2 (R Project for Statistical Computing).

Results

The classification task was performed on 8194 records manually classified as unintentional injuries according to the World Health Organization injury classification system. Among the injuries, 520 (6%) were categorized as road traffic, 589 (7%) as falls, 194 (2%) as fires and burns, and 176 (2%) as poisoning. In 12 cases, the injury was drowning; the remaining injuries were categorized under other, which included insect, tick, and animal bites and trauma of undetermined nature (Table 1). Patients with injury included 4325 males (53%) and 3869 females (47%), with a mean (SD) age of 7.3 (4.7) years.

Performance of the classification task by the large language model was very good (Table 2). The sensitivity was equal to 1.000 points for all categories except for falls (0.997; 95% CI, 0.991-1.000 points). The specificity was at least 0.996. No classification errors were detected for fires and burns and drowning categories.

Discussion

The findings suggest that use of large language models is feasible for processing unstructured free-text information in languages other than English. From a public health perspective, analyzing unstructured information allows for early detection of emerging hazards, helps with identification of injury patterns, and provides data to policymakers for developing preventive measures.

Study limitations include its single-center design and the low prevalence of specific injury mechanisms, requiring assessment of the model’s performance on even larger, preferably multicenter datasets. Despite the potential of large language models in medical research and practice, their use is debated because they pose relevant issues, including ethical concerns, risk of misinformation, and misinformation spread.6 However, almost all new technologies come with risks and benefits; what makes the difference is how they are used. Results of this study suggest that large language models are a promising tool for classifying injuries documented in ED records, helping with surveillance.

Back to top
Article Information

Accepted for Publication: March 25, 2024.

Published: May 28, 2024. doi:10.1001/jamanetworkopen.2024.13208

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2024 Lorenzoni G et al. JAMA Network Open.

Corresponding Author: Dario Gregori, MA, PhD, Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Via Loredan, 18, 35131 Padova, Italy ([email protected]).

Author Contributions: Drs Gregori and Lorenzoni had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Lorenzoni and Gregori are joint first authors.

Concept and design: Gregori, Azzolina, Da Dalt, Berchialla.

Acquisition, analysis, or interpretation of data: Lorenzoni, Gregori, Bressan, Ocagli.

Drafting of the manuscript: Lorenzoni, Gregori, Azzolina.

Critical review of the manuscript for important intellectual content: Gregori, Bressan, Ocagli, Da Dalt, Berchialla.

Statistical analysis: Lorenzoni, Gregori, Berchialla.

Administrative, technical, or material support: Ocagli.

Supervision: Gregori, Bressan.

Conflict of Interest Disclosures: None reported.

Meeting Presentations: A preliminary version of this study was presented at the 44th Annual Conference of the International Society for Clinical Biostatistics; August 28, 2023; Milan, Italy; and at the American Public Health Association Annual Meeting; November 12, 2023; Atlanta, Georgia.

Data Sharing Statement: See Supplement 2.

Additional Contributions: Giulia Andrea Baldan, MA, University of Padova, assisted with gold standard development. This individual received no additional compensation, outside of her usual salary, for her contributions.

References
1.
Lorenzoni  G, Bressan  S, Lanera  C, Azzolina  D, Da Dalt  L, Gregori  D.  Analysis of unstructured text-based data using machine learning techniques: the case of pediatric emergency department records in Nicaragua.   Med Care Res Rev. 2021;78(2):138-145. doi:10.1177/1077558719844123 PubMedGoogle ScholarCrossref
2.
Gianfrancesco  MA, Goldstein  ND.  A narrative review on the validity of electronic health record-based research in epidemiology.   BMC Med Res Methodol. 2021;21(1):234. doi:10.1186/s12874-021-01416-5 PubMedGoogle ScholarCrossref
3.
Azzolina  D, Bressan  S, Lorenzoni  G,  et al.  Pediatric injury surveillance from uncoded emergency department admission records in Italy: machine learning-based text-mining approach.   JMIR Public Health Surveill. 2023;9:e44467. doi:10.2196/44467 PubMedGoogle ScholarCrossref
4.
Peden  M, Oyebite  K, Ozanne-Smith  J,  et al, eds.  World Report on Child Injury Prevention. World Health Organization; 2008.
5.
Rudnytskyi  I. Openai: R wrapper for OpenAI API. R package version 0.4.1. Accessed April 15, 2024. https://github.com/irudnyts/openai
6.
Lancet Digital Health.  ChatGPT: friend or foe?   Lancet Digit Health. 2023;5(3):e102. doi:10.1016/S2589-7500(23)00023-7 PubMedGoogle ScholarCrossref
×