
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study
This study evaluates the effectiveness of OpenAI’s GPT and GPT-4 in streamlining the systematic review process of clinical research papers. By automating the screening of titles and abstracts against human benchmarks, the models demonstrated high accuracy and efficiency, with an accuracy of 0.91 and a macro F1-score of 0.60. The comparison with human reviewers showed a significant reduction in time and effort, highlighting the models’ potential to improve the quality and reliability of clinical reviews. The findings suggest that GPT models can serve as valuable aids in medical research, enhancing both the speed and accuracy of literature screening.