GenAI Healthcare Task Benchmarking Scoreboards

To provide reliable evidence for using GenAI in clinical learning and research, we are systematically evaluating top LLMs, including open-source models, for key healthcare tasks across various specialties and diseases, such as diagnostic prediction. The benchmarking scoreboards below are part of our ongoing efforts using a unique benchmarking system and will be continuously updated as LLMs evolve.

DxB: Diagnostic Prediction Benchmarking
Dataset Diseases OpenAI ChatGPT-4 Google Gemini-1.5 Baidu Ernie-4 Date
Neurology 63 93.22% 92.14% 90.56% 20240509
Oncology 112 85.98% 86.22% 89.88% 20240404
More ...

ScB: Symptom Checking Benchmarking
Dataset Diseases OpenAI ChatGPT-4 Google Gemini-1.0 Baidu Ernie-4 Date
MCSC Diseases 181 90.5% 81.38% 82.38% 20240404
MCSC Symptoms 194 78.71% 20230815
More ...



ELHS GenAI Copilot Platform alpha v1.1.8 Democratizing GenAI in Healthcare to Help Achieve Global Health Equity © 2023-2024 ELHS Institute. All rights reserved.
elhsi.org
Disclaimer: The contents and tools on this website are for informational purposes only. This information does not constitute medical advice or diagnosis.