Scoreboards

GenAI Healthcare Task Benchmarking Scoreboards

To provide reliable evidence for using GenAI in clinical learning and research, we are systematically evaluating top LLMs, including open-source models, for key healthcare tasks across various specialties and diseases, such as diagnostic prediction. The benchmarking scoreboards below are part of our ongoing efforts using a unique benchmarking system and will be continuously updated as LLMs evolve.

DxB: Diagnostic Prediction Benchmarking

Dataset	Diseases	OpenAI ChatGPT-4	Google Gemini-1.5	Baidu Ernie-4	Date
Neurology	63	93.22%	92.14%	90.56%	20240509
Oncology	112	85.98%	86.22%	89.88%	20240404
More ...

ScB: Symptom Checking Benchmarking

Dataset	Diseases	OpenAI ChatGPT-4	Google Gemini-1.0	Baidu Ernie-4	Date
MCSC Diseases	181	90.5%	81.38%	82.38%	20240404
MCSC Symptoms	194	78.71%			20230815
More ...

ELHS GenAI Copilot Platform alpha v1.1.12
Terms and Conditions Mission: Democratizing GenAI and LHS in Healthcare to Help Achieve Global Health Equity © 2023-2025 ELHS Institute. All rights reserved.
elhsi.org