To provide reliable evidence for using GenAI in clinical learning and research, we are systematically evaluating top LLMs, including open-source models, for key healthcare tasks across various specialties and diseases, such as diagnostic prediction. The benchmarking scoreboards below are part of our ongoing efforts using a unique benchmarking system and will be continuously updated as LLMs evolve.
Dataset | Diseases | OpenAI ChatGPT-4 | Google Gemini-1.5 | Baidu Ernie-4 | Date |
---|---|---|---|---|---|
Neurology | 63 | 93.22% | 92.14% | 90.56% | 20240509 |
Oncology | 112 | 85.98% | 86.22% | 89.88% | 20240404 |
More ... |
Dataset | Diseases | OpenAI ChatGPT-4 | Google Gemini-1.0 | Baidu Ernie-4 | Date |
---|---|---|---|---|---|
MCSC Diseases | 181 | 90.5% | 81.38% | 82.38% | 20240404 |
MCSC Symptoms | 194 | 78.71% | 20230815 | ||
More ... |