The ELHS Copilot is a cutting-edge online generative AI (GenAI) learning and research tool for medical students, doctors, and all other healthcare professionals. Copilot takes a GenAI-first approach to streamline key tasks that are currently being transformed by GenAI based on large language models (LLMs). The goal is to make it easier for every doctor to start collaborating with a GenAI copilot in their medical education and healthcare career. Copilot is developed by the ELHS Institute and provided for free to accelerate the democratization of GenAI in healthcare, aiming to help achieve global health equity. It has grown out of a series of our published work, from the initial concept of a learning copilot published by JAMA, a foundational study of ChatGPT benchmarking published in JAMIA, to a literature review of advancing the democratization of GenAI in healthcare published in JHMHP.
Medicine has just entered the new world of GenAI, where the unparalleled capabilities of GenAI in natural language interactions and reasoning have made it feasible to analyze clinical cases and provide expert insights and predictions that can accelerate medical learning, clinical learning, and training. It is possible that every doctor will have a GenAI copilot for learning, research, and practice in future healthcare. Copilot has designed the following key functional modules to help doctors get started on this new journey:
Benchmarking GenAI for every applicable task in healthcare is crucial to the democratization of healthcare AI, setting the foundation for learning, research, and application of GenAI.
Our ELHS Institute is systematically benchmarking the top GenAI LLMs and gradually providing benchmarking scoreboards in Copilot, including on the Copilot home page and the Benchmarking module.
The current benchmarking results have demonstrated that general-purpose LLMs such as ChatGPT-4 and Gemini-1.5 have high accuracy in diagnostic prediction across a wide range of diseases. Although the diagnostic prediction accuracy of open-source LLMs (OS-LLMs) like Llama-3 and Gemma-2 is much lower than that of ChatGPT-4, it is at a level that warrants using OS-LLMs as the baseline for fine-tuning to provide desired intelligence in some specific healthcare tasks.
Users can find diseases in the benchmarking datasets and test diagnostic prediction by different LLMs using the Learning module. From such comparisons, you may find potential target diseases that can be accurately predicted by the best GenAI but not by OS-LLMs. These differences provide good opportunities for you to bring GenAI abilities into your clinical research, as it is possible to fine-tune OS-LLMs to achieve high accuracy so that you can deploy your own fine-tuned LLMs for optimizing specific tasks in your environment.
The Learning module integrates multiple top LLMs in the AIChat form from both commercial and open-source domains. While AIChat allows users to chat about anything with GenAI, it also provides predefined tasks for analyzing patient cases. When predicting patient cases, AIChat will link the predicted diseases to clinical learning guides (CLGs), enabling in-context medical learning.
Users can chat with any of the top models and learn how LLMs can assist with various educational or healthcare tasks. Additionally, by comparing the best commercial models to the top open-source models, users may identify interesting tasks for creating their own fine-tuned open-source models to apply GenAI in their healthcare practices.
For any task, the answer depends on the questions and instructions. You may adjust your prompt to explore better responses.
For the predefined healthcare tasks, GenAI prediction depends on whether the patient case has enough information.
Task: Symptom checking
Model: OpenAI ChatGPT 4o
Predicted Possible Diseases:
- Lung Cancer: Pulmonology
- Chronic Obstructive Pulmonary Disease (COPD) Exacerbation: Pulmonology
Triage: Urgent
Specialty: Pulmonology
Task: Diagnostic prediction
Model: OpenAI ChatGPT 4o
Predicted Possible Diseases:
- Lung Cancer: Oncology
- Mesothelioma: Oncology
Triage: Urgent
Specialty: Oncology
The CLGs page lists the latest guides created from ChatGPT's vast medical knowledge. You can search CLGs by name or UMLS CUI. The CUI codes connect CLGs to international standards of medical ontologies, providing a bridge to traditional medical knowledge.
Each CLG for a disease currently focuses on disease diagnosis, which can be expanded to treatment, management, and prevention in the future. In general, disease CLGs cover the following basic questions and more. For example, the lung cancer CLG covers:
Bilingual Learning: CLGs make it more convenient for non-English speaking users to learn medicine in both their native language and English. For automatic CLGs, questions are answered in both the user's native language (as set in user settings) and English (as set in user GenAI settings). For manually reviewed CLGs, translation currently only applies to Chinese.
Please note that AI answers may have errors and are only for learning purposes. Users are responsible for their learning outcomes.
GenAI is democratizing AI in healthcare, enabling all medical students, doctors, and health professionals to study how AI can assist them in making more informed and better healthcare decisions. To help you get started now, Copilot's Research module streamlines the GenAI evaluation research process, from ideation to using multiple LLMs, data collection, and project team collaboration.
You can start using GenAI to analyze your clinical cases for diagnosis, treatment selection, or any other predictive tasks you would like to improve. By comparing your clinical decisions before and after using GenAI analysis information, you can evaluate whether GenAI can benefit clinical decision-making.
You may choose from the following top LLMs and compare their performance:
Your cases are tracked in Copilot so that you can use them in your collaboration projects. Streamlined research projects make it easier to ensure data consistency and quality, which is crucial for peer-reviewed publication of any new GenAI effectiveness evidence generated from your research.
Next Steps:
Copilot tracks your studied clinical cases, offering a platform where you can compare your decision-making before and after using GenAI predictions and analyses over time. This history comparison feature will become instrumental in self-assessing your journey of GenAI learning and research.
From the Research or Cases page, you can start to analyze a new patient's case. Follow these steps to conduct a case evaluation study:
For convenience, you may set your default task, model, bilingual language, specialty, etc., on the “User > GenAI” page.
All personal identification information MUST be removed from patient cases to protect patient privacy before entering them into Copilot.
Case ID: p1
Specialty: Family medicine
Select Model: Google Gemini 1.5
Select Task: Diagnostic prediction
To publish your case study research in SCI journals, it is critical to organize your case studies as comparative effectiveness research (CER) projects. Copilot's project feature streamlines GenAI evaluation studies for you, ensuring data standardization, quality, and consistency. Conducting projects online makes it easy for you to collaborate among team members, standardize data collection, cross-review data, and efficiently manage data, thus helping produce high-quality data and evidence. Online collaboration can be within a hospital, across multiple hospitals, or even within a clinical research network (CRN) led by hospitals and involving community health centers, rural hospitals, and clinics.
Simple 3-level scoring:
Simple 3-level effectiveness categories:
Any individual user (doctor, student, or health professional) can independently conduct GenAI research in medical education and clinical training. For instance, a project for evaluating GenAI in stroke risk prediction involves the following steps:
If you are part of a project team, you can add co-leads and invite team members to the project. Members accept your invitation and then tag their clinical cases to the project. You and co-leads review all cases from team members, score GenAI predictions, assign GenAI effectiveness categories, and conduct statistical analysis. If the results generate new evidence for GenAI effectiveness, you can write a manuscript for publication.
Here are some key considerations to incorporate into your GenAI study design for publishing your research in SCI journals:
For assistance and support in GenAI research and publication, feel free to Contact us.
User Account module is for users to manage account and contact us for support.
User self-registration requires a unique email address for verification, login, and communication. One email address can register for only one account. A user ID is automatically generated, but optionally you can provide a unique ID for login convenience. After you click the “Register” button, you will receive an email from support@elhsi.org. To verify your registration, click the verification link within the email.
Log in using your email address or user ID.
On the [User > Settings] page, you can update your user information and change your password. You may change your email address, but the new email address requires new email verification.
Parameter | Options |
---|---|
GenAI LLM Model | Choose from top models:
|
Prediction Task |
|
Bilingual Language | Choose the second language for responses. Non-English speakers are recommended to select English so that GenAI responses will be bilingual with English for comparison. |
Specialty: | Select your clinical specialty; the default is family medicine. |
Case Type: |
|
On the [User > Contact] page, you may ask any questions and request more GenAI credits. You are welcome to discuss technical support for your GenAI learning and training, GenAI research, open-source LLM fine-tuning and deployment, GenAI publication, etc. Contact page displays your contact history for record.