User Help

The ELHS Copilot is a cutting-edge online learning tool designed for clinical case studies. Developed and utilized by the ELHS Institute in clinical AI research, this tool serves as an invaluable resource for medical students, doctors, and other healthcare professionals. By using Copilot, these professionals can delve into clinical cases, thereby expediting their clinical learning and training, which in turn enables more informed decision-making.

Copilot is underpinned by generative AI (GenAI) technologies, including but not limited to OpenAI ChatGPT, Google Bard, and Antropic Claude. The unparalleled capabilities of GenAI in natural language interactions make it feasible to analyze clinical cases and provide expert insights and predictions that can greatly aid in learning and training. It is anticipated that GenAI copilots will bring about transformative changes in both medical education and healthcare, marking the dawn of a new era. You may delve deeper into our findings by reading our JAMA Network Open paper on the ChatGPT copilot.

Subsequent sections provide help on how to use Copilot for learning, research, and benchmarking patient cases across various diseases. Should you have further inquiries, we encourage you to contact us after login.

User Account

Registered users will be able to learn, research, and benchmark clinical cases using Copilot.


Registration requires a unique email address for login, verification and communication. Each email address can be used to register only one account. Optionally, you can provide a unique ID for login. After you click the “Register” button, you will receive an email from To verify your registration, click the verification link within the email.


Log in using your email address or ID.

User Settings

On the [Settings] page, you may update your user information, password, and default parameters for AI prediction.

Contact Us

You may ask questions on the [Settings > Contact] page and directly view replies from User Support.

Learning with AI Copilot

GenAI is transforming medical education and learning. The GenAI Copilot is designed to assist you in utilizing ChatGPT for your clinical learning.

Configure AI Settings

Before creating a patient case for AI analysis, set your default parameters on the [Settings > AI] page.

AI Model:ChatGPT GPT-4 model
Prediction Task:
  • [SC] Symptom Checking for Disease Causes: This is the default option that predicts possible diseases based on the provided patient case.
  • [SC+] Symptom Checking with Added Insights: Choose this for analysis that includes diseases, disease localization and qualitative characterization. It is especially useful for complex conditions, such as neurological disorders.
Language:Choose the language for your patient cases. Copilot can analyze cases in most languages, with English set as the default.
Specialty:Select your clinical specialty; the default is family medicine.
Case Type:
  • Real Patient Case (default): Opt for this when studying de-identified real patient cases. Ensure all personal identification information is removed.
  • Hypothetical Patient Case: Use this for patient cases constructed based on theoretical medical knowledge.

New Clinical Case Study

After you've configured your default AI parameters, you can proceed to add a new patient case for study and learning using AI Copilot on the [Learning > Create New Case Study] page.

  1. Assign a random case ID for the new case, for instance, “c1”. The user-provided case id may not be unique across the system. Thus, the system will compose a unique case key from 3 parts: [user id]-[user sequential case number]-[user-provided case id]. This case key is unique across the system. IMPORTANT: Avoid using any real patient IDs that could lead to the identification of the patient, ensuring protection of patient privacy. Ensure the patient case is de-identified.
  2. If the new case falls outside the default specialty, make sure to adjust the specialty accordingly to align with the case.
  3. If the displayed language is not the case language, make sure to set your default language to the case language on your AI settings page.

Steps to Learn from a New Case:
  1. Input Case Details: Fill in the symptoms and all health factors related with disease risk and diagnosis, categorizing them accordingly in your local language. It isn't mandatory to have information for every category; include only the information that is available and relevant. Categories may consist of:
    • Symptoms and health factors
    • Medical and family histories
    • Physical exams
    • Lab tests
    • Imaging and other diagnostics
    Additionally, input your initial clinical decision for comparative purposes. If a decision hasn't been made yet, simply enter “none”. Proceed to save the case.
  2. Ask AI Copilot: Ask Copilot to analyze the case and suggest the top two possible causes. After submitting the request, wait for some time before clicking “Get AI Prediction” to retrieve the results. If your default language is set to one other than English, Copilot will respond in both your local language and English. The waiting time will depend on the current load on the AI service. All user requests are processed in a queue based on a “first-come, first-served” policy.
  3. Learn and Update Decision: Carefully learn from the insights provided by the AI analysis and adjust your clinical decision accordingly. Record the revised decision in the designated update box. Self-evaluate the AI's contribution to determine whether the AI analysis has been beneficial in making a more informed decision.

Example Case 1 (English)

Make sure display has Language: English (en) English. Otherwise, set language to English on user's AI settings page.

Enter Symptoms and All Health Factors in Categories:
Record Initial Clinical Decision:
Ask AI Copilot to Analyze the Case for the Selected Task:
After Learning from AI Analysis, Record Your Updated Decision for Comparison:
Is your learning from AI analysis helpful to your decision?
[selected] Helpful to improve my decision

Example Case 2 (Chinese)

Make sure display has Language: 中文 (zh) Chinese. Otherwise, go to set language to Chinese on user's AI settings page.

Enter Symptoms and All Health Factors in Categories:
Record Initial Clinical Decision:
Ask AI Copilot to Analyze the Case for the Selected Task:
After Learning from AI Analysis, Record Your Updated Decision for Comparison:
Is your learning from AI analysis helpful to your decision?
[selected] Helpful to improve my decision

Track My Clinical Case Studies

The [Learning] page lists the clinical cases you have studied, allowing for seamless tracking of your learning progression. It offers a platform where you can compare your decisions made before and after the incorporation of insights gleaned from AI analyses. As time unfolds, this feature will become instrumental in self-assessing the impact and effectiveness of your learning journey, facilitated by the AI Copilot.

Clinical GenAI Research and Publication

Clinical GenAI Benchmarking Research Approach

Generative AI possesses the remarkable ability to comprehend clinical cases expressed in natural language. This capability is promising for providing detailed, case-by-case analyses in diverse clinical settings. As the healthcare sector is in the early stage of adopting generative AI, numerous research opportunities have emerged. These opportunities are particularly ripe for generating clinical evidence that underscores the impact of GenAI on improving the quality of clinical care.

We have developed a new GenAI benchmarking research approach, which involves learning from clinical cases, developing benchmarks to represent the evidence found, and providing GenAI assessments against these benchmarks. This approach utilizes the real-world data (RWD) in healthcare delivery to create real-world evidence (RWE) in a novel, predictable form. It also embrace the Learning Health System (LHS) concept introduced by the US National Academy of Medicine. We have previously simulated an LHS unit using synthetic patient data, as detailed in our publication in Nature Scientific Reports.

The ELHS Copilot's research component streamlines this GenAI benchmarking method, enabling medical students, doctors, and health professionals to conduct GenAI research in medical education and clinical training.

How to Get Started with Clinical GenAI Benchmarking Research

You may follow these simple steps to start your clinical GenAI benchmarking research in medical education and clinical training:

  1. Initiate a Research Project: Navigate to the research page and create a new project to focus on a disease of interest. Clearly define your objective and the task; for instance, utilizing ChatGPT to enhance early stroke detection.
  2. Study Clinical Cases: On the case study page, create new clinical cases derived from real de-identified patient cases. Make sure not to include any personal identification information (PII) in the cases.
  3. Utilize AI Copilot: Employ AI Copilot to analyze the cases and predict potential disease causes.
  4. Incorporate AI Insights: Learn from the AI analysis, refine your clinical decisions accordingly, and tag the case to the respective project.
  5. Evaluate AI Impact: After studying sufficient number of cases, compare your decisions before and after AI learning to determine any benefits from AI insights.
  6. Establish Benchmark: Develop a task-specific benchmark based on the analyzed cases and invite peers to validate your findings.
  7. Continuous Learning: Leverage ongoing case studies with AI Copilot to refine the benchmark for improving your informed decision-making.
  8. Expand Research Scope: Based on your benchmarking evidence, gradually apply the above GenAI benchmarking approach to address additional diseases and clinical challenges.

Conducting a GenAI Benchmark Project as an Individual User

Any individual user (doctor, student, or health professional) can independently conduct GenAI research in medical education and clinical training. For instance, a benchmark project for stroke risk prediction involves the following steps:
  1. Create a Benchmark Project: Navigate to [Research > Create New Project], open a new benchmark project, set a benchmark name for the specific task “Predict stroke risk”, add a brief description “Use symptoms and available factors to predict stroke risk”, and then save the project. By default, the project creator becomes the project manager.
  2. Analyze Cases: On the [Learning > New Case Study] page, create a case using symptoms and health factors derived from a stroke patient case (no PII). Analyze the case with AI Copilot and self-evaluate whether the AI analysis information aids your decision-making. Tag the care for this project. Repeatedly create and analyze a series of cases derived from real stroke patient cases (no PII).
  3. Score GenAI: On the project's cases page, review each tagged case and assign a score to the AI prediction. Scoring uses simple 3-level score: 1 for correct prediction, 0.5 for a prediction very close to expectation, and 0 for incorrect and other predictions. After scoring all cases, calculate benchmark score from all the case scores and accuracy in percentage.
  4. Create a Benchmark Case: Carefully generalize the real clinical cases to create a benchmark case representing a hypothetical case with a set of health factors and the expected stroke risk.
  5. Share the Benchmark: Set the benchmark status to “shared” so that it will be listed on the [Benchmarks] page, visible to all users in the community.
  6. External Validation: Invite collaborators to use the benchmark in their practices to externally validate the benchmark and GenAI assessment.

Example Benchmark Project 1 (English)

Make sure your AI settings page has these parameters:

  • AI Model: OpenAI ChatGPT model GPT-4
  • Prediction Task: [SC] Symptom checking for disease causes
  • Specialty: Family Medicine
  • Language: English
  • Case Type: Real case

Project ID: pr3007622
Manager: aj
Created: 2023-11-09
Updated: 2023-11-10
Benchmark Task

Language: English (en) English
Specialty: Family medicine
Benchmark Status:
To share the benchmark with the community, set status to "shared".

Conduct GenAI Benchmark Project as a Project Team

If you are part of a project team, you can add co-leads and invite team members to the project. These members must agree to use their designated cases for this project and accept your invitation on the project page. Once they become active members, they can tag their clinical cases to the project. Co-leads will be able to view all the cases from the project members and develop a benchmark by learning from these cases using GenAI.

Research Protocols for Various Clinical Settings

There are readily available opportunities to enhance numerous clinical settings with the integration of generative AI. Utilizing AI Copilot, some initial protocols below, which are used in our research, can be followed to assess the efficacy of GenAI in clinical care. Detailed steps are provided on the [Research Protocols] page.

  1. Protocol for Studying a Single Disease: This protocol is designed for individual doctors aiming to evaluate the improvements enabled by GenAI in their specific area of care delivery.
  2. Protocol for Examining Multiple Diseases within a Specialty: Departmental lead researchers can employ this protocol to assess the GenAI-enhanced improvements in clinical care for a variety of diseases, or all diseases encompassed within a particular medical specialty.
  3. Protocol for Clinical Research Network (CRN) Study: Principal Investigators (PIs) in teaching hospitals can utilize this protocol for collaborative studies. It facilitates the evaluation of GenAI's role in enhancing informed decision-making in care delivery across diverse hospitals and clinics, including those in community and rural areas.

Conduct Clinical AI Research for Publication

For your research to be eligible for publication in international journals, it must be meticulously designed to meet the standard publication criteria. Here are key considerations to incorporate into your study design:

  1. Employment of Rigorous Scientific Methods: Utilize established scientific methods, such as Comparative Effectiveness Research (CER) or Pragmatic Clinical Trial (PCT). An appropriate comparative analysis is pivotal to delineate the impact of AI in clinical care. Outcomes post-AI analysis should be compared against a previous baseline or a parallel control group.
  2. Quantification and Statistical Analysis of Outcomes: It's essential to quantify the results of AI application and subject them to thorough statistical analysis to validate the findings.
  3. Result Reproducibility: Ensure that your findings can be consistently reproduced in subsequent experiments to affirm their validity and reliability.
  4. IRB Approval: Secure approval for your study plan from the Institutional Review Board (IRB) to ensure ethical compliance and integrity.

For assistance and support in AI research and publication, feel free to reach out to us.

Benchmarking Generative AI in Healthcare

GenAI Benchmarks Shared by Users

Using Copilot, users can learn from clinical cases in real-world healthcare delivery and develop benchmarks representing the real-world evidence found. Since the benchmark cases are hypothetical and derived from real patient cases, they cannot identify individual patients. These benchmarks, lacking personal identification information, can be shared within this community, benefiting all users in their medical education and clinical training.

To share your benchmarks, simply change their status to “shared,” and they will appear on the shared benchmark list on the [Benchmarks] page. Your name and organization will be acknowledged alongside with your benchmarks. You may also showcase your benchmarks on the scoreboards after they are used to assess all other relevant GenAI models (see below).

Scoreboards of Systematic GenAI Benchmarking in Symptom Checking

The AI Copilot is powered by the most advanced generative AI models, including OpenAI's ChatGPT, Google Bard, and Antropic's Claude. These models have occupied the top spots on leaderboards based on common ML benchmarks. In the medical field, ChatGPT passed US medical licensing exams for the first time in early 2023 [see ChatGPT exam paper]. Google has also reported medical benchmarks for its medical GenAI models [see Google paper]. However, the GenAI models have not been benchmarked systematically for disease prediction tasks.

To assess the accuracy of GenAI models underpinning the AI Copilot in predicting possible disease causes for patient cases across a full spectrum of diseases, we have developed a new comprehensive benchmarking system, consisting of both theoretical and real-world benchmarks. The benchmark accuracy metrics are reported on the scoreboards on the Benchmarks page. These scoreboards provide essential GenAI assessment information, helping users decide which GenAI model is more applicable for specific disease prediction tasks.

Scoreboard of Theoretical benchmarks:

While benchmarking GenAI models for all diseases is an ongoing process, our initial results indicate that ChatGPT has achieved unprecedented level of accuracy in symptom checking across a broad range of diseases in most common specialties. Consequently, we have qualified ChatGPT model GPT-4 as a “medical expert” for the role of copilot in medical learning and clinical training, forming the scientific foundation of this AI Copilot. Our ChatGPT benchmarking research results will be published soon.

  • Benchmarking Against Mayo Clinic Symptom Checker: ChatGPT was tested with hypothetical patient cases ideal for Mayo Clinic Symptom Checker, covering nearly 200 diseases.
  • Benchmarking Against Medical Guidelines: ChatGPT was tested with hypothetical patient cases based on medical guidelines and knowledge.
Scoreboard of Real-world benchmarks:

Real-world benchmarks are contributed by users in the community. After users create benchmarks using AI Copilot, we may review them and, if they meet the standards, assist users in assessing other GenAI models using these benchmarks. The resulting benchmarks will be listed on the scoreboard, acknowledging users' contributions to the medical communities.


  1. Chen A, Chen DO. Accuracy of Chatbots in Citing Journal Articles. JAMA Netw Open. 2023;6(8):e2327647. [JAMA LHS copilot paper]
  2. National Academy of Medicine, The Learning Health System Series: Continuous improvement and innovation in health and health care. [NAM LHS website]
  3. Chen A, Chen DO. Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data. Nature Sci Rep. 2022;12(1):17917. [Nature ML-LHS simulation paper]
  4. Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature 2023;620, 172–180. [Google MedPaLM paper]
  5. Chen A, et al. Manuscript submitted. 2023. [ChatGPT benchmark paper]
  6. Gilson A, Safranek CW, Huang T, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ 2023;9:e45312. [ChatGPT exam paper]

ELHS Copilot alpha v1.0.8 Democratizing GenAI to Achieve Health Equity Worldwide
© 2023-2024 ELHS Institute. All rights reserved.
Disclaimer: The contents and tools on this website are for informational purposes only. This information does not constitute medical advice or diagnosis.