ChatGPT has astounded the medical community with its performance, surpassing expectations across a multitude of clinical use cases examined informally. We are in total agreement with predictions from top leaders at Harvard Medical School and Stanford Medical School regarding GenAI democratization in medical education, clinical training, clinical research, and clinical practices [1]. Recognizing in early 2023 that GenAI enables health professionals to seamlessly interact with it using natural language, we have incorporated ChatGPT as a co-pilot for education and training, as highlighted in our JAMA Network Open publication [2]. Since then, we have been systematically benchmarking ChatGPT and other tope GenAI models in symptom checking and diagnostic prediction across a wide range of diseases, with initial results published in JAMIA in collaboration with Prof. Tian at Stanford University [3]. It has become evident to us that publicly available general-purpose GenAI tools are primed for evaluation by any doctors across diverse clinical scenarios, signaling the beginning of a major trend of democratizing healthcare GenAI due to GenAI's unprecedented natural language capabilities [4]. Recently we published the first review paper on advancing democratization of GenAI in healthcare, revealing GenAI's intrinsic power for democratization and the initial evidence seen in a large number of touchpoints in healthcare [5].
To determine whether Generative AI (GenAI), supported by Large Language Models (LLMs), can aid doctors in making better-informed clinical decisions, physicians can utilize publicly accessible GPT models like OpenAI's ChatGPT and Google's Gemini to evaluate their effects in clinical research. These models can analyze Real-World Data (RWD) derived from patient cases in routine healthcare. The objective is for doctors to learn from GenAI analysis to enhance the quality and precision of their clinical decision-making at every step of care delivery, such as in diagnosis or precision medicine selection. Comparative Effectiveness Research (CER) can be conducted to compare decisions made before and after GenAI analysis, producing Real-World Evidence (RWE) of GenAI's impact on healthcare delivery.
JAMA's AI focus calls for original research rigorously examining the challenges and potential solutions to optimizing clinical care with AI. Similarly, the new NEJM AI journal prefers using randomized controlled trials (RCTs) designed to test LLM-based tools against established standards. Randomized controlled trials with LLMs will not be easy; thus, other clinical trial methods, such as the pragmatic clinical trial (PCT) method, can be used as well. To evaluate GenAI tools more effectively and efficiently, we expect new comparative evaluation methods using RWD to emerge in the near future.
To research AI solutions for reducing health disparities, an advanced system approach based on the Learning Health System (LHS) vision is most promising. Introduced by the US National Academy of Medicine over a decade ago, the LHS concept requires doctors to embed research within the care delivery workflow [6]. Embedded research can enable continuous machine learning (ML), evaluation, and deployment more efficiently than traditional clinical trials. We refer to this embedded ML clinical research approach as the "ML-LHS approach." For an illustration of an ML-enabled LHS unit, we direct you to our first ML-LHS simulation paper published in Nature Scientific Reports in 2022 [7]. Working with clinical collaborators, we have developed inclusive and practical ML models from structured EHR data for new ML-LHS units designed to enable risk-based screening for several diseases [8,9]. Because GenAI can work directly with existing unstructured patient data, it will be much easier to build ML-LHS units using general-purpose as well as specialized finetuned LLMs.