A (brief) history and development of AI in medicine
From Aristotle's syllogisms to MYCIN's rules: the dream of artificial intelligence in medicine is gradually becoming a reality.
Can machines think?
The idea that human reasoning could be imitated by a machine is far older than the digital age. Already in ancient Greece, Aristotle proposed syllogistic logic, a method of drawing conclusions from premises that, in its own way, foreshadowed the reasoning rules later adopted by expert systems. Throughout the centuries, philosophers and inventors toyed with the idea of mechanical minds, and writers imagined automatons capable of responding intelligently to human needs. Yet it was only in the twentieth century that this dream began to take on the shape of scientific ambition.
In 1950, Alan Turing asked the fateful question, “Can machines think?”, and proposed the Turing Test as a benchmark. His challenge gave intellectual legitimacy to a new discipline that, by the time of the Dartmouth Conference in 1956, had acquired a name: artificial intelligence.
There are various definitions of AI. The one recently adopted by the European Community is as follows: “Artificial intelligence (AI) refers to systems that display intelligent behavior by analyzing their environment and taking actions - with a certain degree of autonomy - to achieve specific goals”.
The dawn of AI in medicine
The first forays into medical computing in the 1950s and 1960s were primitive, but they revealed the excitement of trying to capture the diagnostic reasoning of physicians in lines of code. At Stanford University, researchers created one of the earliest true expert systems, not for medicine but for chemistry: DENDRAL. Developed from 1965 onwards by Edward Feigenbaum, Joshua Lederberg and colleagues, DENDRAL analyzed mass spectra to identify the structure of organic molecules. Its success was not only technical, it demonstrated that computers could outperform humans in narrow but highly specialized domains. The key was not a general algorithm but the codification of expert knowledge itself. This was the insight that would carry over directly into medical applications.
Inspired by this work, the University of Pittsburgh launched INTERNIST-I in the early 1970s. Conceived as a teaching and diagnostic aid in internal medicine, INTERNIST-I attempted to rank possible diagnoses by analyzing patients’ signs, symptoms and laboratory data. It was ambitious in scope, covering hundreds of diseases, but it quickly revealed the limitations of rule-based reasoning. The system struggled in complex cases involving comorbidities, where the neat hierarchies of rules could not replicate the messy ambiguity of real patients. Nevertheless, INTERNIST-I set an important precedent and evolved into the Quick Medical Reference (QMR), which was widely used by medical students and educators for years.
MYCIN: a system specializing in the diagnosis of infectious diseases
The most celebrated of these early medical expert systems was MYCIN, born at Stanford in the early 1970s from the doctoral work of Edward Shortliffe. MYCIN specialized in identifying bacterial infections and suggesting antibiotic regimens, even adjusting dosages for patient weight. Its reasoning engine was based on approximately 600 “if-then” rules, and one of its revolutionary features was the ability to explain its conclusions: when asked, MYCIN could describe the reasoning path that led to its decision. In evaluations, its performance was at least as accurate as that of infectious disease specialists. Yet for all its brilliance, MYCIN was never used in clinical practice. Concerns about legal liability, ethical implications, and the difficulty of integrating such a system into real hospital workflows proved insurmountable at the time.
This apparent failure became a lesson of its own. As Shortliffe himself later noted, MYCIN was less important for its clinical utility than for its role as a proof of concept. It showed that structured medical knowledge could be formalized, manipulated, and even interrogated by a computer. The spin-off EMYCIN, a shell for building similar expert systems, spread this model to other domains and inspired a generation of researchers. MYCIN also highlighted challenges that remain urgent today: how to integrate AI into clinical practice without undermining physician responsibility, how to guarantee transparency and accountability, and how to earn the trust of clinicians and patients alike.
The difficulty of translating medical knowledge into precise patterns
The 1980s brought a proliferation of expert systems in medicine, many of them encouraged by the enthusiasm surrounding MYCIN. CADUCEUS, for instance, attempted to improve on INTERNIST-I by offering more sophisticated reasoning in infectious disease diagnosis. At the same time, monitoring systems were developed for intensive care units, and diagnostic aids were created in fields ranging from hematology to endocrinology.
These projects benefited from the reusable architectures pioneered by systems like EMYCIN, which reduced the need to reinvent the wheel with each new application. Yet they also revealed that medicine’s complexity could rarely be captured in tidy sets of rules. Knowledge acquisition - the process of encoding human expertise into computer-readable form - proved to be an enormous bottleneck. Physicians were not always willing or able to translate their tacit knowledge into explicit rules, and as medical science expanded, rule bases became harder and harder to maintain.
In retrospect, these difficulties marked the end of the “heroic” era of rule-based medical AI and the beginning of a gradual shift toward statistical and machine learning approaches. By the 1990s, growing computing power and the availability of electronic health records enabled more data-driven models, from logistic regression to Bayesian networks. The explosion of genomic data in the 2000s further reinforced the trend. Yet the intellectual legacy of the early expert systems remains. The attention to transparency in MYCIN resonates in today’s emphasis on explainable AI; the limitations of INTERNIST-I remind us that comorbidities and uncertainty are central to real medicine; the success of DENDRAL shows that domain-specific expertise, carefully encoded, can unlock powerful new insights.
Old and new challenges
The history of AI in medicine, therefore, is not just a history of technology. It is a chronicle of ambitions and obstacles, of pioneers who dared to translate clinical reasoning into computational logic, and of lessons that continue to shape the debate on ethics, trust and responsibility. When contemporary physicians worry about whether a deep learning system should be allowed to make autonomous decisions, they echo the same concerns that stopped MYCIN fifty years ago. The questions have not changed as much as the algorithms.
It is tempting to see history as a straight line from Aristotle's syllogisms to today's neural networks, but the truth is more nuanced. The path has been littered with false starts, exciting prototypes, and the gradual realisation that medicine is not just about data and logic, but also values, context, and human judgement. AI in medicine does not replace that judgement, but is an evolving dialogue with it.
While the early history of medical AI is characterised by enthusiasm, prototypes and lessons about the limitations of rule-based reasoning, today's debates have shifted towards broader social and clinical implications. Algorithmic bias, lack of transparency and insufficient external validation remain key obstacles to its safe adoption in practice. Concerns about data privacy and patient consent have also emerged, as AI systems increasingly draw on electronic health records and image archives.
At the same time, systematic reviews highlight the paucity of randomised controlled trials and the frequent gap between impressive retrospective performance and real-world utility. These issues echo, in a modern guise, the same challenges that once limited MYCIN: accuracy alone is not enough if systems are not reliable, interpretable, and ethically integrated into clinical workflows.
AI has the potential to revolutionise the medical sector, improving diagnosis, treatment and disease management. It is essential to proactively address the ethical, social, privacy and data security issues related to the use of AI in medicine in order to ensure a positive and equitable impact for all patients and the medical community.
Sources and further readings
- Shortliffe EH. Computer-Based Medical Consultations: MYCIN. New York: Elsevier; 1976.
- Kulikowski CA. Beginnings of artificial intelligence in medicine (AIM): computational artifice assisting scientific inquiry and clinical art – with reflections on present AIM challenges. Yearb Med Inform. 2019 Aug;28(1):249-256. doi:10.1055/s-0039-1677903.
- Kaul V, Enslin S, Gross SA. History of artificial intelligence in medicine. Gastrointest Endosc. 2020 Oct;92(4):807-812. doi:10.1016/j.gie.2020.06.040.
- Buchanan BG, Feigenbaum EA, Lederberg J, Sutherland GE. Heuristic DENDRAL: a program for generating explanatory hypotheses in organic chemistry. In: Meltzer B, Michie D, editors. Machine Intelligence 4. Edinburgh: Edinburgh University Press; 1969. p. 209-254.
- Miller RA, Pople HE Jr, Myers JD. INTERNIST-I, an experimental computer-based diagnostic consultant for general internal medicine. N Engl J Med. 1982 Aug 19;307(8):468-76. doi:10.1056/NEJM198208193070803.
- Berner ES, editor. Clinical Decision Support Systems: Theory and Practice. 3rd ed. Cham: Springer; 2016.
- Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. BMJ. 2019 Mar 25;364:l689. doi:10.1136/bmj.l689.
- Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Lancet Digit Health. 2019 Oct;1(2):e83-e91. doi:10.1016/S2589-7500(19)30026-2.
- Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Lancet Digit Health. 2020 Oct;2(10):e489-e498. doi:10.1016/S2589-7500(20)30100-2.