AI for Generating Hypotheses
Why Hypothesis Generation Matters
The bottleneck in most scientific fields is not the ability to test hypotheses, it is the ability to generate good ones. A well-formulated hypothesis focuses experimental effort, guides data collection, and provides a framework for interpreting results. A poorly formulated hypothesis wastes months of laboratory time and funding. Traditionally, hypothesis generation is treated as an intuitive, creative act: a scientist reads widely, attends conferences, talks to colleagues, and has an insight. AI does not replace this creative process, but it supplements it by systematically searching a much larger space of possible connections than any human mind can hold.
Consider the scale of the problem. PubMed alone contains over 37 million articles. A productive researcher might read 200 papers a year. Over a 30-year career, that is 6,000 papers, roughly 0.016% of the available literature. The connections that could generate breakthrough hypotheses often span disciplines: a finding in plant biology that explains a process in human disease, or a materials science technique that solves a problem in neuroscience. No human can read broadly enough to find all these connections, but AI can.
Knowledge Graph Mining
Knowledge graphs represent scientific information as networks of entities and relationships. A biomedical knowledge graph might contain millions of nodes (genes, proteins, diseases, drugs, pathways, cell types) connected by edges that represent specific relationships (gene X is upregulated in disease Y, drug A inhibits protein B, pathway C is activated in cell type D). These graphs are constructed by extracting relationships from published papers using natural language processing.
The power of knowledge graphs for hypothesis generation lies in their ability to find indirect connections. If gene X causes disease Y, and drug A inhibits gene X, the graph suggests that drug A might treat disease Y, even if no paper has ever tested that specific combination. This is called "link prediction," and it has generated genuinely novel hypotheses that were later confirmed experimentally.
A landmark example is the work of Swanson in the 1980s, who manually connected two separate literatures: one linking fish oil to blood viscosity, and another linking blood viscosity to Raynaud's disease. He hypothesized that fish oil might treat Raynaud's disease without any direct evidence connecting the two, and this was later confirmed. Modern AI systems perform Swanson-style discovery at scale, processing millions of papers to find thousands of indirect connections that merit investigation.
Google's knowledge graph for drug repurposing identifies existing approved drugs that might treat diseases they were not originally designed for. By mapping the molecular targets of drugs and the molecular mechanisms of diseases, the system identifies overlaps that suggest new therapeutic uses. This approach led to the identification of baricitinib as a potential COVID-19 treatment in early 2020, which was later validated in clinical trials. The drug was originally approved for rheumatoid arthritis, and no human researcher had previously connected it to viral infections.
Language Models as Hypothesis Engines
Large language models trained on scientific literature can generate hypotheses by combining knowledge from different domains in novel ways. When prompted with a description of an unsolved problem, the model draws on its training data to propose potential mechanisms, suggest experimental approaches, and identify relevant prior work that the researcher may not have encountered.
The quality of AI-generated hypotheses depends heavily on how you frame the prompt. A vague prompt like "generate hypotheses about Alzheimer's disease" produces generic suggestions. A precise prompt like "propose molecular mechanisms by which gut microbiome composition might influence amyloid-beta aggregation in the hippocampus, drawing on evidence from both gastroenterology and neuroscience" produces specific, testable, and often genuinely creative suggestions. The model is most useful when you provide enough context for it to combine knowledge from specific domains.
Recent studies have evaluated the quality of AI-generated hypotheses rigorously. A 2024 Nature study asked AI systems and human scientists to generate novel hypotheses about gene function, then had expert panels evaluate them blind. The AI hypotheses were rated as more novel than the human hypotheses on average, though slightly less feasible. This suggests that AI is particularly valuable for expanding the space of ideas considered, even if not every idea is immediately actionable. The human scientists excelled at proposing hypotheses that were grounded in practical laboratory constraints, while the AI excelled at finding unexpected cross-domain connections.
Data-Driven Discovery
Sometimes the data itself suggests hypotheses that no theory predicts. AI systems can analyze large experimental datasets and flag surprising patterns, anomalous results, or unexpected correlations that merit further investigation. This is the computational equivalent of the "accidental discovery" that has driven many scientific breakthroughs, from penicillin to the cosmic microwave background.
In genomics, unsupervised learning algorithms applied to gene expression data routinely discover patient subtypes that correspond to different disease mechanisms or treatment responses. These subtypes were not predicted by any prior theory; they emerged from the data. Once discovered, they generate specific hypotheses about the molecular differences between subtypes and the drugs that might target each one.
In materials science, AI analysis of synthesis conditions and material properties has discovered unexpected composition-property relationships. For example, high-entropy alloys, materials made from five or more elements in roughly equal proportions, exhibit properties that no classical metallurgical theory predicted. AI identified these compositions as promising from the data pattern alone, and experimental validation confirmed their unusual strength, corrosion resistance, and thermal stability.
The danger of data-driven discovery is false positives. When you search millions of possible patterns, some will appear significant by chance. This is the multiple comparisons problem, and it applies with extra force to AI-driven exploration because the number of patterns examined is much larger than in traditional hypothesis testing. Statistical corrections (Bonferroni, false discovery rate) are essential, and any AI-generated hypothesis must be validated on independent data or through targeted experiments before it should be taken seriously.
Practical Approaches for Researchers
Start with what you know. The most productive use of AI for hypothesis generation is not asking it to generate ideas from scratch, but rather asking it to extend, combine, or challenge your existing ideas. Frame your prompts around specific problems in your research: "Given that we observe X in our data and the literature reports Y, what mechanisms could explain the connection?" This grounds the AI in real observations and produces more actionable hypotheses.
Use multiple approaches. Generate hypotheses from knowledge graph tools, language models, and data-driven analysis separately, then look for convergence. If a knowledge graph suggests that protein A is connected to disease B, and a language model independently proposes a mechanism linking them, and your data shows a correlation between protein A levels and disease severity, you have converging evidence from three independent sources. This convergence dramatically increases the probability that the hypothesis is worth testing.
Evaluate AI-generated hypotheses with the same rigor you would apply to any hypothesis. Is it specific enough to be testable? Is it consistent with established knowledge, or does it require overturning well-established findings? Is there a plausible mechanism? What experiment would definitively confirm or refute it? The best AI-generated hypotheses are the ones that a domain expert reads and thinks "that could actually work, and I know exactly how to test it."
Document which hypotheses were AI-generated. This is increasingly a requirement for publication, and it is also good scientific practice. Readers can better evaluate a hypothesis if they know it came from a systematic computational search rather than a researcher's intuition. It also helps the field track how effective AI is at generating hypotheses that turn out to be correct, providing data for improving these systems.
AI generates scientific hypotheses by finding connections across literature and data at a scale no human can match. The most productive approach combines AI-generated suggestions with domain expertise: let AI expand the space of ideas you consider, then apply your scientific judgment to evaluate which ones are worth pursuing experimentally.