Molecular Evolution: How DNA Changes Drive Evolution
Types of Molecular Change
Mutations are the ultimate source of all genetic variation and therefore the raw material for molecular evolution. Point mutations change a single nucleotide in a DNA sequence and are the most common type of mutation. They are classified as transitions (changes between chemically similar bases, such as A to G or C to T) or transversions (changes between chemically dissimilar bases, such as A to C or G to T). Transitions are generally more common than transversions because they involve smaller chemical changes.
Insertions and deletions (collectively called indels) add or remove nucleotides from a DNA sequence. In protein-coding genes, indels that are not multiples of three nucleotides cause frameshifts, altering the reading frame of the genetic code and usually producing a nonfunctional protein. Frameshift mutations are therefore usually eliminated by natural selection, while in-frame indels that add or remove whole codons are more likely to persist.
Gene duplication is a major source of evolutionary innovation at the molecular level. When a gene is accidentally duplicated, one copy can continue performing its original function while the other is free to accumulate mutations and potentially evolve a new function. The globin gene family, which includes hemoglobin and myoglobin, evolved through a series of gene duplications from a single ancestral globin gene over hundreds of millions of years. Each duplication event produced a new gene copy that eventually specialized for a different function, such as oxygen transport at different developmental stages.
Whole-genome duplications, in which the entire genome is duplicated, have occurred multiple times in evolutionary history. Two rounds of whole-genome duplication occurred early in vertebrate evolution, and additional duplications have occurred in specific lineages such as teleost fish and flowering plants. These events provided a massive supply of duplicate genes that could be repurposed for new functions, potentially facilitating major evolutionary innovations.
The Neutral Theory of Molecular Evolution
The neutral theory, proposed by Motoo Kimura in 1968, fundamentally changed how biologists think about molecular evolution. Kimura argued that most evolutionary changes at the molecular level are caused by the random fixation of selectively neutral mutations through genetic drift, rather than by natural selection driving the spread of beneficial mutations.
The neutral theory does not deny the importance of natural selection for phenotypic evolution. Instead, it proposes that the vast majority of mutations at the DNA level are either harmful (and quickly removed by selection) or neutral (having no effect on fitness). Neutral mutations accumulate at a rate determined by the mutation rate alone, independent of population size, because the probability of fixation by drift (1/2N) is exactly balanced by the number of new neutral mutations per generation (2Nu), producing a rate of molecular evolution equal to the neutral mutation rate u.
The nearly neutral theory, developed by Tomoko Ohta, extended Kimura s framework by recognizing that many mutations are not strictly neutral but have very small selective effects. Whether these slightly deleterious or slightly beneficial mutations behave as neutral depends on population size: in small populations, drift overwhelms weak selection, and these mutations behave as if they were neutral. In large populations, natural selection is more effective at removing slightly deleterious mutations and fixing slightly beneficial ones.
The neutral theory made specific predictions that have been largely confirmed. It predicted that functionally less constrained regions of the genome (such as pseudogenes and intergenic regions) should evolve faster than functionally important regions (such as protein-coding sequences), because a higher proportion of mutations in unconstrained regions are neutral. It also predicted that synonymous substitutions (which do not change the amino acid sequence) should be more common than nonsynonymous substitutions (which do change the amino acid sequence), because synonymous changes are more likely to be neutral. Both predictions have been confirmed across many organisms.
Molecular Clocks and Divergence Dating
The observation that molecular sequences accumulate changes at relatively constant rates led to the concept of the molecular clock. If the rate of molecular change can be calibrated using fossil evidence or known geological events, molecular data can be used to estimate when two lineages diverged from their common ancestor.
Early molecular clock analyses assumed a strict clock with a constant rate across all lineages. However, research has shown that rates vary among lineages due to differences in generation time, metabolic rate, population size, and the strength of natural selection. Species with shorter generation times tend to accumulate mutations faster because they undergo more rounds of DNA replication per unit time. Species with higher metabolic rates may also have higher mutation rates due to increased production of DNA-damaging reactive oxygen species.
Modern molecular clock methods use relaxed clock models that allow the rate to vary across branches of a phylogenetic tree. These methods, combined with multiple fossil calibration points and sophisticated statistical frameworks, provide divergence time estimates with explicit confidence intervals. Molecular dating has been used to estimate the ages of virtually all major lineages of life, from the divergence of bacteria and archaea billions of years ago to the diversification of human populations within the last 100,000 years.
Natural Selection at the Molecular Level
While the neutral theory emphasizes the importance of drift, natural selection clearly acts on molecular variation as well. Positive selection, which drives the spread of beneficial mutations, can be detected by comparing the rates of synonymous and nonsynonymous substitutions. When the rate of nonsynonymous substitutions exceeds the rate of synonymous substitutions, it suggests that amino acid changes are being driven to fixation by positive selection faster than the background neutral rate.
Genes involved in immune defense, reproduction, and sensory perception frequently show signatures of positive selection, reflecting ongoing adaptation to pathogens, sexual selection, and environmental challenges. The major histocompatibility complex (MHC) genes, which encode proteins that present pathogen-derived peptides to the immune system, show some of the strongest signatures of positive selection in vertebrate genomes, driven by the constant arms race between hosts and parasites.
Purifying selection, which removes deleterious mutations, is the most common form of natural selection at the molecular level. Most protein-coding genes are under strong purifying selection, with the majority of amino acid-changing mutations being removed because they compromise protein function. The strength of purifying selection varies across genes: genes encoding essential cellular machinery (such as histones and ribosomal proteins) are among the most highly conserved sequences known, while genes with more specialized or redundant functions evolve more rapidly.
Balancing selection maintains multiple alleles in a population when heterozygotes have higher fitness than either homozygote, or when the fitness of an allele depends on its frequency in the population. The classic example is the sickle cell allele of hemoglobin: the heterozygous state provides resistance to malaria, maintaining the allele at high frequencies in malaria-endemic regions despite the severe disease it causes in homozygotes.
Comparative Genomics
The sequencing of complete genomes from hundreds of species has opened new frontiers in molecular evolution. Comparative genomics reveals how genomes change in size, structure, and content over evolutionary time. Mammalian genomes, for example, are remarkably similar in gene content despite varying greatly in size, largely due to differences in the amount of repetitive DNA. The human genome contains approximately 20,000 protein-coding genes, a number similar to that of mice, dogs, and cows, despite the obvious differences between these species.
Comparisons between closely related species reveal the specific genetic changes responsible for phenotypic differences. The comparison of human and chimpanzee genomes, which differ by approximately 1.3 percent in their aligned sequences, has identified genes and regulatory elements that changed rapidly in the human lineage and may be responsible for uniquely human traits such as language capacity, brain size, and bipedal locomotion.
Regulatory evolution, changes in when and where genes are expressed rather than changes in the protein they encode, is increasingly recognized as a major driver of phenotypic evolution. The same toolkit of developmental genes is shared across animals as different as fruit flies and humans, suggesting that much of the morphological diversity in the animal kingdom results from changes in gene regulation rather than changes in gene structure. This insight has profound implications for understanding how complex traits evolve and how relatively small genetic changes can produce dramatic phenotypic differences.
Molecular evolution studies how DNA and protein sequences change over time through mutation, drift, and selection. The neutral theory, molecular clocks, and comparative genomics provide tools for understanding evolutionary relationships and the genetic basis of adaptation and diversification.