Phylogenetic Trees: Mapping Evolutionary Relationships
How to Read a Phylogenetic Tree
A phylogenetic tree consists of branches, nodes, and tips. The tips represent the organisms or groups being compared, which may be species, populations, genes, or any other units of interest. The internal nodes represent hypothetical common ancestors. The branching pattern shows which groups share more recent common ancestors and are therefore more closely related.
Two species that share a more recent common ancestor are more closely related to each other than either is to a species with a more distant common ancestor. For example, on a tree of vertebrates, humans and chimpanzees share a more recent common ancestor than either shares with dogs. This means humans are more closely related to chimpanzees than to dogs, a fact reflected in the branching pattern of the tree.
It is important to understand what phylogenetic trees do not show. The vertical axis does not represent a scale of advancement or complexity. Species at the tips of a tree are not more evolved than those at internal nodes. All living species have been evolving for exactly the same amount of time since their last common ancestor. The tree simply shows the pattern of ancestral relationships, not a ranking from primitive to advanced.
Trees can be drawn in many orientations, including left to right, bottom to top, or as circular diagrams, but the branching pattern conveys the same information regardless of how the tree is oriented. Branches can also be rotated around any node without changing the relationships depicted. Two trees that look visually different may actually show identical evolutionary relationships if their branching patterns are equivalent.
Building Phylogenetic Trees
Scientists construct phylogenetic trees using data from morphology (physical characteristics), molecular sequences (DNA, RNA, or protein), and sometimes behavioral or ecological traits. The basic principle is that organisms sharing more similar characteristics are likely to be more closely related, having diverged more recently from a common ancestor.
Morphological data has been used to build phylogenies since before Darwin. Anatomists compare homologous structures, features inherited from a common ancestor, to determine relationships. The same arrangement of bones in the forelimbs of humans, whales, bats, and horses indicates that these species share a common ancestor with that limb structure. Shared derived characters, traits that evolved in an ancestor and are present in all of its descendants, are the most informative features for building phylogenies.
Molecular phylogenetics, which uses DNA or protein sequence comparisons, has revolutionized our understanding of evolutionary relationships since the 1960s. DNA sequences accumulate mutations over time, and more closely related species have more similar DNA sequences because they have had less time to accumulate differences. By aligning sequences from different species and analyzing the pattern of similarities and differences, scientists can reconstruct the evolutionary tree with high precision.
Several computational methods are used to build molecular phylogenies. Maximum parsimony selects the tree that requires the fewest evolutionary changes to explain the observed data. Maximum likelihood evaluates which tree has the highest probability of producing the observed sequences given a model of how DNA evolves. Bayesian methods calculate the probability of different trees given the data and a prior probability distribution. Each method has strengths and limitations, and researchers often compare results from multiple methods to assess the robustness of their conclusions.
Molecular Clocks
The molecular clock hypothesis proposes that DNA sequences accumulate mutations at a roughly constant rate over time. If this rate can be calibrated using fossils or other independent evidence, molecular data can be used to estimate when lineages diverged. This approach has been used to date many evolutionary events, from the divergence of humans and chimpanzees (approximately six to seven million years ago) to the origin of major animal groups.
In practice, molecular clocks are not perfectly constant. Different genes evolve at different rates, different lineages may have different mutation rates due to differences in generation time or metabolic rate, and natural selection can accelerate or slow the rate of molecular change in specific genes. Modern molecular clock methods account for these complications using relaxed clock models that allow the rate to vary across branches of the tree.
Despite these complications, molecular clocks remain valuable tools for estimating divergence times, especially for groups with poor fossil records. Molecular dating has been particularly useful for estimating the ages of microbial lineages, which rarely leave fossils, and for dating the diversification of groups like flowering plants and mammals that underwent rapid radiation.
Challenges in Building Phylogenies
Constructing accurate phylogenetic trees is not always straightforward. Several biological phenomena can make it difficult to determine the true pattern of evolutionary relationships.
Convergent evolution, where unrelated species independently evolve similar traits, can mislead phylogenetic analysis if it causes distantly related species to appear closely related. Dolphins and sharks have similar body shapes, but molecular data clearly shows that dolphins are mammals more closely related to cows than to sharks. Molecular data has generally proven more reliable than morphological data for resolving such cases because convergent evolution at the molecular level is less common than at the morphological level.
Horizontal gene transfer, the movement of genes between organisms outside of parent-to-offspring inheritance, is common in bacteria and archaea and complicates phylogenetic reconstruction for these groups. Because different genes in the same organism may have different evolutionary histories, no single tree may accurately represent the relationships among prokaryotes. Some biologists have proposed replacing the tree metaphor with a web or network model for prokaryotic evolution.
Incomplete lineage sorting occurs when ancestral genetic variation persists through multiple speciation events, causing gene trees to differ from species trees. This phenomenon is particularly common when speciation events occur in rapid succession, leaving little time for ancestral genetic variation to sort into the new species. Incomplete lineage sorting has complicated the reconstruction of relationships among closely related species, including the great apes.
Long branch attraction is a statistical artifact that can cause distantly related lineages with high rates of molecular evolution to appear closely related. This occurs because independently evolved similarities accumulate more rapidly in fast-evolving lineages, causing them to cluster together in phylogenetic analysis. Methods to detect and correct for long branch attraction are an active area of research in computational phylogenetics.
Applications of Phylogenetics
Phylogenetic trees have applications far beyond classifying organisms. In medicine, phylogenetic analysis is used to track the spread and evolution of pathogens. During disease outbreaks, comparing viral genome sequences can reveal transmission chains, identify the geographic origin of an outbreak, and predict how viruses might evolve in the future. This approach has been applied extensively to HIV, influenza, Ebola, and SARS-CoV-2.
In conservation biology, phylogenetics helps prioritize species for protection. Species that represent unique evolutionary lineages, with few close relatives, may be given higher conservation priority because their extinction would result in a disproportionate loss of evolutionary history. The concept of phylogenetic diversity provides a framework for measuring how much evolutionary history would be lost if specific species went extinct.
Forensic science uses phylogenetic methods to analyze evidence in criminal cases involving biological material. Phylogenetic analysis of HIV sequences has been used in court cases to determine whether one individual infected another. Similar approaches have been applied to cases involving anthrax, food contamination, and wildlife trafficking.
Agriculture and biotechnology benefit from phylogenetics by identifying wild relatives of crop plants that may carry useful genes for disease resistance, drought tolerance, or nutritional quality. Understanding the evolutionary relationships among crop plants and their wild relatives guides the search for beneficial genetic variation that can be introduced through breeding programs.
Phylogenetic trees are diagrams that map evolutionary relationships among organisms based on shared ancestry. Built from morphological and molecular data, phylogenies are essential tools in biology with applications ranging from disease tracking to conservation prioritization.