Protein Structure and Function: From Amino Acids to Complex Machines
Amino Acids: The Building Blocks
Every protein is built from a set of 20 standard amino acids, each sharing the same core structure: a central alpha carbon bonded to an amino group (NH2), a carboxyl group (COOH), a hydrogen atom, and a variable side chain called the R group. The R group is what makes each amino acid unique. Glycine has just a hydrogen atom as its R group, making it the smallest and most flexible amino acid. Tryptophan has a large, bulky indole ring. Cysteine has a sulfhydryl group that can form disulfide bonds with other cysteine residues.
The 20 amino acids are commonly grouped by the chemical properties of their R groups. Nonpolar amino acids, including alanine, valine, leucine, isoleucine, phenylalanine, tryptophan, methionine, and proline, have hydrophobic side chains that tend to cluster in the interior of folded proteins, away from water. Polar uncharged amino acids, including serine, threonine, asparagine, glutamine, and tyrosine, have side chains that can form hydrogen bonds with water. Positively charged amino acids (lysine, arginine, histidine) and negatively charged amino acids (aspartate, glutamate) carry ionic charges at physiological pH and are typically found on protein surfaces where they interact with water and other charged molecules.
Amino acids are linked together by peptide bonds, which form through a condensation reaction between the carboxyl group of one amino acid and the amino group of the next, releasing a water molecule. The resulting chain of amino acids is called a polypeptide. A typical protein contains between 100 and 1,000 amino acid residues, though some are much larger. The giant muscle protein titin, for example, contains over 34,000 amino acid residues.
Primary Structure
The primary structure of a protein is simply its amino acid sequence, read from the amino terminus (N-terminus) to the carboxyl terminus (C-terminus). This sequence is determined by the nucleotide sequence of the gene that encodes the protein. Each set of three nucleotides (a codon) in the messenger RNA specifies one amino acid, according to the genetic code.
The primary structure is the most fundamental level of protein organization because it dictates all subsequent levels of folding. Changing even a single amino acid can alter a protein's shape and function. The classic example is sickle cell disease, in which a single substitution, glutamate to valine at position 6 of the beta-globin chain, causes hemoglobin molecules to polymerize into rigid fibers that distort red blood cells into a sickle shape.
Frederick Sanger determined the first complete amino acid sequence of a protein, bovine insulin, in 1953. This achievement demonstrated that proteins have definite chemical structures and earned Sanger the 1958 Nobel Prize in Chemistry. Today, protein sequences are routinely predicted from DNA sequences using computational tools, and databases like UniProt contain the sequences of hundreds of millions of proteins.
Secondary Structure
Secondary structure refers to local folding patterns within a polypeptide chain, stabilized by hydrogen bonds between backbone atoms. The two most common secondary structures are the alpha helix and the beta sheet.
In an alpha helix, the polypeptide backbone coils into a right-handed spiral. Each backbone carbonyl oxygen (C=O) forms a hydrogen bond with the backbone amino hydrogen (N-H) four residues ahead in the sequence. This regular pattern of hydrogen bonds gives the helix a rigid, rod-like structure. Alpha helices are common in membrane-spanning proteins, where they traverse the lipid bilayer, and in structural proteins like keratin (found in hair and nails) and myosin (found in muscle).
In a beta sheet, the polypeptide chain forms extended strands that lie side by side, connected by hydrogen bonds between adjacent strands. The strands can run in the same direction (parallel beta sheet) or in opposite directions (antiparallel beta sheet). Beta sheets provide structural strength and are common in proteins like silk fibroin and the immunoglobulin fold found in antibodies.
Regions of polypeptide that do not form regular alpha helices or beta sheets are called loops or coils. These regions are not disorganized; they often have specific, functionally important conformations. Loops frequently form the active sites of enzymes and the binding surfaces that allow proteins to interact with other molecules.
Tertiary Structure
Tertiary structure is the complete three-dimensional arrangement of all atoms in a single polypeptide chain. While secondary structure involves local interactions between nearby residues, tertiary structure involves interactions between residues that may be far apart in the primary sequence but are brought close together by the folding of the chain.
Several types of interactions stabilize tertiary structure. Hydrophobic interactions are perhaps the most important: nonpolar side chains are driven into the interior of the protein, away from the aqueous environment, creating a tightly packed hydrophobic core. Hydrogen bonds form between polar side chains and between side chains and backbone atoms. Ionic bonds (salt bridges) form between oppositely charged side chains. Disulfide bonds, covalent links between the sulfhydryl groups of two cysteine residues, provide additional stability, particularly in secreted proteins that must withstand harsh extracellular conditions.
Van der Waals forces, though individually weak, collectively contribute significant stability due to the large number of close contacts between atoms in a tightly packed protein interior. The overall result is a specific, reproducible three-dimensional shape that is essential for the protein's function.
Protein tertiary structures are determined experimentally using X-ray crystallography, NMR spectroscopy, or cryo-electron microscopy. More recently, computational methods like AlphaFold have achieved remarkable accuracy in predicting protein structures from amino acid sequences alone, a breakthrough that earned Demis Hassabis and John Jumper a share of the 2024 Nobel Prize in Chemistry.
Quaternary Structure
Many proteins function as complexes of two or more polypeptide chains, called subunits. The arrangement of these subunits constitutes the protein's quaternary structure. Hemoglobin, the oxygen-carrying protein in red blood cells, is a classic example: it consists of four subunits, two alpha chains and two beta chains, each containing a heme group with an iron atom that binds oxygen.
Quaternary structure allows for cooperative behavior that would not be possible with a single polypeptide chain. In hemoglobin, the binding of oxygen to one subunit increases the oxygen affinity of the remaining subunits, producing a sigmoidal oxygen-binding curve that allows efficient oxygen loading in the lungs and efficient oxygen release in the tissues. This cooperativity is a direct consequence of quaternary structure: conformational changes in one subunit are transmitted to the others through the subunit interfaces.
Other examples of quaternary structure include DNA polymerase (a multi-subunit enzyme that replicates DNA), the proteasome (a barrel-shaped complex that degrades damaged proteins), and ATP synthase (a rotary molecular motor that synthesizes ATP). In each case, the functional properties of the complex emerge from the specific arrangement of its subunits.
Structure Determines Function
The central principle of protein biochemistry is that structure determines function. An enzyme's active site is a precisely shaped cavity that fits its substrate like a glove, positioning reactive groups for catalysis. An antibody's binding site is complementary in shape and charge to its antigen, allowing specific recognition. A channel protein forms a narrow pore through the membrane, sized to allow certain ions to pass while excluding others.
When a protein's structure is disrupted, its function is lost. Denaturation, caused by heat, extreme pH, organic solvents, or detergents, unfolds the protein by disrupting the noncovalent interactions that maintain its shape. A denatured protein is typically a disordered, nonfunctional aggregate. In some cases, denaturation is reversible: if the denaturing agent is removed gradually, the protein can refold to its native state, demonstrating that all the information needed for correct folding is contained in the amino acid sequence.
Protein misfolding is implicated in several serious diseases. In Alzheimer's disease, the amyloid-beta peptide misfolds and aggregates into plaques in the brain. In Parkinson's disease, the protein alpha-synuclein forms similar aggregates. Prion diseases like Creutzfeldt-Jakob disease involve the misfolding of the prion protein into an infectious conformation that can template the misfolding of normal copies. Understanding protein folding and misfolding remains one of the most important and challenging problems in biochemistry.
Protein structure is organized into four hierarchical levels, from the linear amino acid sequence (primary) through local folding patterns (secondary), the overall 3D shape (tertiary), and multi-subunit assemblies (quaternary). At every level, the specific arrangement of atoms determines what the protein can do.