Search Algorithms Explained: How Computers Find What You Need

Updated May 2026
Search algorithms locate specific items within collections of data, or determine that those items do not exist. From finding a contact in your phone to querying billions of web pages, search is one of the most frequent operations in computing. The choice of search algorithm determines whether a lookup takes microseconds or minutes, making search algorithm design one of the most practically impactful areas of computer science.

Linear Search: The Simplest Approach

Linear search, also called sequential search, examines every element in a collection one at a time until it finds the target or reaches the end. It requires no special data organization and works on any collection, sorted or unsorted, arrays or linked lists. The time complexity is O(n) in the worst and average cases, since the target could be anywhere in the collection or absent entirely.

Despite its simplicity, linear search has legitimate uses. For very small collections (under 20 or so elements), the overhead of more sophisticated algorithms can exceed the cost of simply checking each element. Linear search also works on unsorted data without requiring any preprocessing. In situations where data arrives as a stream and you need to find a match in a single pass, linear search is the natural approach.

Binary Search: Divide and Conquer on Sorted Data

Binary search works on sorted collections by repeatedly dividing the search space in half. It compares the target value to the middle element. If they match, the search is complete. If the target is less than the middle element, the search continues in the lower half. If the target is greater, the search continues in the upper half. Each comparison eliminates half of the remaining possibilities, yielding O(log n) time complexity.

The efficiency gain is enormous. Searching one million sorted elements requires at most 20 comparisons with binary search, compared to up to one million with linear search. Searching one billion elements requires at most 30 comparisons. This logarithmic scaling makes binary search one of the most efficient algorithms in computer science.

Binary search has important prerequisites and subtleties. The data must be sorted, and it must support random access (jumping directly to any position), which means arrays work but linked lists do not. The algorithm is also notoriously tricky to implement correctly. Common bugs include integer overflow when computing the midpoint, off-by-one errors in boundary conditions, and infinite loops when the search space is not reduced properly. Jon Bentley, in his book Programming Pearls, reported that most professional programmers could not write a correct binary search implementation, and the Java standard library had a binary search bug for nearly a decade before it was discovered.

Hash-Based Search

Hash tables provide O(1) average-case lookup time, making them the fastest search structure for many applications. A hash function maps each key to an index in an array, allowing direct access to the stored value without any comparisons. When two keys map to the same index (a collision), various strategies resolve the conflict: chaining stores colliding elements in a linked list at that index, while open addressing probes alternative positions in the array.

The performance of hash-based search depends critically on the quality of the hash function and the load factor (ratio of stored elements to table size). A good hash function distributes keys uniformly across the array, minimizing collisions. As the load factor increases, collisions become more frequent and performance degrades. Most implementations resize the table when the load factor exceeds a threshold (commonly 0.7), maintaining near-constant lookup time at the cost of occasional rehashing.

Hash tables are used extensively in practice. Python dictionaries, JavaScript objects, and Java HashMaps are all hash table implementations. Database indexes often use hash-based structures for equality lookups. Caches, symbol tables in compilers, and duplicate detection algorithms all rely on hash tables for their O(1) average lookup time.

Tree-Based Search Structures

Binary search trees (BSTs) organize data in a tree where each node has at most two children, with all values in the left subtree less than the node and all values in the right subtree greater. Searching follows the path from root to target, going left or right at each node based on comparison. In a balanced tree, this takes O(log n) time, but an unbalanced tree can degrade to O(n), essentially becoming a linked list.

Self-balancing trees like AVL trees and red-black trees automatically maintain balance through rotations after insertions and deletions, guaranteeing O(log n) search time in all cases. AVL trees enforce a strict balance condition (height difference of at most 1 between subtrees), making lookups slightly faster but insertions and deletions slightly slower due to more frequent rotations. Red-black trees use a looser balance condition with node coloring, offering a good compromise between lookup speed and modification cost. Most standard library implementations of ordered maps and sets use red-black trees.

B-trees are designed for storage systems where data resides on disk. Each node can contain many keys and children, reducing the number of disk accesses needed to find a key. B-trees are the standard data structure for database indexes and file systems. A B-tree with a branching factor of 1000 can index one billion records with only three levels, meaning any record can be found with at most three disk reads.

Search in Specialized Domains

Text search algorithms find patterns within strings. The naive approach checks every position in the text, taking O(nm) time where n is the text length and m is the pattern length. The Knuth-Morris-Pratt (KMP) algorithm preprocesses the pattern to skip unnecessary comparisons, achieving O(n + m) time. The Boyer-Moore algorithm uses the insight that mismatches at the end of the pattern can skip larger portions of the text, making it sublinear in practice for long patterns.

Interpolation search improves on binary search when data is uniformly distributed. Instead of always checking the middle element, it estimates the target position based on the target value relative to the range of values, similar to how a person might open a dictionary closer to the beginning for words starting with "B" and closer to the end for words starting with "W." Interpolation search achieves O(log log n) average time on uniformly distributed data but degrades to O(n) on adversarial inputs.

Ternary search finds the maximum or minimum of a unimodal function by dividing the search interval into three parts. It is used in optimization problems where the objective function has a single peak or valley, such as finding the optimal angle for a projectile or the minimum cost in a convex cost function.

Search Algorithm Selection

Choosing the right search algorithm depends on the data characteristics and access patterns. For unsorted data with infrequent lookups, linear search may be acceptable. For sorted data with frequent lookups, binary search offers excellent performance with no extra memory. When the fastest possible lookup is needed and memory is available, hash tables provide O(1) average time. When both fast lookup and ordered traversal are required, balanced search trees offer O(log n) operations with ordered iteration. For persistent storage, B-trees minimize disk access and are the standard choice for databases and file systems.

Key Takeaway

The choice of search algorithm can mean the difference between 20 comparisons and one million comparisons for the same dataset. Understanding linear search, binary search, hash tables, and tree-based structures gives you the tools to make informed decisions about data organization and retrieval in any application.