Published on

What is a Hash Index? Fast Exact-Match Lookups in Databases

What is a Hash Index?

A Hash Index stores index entries in a hash table rather than a tree structure like a B-tree. When you query a column indexed by a hash index, the database applies a hash function to the key value to compute which "bucket" or slot stores the pointer(s) to the matching rows.

Because the hash function directly maps keys to buckets, a hash index can perform exact-match lookups (= or IN) very efficiently, with an average time complexity of O(1) — meaning it can retrieve matching rows in constant time regardless of table size.

How Does a Hash Index Work?

  • The indexed value (e.g., a column value) is input to a hash function.
  • The hash function outputs a hash code that determines the bucket where the row reference is stored.
  • To find matching rows, the database hashes the query key and jumps directly to the corresponding bucket.
  • Collisions (two keys hashing to the same bucket) are handled via linked lists or overflow areas, causing slightly slower lookups but still efficient if collisions are rare.

Advantages of Hash Indexes

  • Extremely fast equality lookups: Perfect for queries like WHERE AccountNumber = 12345.
  • Stable performance on large datasets: Lookup speed stays near constant even as data grows.
  • Low maintenance cost: Insertions and deletions just involve hashing and bucket insertion/removal.
  • Ideal for key-value workloads: Useful when queries are mostly exact key lookups.

Limitations of Hash Indexes

  • No support for range queries: Queries with <, >, BETWEEN, or range scans cannot use hash indexes.
  • No ordering: Since hash indexes don't store keys in sorted order, they cannot optimize ORDER BY operations.
  • No prefix or partial matching: Queries like LIKE 'John%' cannot leverage hash indexes effectively.
  • Database support varies: Not all RDBMS support user-created hash indexes, or they have limitations (e.g., PostgreSQL only recently made hash indexes crash-safe; MySQL InnoDB doesn't allow explicit hash indexes).
  • Potential resizing cost: If the hash table fills up, it may need to be resized and rehashed, which can be expensive but infrequent.

Use Cases for Hash Indexes

Use hash indexes when:

  • Your workload is dominated by exact-match lookups on large tables.
  • You do not need to perform range or ordered queries on the indexed column.
  • You want a fast, constant-time key-value lookup pattern, such as caching or lookup tables.
  • You need to index uniformly distributed values like cryptographic hashes or UUIDs.
  • Your database supports hash indexes efficiently and your queries benefit from them.

Example: Creating and Using a Hash Index

PostgreSQL:

-- Create a hash index on the CustomerCode column
CREATE INDEX idx_customer_code_hash ON Customers USING HASH(CustomerCode);

Query:

SELECT * FROM Customers WHERE CustomerCode = 'ABC123';

This query will efficiently use the hash index to directly hash 'ABC123' and retrieve matching rows.

MySQL:

  • The MEMORY storage engine uses hash indexes by default.
  • For disk-based tables (e.g., InnoDB), you cannot explicitly create hash indexes; B-trees are used instead.
  • Use hash indexes only if your workload is heavily based on equality lookups and your DB supports them.

Summary

A Hash Index is a powerful indexing method designed for extremely fast equality searches on large datasets. It offers average constant-time lookups but does not support range queries or sorting. If your application performs mostly exact key lookups and your database supports it, hash indexes can significantly speed up queries compared to traditional B-tree indexes.