Hash Table

A hash table is used to create a list of key-value pairs. With a hash table, you can retrieve the elements in the collection by specifying a key value.

Like arrays, hash tables can be used to implement other data structures.

Hash tables are commonly used because of its fast search, insert, and delete operations. The average time complexity for these operations is constant O(1), which is very fast compared to the typical linear time complexity O(n).

Click here for a comparison between hash table, linked list, and binary search tree data structures.

If you go through the implementation of these three data structures, you'll notice that hash table is different from the rest such that it's implemented with an array.

The use of key-value pairs is what makes hash tables faster compared to a regular array. When looking for a value in an array, you'd normally start by looking at the first element and move to the next until you find a match O(n). If you want to make it faster, you can sort the array first and do a binary search.

While this improves the speed of lookup, it also makes insertion of new values very slow, because you will have to move around the existing values in order to keep the array sorted.

This problem is solved by using a hash table.

Other terms for a hash table are:

  • dictionary
  • map
  • associative array
  • symbol table

Example Use Cases

Use hash tables when you need fast access and insertion to your collection, and the order of the elements is not important, such as:

  • student records mapped to their id's
  • address tables
  • password lookups

The set data structure is also implemented using a hash table instead of a regular array, to improve the time complexity of its operations. That means use cases of set are also use cases of hash table.

A simple programming exercise that can be solved with a hash table is counting the frequency of every word, given a phrase or paragraph. The word is the key and the frequency is the value. However, this simple problem can also be solved with a regular reduce function and fewer lines.

A better example problem is the Two Sum problem on LeetCode.

Two Sum Problem TypeScript Example

The Two Sum problem can be solved with a brute-force method using a loop within a loop, but the time complexity would then be quadratic.

By using a hash table, this can be significantly reduced to linear.

function getTwoSum(arr: number[], target: number): number[] | null {
const map = new Map<number, number>();
for (let i = 0; i < arr.length; i++) {
const complement = target - arr[i];
if (map.has(complement)) {
return [map.get(complement) as number, i]
}
map.set(arr[i], i);
}
return null;
}

Hash Table Concepts

1. Hash function

Hash table works by accepting a key input and running it through a hash function. The hash function maps the input value to a hash value (which is a number). The hash number is usually the index in the array.

A hash function should consistently return the same value when given the same input, as well as map different input values to different hashes.

2. Collision

A collision happens when the hash function assigns two different key inputs to the same hash number. Hash functions should be well-written to avoid collision. Such hash functions are expensive and difficult to implement.

One way to avoid collision is by storing the key-value pairs in an array first (where the first element is the key and the second element is the value) and then storing that array at the index. The array that holds the key-value pairs that are stored in the same index is usually called a "bucket".

When you perform a lookup for a pair, you'll iterate through the bucket to find the pair that you're looking for.

Hash Table Methods

  1. has(key) - returns a boolean asserting whether a value has been associated to the key in the hash table or not.
  2. set(key, value) - sets the value for the key in the hash table. Returns the hash table object with the added key-value pair.
  3. get(key) - returns the value associated to the key, or undefined if there is none.
  4. delete(key) - removes the specified element. Returns a boolean that asserts whether it was successfully removed or not.
  5. size - returns the number of key/value pairs in the hash table.

JavaScript ES6 (ES2015) has a built-in Map object, which is more useful than a regular JavaScript object, because it contains all of the methods above, and more (map.size in ES6 is a number property and not a method). In addition, it accepts any type for the keys instead of just strings and numbers.

Simple Hash Table TypeScript Implementation

Not needed irl because JavaScript has a built-in Map object, but it's useful to see how the methods are implemented at a basic level to understand under the hood.

interface IHashTable<T> {
has(key: string): boolean;
set(key: string, value: T): this;
get(key: string): T | undefined;
delete(key: string): boolean;
}
type HashTablePair<T> = [string, T];
class HashTable<T> implements IHashTable<T> {
private table: HashTablePair<T>[][] = [];
size: number = 0;
constructor(initialItems: HashTablePair<T>[], maxSize: number) {
this.table = new Array(maxSize);
initialItems.forEach(pair => {
this.set(pair[0], pair[1]);
});
}
private getHash(key: string, tableSize: number): number {
// one way to implement a hash function is to get the character code of each character in the string
let hash = 0;
for (let i = 0; i < key.length; i++) {
hash += key.charCodeAt(i); // use modulo operator to solve the problem of the hash getting too big
}
return hash % tableSize;
}
has(key: string): boolean {
const index = this.getHash(key, this.table.length);
if (!!this.table[index]) {
return !!this.table[index].find((pair: HashTablePair<T>) => pair[0] === key);
}
return false;
}
set(key: string, value: T): this {
if (!this.has(key)) {
this.size++;
}
const index = this.getHash(key, this.table.length);
if (this.table[index]) {
const pair = this.table[index].find(pair => pair[0] === key);
if (!!pair) {
pair[1] = value;
} else {
this.table[index].push([key, value]);
}
} else {
this.table[index] = [[key, value]];
}
return this;
}
get(key: string): T | undefined {
const index = this.getHash(key, this.table.length);
if (!!this.table[index]) {
// performing a find() method makes the worst case scenario's time complexity O(n), which happens if all items had a collision and are stored in the same bucket
const pair = this.table[index].find((pair: HashTablePair<T>) => pair[0] === key);
if (pair) {
return pair[1];
}
}
return undefined;
}
delete(key: string): boolean {
const index = this.getHash(key, this.table.length);
// performing a for loop makes the worst case scenario's time complexity O(n), which happens if all items had a collision and are stored in the same bucket
for (let i = 0; i < this.table[index].length; i++) {
if (this.table[index][i][0] === key) {
this.table[index][i] = [] as unknown as HashTablePair<T>;
this.size--;
return true;
}
}
return false;
}
}

This is a basic implementation. It only accepts string type as a key to keep things simple, because there are different ways to implement a hash function, and the correct implementation depends on the type of your keys.

You would also rarely need an object as a key. If you want to use an object as a key and hash it, it's often better to find what makes the object unique and use that as key instead.

This implementation's constructor also accepts a hash table size as a parameter to keep things simple. You can improve this by making the hash table size dynamic, and make the size parameter optional.

One way to do this is to calculate the load factor whenever you insert a new item anywhere. The load factor is the number of items divided by the hash table's length.

If the load factor is over a certain number, let's say 0.9 means the table size is 90% full, then resize the table. Keep in mind that the hash function takes in the table's size as a parameter, which means the hash will no longer be consistent after resizing.

You can simply solve this by re-hashing every item after every resize of the table. That means resizing can get expensive.

And the best improvement for this implementation in terms of optimization is using a linked list for the bucket arrays instead of a regular array. Then it will truly make the operations' time complexity constant O(1).

When I have the time, I'll include these in the implementation as well. But if you're feeling like a tinkerer, feel free to play around and implement it yourself ;)

TODO implement dynamic size and linked list buckets