disast.rs :: now :: notes

So much of our modern digital infrastructure is built off hash tables, it seems I should have an entry.

You can think of a hash table as a dynamic array, with a really strange policy for inserting items.

Hash Functions

Many hash functions follow the same pattern. You start with some initial hash value, usually a constant with certain carefully chosen mathematical properties. Then you walk the data to be hashed. For each byte, you mix the bits into the hash value somehow, and then scramble the resulting bits around some.

FNV-1a

Techniques

See: double hashing, cuckoo hashing, Robin Hood hashing, linear probing

Glossary

  • Collision

    :

    Occurs when the calculated index (i.e. hash) of two elements in a hashtable are the same.

  • Load Factor

    :

    The ratio between the size of a hash table, and the number of elements it contains

  • Separate Chaining

    :

    A strategy for resolving collisions between elements wherein each bucket of a hash table points to a linked-list of elements, rather than a single position.

    It's simple to implement, but tends to be memory inefficient.

  • Open Addressing

    Also known as "closed hashing"

    A hashing technique wherein each element is stored directly in the table

    Upon collision, the table probes for the next available bucket.

    The simplest form of probing is linear probing, in which the table walks its internal array looking for the first open bucket.

  • Clustering

    The presence of multiple entries in one section of the table

    High degrees of clustering in a table suggest a different hashing algorithm or probing algorithm may be appropriate.