Bloom filters are an interesting probabilistic data structure that have can be used to see if an item has never been added previously
Bloom filters work by running an item through a quick hashing function and sampling bits from that hash and setting them from a 0 to 1 at particular interval in a bitfield. To check for existence in a Bloom filter, the same bits are sampled. Many item may have bits that overlap, but since a hashing function produce unique identifiers, if a single bit from the hash is still a 0, then we know it has not been previously added.
A good use case for a Bloom filter is to check for an already used username. On a small scale, this is no problem, but as a service grows, this can be very taxing on a database. It is very simple to implement this with a ReBloom.
First, let’s add a handful of usernames as a test:
BF.ADD
> usernames funnyfred
(integer) 1
> BF.ADD usernames fredisfunny
(integer) 1
> BF.ADD usernames fred
(integer) 1
> BF.ADD usernames funfred
(integer) 1
Now, let’s run some test versus the Bloom filter.
BF.EXISTS
> BF.EXISTS usernames fred
(integer) 1
> BF.EXISTS usernames fred_is_funny
(integer) 0
As expected, fred_is_funny yields a 0. A response of zero means we can be sure that this username has not been used. A response of 1 means it might have been used. We can’t say for certain as it might a case of overlapping bits between multiple items.