I first learned about Bloom filters two or three years ago while studying common interview topics, specifically as a solution for cache penetration. Of course, I only knew it as a concept back then and never implemented it in any project. Recently, after reading the Bigtable paper and discovering that it also uses Bloom filters, I realized their practical utility.
When it came to technology selection, I considered Guava, Redis' RedisBloom module, and Redisson. Guava offers a standalone Bloom filter with high performance, but it's not suitable for cluster environments. The RedisBloom module isn't built into Redis and requires separate installation, which I preferred to avoid. Ultimately, I chose the Bloom filter functionality provided by Redisson, a Redis client, and successfully applied it in my own project.
The principle behind Bloom filters is simple. One might assume that Redisson implements Bloom filters using Lua scripts to ensure atomicity. However, I wasn't sure exactly how Redisson did it, so I decided to investigate. This blog post leverages an LLM to read the source code of the RedissonBloomFilter class and analyze its implementation.
Source Code Analysis
Redisson's Bloom filter implementation is based on Redis bitmaps and Lua scripts to guarantee atomicity, combined with the Highway 128-bit hash algorithm. Below is a detailed breakdown of the key fields and the logic of three core methods (tryInitAsync, addAsync, containsAsync) in Java code, along with their corresponding Lua script implementations.
Key Fields
Field
Type
Description
size
long
The length of the bit array (number of bits), calculated from the expected number of insertions and the false positive probability.
hashIterations
int
The number of hash functions mapped per element (i.e., the number of bits set in the bit array).
commandExecutor
CommandAsyncExecutor
Redisson's command executor, responsible for sending Redis commands.
configName
String
The Redis key storing the Bloom filter's metadata, actually a hash containing fields like size, hashIterations, expectedInsertions, and falseProbability.
The actual element bit data is stored in a Redis string structure (key is getRawName()), using setbit / getbit operations on bits.
Configuration information is stored in the configName hash to avoid recalculating hash iterations and bit array size for each operation.
public RFuture<Boolean> tryInitAsync(long expectedInsertions, double falseProbability) { // Parameter validation if (falseProbability > 1) { throw ...; } if (falseProbability < 0) { throw ...; }
// Calculate the optimal bit array size and number of hash functions // based on expected insertions and false positive probability size = optimalNumOfBits(expectedInsertions, falseProbability); if (size == 0) { throw ...; } if (size > getMaxSize()) { throw ...; } hashIterations = optimalNumOfHashFunctions(expectedInsertions, size);
// Execute Lua script to attempt initializing the configuration return commandExecutor.evalWriteAsync(configName, StringCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN, "if redis.call('exists', KEYS[1]) == 1 then " + "return 0;" + "end; " +
First, it calculates the optimal size and hashIterations (using the same algorithm as Guava's BloomFilter).
Then, via a Lua script, it checks in Redis whether configName exists. If it doesn't, it writes the configuration and returns 1; if it already exists, it returns 0.
The script ensures atomicity for initialization and the "create only if not exists" semantics.
Lua Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-- KEYS[1] : configName, the hash key storing configuration -- ARGV[1] : size (bit array size) -- ARGV[2] : hashIterations (number of hash functions) -- ARGV[3] : expectedInsertions (expected number of insertions) -- ARGV[4] : falseProbability (expected false positive probability)
-- Check if configuration already exists if redis.call('exists', KEYS[1]) == 1then return0-- Already exists, initialization failed end
public RFuture<Long> addAsync(Collection<T> objects) { // If size is 0 (local config not loaded yet), asynchronously read config first CompletionStage<Void> future = CompletableFuture.completedFuture(null); if (size == 0) { future = readConfigAsync(); }
return future.thenCompose(r -> { // Calculate hash indexes for all elements List<Long> allIndexes = index(objects);
// Construct Lua parameters: size, hashIterations, index count per element, all indexes... List<Object> params = newArrayList<>(); params.add(size); params.add(hashIterations); ints= allIndexes.size() / objects.size(); // Number of indexes per element params.add(s); params.addAll(allIndexes);
"local k = 0;" + "local c = 0;" + "for i = 4, #ARGV, 1 do " + "local r = redis.call('setbit', KEYS[2], ARGV[i], 1); " + "if r == 0 then " + "k = k + 1;" + "end; " + "if ((i - 4) + 1) % ARGV[3] == 0 then " + "if k > 0 then " + "c = c + 1;" + "end; " + "k = 0; " + "end; " + "end; " + "return c;", Arrays.asList(configName, getRawName()), params.toArray()); }); }
The index(objects) method calls hash for each element, generating a 128-bit hash value (two longs), and then uses hash(long hash1, long hash2, int iterations, long size) to produce hashIterations index positions.
In the Lua script, KEYS[1] is configName, and KEYS[2] is the key storing the bit data.
ARGV[1] = size, ARGV[2] = hashIterations, ARGV[3] = number of indexes per element, followed sequentially by all indexes.
Script return value: The actual number of newly added elements (i.e., elements for which at least one bit was changed from 0 to 1).
-- KEYS[1] : configName (configuration hash key) -- KEYS[2] : Key for the bit data (bitmap) -- ARGV[1] : size (expected bit array size) -- ARGV[2] : hashIterations (expected number of hash functions) -- ARGV[3] : Number of indexes per element (equals hashIterations) -- ARGV[4] ... ARGV[N] : Index values for all elements, arranged consecutively per element
-- Read actual size and hashIterations from config hash, compare with passed values local size = redis.call('hget', KEYS[1], 'size') local hashIterations = redis.call('hget', KEYS[1], 'hashIterations') assert(size == ARGV[1] and hashIterations == ARGV[2], 'Bloom filter config has been changed')
local k = 0-- Number of newly set bits for the current element (bits that were 0) local c = 0-- Actual number of newly added elements -- Iterate through all indexes (starting from ARGV[4]) for i = 4, #ARGV, 1do -- Set the bit at the index to 1, return the old value (0 or 1) local r = redis.call('setbit', KEYS[2], ARGV[i], 1) if r == 0then k = k + 1-- This bit was 0, so it contributes a new bit end -- After processing all indexes for one element (specified by ARGV[3]) if ((i - 4) + 1) % ARGV[3] == 0then if k > 0then c = c + 1-- This element set at least one new bit, increment count end k = 0-- Reset count for the current element end end
return c -- Return the number of actually newly added elements
The script first checks configuration consistency to prevent errors due to concurrent modifications.
While iterating through indexes, every ARGV[3] indexes correspond to one element. k accumulates the number of bits that were 0 for the current element. If k>0 at the end, it means the element is newly inserted (didn't exist before), and c is incremented.
The return value can be used to count the number of truly new elements in a batch add operation.
public RFuture<Long> containsAsync(Collection<T> objects) { CompletionStage<Long> f = CompletableFuture.completedFuture(null); if (size == 0) { f = readConfigAsync().handle((r, e) -> { if (e instanceof IllegalArgumentException) { return0L; // Not initialized, consider all elements non-existent } returnnull; }); }
List<Object> params = newArrayList<>(); params.add(size); params.add(hashIterations); params.add(objects.size()); // Number of elements params.addAll(allIndexes);
"local k = 0;" + "local c = 0;" + "local cc = (#ARGV - 3) / ARGV[3];" + "for i = 4, #ARGV, 1 do " + "local r = redis.call('getbit', KEYS[2], ARGV[i]); " + "if r == 0 then " + "k = k + 1;" + "end; " + "if ((i - 4) + 1) % cc == 0 then " + "if k > 0 then " + "c = c + 1;" + "end; " + "k = 0; " + "end; " + "end; " + "return ARGV[3] - c;", Arrays.asList(configName, getRawName()), params.toArray()); }); returnnewCompletableWrapper<>(f); }
Similarly, it first ensures the configuration is loaded. If uninitialized (readConfigAsync throws an exception), it returns 0 (all elements are considered non-existent).
Calculates all indexes and passes them to the Lua script.
The script returns: the number of elements that are found to exist (i.e., elements for which all corresponding bits are 1).
-- KEYS[1] : configName (configuration hash key) -- KEYS[2] : Bit data key -- ARGV[1] : size -- ARGV[2] : hashIterations -- ARGV[3] : Number of elements to check -- ARGV[4] ... ARGV[N] : All index values, arranged consecutively per element
-- Read actual values from config; if they don't match passed values, return 0 -- (consider all elements non-existent to prevent false positives) local size = redis.call('hget', KEYS[1], 'size') local hashIterations = redis.call('hget', KEYS[1], 'hashIterations') if size ~= ARGV[1] or hashIterations ~= ARGV[2] then return0 end
local k = 0-- Number of 0 bits for the current element local c = 0-- Count of non-existent elements (elements with at least one 0 bit) -- Calculate the number of indexes per element (total indexes divided by element count) local cc = (#ARGV - 3) / ARGV[3] -- Should equal hashIterations
for i = 4, #ARGV, 1do local r = redis.call('getbit', KEYS[2], ARGV[i]) if r == 0then k = k + 1-- Record that this bit is 0 end -- After processing all indexes for one element if ((i - 4) + 1) % cc == 0then if k > 0then c = c + 1-- This element has at least one 0 bit, so it definitely does not exist end k = 0-- Reset end end
-- Total elements minus the count of definitely non-existent elements gives -- the number of possibly existing elements return ARGV[3] - c
The script first checks if the configuration matches, avoiding errors if the filter was rebuilt.
It then iterates through the indexes, using getbit to check each bit. If any index bit for an element is 0, that element definitely does not exist, and counter c is incremented.
Finally, it returns ARGV[3] - c, the number of possibly existing elements (note: Bloom filters have false positives, so this is the count of elements that might exist).
Unlike the add script, this one uses a simple return 0 on config mismatch instead of assert, allowing graceful degradation (treating all elements as non-existent to avoid false reports) when the configuration changes.
Summary
Redisson's Bloom filter implementation cleverly places computational logic (hashing, index generation) in the Java client while encapsulating atomic operations (batch bit checking/setting) within Lua scripts. This approach ensures both performance and avoids concurrency issues. By storing metadata independently in configName, it achieves separation of configuration and data and allows for runtime consistency verification. The comments within the Lua scripts above detail the underlying bit manipulation principles for Bloom filter addition and querying.