Chat with LLM: Source Code Analysis of Redisson Bloom Filter

Introduction

I first learned about Bloom filters two or three years ago while studying common interview topics, specifically as a solution for cache penetration. Of course, I only knew it as a concept back then and never implemented it in any project. Recently, after reading the Bigtable paper and discovering that it also uses Bloom filters, I realized their practical utility.

When it came to technology selection, I considered Guava, Redis' RedisBloom module, and Redisson. Guava offers a standalone Bloom filter with high performance, but it's not suitable for cluster environments. The RedisBloom module isn't built into Redis and requires separate installation, which I preferred to avoid. Ultimately, I chose the Bloom filter functionality provided by Redisson, a Redis client, and successfully applied it in my own project.

The principle behind Bloom filters is simple. One might assume that Redisson implements Bloom filters using Lua scripts to ensure atomicity. However, I wasn't sure exactly how Redisson did it, so I decided to investigate. This blog post leverages an LLM to read the source code of the RedissonBloomFilter class and analyze its implementation.

Source Code Analysis

Redisson's Bloom filter implementation is based on Redis bitmaps and Lua scripts to guarantee atomicity, combined with the Highway 128-bit hash algorithm. Below is a detailed breakdown of the key fields and the logic of three core methods (tryInitAsync, addAsync, containsAsync) in Java code, along with their corresponding Lua script implementations.


Key Fields

Field Type Description
size long The length of the bit array (number of bits), calculated from the expected number of insertions and the false positive probability.
hashIterations int The number of hash functions mapped per element (i.e., the number of bits set in the bit array).
commandExecutor CommandAsyncExecutor Redisson's command executor, responsible for sending Redis commands.
configName String The Redis key storing the Bloom filter's metadata, actually a hash containing fields like size, hashIterations, expectedInsertions, and falseProbability.
  • The actual element bit data is stored in a Redis string structure (key is getRawName()), using setbit / getbit operations on bits.
  • Configuration information is stored in the configName hash to avoid recalculating hash iterations and bit array size for each operation.

Initializing the Bloom Filter

Java Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public RFuture<Boolean> tryInitAsync(long expectedInsertions, double falseProbability) {
// Parameter validation
if (falseProbability > 1) { throw ...; }
if (falseProbability < 0) { throw ...; }

// Calculate the optimal bit array size and number of hash functions
// based on expected insertions and false positive probability
size = optimalNumOfBits(expectedInsertions, falseProbability);
if (size == 0) { throw ...; }
if (size > getMaxSize()) { throw ...; }
hashIterations = optimalNumOfHashFunctions(expectedInsertions, size);

// Execute Lua script to attempt initializing the configuration
return commandExecutor.evalWriteAsync(configName, StringCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
"if redis.call('exists', KEYS[1]) == 1 then " +
"return 0;" +
"end; " +

"redis.call('hset', KEYS[1], 'size', ARGV[1]);" +
"redis.call('hset', KEYS[1], 'hashIterations', ARGV[2]);" +
"redis.call('hset', KEYS[1], 'expectedInsertions', ARGV[3]);" +
"redis.call('hset', KEYS[1], 'falseProbability', ARGV[4]);" +
"return 1;",
Arrays.asList(configName),
size, hashIterations, expectedInsertions, falseProbability);
}
  • First, it calculates the optimal size and hashIterations (using the same algorithm as Guava's BloomFilter).
  • Then, via a Lua script, it checks in Redis whether configName exists. If it doesn't, it writes the configuration and returns 1; if it already exists, it returns 0.
  • The script ensures atomicity for initialization and the "create only if not exists" semantics.

Lua Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- KEYS[1] : configName, the hash key storing configuration
-- ARGV[1] : size (bit array size)
-- ARGV[2] : hashIterations (number of hash functions)
-- ARGV[3] : expectedInsertions (expected number of insertions)
-- ARGV[4] : falseProbability (expected false positive probability)

-- Check if configuration already exists
if redis.call('exists', KEYS[1]) == 1 then
return 0 -- Already exists, initialization failed
end

-- Write all configuration fields
redis.call('hset', KEYS[1], 'size', ARGV[1])
redis.call('hset', KEYS[1], 'hashIterations', ARGV[2])
redis.call('hset', KEYS[1], 'expectedInsertions', ARGV[3])
redis.call('hset', KEYS[1], 'falseProbability', ARGV[4])

return 1 -- Initialization successful
  • This script executes only when the filter is first created, ensuring the configuration is not overwritten repeatedly.

Adding Elements

Java Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
public RFuture<Long> addAsync(Collection<T> objects) {
// If size is 0 (local config not loaded yet), asynchronously read config first
CompletionStage<Void> future = CompletableFuture.completedFuture(null);
if (size == 0) {
future = readConfigAsync();
}

return future.thenCompose(r -> {
// Calculate hash indexes for all elements
List<Long> allIndexes = index(objects);

// Construct Lua parameters: size, hashIterations, index count per element, all indexes...
List<Object> params = new ArrayList<>();
params.add(size);
params.add(hashIterations);
int s = allIndexes.size() / objects.size(); // Number of indexes per element
params.add(s);
params.addAll(allIndexes);

// Execute Lua script
return commandExecutor.evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_LONG,
"local size = redis.call('hget', KEYS[1], 'size');" +
"local hashIterations = redis.call('hget', KEYS[1], 'hashIterations');" +
"assert(size == ARGV[1] and hashIterations == ARGV[2], 'Bloom filter config has been changed')" +

"local k = 0;" +
"local c = 0;" +
"for i = 4, #ARGV, 1 do " +
"local r = redis.call('setbit', KEYS[2], ARGV[i], 1); " +
"if r == 0 then " +
"k = k + 1;" +
"end; " +
"if ((i - 4) + 1) % ARGV[3] == 0 then " +
"if k > 0 then " +
"c = c + 1;" +
"end; " +
"k = 0; " +
"end; " +
"end; " +
"return c;",
Arrays.asList(configName, getRawName()),
params.toArray());
});
}
  • The index(objects) method calls hash for each element, generating a 128-bit hash value (two longs), and then uses hash(long hash1, long hash2, int iterations, long size) to produce hashIterations index positions.
  • In the Lua script, KEYS[1] is configName, and KEYS[2] is the key storing the bit data.
  • ARGV[1] = size, ARGV[2] = hashIterations, ARGV[3] = number of indexes per element, followed sequentially by all indexes.
  • Script return value: The actual number of newly added elements (i.e., elements for which at least one bit was changed from 0 to 1).

Lua Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- KEYS[1] : configName (configuration hash key)
-- KEYS[2] : Key for the bit data (bitmap)
-- ARGV[1] : size (expected bit array size)
-- ARGV[2] : hashIterations (expected number of hash functions)
-- ARGV[3] : Number of indexes per element (equals hashIterations)
-- ARGV[4] ... ARGV[N] : Index values for all elements, arranged consecutively per element

-- Read actual size and hashIterations from config hash, compare with passed values
local size = redis.call('hget', KEYS[1], 'size')
local hashIterations = redis.call('hget', KEYS[1], 'hashIterations')
assert(size == ARGV[1] and hashIterations == ARGV[2], 'Bloom filter config has been changed')

local k = 0 -- Number of newly set bits for the current element (bits that were 0)
local c = 0 -- Actual number of newly added elements
-- Iterate through all indexes (starting from ARGV[4])
for i = 4, #ARGV, 1 do
-- Set the bit at the index to 1, return the old value (0 or 1)
local r = redis.call('setbit', KEYS[2], ARGV[i], 1)
if r == 0 then
k = k + 1 -- This bit was 0, so it contributes a new bit
end
-- After processing all indexes for one element (specified by ARGV[3])
if ((i - 4) + 1) % ARGV[3] == 0 then
if k > 0 then
c = c + 1 -- This element set at least one new bit, increment count
end
k = 0 -- Reset count for the current element
end
end

return c -- Return the number of actually newly added elements
  • The script first checks configuration consistency to prevent errors due to concurrent modifications.
  • While iterating through indexes, every ARGV[3] indexes correspond to one element. k accumulates the number of bits that were 0 for the current element. If k>0 at the end, it means the element is newly inserted (didn't exist before), and c is incremented.
  • The return value can be used to count the number of truly new elements in a batch add operation.

Checking Element Existence

Java Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public RFuture<Long> containsAsync(Collection<T> objects) {
CompletionStage<Long> f = CompletableFuture.completedFuture(null);
if (size == 0) {
f = readConfigAsync().handle((r, e) -> {
if (e instanceof IllegalArgumentException) {
return 0L; // Not initialized, consider all elements non-existent
}
return null;
});
}

f = f.thenCompose(r -> {
if (r != null) {
return CompletableFuture.completedFuture(r); // Uninitialized, return 0 directly
}
List<Long> allIndexes = index(objects);

List<Object> params = new ArrayList<>();
params.add(size);
params.add(hashIterations);
params.add(objects.size()); // Number of elements
params.addAll(allIndexes);

return commandExecutor.evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_LONG,
"local size = redis.call('hget', KEYS[1], 'size');" +
"local hashIterations = redis.call('hget', KEYS[1], 'hashIterations');" +
"if size ~= ARGV[1] or hashIterations ~= ARGV[2] then " +
"return 0;" +
"end;" +

"local k = 0;" +
"local c = 0;" +
"local cc = (#ARGV - 3) / ARGV[3];" +
"for i = 4, #ARGV, 1 do " +
"local r = redis.call('getbit', KEYS[2], ARGV[i]); " +
"if r == 0 then " +
"k = k + 1;" +
"end; " +
"if ((i - 4) + 1) % cc == 0 then " +
"if k > 0 then " +
"c = c + 1;" +
"end; " +
"k = 0; " +
"end; " +
"end; " +
"return ARGV[3] - c;",
Arrays.asList(configName, getRawName()),
params.toArray());
});
return new CompletableWrapper<>(f);
}
  • Similarly, it first ensures the configuration is loaded. If uninitialized (readConfigAsync throws an exception), it returns 0 (all elements are considered non-existent).
  • Calculates all indexes and passes them to the Lua script.
  • The script returns: the number of elements that are found to exist (i.e., elements for which all corresponding bits are 1).

Lua Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- KEYS[1] : configName (configuration hash key)
-- KEYS[2] : Bit data key
-- ARGV[1] : size
-- ARGV[2] : hashIterations
-- ARGV[3] : Number of elements to check
-- ARGV[4] ... ARGV[N] : All index values, arranged consecutively per element

-- Read actual values from config; if they don't match passed values, return 0
-- (consider all elements non-existent to prevent false positives)
local size = redis.call('hget', KEYS[1], 'size')
local hashIterations = redis.call('hget', KEYS[1], 'hashIterations')
if size ~= ARGV[1] or hashIterations ~= ARGV[2] then
return 0
end

local k = 0 -- Number of 0 bits for the current element
local c = 0 -- Count of non-existent elements (elements with at least one 0 bit)
-- Calculate the number of indexes per element (total indexes divided by element count)
local cc = (#ARGV - 3) / ARGV[3] -- Should equal hashIterations

for i = 4, #ARGV, 1 do
local r = redis.call('getbit', KEYS[2], ARGV[i])
if r == 0 then
k = k + 1 -- Record that this bit is 0
end
-- After processing all indexes for one element
if ((i - 4) + 1) % cc == 0 then
if k > 0 then
c = c + 1 -- This element has at least one 0 bit, so it definitely does not exist
end
k = 0 -- Reset
end
end

-- Total elements minus the count of definitely non-existent elements gives
-- the number of possibly existing elements
return ARGV[3] - c
  • The script first checks if the configuration matches, avoiding errors if the filter was rebuilt.
  • It then iterates through the indexes, using getbit to check each bit. If any index bit for an element is 0, that element definitely does not exist, and counter c is incremented.
  • Finally, it returns ARGV[3] - c, the number of possibly existing elements (note: Bloom filters have false positives, so this is the count of elements that might exist).
  • Unlike the add script, this one uses a simple return 0 on config mismatch instead of assert, allowing graceful degradation (treating all elements as non-existent to avoid false reports) when the configuration changes.

Summary

Redisson's Bloom filter implementation cleverly places computational logic (hashing, index generation) in the Java client while encapsulating atomic operations (batch bit checking/setting) within Lua scripts. This approach ensures both performance and avoids concurrency issues. By storing metadata independently in configName, it achieves separation of configuration and data and allows for runtime consistency verification. The comments within the Lua scripts above detail the underlying bit manipulation principles for Bloom filter addition and querying.

References

  • LLM