Bloom is a REST API caching middleware, acting as a reverse proxy between your load balancers and your REST API workers.
It is completely agnostic of your API implementation, and requires minimal changes to your existing API code to work.
Bloom relies on redis
, configured as a cache to store cached data. It is built in Rust and focuses on performance and low resource usage.
Important: Bloom works great if your API implements REST conventions. Your API needs to use HTTP read methods, namely GET
, HEAD
, OPTIONS
solely as read methods (do not use HTTP GET parameters as a way to update data).
:newspaper: The Bloom project was initially announced in a post on my personal journal.
![]() |
Crisp |
👋 You use Bloom and you want to be listed there? Contact me.
Bloom-Request-Shard
(eg. Main API uses shard 0
, Search API uses shard 1
)Bloom-Response-Buckets
.Authorization
HTTP header.Bloom-Request-*
HTTP headers in the requests your Load Balancers forward to Bloom.
Bloom-Request-Shard
(default shard is 0
, maximum value is 15
).Bloom-Response-*
HTTP headers in your API responses to Bloom.
Bloom-Response-Ignore
(with value 1
).Bloom-Response-Buckets
(comma-separated if multiple buckets).Bloom-Response-TTL
(other than default TTL, number in seconds).304 Not Modified
to non-modified route contents, lowering bandwidth usage and speeding up requests to your users.Bloom can be hot-plugged to sit between your existing Load Balancers (eg. NGINX), and your API workers (eg. NodeJS). It has been initially built to reduce the workload and drastically reduce CPU usage in case of API traffic spike, or DOS / DDoS attacks.
A simpler caching approach could have been to enable caching at the Load Balancer level for HTTP read methods (GET
, HEAD
, OPTIONS
). Although simple as a solution, it would not work with a REST API. REST API serve dynamic content by nature, that rely heavily on Authorization headers. Also, any cache needs to be purged at some point, if the content in cache becomes stale due to data updates in some database.
NGINX Lua scripts could do that job just fine, you say! Well, I firmly believe Load Balancers should be simple, and be based on configuration only, without scripting. As Load Balancers are the entry point to all your HTTP / WebSocket services, you'd want to avoid frequent deployments and custom code there, and handoff that caching complexity to a dedicated middleware component.
Bloom is installed on the same box as each of your API workers. As seen from your Load Balancers, there is a Bloom instance per API worker. This way, your Load Balancing setup (eg. Round-Robin with health checks) is not broken. Each Bloom instance can be set to be visible from its own LAN IP your Load Balancers can point to, and then those Bloom instances can point to your API worker listeners on the local loopback.
Bloom acts as a Reverse Proxy of its own, and caches read HTTP methods (GET
, HEAD
, OPTIONS
), while directly proxying HTTP write methods (POST
, PATCH
, PUT
and others). All Bloom instances share the same cache storage on a common redis
instance available on the LAN.
Bloom is built in Rust for memory safety, code elegance and especially performance. Bloom can be compiled to native code for your server architecture.
Bloom has minimal static configuration, and relies on HTTP response headers served by your API workers to configure caching on a per-response basis. Those HTTP headers are intercepted by Bloom and not served to your Load Balancer responses. Those headers are formatted as Bloom-Response-*
. Upon serving response to your Load Balancers, Bloom sets a cache status header, namely Bloom-Status
which can be seen publicly in HTTP responses (either with value HIT
, MISS
or DIRECT
— it helps debug your cache configuration).
Bloom is built in Rust. To install it, either download a version from the Bloom releases page, use cargo install
or pull the source code from master
.
Install from sources:
If you pulled the source code from Git, you can build it using cargo
:
bash
cargo build --release
You can find the built binaries in the ./target/release
directory.
Install from Cargo:
You can install Bloom directly with cargo install
:
bash
cargo install bloom-server
Ensure that your $PATH
is properly configured to source the Crates binaries, and then run Bloom using the bloom
command.
Install from packages:
Debian & Ubuntu packages are also available. Refer to the How to install it on Debian & Ubuntu? section.
Use the sample config.cfg configuration file and adjust it to your own environment.
Make sure to properly configure the [proxy]
section so that Bloom points to your API worker host and port.
Available configuration options are commented below, with allowed values:
[server]
log_level
(type: string, allowed: debug
, info
, warn
, error
, default: warn
) — Verbosity of logging, set it to error
in productioninet
(type: string, allowed: IPv4 / IPv6 + port, default: [::1]:8080
) — Host and TCP port the Bloom proxy should listen on[control]
inet
(type: string, allowed: IPv4 / IPv6 + port, default: [::1]:8811
) — Host and TCP port Bloom Control should listen ontcp_timeout
(type: integer, allowed: seconds, default: 300
) — Timeout of idle/dead client connections to Bloom Control[proxy]
[[proxy.shard]]
shard
(type: integer, allowed: 0
to 15
, default: 0
) — Shard index (routed using Bloom-Request-Shard
in requests to Bloom)inet
(type: string, allowed: IPv4 / IPv6 + port, default: 127.0.0.1:3000
) — Target host and TCP port to proxy to for this shard (ie. where the API listens)[cache]
ttl_default
(type: integer, allowed: seconds, default: 600
) — Default cache TTL in seconds, when no Bloom-Response-TTL
providedexecutor_pool
(type: integer, allowed: 0
to (2^16)-1
, default: 16
) — Cache executor pool size (how many cache requests can execute at the same time)disable_read
(type: boolean, allowed: true
, false
, default: false
) — Whether to disable cache reads (useful for testing)disable_write
(type: boolean, allowed: true
, false
, default: false
) — Whether to disable cache writes (useful for testing)[redis]
inet
(type: string, allowed: IPv4 / IPv6 + port, default: 127.0.0.1:6379
) — Target Redis host and TCP portpassword
(type: string, allowed: password values, default: none) — Redis password (if no password, dont set this key)database
(type: integer, allowed: 0
to 255
, default: 0
) — Target Redis databasepool_size
(type: integer, allowed: 0
to (2^32)-1
, default: 80
) — Redis connection pool size (should be a bit higher than cache.executor_pool
, as it is used by both Bloom proxy and Bloom Control)idle_timeout_seconds
(type: integer, allowed: seconds, default: 600
) — Timeout of idle/dead pool connections to Redisconnection_timeout_seconds
(type: integer, allowed: seconds, default: 1
) — Timeout in seconds to consider Redis dead and emit a DIRECT
connection to API without using cache (keep this low, as when Redis is down it dictates how much time to wait before ignoring Redis response and proxying directly)max_key_size
(type: integer, allowed: bytes, default: 256000
) — Maximum data size in bytes to store in Redis for a key (safeguard to prevent very large responses to be cached)max_key_expiration
(type: integer, allowed: seconds, default: 2592000
) — Maximum TTL for a key cached in Redis (prevents erroneous Bloom-Response-TTL
values)Bloom can be run as such:
./bloom -c /path/to/config.cfg
Important: make sure to spin up a Bloom instance for each API worker running on your infrastructure. Bloom does not manage the Load Balancing logic itself, so you should have a Bloom instance per API worker instance and still rely on eg. NGINX for Load Balancing.
Once Bloom is running and points to your API, you can configure your Load Balancers to point to Bloom IP and port (instead of your API IP and port as previously).
Bloom requires the Bloom-Request-Shard
HTTP header to be set by your Load Balancer upon proxying a client request to Bloom. This header tells Bloom which cache shard to use for storing data (this way, you can have a single Bloom instance for different API sub-systems listening on the same server).
On NGINX, you may add the following rule to your existing proxy ruleset:
```
proxy_pass http://(...)
proxysetheader Bloom-Request-Shard 0; ```
Notice: a shard number is an integer from 0 to 15 (8-bit unsigned number, capped to 16 shards).
Bloom provides pre-built packages for Debian-based systems (Debian, Ubuntu, etc.).
Important: Bloom only provides Debian 8 32 bits packages for now (Debian Jessie). You should still be able to use them on other Debian versions, as well as Ubuntu.
1️⃣ Add the Bloom APT repository:
bash
echo "deb https://packagecloud.io/valeriansaliou/bloom/debian/ jessie main" > /etc/apt/sources.list.d/valeriansaliou_bloom.list
curl -L https://packagecloud.io/valeriansaliou/bloom/gpgkey 2> /dev/null | apt-key add - &>/dev/null
apt-get update
2️⃣ Install the Bloom package:
bash
apt-get install bloom
3️⃣ Edit the pre-filled Bloom configuration file:
bash
nano /etc/bloom.cfg
4️⃣ Restart Bloom:
service bloom restart
Bloom is built in Rust, which can be compiled to native code for your architecture. Rust, unlike eg. Golang, doesn't carry a GC (Garbage Collector), which is usually a bad thing for high-throughput / high-load production systems (as a GC halts all program instruction execution for an amount of time that depends on how many references are kept in memory).
Note that some compromises have been made relative to how Bloom manages memory. Heap-allocated objects are heavily used for the sake of simplicify. ie. responses from your API workers are fully buffered in memory before they are served to the client; which has the benefit of draining data from your API workers as fast as your loopback / LAN goes, even if the requester client has a very slow bandwidth.
Authenticated routes are usually used by REST API to return data that's private to the requester user. Bloom being a cache system, it is critical that no cache leak from an authenticated route occur. Bloom solves the issue easily by isolating cache in namespaces for requests that send an HTTP Authorization
header. This is the default, secure behavior.
If a route is being requested without HTTP Authorization
header (ie. the request is anonymous / public), whatever the HTTP response code, that response will be cached by Bloom.
As your HTTP Authorization
header contains sensitive authentication data (ie. username and password), Bloom stores those values hashed in redis
(using a cryptographic hash function). That way, a redis
database leak on your side will not allow an attacker to recover authentication key pairs.
Yes. As your existing API workers perform the database updates on their end, they are already well aware of when data - that might be cached by Bloom - gets stale. Therefore, Bloom provides an efficient way to tell it to expire cache for a given bucket. This system is called Bloom Control.
Bloom can be configured to listen on a TCP socket to expose a cache control interface. The default TCP port is 8811. Bloom implements a basic Command-ACK protocol.
This way, your API worker (or any other worker in your infrastructure) can either tell Bloom to:
Authorization
headers, bucket cache for all authentication tokens is purged at the same time when you purge cache for a bucket.Authorization
header. Useful if an user logs-out and revokes their authentication token.➡️ Available commands:
FLUSHB <namespace>
: flush cache for given bucket namespaceFLUSHA <authorization>
: flush cache for given authorizationSHARD <shard>
: select shard to use for connectionPING
: ping serverQUIT
: stop connection⬇️ Control flow example:
bash
telnet bloom.local 8811
Trying ::1...
Connected to bloom.local.
Escape character is '^]'.
CONNECTED <bloom v1.0.0>
HASHREQ hxHw4AXWSS
HASHRES 753a5309
STARTED
SHARD 1
OK
FLUSHB 2eb6c00c
OK
FLUSHA b44c6f8e
OK
PING
PONG
QUIT
ENDED quit
Connection closed by foreign host.
Notice: before any command can be issued, Bloom requires the client to validate its hasher function against the Bloom internal hasher (done with the HASHREQ
and HASHRES
exchange). FarmHash is used to hash keys, using the FarmHash.fingerprint32(), which computed results may vary between architectures. This way, most weird Bloom Control issues are prevented in advance.
📦 Bloom Control Libraries:
👉 Cannot find the library for your programming language? Build your own and be referenced here! (contact me)