Kafka Streams is a powerful event stream processing framework that allows developers to easily build and deploy real-time streaming applications. A key underlying component of the architecture is the embedded storage engine RocksDB, which stores and manages state data. Optimizing RocksDB is often overlooked, but may hold the key to solving or improving a Kafka Streams implementation.
RocksDB is an open-source, embedded, persistent key-value store developed by Facebook that is optimized for fast storage and retrieval of data on disk. It is designed to handle large data sets and provides high throughput with low latency, working especially well in applications that require high-speed data access, including stream processing frameworks like Kafka Streams. Tuning your Kafka Streams & RocksDB configuration is essential to optimizing your application performance.
Kafka Streams Performance Issues
If you’ve been using RocksDB in your Kafka Streams deployment, you’ve probably encountered at least one of the issues below:
- Resource Utilization: RocksDB is an in-memory storage engine and while this helps to make it extremely fast, it can also be both CPU and memory-intensive, requiring significant amounts of RAM and CPU to perform well. This can lead to issues with resource utilization, making it difficult to manage and scale.
- Tuning: RocksDB requires careful tuning to avoid degraded performance or even, in some cases, data corruption. RocksDB parameters are intertwined in many cases often making configuration a long and tedious task.
- Cold Starts: RocksDB can have relatively slow cold start times, which can cause delays in application startup or recovery from crashes.
- Compaction: RocksDB's compaction process can be resource-intensive and impact application performance, especially during peak loads. Serious compaction side effects could delay the application response time or even cause it to crash.
- I/O Operations: RocksDB can generate significant I/O operations, which can be a bottleneck in environments with limited I/O resources and naturally cause disk wearout on flash drives.
- Space Management: Due to the nature of in-memory key value stores, RocksDB requires ongoing space management, which is a challenge for applications with unpredictable data growth patterns. Balancing between data writes and space amplification is an ongoing configuration challenge.
- Application Design: RocksDB's characteristics can impact the design of Kafka Streams applications, requiring careful consideration of factors such as data modeling, data partitioning, and performance tuning.
Replacing RocksDB with Speedb
Okay, that’s a long list that might seem overwhelming as you plan out your Kafka Steams optimization. Fortunately, there is a RocksDB-based OSS project named Speedb, a fully compatible RocksDB drop-in replacement storage engine designed to address the most demanding challenges on that list (and beyond). Similar to other platforms like Redis on Flash that use Speedb as a RocksDB alternative, it can be easily dropped-in to Kafka Streams as a sort of enhanced RocksDB implementation.
Speedb OSS rebases on RocksDB’s latest version with additional features that supercharge application performance and stability and enhance usability and resource utilization. There is also a Speedb enterprise version designed to boost performance at scale for datasets exceeding 50 GB per node with a unique compaction technology and other innovations.
Let's go through these challenges again and see how Speedb’s technology helps tackle some of the issues, including some real-world benchmarks:
- Resource Utilization: RocksDB can be CPU & memory-intensive, leading to issues with increased resource utilization, especially in memory-constrained environments. Speedb is optimized for enhanced resource utilization, and can perform well with much lower CPU & memory requirements. Below is a benchmark done comparing Speedb's improved bloom filter mechanism with Rocksdb, showing 25% reduction in memory consumption using Speedb.
Okay, I realize that reading 6 more in-depth explanations with statistics and charts may put you to sleep so here’s a quick guide:
- Tuning: Speedb is designed to be easy to configure and requires less tuning for optimal performance.
- Cold Starts: Speedb has faster cold start times, helping to reduce application startup times and improve resiliency.
- Compaction: Speedb's compaction process is more powerful and efficient, reducing the impact on application performance.
- I/O Operations: Speedb is optimized for I/O efficiency, reducing the impact on application performance and enabling better scalability.
- Space Management: Speedb handles unpredictable data growth patterns more efficiently, reducing the need for ongoing space management.
- Application Design: Speedb is fully compatible with RocksDB, making it easy to use as a drop-in replacement library without impacting the application design.
In short, Speedb helps with all of these issues. If you’re skeptical and don’t believe me, you can easily swap Speedb in and out of your project, so feel free to try it yourself.
As you’ve seen, the configuration of RocksDB running behind the scenes can have a serious impact on the health of your Kafka Streams implementation. I’ve made a lot of claims about Speedb, and I welcome you to try it for yourself; you can find it here. If you have any questions make sure to engage with our Speedb Hive community to get answers and insights.
Give us a try, there’s nothing to lose besides future problems.