Kafka Streams is a powerful event stream processing framework that allows developers to easily build and deploy real-time streaming applications. A key underlying component of the architecture is the embedded storage engine RocksDB, which stores and manages state data. Optimizing RocksDB is often overlooked, but may hold the key to solving or improving a Kafka Streams implementation.

RocksDB is an open-source, embedded, persistent key-value store developed by Facebook that is optimized for fast storage and retrieval of data on disk. It is designed to handle large data sets and provides high throughput with low latency, working especially well in applications that require high-speed data access, including stream processing frameworks like Kafka Streams. Tuning your Kafka Streams & RocksDB configuration is essential to optimizing your application performance.  

Kafka Streams Performance Issues

If you’ve been using RocksDB in your Kafka Streams deployment, you’ve probably encountered at least one of the issues below:

  1. Scalability: RocksDB cannot scale horizontally as it doesn't support distributed transactions, which can limit the throughput of a Kafka Streams application.
  2. Resource Utilization: RocksDB is an in-memory storage engine and while this helps to make it extremely fast, it can also be both CPU and memory-intensive, requiring significant amounts of RAM and CPU to perform well. This can lead to issues with resource utilization, making it difficult to manage and scale.
  3. Tuning: RocksDB requires careful tuning to avoid degraded performance or even, in some cases, data corruption. RocksDB parameters are intertwined in many cases often making configuration a long and tedious task.
  4. Cold Starts: RocksDB can have relatively slow cold start times, which can cause delays in application startup or recovery from crashes. There are of course tactics for improvement, like properly configuring the WAL, but this requires expertise and time.
  5. Compaction: RocksDB's compaction process can be resource-intensive and impact application performance, especially during peak loads. Serious compaction side effects could delay the application response time or even cause it to crash.
  6. Backup and Restore: Backup and restore operations can be time-consuming and resource-intensive with RocksDB, especially for large data sets. Choosing between snapshots or checkpoints often depends on the application workload.
  7. I/O Operations: RocksDB can generate significant I/O operations, which can be a bottleneck in environments with limited I/O resources and naturally cause disk wearout on flash drives.
  8. Space Management: Due to the nature of in-memory key value stores, RocksDB requires ongoing space management, which is a challenge for applications with unpredictable data growth patterns. Balancing between data writes and space amplification is an ongoing configuration challenge.
  9. Application Design: RocksDB's characteristics can impact the design of Kafka Streams applications, requiring careful consideration of factors such as data modeling, data partitioning, and performance tuning.

Replacing RocksDB with Speedb

Okay, that’s a long list that might seem overwhelming as you plan out your Kafka Steams optimization. Fortunately, there is a RocksDB-based OSS project named Speedb, a fully compatible RocksDB drop-in replacement storage engine designed to address the most demanding challenges on that list (and beyond).  Similar to other platforms  like Redis on Flash that use Speedb as a RocksDB alternative, it can be easily dropped-in  to Kafka Streams as a sort of enhanced RocksDB implementation. 

Speedb OSS rebases on RocksDB’s latest version with additional features that supercharge application performance and stability and enhance usability and resource utilization. There is also a Speedb enterprise version designed to boost performance at scale for datasets exceeding 50 GB per node with a unique compaction technology and other innovations. 

Let's go through these challenges again and see how Speedb’s technology helps tackle some of the issues, including some real-world benchmarks:

  1. Scalability: As mentioned before, RocksDB is an embedded library and thus cannot scale horizontally. This means that when using large capacities per node, it is prone to suffer from high write amplification caused by the large compactions occurring on the lower levels of the LSM tree, causing severe performance degradation.
    In Speedb’s OSS, we have a very powerful delayed write mechanism that allows us to control the compactions and reduce the response time spikes in intensive write workloads. For especially demanding installations of over 50GB per node, we tackle the problem of scalability with our Speedb enterprise version, using a new compaction methodology that reduces the write amplification factor from ±30 to ±5. In large capacities (usually above 50GB), we have seen significant improvements in application response times in mixed workloads. Using Speedb enterprise, Kafka Streams can store more data per node without suffering from application performance degradation hence using less nodes in a given Kafka cluster.
  1. Resource Utilization: RocksDB can be CPU & memory-intensive, leading to issues with increased resource utilization, especially in memory-constrained environments. Speedb is optimized for enhanced resource utilization, and can perform well with much lower CPU & memory requirements. Below is a benchmark done comparing Speedb's improved bloom filter mechanism with Rocksdb, showing 25% reduction in memory consumption using Speedb.

Speedb enterprise version provides even better resource utilization reduction, thanks to the x6 reduction of the WAF (24 to 4). Reducing the WAF improves the SSD endurance and CPU utilization. 

Okay, I realize that reading 7 more in-depth explanations with statistics and charts may put you to sleep so here’s a quick guide:

  1. Tuning: Speedb is designed to be easy to configure and requires less tuning for optimal performance.
  2. Cold Starts: Speedb has faster cold start times, helping to reduce application startup times and improve resiliency.
  3. Compaction: Speedb's compaction process is more powerful and efficient, reducing the impact on application performance.
  4. Backup and Restore: Speedb is more efficient for backup and restore operations, reducing the time and resources required.
  5. I/O Operations: Speedb is optimized for I/O efficiency, reducing the impact on application performance and enabling better scalability.
  6. Space Management: Speedb handles unpredictable data growth patterns more efficiently, reducing the need for ongoing space management.
  7. Application Design: Speedb is fully compatible with RocksDB, making it easy to use as a drop-in replacement library without impacting the application design.

In short, Speedb helps with all of these issues.  If you’re skeptical and don’t believe me, you can easily swap Speedb in and out of your project, so feel free to try it yourself.

Moving Forward

As you’ve seen, the configuration of RocksDB running behind the scenes can have a serious impact on the health of your Kafka Streams implementation.  I’ve made a lot of claims about Speedb, and I welcome you to try it for yourself; you can find it here.  If you have any questions make sure to engage with our Speedb Hive community to get answers and insights.  

Give us a try, there’s nothing to lose besides future problems.

Related content: