At its core, a storage engine is the component of an application that leverages some particular data structure, typically represented as a hash table and often referred to as a key value storage engine. key value stores are often called 'unstructured' databases unlike their older brothers the 'structured' databases, which are relational in nature. The unstructured database is really designed for more large random data sets which are typically required by the hyperscale applications we use most.

The storage engine is responsible for physical storage, ingestion, retrieval, and modification of data. It's responsible for managing the low-level details and how data is stored efficiently in application memory *and* on disk, and finally how that data is accessed by applications from both components. In other words, it's the magic behind the scenes that allows your application to ‘do the work’ with the actual data.

CRUD

But there's more to it. The storage engine manages local application memory buffers and structures, and further aligns that internal data structure with input/output (I/O) operations for creating, reading, updating, and deleting (CRUD) data on physical disk and external disk arrays. When done right, this creates dramatic efficiency gains for the application by eliminating read, write and space amplification from compactions on an LSM-Tree, or costly inserts on writes for a B-Tree.

The LSM-Tree is natively optimized for heavy ingestion, and the B-Tree is natively optimized for heavy reads, and thorough innovative modifications, both can be optimized for various workloads.

In general, considering I/O ingestion is most challenging and costly from a physical perspective, LSM-Tree has been most widely adopted, and also has the most theoretical potential to optimize reads through innovation, therefore an ideal choice for modern, large scale workloads. The LSM-Tree aggregates bytes in memory and ensures that they are written sequentially to fill complete disk blocks, meaning, there is no such thing as 'random writes' in the LSM-Tree. All writes become sequential and saturate the ingestion pipeline to the disks.

What does Embedded Mean?

One important concept in reducing any overhead in storage engines is the idea of it typically being 'embedded' on the application server. An embedded storage engine is built into the application itself, rather than running as a separate server, container, process or service. This is critical for the highest-performing application requirements, because it allows for fastest theoretical (and practical) access to data, and reduces the overhead associated with communicating with external servers, storage and networks.

And the best part.. Enabling an embedded storage engine is simply a matter of linking (or adding) a library to your application, and requires no other installation or considerations, taking only about 30 seconds to install. This makes it extremely simple to swap in or out your existing storage engine of choice to match your particular application's workload characteristics.

Real-World Examples

Three popular examples of persistent embedded hyperscaler grade storage engines are LSM based LevelDB, RocksDB and Speedb, which were developed by Google, Meta (formerly Facebook) and Speedb, respectively. These are all three OSS, and are used as the storage engine in thousands of enterprise grade applications and services behind some of the most massive modern applications, including many well known applications like Google Chrome, Meta Messenger, WhatsApp, Netflix and many more.

You might be shocked to learn that LevelDB, RocksDB and Speedb power most of the worlds highest performing applications with their persistent embedded storage engines. Some well known you might use everyday:

  • Google (LevelDB)
  • Meta (RocksDB)
  • Netflix (RocksDB)
  • LinkedIn (RocksDB)
  • WhatsApp (RocksDB)
  • Pinterest (RocksDB)
  • Redis-On-Flash (Speedb)
  • Airbnb (RocksDB)
  • Twitter (RocksDB)
  • Dropbox (RocksDB)
  • XMCyber (Speedb)
  • Yahoo (LevelDB)
  • Baidu (RocksDB)
  • MinIO (Speedb)
  • and many more.. (all references available online)

This category of embedded storage engines offer the highest possible operational efficiency, enabling massive scale and stability under pressure, which is critical to the success of these companies and our experiences with their products. Hash Table based (LSM-Tree) storage engines are the only type of storage engine that allow applications to run fully random workloads in 'constant time', which means no matter how large each operation that writes or reads data, the algorithm being executed will always complete in a single operation request to physical media (SSD/Flash). This ability scores Hash Table as perfect on the Big O efficiency benchmark.

Why LSM (Log-Structured-Merge Tree)?

So why is a Log-Structured-Merge (LSM) tree most ideal for large ingest (writes) of data? An LSM tree is a specific data structure used by most hyperscaler storage engines, designed to handle large volumes of data in a highly efficient way. It does this by storing data in two or more different structures: a memory buffer and a set of on-disk sorted files (SSTs). When the memory buffer is full, it's flushed sequentially and efficiently to disk and ‘merged’ with the existing sorted files, known as ‘levels’, which are ordered based on various algorithms.

This approach is the most efficient for massive ingest of data because it allows new data to be quickly written from sorted arrays in memory, perfectly aligning aggregated bytes (in memory) to fill complete blocks (on disk), while still allowing for efficient retrieval of new data from memory, and old data from disk.

For example, embedded storage engines are really most useful when your application doesn’t fit in memory, needs to be written to disk for persistence, and yet continues to be in-sync with your application memory.

If your application data fits completely in local memory, you might then use something like Redis, which is highly optimized to run *only* in memory but doesn't offer persistence to physical disk.

With Redis, when your application data size exceeds your local memory, they'll typically recommend Redis-On-Flash, which (surprise!) offloads writes/reads to disk, today using Speedb for maximum performance and utilization optimization. Speedb is replacing RocksDB which was the previous option.

External Storage

But what about external data storage from commercial storage vendors like EMC and Netapp?

While these technologies play an important role in data storage, they're not the same as a storage engine, they *only* operate at the level outside of application memory, and attempt (often successfully, sometimes not) to alleviate I/O inefficiencies in the application data structures, by instead providing a very large external shared cache buffer, and themselves handle writing data to disk with striping and/or various RAID options. These are excellent options if you're not prepared to optimize your application storage engine, and are often a reasonable solution that helps hide application inefficiencies at the storage engine layer.

External storage solutions provide a physical place to store data over a network, but they don't provide the same low-level application memory management of that data in the application itself that a storage engine does. So the ideal scenario would always be optimizing your storage engine to reduce inefficiency, and that will in turn allow your external storage to perform even better as this will eliminate a lot of I/O overhead and read/write/space amplification, thus allowing for peak performance and efficiency throughout the entire computing stack.

We can help!

Here at Speedb.io we’re obsessed with driving inefficiency out of your storage engine. Connect with me or ping me directly on LinkedIn for any questions or details on how we can help optimize the success of any of your computing projects with our OSS or Enterprise version of Speedb or tips on optimizing your RocksDB or LevelDB engines. (email me: bam@speedb.io)

And of course star us on GitHub if you love what we're doing! https://lnkd.in/dgc78wsM

Related content: