IOWOW

The persistent key/value storage engine based on Skiplist data structure.

Key features

Ultra-fast storage engine
Tiny embeddable C11 Library
Online backups support
Single data file for any number of key/value sub-databases
MIT licensed

Motivation and history

IOWOW was originally developed for use in our embeddable JSON database engine EJDB2 Previous version of EJDB 1.x based on TokyoCabinet (TC) engine and gained some popularity. Due to LGPL licensing, technical limitations and lack of support by TC developers. We did complete redesign of engine and implemented it on top of our home grown key value database — https://iowow.io

What we are looking for?

We need a free and lightweight open sourced database engine which manages all data in a single file.

Why single data file is good for embeddable database?
- It’s convenient for distribute data file as single asset between users and devices
- Lesser chance to get data inconsistency on file system as result of operation on set of database files
- Easier to develop data checking, migration and recovery tools

We need a database engine able to run on wide range of operating systems and hardware architectures. (x86/64, arm, mips).

Linux
macOS
FreeBSD
Windows
OpenWrt
Android
iOS

We need engine with liberal open source license. What does it mean?

Liberal license allows free use of software for practically any purpose
License may contains obligations concerning copyright notice and references to names of authors but no more

Any copyleft licenses are not liberal from our point of view. Any licenses with crazy obligations to source code (like this), licenses having restrictions on ways of source code distribution and so on are not liberal.

Requirements summary

Be open sourced and completely free for any use
Be fast and lightweight — no more than few hundreds kilobytes of linked code
Compatible with many ABIs and language bindings
Database data is located in single file
Support of data durability and fault recovering
Able to run on resources limited systems, mobile devices

Trying avoid reinvention of a wheel

SQLite?

Single file
Liberal license
Implemented in C language
Estimate library size overhead

curl https://www.sqlite.org/snapshot/sqlite-snapshot-202002271621.tar.gz | tar -xzv
cd ./sqlite-snapshot-202002271621 &&
./configure --disable-fts4 --disable-fts5 --disable-json1 --disable-rtree
make && strip -d ./.libs/libsqlite3.a ./.libs/libsqlite3.so.0.8.6
ls -hs ./.libs/
  1.1M libsqlite3.a
  868K libsqlite3.so.0.8.6

Support of SQL and transactions isolation costs price. According to IOArena benchmarks simple CRUD performance may be ~5-10x slower comparing to other key value stores.

Trying avoid reinvention of a wheel

Berkeley DB?

Good performance according to our micro-benchmark
Not liberal AGPL license also licensed as commercial Oracle product
Single file
Implemented in C language
Interesting thread about good and weak points of BDB

Anything else?

WiredTiger
- Good performance especially on large data sets
- Not liberal GPL license also licensed as commercial Mongodb product
- Server side oriented
- Big binary artifact size ~3M

LMDB
- Good performance in benchmarks
- Good OpenLDAP license
- Single file
- Readers and writer transactions are isolated
- The database file will never shrink
- Not able to keep arbitrary number of sub-databases
- You have to preallocate database to the maximum estimated data size on Windows platform

LevelDB
- Not good performance of insert/update operations
- Liberal BSD 3-Clause license
- Small binary artifacts size ~400K
- Known to be unreliable and corruption prone
RocksDB
- LevelDB successor developed by Facebook
- Not liberal GPL license
- Good performance especially on large data sets
- Huge binary artifacts size ~20M
- Works on Android

Out of review

There are number of remarkable database engines skipped in this presentation.

Realm
UnQLite
SophiaDB — predecessor of database backend for Mail.ru Tarantool engine

With IOArena you can compare of IOWOW against SQLite3, SophiaDB, UnQLite, RocksDB, WiredTiger, LMDB, LevelDB.

IOWOW architecture

Why we have developed IOWOW persistence layer based on Skiplist data structure?

Due to planar (non hierarchial nature) of skiplist an actual implementation may be much more simpler than B+ tree and LSM trees. Take a look on BerkeleyDB/tree/master/src/btree It ~600Kb in total which ~5x times bigger than whole IOWOW/src/kv database implementation
Skiplist is very effective for sequential scans, it is very important feature since high ratio of database queries don’t use indexes
Create a persistent skiplist implementation competitive to B+/LSM trees is very interesting and challenging task. Looking back I can say that I’m succeeded with that

Skiplist drawbacks

Not so CPU/data cache friendly comparing to B+ trees
Bigger storage overhead than B+ and LSM trees
So why the size of iowow database file limited to ~512Gb, It is the result of tradeoff between data overhead and max database size

Benchmarks

It is quite hard to provide fair database benchmarks since benchmarks are highly dependent on hardware configuration, database parameters, compiler, extra libraries, build flags, workload and so on. For example one engine may be efficient on small keys while on large keys perform poorly. IOWOW benchmarked against two suites:

Next slides show single threaded micro-benchmarks

IOWOW (iwkv)
LMDB
BerkeleyDB (bdb)
KyotoCabinet (kyc)
WiredTiger
TokyoCabinet (tc)

Benchmarks

LevelDB was excluded from the latest benchmarks since it shows very poor update/insert performance comparing to competitors

Benchmarks environment

CPU 8x Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
RAM 16320MB
SSD Micron C400 RealSSD 256GB
Filesystem ext4
OS Ubuntu 19.10

Single threaded micro-benchmark sources

Full benchmarks report on single page

Storage efficiency

Insert 1M records

16 byte keys
Random values of [0, 1000] bytes length

16 byte keys
Random values of [0..40] bytes length
- filrandom2 - insert 2M random distributed records
- readrandom - read 2M records in random order
- deleterandom - delete 2M random keys

1024 byte keys
Random values of [0..40] bytes length
- filrandom2 - insert 2M random distributed records
- readrandom - read 2M records in random order
- deleterandom - delete 2M random keys

LMDB is out of scope since keys greater than 511 bytes are not supported by default

16 byte keys
Random values of [0..400] bytes length
- filrandom2 - insert 2M random distributed records
- readrandom - read 2M records in random order
- deleterandom - delete 2M random keys

1024 byte keys
Random values of [0..400] bytes length
- filrandom2 - insert 2M random distributed records
- readrandom - read 2M records in random order
- deleterandom - delete 2M random keys

16 byte keys
400 byte values
- filseq - insert 2M records in ascending order
- overwrite - overwrite 2M values for random keys
- deleterandom - delete 2M random keys

1024 byte keys
400 byte values
- filseq - insert 2M records in ascending order
- overwrite - overwrite 2M values for random keys
- deleterandom - delete 2M random keys

16 byte keys
Random values of [0..400] bytes length
- filrandom2 - insert 10M random distributed records
- readrandom - read 10M records in random order
- readseq - read 10M records sequentially
- readreverse - read 10M records sequentially in reverse order

16 byte keys
200Kb values
- filrandom2 - insert 10K random distributed records

IOArena benchmarks

IOArena is an utility designed for evaluating performance of embedded databases. This tool able to run multi-threaded benchmarks and measure not only total time of test performed but throughput and latency.

https://github.com/pmwkaa/ioarena

git clone https://github.com/pmwkaa/ioarena
cd ioarena && mkdir build && cd build
cmake ..  -DENABLE_IOWOW=ON -DENABLE_WIREDTIGER=ON -DENABLE_ROCKSDB=ON \
          -DENABLE_LMDB=ON -DENABLE_LEVELDB=ON -DENABLE_UNQLITE=ON \
          -DENABLE_SOPHIA=ON -DENABLE_SQLITE3=ON
make
src/ioarena -h

IOArena benchmarks

Very good throughput performance (270 Kops/s) especially for small keys: iowow faster than rocksdb, wirediger and lmdb

# Fill db with 4M keys then query it
./ioarena -n 4000000 -k 8 -v 16 -B set,get -D iowow

iowow vs rocksdb:

IOWOW Internals

File structure

IOWOW weak points

Limited scalability in multithreaded environment: RW locks used, single file access
Maximum database size is limited to 512Gb
IOWOW is not intended for big-data
Isolated transactions are not supported

IOWOW strong points

Tiny library size ~220K
Support of online backups
Support of multiple sub-databases in one file
Competitive performance

Room for improvements

Use data compression - skiplist nature allows simple ways to do that
Work on performance of data update operations
Create more effective caching of data
Research ways of concurrent data modification

https://github.com/Softmotions/iowow