Bitcask 存储模型

Paper: Log-Structured Hash Table for Fast Key/Value Data

Achieving some of these is easy. Achieving them all is less so.

Bitcask 原来是用于记录Riak distributed database的历史。在Risk的集群中，每个node使用插件式的存储引擎，几乎所有key-value类型的存储引擎都可以作为单个node节点的存储引擎。这种内嵌式的存储引擎可以在不影响其它codebase的情况下提升和测试。

BitCask 的设计目标是：

读写低延迟 low latency per item read or written
高吞吐量，high throughput, especially when writing an incoming stream of random items
处理更大的数据 ability to handle datasets much larger than RAM w/o degradation
崩溃恢复容易并且不会丢失数据: crash friendliness, both in terms of fast recovery and not losing data
容易备份和存储 ease of backup and restore
一个相当简单并且容易理解的数据形式 a relatively simple, understandable (and thus supportable) code structure and data format
predictable behavior under heavy access load or large volume

采用 hash table log merging，有可能比 LSM-trees 更快

一个bitcask实例就是一个目录，在设计上强制在任意时刻，只有一个操作系统进程可以打开bitcask进行写操作，这个进程就可以看作是bitcask服务。在任意时刻，这个目录中只有一个文件是active的，只有这个文件是可以被写入的。当这个active的文件大小达到一个临界值的时候，bitcask就会创建一个新的文件，用来取代当前的active文件。被取代的文件被称为老文件，之后永远都是不可变的，不会再有任何进程往里面写入数据。

当前存活的文件(active file) 只允许追加写(appendinf)，这意味着顺序写并不要求磁盘 disk seek，每一个被写入的 entry 非常简单，