首页   快速返回

leveldb简介     所属分类 leveldb


LevelDB is an open source key-value store that originated from Google’s BigTable. 
It is an implementation of LSM-tree, and it has received increased attention in both industry and academia. 
Figure  illustrates the architecture of LevelDB, which consists of two MemTables in main memory 
and a set of SSTables in the disk and other auxiliary files, 
such as the Manifest file which stores the metadata of SSTables.

LSM  Log Structured Merge Tree 
MemTable + ImmutableMemTable + SSTables + log(WAL) + metadata-files



When the user inserts a key-value pair into LevelDB, it will be first saved in a log file. 
Then it is inserted into a sorted structure in memory, called MemTable, 
which holds the most recent updates. 
When the size of incoming data items reaches its full capacity, 
the MemTable will be transformed into a read-only Immutable MemTable. 
A new MemTable will be created to accumulate fresh updates.

写日志 写内存(保存最近的更新) 
MemTable 达到指定容量 , 转成 只读 MemTable ,生成一个新的 MemTable 

 
At the same time, a background thread begins to dump the Immutable MemTable into the disk 
and generate a new Sorted String Table file (SSTable).

后台线程 dump 只读 MemTable 到硬盘 生成 SSTable 

 
Deletes are a special case of update wherein a deletion marker is stored.

删除当做特殊的写入,带删除标记

An SSTable stores a sequence of data items sorted by their keys. 
The set of SSTables are organized into a series of levels, as shown in Figure . 
The youngest level, Level 0,is produced by writing the Immutable MemTable from main memory to the disk. 

SSTable 按key顺序存储   多个 SSTables 分层组织 
最年轻的层 level0  , 由 只读 memtable dump 形成

Thus SSTables in Level 0 could contain overlapping keys. 
However, in other levels the key range of SSTables are non-overlapping. 

Level 0 的 SSTables 包含重复的 key  重叠 , 一个key会出现在多个 sstable 中 
其他层的 SSTables  不会重叠 

Each level has a limit on the maximum number of SSTables, or equivalently, 
on the total amount of data because each SSTable has a fixed size in a level. 
The limit grows at an exponential rate with the level number. 
For example, the maximum amount of data in Level1 will not exceed 10 MB, 
and it will not exceed 100 MB for Level 2.

每层都有数量限制 或者 大小限制 , 限制按指数增长
例如,Level1中的最大数据量不超过10mb,level2不超过100mb


In order to keep the stored data in an optimized layout,a compaction process will be conducted. 
The background compaction thread will monitor the SSTable files. 
When the total size of Level L exceeds its limit, 
the compaction thread will pick one SSTable from Level L and all overlapping ones from the next Level L+1. 
These files are used as inputs to the compaction and are merged together to produce a series of new Level L+1 files. 
When the output file has reached the predefined size (2 MB by default), another new SSTable is created. 
All inputs will be discarded after the compaction. 

后台压缩  ,大小超出 限制 ,启动压缩
压缩合并 
挑选 重叠的 SSTable 进行压缩合并

Note that the compaction from Level 0 to Level1 is treated differently than those between other levels. 
When the number of SSTables in Level 0 exceeds an upper limit(4 by default), the compaction is triggered. 
The compactionmay involve more than one Level 0 file in case some of them overlap with each other.

Level 0 合并 特殊处理 ,SSTables 超过4个 触发合并


By conducting compaction, LevelDB eliminates overwritten values and drops deleted markers. 
The compaction operation also ensures that the freshest data reside in the lowest level. 
The stale data will gradually move to the higher levels.

压缩 删除 覆盖写的值 和 删除的值 ,保证 新鲜的数据留在 更低的level
陈旧的数据逐渐移动到更高的level。


The data retrieving, or read operation, is more complicated than the insertion. 
When LevelDB receives a Get(Key,Value) request, it will first do a look up in the MemTable,
then in Immutable MemTable, and finally search the SSTables from Level 0 to higher levels in the order until a matched KV data item is found. 
Once LevelDB finds the key in a certain level, it will stop its search. 

读比写复杂
查找顺序  
MemTable  > Immutable MemTable > level0  > level1 ... > levelN 
找到之后终止


As we mentioned before,lower levels contain fresher data items. 

低层级包含新鲜的数据

The new data will be searched earlier than old data. 
Similar to compaction, more than one Level 0 file could be searched because of their data overlapping. 
A Bloom filter  is usually adopted to reduce the I/O cost for reading data blocks that do not contain requested KV items.

布隆过滤器  确认不存在的key  避免 查找不存在的 key

上一篇     下一篇
JMX之ObjectName

tomcat之JMXProxyServlet

javap查看字节码

leveldb要点

一致性算法raft简介

aerospike网络及常用端口