文章详情|H2 MVStore

H2 MVStore 所属分类 H2 浏览量 1678

http://www.h2database.com/html/mvstore.html

The MVStore is a persistent, log structured key-value store.
It is used as default storage subsystem of H2,
but it can also be used directly within an application, without using JDBC or SQL.

MVStore 持久化的 日志结构的 KV 存储
H2的默认存储子系统
可以直接独立使用 ，不依赖 JDBC

MVStore stands for "multi-version store".
Each store contains a number of maps that can be accessed using the java.util.Map interface.
Both file-based persistence and in-memory operation are supported.
It is intended to be fast, simple to use, and small.
Concurrent read and write operations are supported.
Transactions are supported (including concurrent transactions and 2-phase commit).

The tool is very modular.
It supports pluggable data types and serialization,
pluggable storage (to a file, to off-heap memory),
pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage,
and a file system abstraction to support encrypted files and zip files.

MVStore multi-version store 多版本存储
支持并发读写

import org.h2.mvstore.*;

// open the store (in-memory if fileName is null)
MVStore s = MVStore.open(fileName);

// create/get the map named "data"
MVMap<Integer, String> map = s.openMap("data");

// add and read some data
map.put(1, "Hello World");
System.out.println(map.get(1));

// close the store (this will persist changes)
s.close();

Store Builder

MVStore s = new MVStore.Builder().
fileName(fileName).
encryptionKey("007".toCharArray()).
compress().
open();

可选的参数

autoCommitBufferSize: the size of the write buffer.
autoCommitDisabled: to disable auto-commit.
backgroundExceptionHandler: a handler for exceptions that could occur while writing in the background.
cacheSize: the cache size in MB.
compress: compress the data when storing using a fast algorithm (LZF).
compressHigh: compress the data when storing using a slower algorithm (Deflate).
encryptionKey: the key for file encryption.
fileName: the name of the file, for file based stores.
fileStore: the storage implementation to use.
pageSplitSize: the point where pages are split.
readOnly: open the file in read-only mode.

R-Tree
The MVRTreeMap is an R-tree implementation that supports fast spatial queries.

R-tree 一种空间搜索的动态索引结构

Each store contains a set of named maps.
A map is sorted by key, and supports the common lookup operations,
including access to the first and last key, iterate over some or all keys, and so on.

包含多个命名的 map , 每个map 按key 顺序存储 ，支持 查询操作 ，访问第一个key 和 最后一个key
遍历 部分 或 全部 key

A version is a snapshot of all the data of all maps at a given point in time.
COW (copy on write)
Old versions are readable. Rollback to an old version is supported.

Transactions
To support multiple concurrent open transactions, a transaction utility is included, the TransactionStore.
The tool supports "read committed" transaction isolation with savepoints, two-phase commit, and other features typically available in a database.
There is no limit on the size of a transaction (the log is written to disk for large or long running transactions).

“读提交”事务隔离，保存点、两阶段提交
事务的大小没有限制
对于大型或长时间运行的事务，日志被写到磁盘上

internally, this utility stores the old versions of changed entries in a separate map,
similar to a transaction log, except that entries of a closed transaction are removed,
and the log is usually not stored for short transactions.

在一个单独的map中存储已更改条目的旧版本
类似于事务日志，但已关闭事务的条目将被删除，并且日志通常不存储用于短事务

In-Memory Performance and Usage

Performance of in-memory operations is about 50% slower than java.util.TreeMap.

Pluggable Data Types

Serialization is pluggable.
The default serialization currently supports many common data types,
and uses Java serialization for other objects.
The following classes are currently directly supported: Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date and arrays (both primitive arrays and object arrays).
For serialized objects, the size estimate is adjusted using an exponential moving average.

序列化

Parameterized data types are supported (for example one could build a string data type that limits the length).

The storage engine itself does not have any length limits,
so that keys, values, pages, and chunks can be very big (as big as fits in memory).

存储引擎本身没有长度限制

BLOB Support
There is a mechanism that stores large binary objects by splitting them into smaller blocks.

Concurrent Operations and Caching

Concurrent reads and writes are supported.
支持并发读写

file system
page cache

Write operations first read the relevant pages from disk to memory (this can happen concurrently), and only then modify the data.
写操作首先读取相关的pages到内存，然后修改数据

The in-memory parts of write operations are synchronized.

Caching is done on the page level.
The page cache is a concurrent LIRS cache, which should be resistant against scan operations.

页面级缓存 ,LIRS缓存,能够抵抗扫描操作。

For fully scalable concurrent write operations to a map (in-memory and to disk),
the map could be split into multiple maps in different stores ('sharding').
The plan is to add such a mechanism later when needed.

多个 store , 分片 ，

Log Structured Storage

changes are buffered in memory, and once enough changes have accumulated,
they are written in one continuous disk write operation.

By default, changes are automatically written when more than a number of pages are modified,
and once every second in a background thread, even if only little data was changed.
Changes can also be written explicitly by calling commit().

当修改超过一定数量的页面时，自动写入更改
后台线程每秒写一次
调用 commit() 写入

Off-Heap and Pluggable Storage

An off-heap storage implementation is available.
This storage keeps the data in the off-heap memory, meaning outside of the regular garbage collected heap.
Memory is allocated using ByteBuffer.allocateDirect.
One chunk is allocated at a time (each chunk is usually a few MB large), so that allocation cost is low.

堆外内存 分配 ByteBuffer.allocateDirect
一次分配一个chunk （ 大小为 几 MB ）

OffHeapStore offHeap = new OffHeapStore();
MVStore s = new MVStore.Builder().
fileStore(offHeap).open();

File System Abstraction, File Locking and Online Backup

Each store may only be opened once within a JVM.
When opening a store, the file is locked in exclusive mode,
so that the file can only be changed from within one process.

打开模式 exclusive 排他 独占

The persisted data can be backed up at any time, even during write operations (online backup).
在线备份，热备份

To do that, automatic disk space reuse needs to be first disabled,
so that new data is always appended at the end of the file.
Then, the file can be copied. The file handle is available to the application.
It is recommended to use the utility class FileChannelInputStream to do this.

Encrypted Files

MVStoreTool
dump the contents of a file.

Exception Handling

This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed.
抛出 unchecked exceptions

IllegalStateException
IllegalArgumentException
UnsupportedOperationException
ConcurrentModificationException

Storage Engine for H2

For H2 version 1.4 and newer,
the MVStore is the default storage engine (supporting SQL, JDBC, transactions, MVCC, and so on).
For older versions, append ;MV_STORE=TRUE to the database URL.

H2 1.4 以上版本 默认使用 MVStore 作为存储引擎
老版本 在 连接 URL 后增加 ;MV_STORE=TRUE

File Format

Metadata Map

In addition to the user maps,
there is one metadata map that contains names and positions of user maps, and chunk metadata.

Similar Projects and Differences to Other Storage Engines
同类项目及不同点

LevelDB Cabinet BDB SQLite3 MapDB

Unlike similar storage engines like LevelDB and Kyoto Cabinet,
the MVStore is written in Java and can easily be embedded in a Java and Android application.

LevelDB Cabinet

The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java,
and is also a log structured storage, but the H2 license is more liberal.

与BDB类似 ，H2的许可更自由

Like SQLite 3, the MVStore keeps all data in one file.
Unlike SQLite 3, the MVStore uses is a log structured storage.

The API of the MVStore is similar to MapDB (previously known as JDBM)
some code is shared between MVStore and MapDB.

unlike MapDB, the MVStore uses is a log structured storage.
The MVStore does not have a record size limit.

MVStore
written in Java and can easily be embedded in a Java and Android application
keeps all data in one file
log structured storage
NO record size limit

The MVStore is included in the latest H2 jar file.
To build just the MVStore (without the database engine), run:
./build.sh jarMVStore

MVStore 包含在 H2 jar包中，可独立编译 MVStore （不包含数据库引擎）

后浪程序员

大数据常用存储格式

Apache Parquet

H2 MVStore Log Structured Storage

H2 limits

列式存储引擎Parquet和ORC比较