The MVStore is a persistent, log structured key-value store.
It is used as default storage subsystem of H2,
but it can also be used directly within an application, without using JDBC or SQL.
MVStore 持久化的 日志结构的 KV 存储
可以直接独立使用 ，不依赖 JDBC
MVStore stands for "multi-version store".
Each store contains a number of maps that can be accessed using the java.util.Map interface.
Both file-based persistence and in-memory operation are supported.
It is intended to be fast, simple to use, and small.
Concurrent read and write operations are supported.
Transactions are supported (including concurrent transactions and 2-phase commit).
The tool is very modular.
It supports pluggable data types and serialization,
pluggable storage (to a file, to off-heap memory),
pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage,
and a file system abstraction to support encrypted files and zip files.
MVStore multi-version store 多版本存储
// open the store (in-memory if fileName is null)
MVStore s = MVStore.open(fileName);
// create/get the map named "data"
MVMap map = s.openMap("data");
// add and read some data
map.put(1, "Hello World");
// close the store (this will persist changes)
MVStore s = new MVStore.Builder().
autoCommitBufferSize: the size of the write buffer.
autoCommitDisabled: to disable auto-commit.
backgroundExceptionHandler: a handler for exceptions that could occur while writing in the background.
cacheSize: the cache size in MB.
compress: compress the data when storing using a fast algorithm (LZF).
compressHigh: compress the data when storing using a slower algorithm (Deflate).
encryptionKey: the key for file encryption.
fileName: the name of the file, for file based stores.
fileStore: the storage implementation to use.
pageSplitSize: the point where pages are split.
readOnly: open the file in read-only mode.
The MVRTreeMap is an R-tree implementation that supports fast spatial queries.
Each store contains a set of named maps.
A map is sorted by key, and supports the common lookup operations,
including access to the first and last key, iterate over some or all keys, and so on.
包含多个命名的 map , 每个map 按key 顺序存储 ，支持 查询操作 ，访问第一个key 和 最后一个key
遍历 部分 或 全部 key
A version is a snapshot of all the data of all maps at a given point in time.
COW (copy on write)
Old versions are readable. Rollback to an old version is supported.
To support multiple concurrent open transactions, a transaction utility is included, the TransactionStore.
The tool supports "read committed" transaction isolation with savepoints, two-phase commit, and other features typically available in a database.
There is no limit on the size of a transaction (the log is written to disk for large or long running transactions).
internally, this utility stores the old versions of changed entries in a separate map,
similar to a transaction log, except that entries of a closed transaction are removed,
and the log is usually not stored for short transactions.
In-Memory Performance and Usage
Performance of in-memory operations is about 50% slower than java.util.TreeMap.
Pluggable Data Types
Serialization is pluggable.
The default serialization currently supports many common data types,
and uses Java serialization for other objects.
The following classes are currently directly supported: Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date and arrays (both primitive arrays and object arrays).
For serialized objects, the size estimate is adjusted using an exponential moving average.
Parameterized data types are supported (for example one could build a string data type that limits the length).
The storage engine itself does not have any length limits,
so that keys, values, pages, and chunks can be very big (as big as fits in memory).
There is a mechanism that stores large binary objects by splitting them into smaller blocks.
Concurrent Operations and Caching
Concurrent reads and writes are supported.
Write operations first read the relevant pages from disk to memory (this can happen concurrently), and only then modify the data.
The in-memory parts of write operations are synchronized.
Caching is done on the page level.
The page cache is a concurrent LIRS cache, which should be resistant against scan operations.
For fully scalable concurrent write operations to a map (in-memory and to disk),
the map could be split into multiple maps in different stores ('sharding').
The plan is to add such a mechanism later when needed.
多个 store , 分片 ，
Log Structured Storage
changes are buffered in memory, and once enough changes have accumulated,
they are written in one continuous disk write operation.
By default, changes are automatically written when more than a number of pages are modified,
and once every second in a background thread, even if only little data was changed.
Changes can also be written explicitly by calling commit().
调用 commit() 写入
Off-Heap and Pluggable Storage
An off-heap storage implementation is available.
This storage keeps the data in the off-heap memory, meaning outside of the regular garbage collected heap.
Memory is allocated using ByteBuffer.allocateDirect.
One chunk is allocated at a time (each chunk is usually a few MB large), so that allocation cost is low.
堆外内存 分配 ByteBuffer.allocateDirect
一次分配一个chunk （ 大小为 几 MB ）
OffHeapStore offHeap = new OffHeapStore();
MVStore s = new MVStore.Builder().
File System Abstraction, File Locking and Online Backup
Each store may only be opened once within a JVM.
When opening a store, the file is locked in exclusive mode,
so that the file can only be changed from within one process.
打开模式 exclusive 排他 独占
The persisted data can be backed up at any time, even during write operations (online backup).
To do that, automatic disk space reuse needs to be first disabled,
so that new data is always appended at the end of the file.
Then, the file can be copied. The file handle is available to the application.
It is recommended to use the utility class FileChannelInputStream to do this.
dump the contents of a file.
This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed.
抛出 unchecked exceptions
Storage Engine for H2
For H2 version 1.4 and newer,
the MVStore is the default storage engine (supporting SQL, JDBC, transactions, MVCC, and so on).
For older versions, append ;MV_STORE=TRUE to the database URL.
H2 1.4 以上版本 默认使用 MVStore 作为存储引擎
老版本 在 连接 URL 后增加 ;MV_STORE=TRUE
In addition to the user maps,
there is one metadata map that contains names and positions of user maps, and chunk metadata.
Similar Projects and Differences to Other Storage Engines
LevelDB Cabinet BDB SQLite3 MapDB
Unlike similar storage engines like LevelDB and Kyoto Cabinet,
the MVStore is written in Java and can easily be embedded in a Java and Android application.
The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java,
and is also a log structured storage, but the H2 license is more liberal.
Like SQLite 3, the MVStore keeps all data in one file.
Unlike SQLite 3, the MVStore uses is a log structured storage.
The API of the MVStore is similar to MapDB (previously known as JDBM)
some code is shared between MVStore and MapDB.
unlike MapDB, the MVStore uses is a log structured storage.
The MVStore does not have a record size limit.
written in Java and can easily be embedded in a Java and Android application
keeps all data in one file
log structured storage
NO record size limit
The MVStore is included in the latest H2 jar file.
To build just the MVStore (without the database engine), run:
MVStore 包含在 H2 jar包中，可独立编译 MVStore （不包含数据库引擎）
H2 MVStore Log Structured Storage