文章详情|H2 MVStore Log Structured Storage

H2 MVStore Log Structured Storage 所属分类 H2 浏览量 1694

Internally, changes are buffered in memory,
and once enough changes have accumulated, they are written in one continuous disk write operation.
Compared to traditional database storage engines,
this should improve write performance for file systems and storage systems
that do not efficiently support small random writes, such as Btrfs, as well as SSDs.
(According to a test, write throughput of a common SSD increases with write block size,
until a block size of 2 MB, and then does not further increase.)

提升 随机写性能
Btrfs 采用 COW 保证文件系统的一致性
写吞吐量测试 write block size 2MB最佳

By default, changes are automatically written when more than a number of pages are modified,
and once every second in a background thread, even if only little data was changed.
Changes can also be written explicitly by calling commit().

当修改超过一定数量的页面时，自动写入更改
后台线程每秒写一次
调用 commit() 写入

When storing, all changed pages are serialized,
optionally compressed using the LZF algorithm,
and written sequentially to a free area of the file.
Each such change set is called a chunk.
All parent pages of the changed B-trees are stored in this chunk as well,
so that each chunk also contains the root of each changed map
(which is the entry point for reading this version of the data).

顺序写入，可选的压缩 LZF algorithm
变更集合 chunk
each chunk also contains the root of each changed map

There is no separate index: all data is stored as a list of pages.
Per store, there is one additional map that contains the metadata
(the list of maps, where the root page of each map is stored, and the list of chunks).

没有独立的索引，所有数据 按页的列表存储
元数据

There are usually two write operations per chunk:
one to store the chunk data (the pages),
and one to update the file header (so it points to the latest chunk).
If the chunk is appended at the end of the file,
the file header is only written at the end of the chunk.
There is no transaction log, no undo log, and there are no in-place updates (however, unused chunks are overwritten by default).

每个chunk 两次写操作
没有事务日志 undo日志 ，没有原地更新 （没有使用的 chunks 会被覆盖 ）

Old data is kept for at least 45 seconds (configurable),
so that there are no explicit sync operations required to guarantee data consistency.
An application can also sync explicitly when needed.
To reuse disk space, the chunks with the lowest amount of live data are compacted
(the live data is stored again in the next chunk).
To improve data locality and disk space usage,
the plan is to automatically defragment and compact data.

为了节省空间 ， 压缩

Compared to traditional storage engines (that use a transaction log, undo log, and main storage area),
the log structured storage is simpler, more flexible, and typically needs less disk operations per change,
as data is only written once instead of twice or 3 times,
and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed.

But temporarily, disk space usage might actually be a bit higher than for a regular database,
as disk space is not immediately re-used (there are no in-place updates).

大数据常用存储格式

Apache Parquet

H2 MVStore

H2 limits

列式存储引擎Parquet和ORC比较

parquet基本原理