aerospike缓冲和缓存机制     所属分类 aerospike 浏览量 1503
Buffering and Caching in Aerospike

current write block 

When a record is written and needs to be stored on disk, 
it is put into an in-RAM buffer that holds the current write block, 
the write block that asd is currently filling up. 
When the current write block is full, then it is persisted to disk and asd starts a new write block. 
Thus, all writes to a write block are coalesced into a single, write-block-size  device write. 
This leads to low write IOPS.

当前写入块,写入缓冲区 ,满了之后,写入磁盘,并重新创建新的写入块

post-write queue 

The most recently persisted write blocks are kept in RAM after being written to disk. 
The idea is that subsequent reads are more likely to hit recently written records than older records. 
In particular, XDR would hit recently written records. 
If such a record is read, its data can be retrieved from the write block in the post-write-queue 
that contains the record’s data rather than having to be retrieved from the device.



缓存最近写入的数据,提升读取该类数据的性能, 直接从队列里检索

如果  data-in-memory  设置为 true , 则不需要开启该功能

page cache 
In general, the Linux kernel does any device I/O via the page cache. 
When a process writes data, the data goes to the page cache, 
i.e., to RAM, and the Linux kernel takes care of writing it to the device asynchronously at a later point in time. 
The page cache operates with a granularity of 4-KiB pages. 
So, if two small writes hit the same 4-KiB page in rapid succession, 
these two writes will be coalesced into one 4-KiB write later, 
when the Linux kernel decides to asynchronously write the page to the underlying device. 
Between data in a page getting modified (in RAM) 
and the page actually getting written to disk, the page is said to be dirty.



未写入底层设备的修改过的page, 脏页  dirty page

Reads also go through the page cache. 
When a process reads data from a device, 
the Linux kernel reads the 4-KiB pages that hold the data into the page cache, i.e., to RAM. 
From there, the Linux kernel copies the data to the read buffer provided by the process.


The page cache uses least-recently-used eviction. 
The pages that contain data that was recently read or written are thus kept in RAM, 
so that subsequent reads hopefully won’t have to go to the underlying device, 
but will find their data already in the page cache.

LRU 逐出算法
最近读写的页面在内存中 ,后面的读取就不用到底层设备,直接从页缓存读取


The page cache’s lifetime is bounded by the Linux kernel’s lifetime. 
The data will be safe even if a process crashes after writing data to the page cache 
but before the Linux kernel actually writes the data to disk. 
The page cache is system-wide and not bound to a process. 
The page cache only loses data when the kernel panics 
or if there is a sudden power loss, in other terms, 
when the kernel doesn’t get cleanly shut down. 
A clean OS shutdown will write all dirty pages to disk.


hardware caches 
Disk devices and controllers can also contain caches. 
Again, the idea is to coalesce writes and to keep recently read or written data in RAM. 
The difference is just that for hardware caches, the RAM sits on the disk device or the controller. 
How exactly these caches work and which guarantees they come with differs from device to device. 
Sometimes the cache of a device can also be configured, 
i.e., it allows to select from a set of different behaviors. 
Some of the caches are battery-backed, 
so that a sudden power loss would not cause data loss, others aren’t.




Cache hierarchy:
The cache hierarchy uses the following 3 layers:

the current write block
the page cache
the hardware caches

缓存层级 3层结构

All three of them can delay the persistence of written record data, 
temporarily keeping the data in RAM, 
where it can be affected by unexpected events such as a sudden loss of power. 
The post-write-queue doesn’t factor into this, 
as it only keeps already written data around, 
but doesn’t delay the data on its way to persistent storage.

post-write-queue 不会导致延迟,只是缓存最新读写数据,用于提升读性能

These three buffers and caches form three layers of a hierarchy that written data moves through.

As the first layer, asd keeps data in the current write block 
(unless if commit-to-device  is set to true).
除非将 commit-to-device 设为 true 

When asd decides to actually persist a write block, 
the page cache comes into play as a second layer and may further delay persistence.

Once the Linux kernel decides to write the data from the page cache to the underlying device, 
the hardware caches come into play as a third layer and may further delay persistence.

Finally, the device’s firmware will decide to move the data from the hardware cache to persistent media. 
Only then will the data be safe from any unexpected events such as a sudden power loss.



Aerospike server version 4.3.1 and above:

By default, reads from and writes to devices use O_DIRECT and O_DSYNC. 
This bypasses the latter two layers in the three-layer cache hierarchy, 
the page cache and the hardware caches. 
However, data can still be lost in first layer, the current write block. 
This buffer loss window is bounded, though, 
by how often asd writes partial write blocks to the underlying device. 
Refer to the flush-max-ms  configuration directive. 
Therefore, the buffer loss window (in case of a crash or power loss) can be quantified and controlled.

参考 flush-max-ms 配置指令。

By default, reads from and writes to files don’t use any flags. 
Therefore, caching applies to both. All reads are cached in the page cache, 
and for writes data can theoretically be lost in the page cache or in the hardware caches. 
As with devices, data can also be lost in the first layer. 
However, while the loss window in the first layer is bounded 
(refer to the flush-max-ms configuration directive), 
there are no definitive bounds regarding the page cache or the hardware caches. 
So, this default has slightly worse guarantees than the default for devices. 
As mentioned previously in this article, 
though, data loss in the second and third layer requires a kernel panic or sudden power loss. 
If asd crashes, the data will be preserved in these layers, 
even with these somewhat lesser guarantees.

但是,第一层中的损失窗口是有界的 (参考 flush-max-ms 配置),

注意设备和文件的区别 ,使用设备,默认后两层缓存无效。

This configuration directive enables O_DIRECT and O_DSYNC for files, 
i.e., it brings the parameters for files in line with the defaults for devices. 
Reads and writes now bypass the page cache and the hardware caches. 
Data can still be lost in the first layer, 
though, just like for devices which the next configuration directive addresses.

针对文件开启 O_DIRECT and O_DSYNC,与设备的默认配置一样
读写绕过 页缓存和硬件缓存


This directive configures the interval at which asd writes a 
partially filled current write block to device (in milliseconds). 
This reduces risk of buffer loss window in layer one of the cache hierarchy, 
the current write block, to the given millisecond interval. 
Note that this only applies to the first layer of the cache hierarchy. 
If used with files, but don’t use direct-files, 
then caching still happens in the page cache and in the hardware caches.

只影响第一层缓存 写缓冲flush间隔 毫秒

This configuration directive takes flush-max-ms to its logical conclusion: 
synchronously write record data to the underlying device during a write transaction. 
In contrast to flush-max-ms, though, this affects all three layers of the cache hierarchy: 
if O_DIRECT and O_DSYNC aren’t enabled yet, this will enable them. 
For devices, O_DIRECT and O_DSYNC are enabled by default,
so this aspect of commit-to-device only applies to files. 
This also means that the direct-files directive is not needed 
when using commit-to-device  with files.
In any case, this configuration directive disables caching in all three layers of the hierarchy.

设置为true 会禁用所有层的缓存

This configuration directive removes O_DIRECT and O_DSYNC for record reads done by transactions. 
This means that the read data will not only go to asd, but also into the page cache. 
If the same record gets read again by a subsequent transaction, 
it will not need to go to the device to get it, 
but the read will be satisfied from the page cache. 
This configuration directive doesn’t change anything about writes, 
i.e., it doesn’t affect any write guarantees established 
by the above configuration options, say, commit-to-device 


So, if reads are cached in the page cache, but writes bypass the page cache, 
won’t reads potentially read stale cached data? 
This is not an issue, because the Linux kernel guarantees page cache coherence. 
Even though in our scenario writes aren’t cached, 
a write of some data invalidates a cached copy of that data in the page cache. 
Therefore, writes don’t overwrite data in the page cache, 
but they do invalidate it, if needed. 
A subsequent read of the written data is thus forced to hit the device again 
and read the fresh data, not a stale version of it.




上一篇     下一篇



aerospike写入失败处理queue too deep