aerospike写入失败处理queue too deep     所属分类 aerospike 浏览量 2039
Warning: write fail: queue too deep

storage-engine 使用 device
WARNING (drv_ssd): (drv_ssd.c:4153) {test} write fail: queue too deep: exceeds max 256
WARNING (drv_ssd): (drv_ssd.c:4153) (repeated:1096) {test} write fail: queue too deep: exceeds max 256

com.aerospike.client.AerospikeException: Error Code 18: Device overload.

com.aerospike.client.AerospikeException: Error Code 14: Hot key 

This error indicates that although the disks themselves are not necessarily faulty or nearing end of life, 
they are not keeping up with the load placed upon them. 
When Aerospike writes it returns success to the client 
when the record has been written to storage write blocks which are flushed to disk asynchronously. 
If the storage device is not able to keep up with the write load 
the storage write blocks are cached in the write cache. 
When this cache is full the error above is reported. 
As the cache is not flushing to disk synchronously and get queued up, 
the errors may appear suddenly. 
In this scenario latencies may be visible in the reads histogram 
as reads will involve disk access in most circumstances 
(excluding data in memory and reading from the post write queue)

在这种情况下,读取直方图中可以看到延迟,因为读取在大多数情况下都涉及磁盘访问(不包括内存中的数据和从post write队列中读取)

current write block 

If the storage device is not able to keep up with the write load 
the storage write blocks are cached in the write cache. 
When this cache is full the error above is reported. 


Cause and Solution

1 Check whether the write load/client traffic has increased



2 Figure out the device and the namespace that are impacted

The following log line will indicate 
how many storage-write_blocks are currently in use and the trend for (write-q). 
If the issue is happening only on one device, 
this can potentially infer to an existing hot-key situation.

INFO (drv_ssd): (drv_ssd.c:2115) {test} /dev/xvdb: used-bytes 1626239616 free-wblocks 28505 write-q 39 write (8203,23.0) defrag-q 0 defrag-read (7981,21.7) defrag-write (1490,3.0) shadow-write-q 0

写入队列长度  hot key 问题

3 Check whether migrations are tuned too aggressive



Note that incoming migration data will go through the write queues as well 
and can put the w-q at a really high value.

迁入的数据会经过写入队列 ,可能会将写入队列数变为一个很高的值

Please note this is limited to tracking write-queue on a per-device basis, 
we do not have a direct cluster statistic.

4 Check whether defragmentation is tuned too aggressive

   defrag-sleep defrag-lwm-pct 
This will potentially cause extra large block reads and writes that would hurt disk performance 
(extra defrag reads / defrag writes) as well as increase further the write queue due to defrag writes


5 Confirm the health of the device

Even though the error itself does not imply a faulty device, 
it’s good to confirm if disks are not nearing end of life


In the long term the load on the cluster should be 
considered to see if the disks are able to keep up with the throughput. 
Factors such as level of defragmentation and whether migration is ongoing 
should be considered as these can contribute to pressure on the disks. 
Consider keeping configuration at default to lower the load on the disk.


负载 吞吐量 综合考虑 业务TPS  数据迁移 碎片整理 等情况

Workaround 解决方案

In case of short bursts of writes, 
increasing the write cache will provide more room for the write buffers (w-q) 
and potentially allow client writes to proceed. 
If the disks are not capable of handling the write load this will not be a permanent solution. 


The size of the write cache can be increased using the following command:

asinfo -v 'set-config:context=namespace;id=test;max-write-cache=128M'

The default size for the write cache is 64MB 
(with a write-block-size of 128K a 64MB cache would comprise 512 storage-write-blocks hence the q 513, max 512 portion of the error). 
When increasing the cache, it should always be increased to a multiple of the write-block-size.

默认 write cache 大小 64MB
write-block-size of 128K 

Ensure that you have sufficient memory capacity before increasing the max-write-cache to a high value to avoid running out of memory. 

Oct 08 2017 06:46:35 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70507569408 free-wblocks 3269912 write-q 0 write (80395102,140.9) defrag-q 0 defrag-read (79565128,178.9) defrag-write (36171273,82.2) shadow-write-q 0
Oct 08 2017 06:46:55 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70515256192 free-wblocks 3269869 write-q 0 write (80397215,105.7) defrag-q 0 defrag-read (79567198,103.5) defrag-write (36172224,47.5) shadow-write-q 1
Oct 08 2017 06:47:15 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70517832704 free-wblocks 3269777 write-q 35 write (80397876,33.0) defrag-q 29 defrag-read (79567797,30.0) defrag-write (36172491,13.4) shadow-write-q 0
Oct 08 2017 06:47:35 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70516947328 free-wblocks 3269817 write-q 56 write (80398042,8.3) defrag-q 63 defrag-read (79568037,12.0) defrag-write (36172589,4.9) shadow-write-q 0
Oct 08 2017 06:47:55 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70515790720 free-wblocks 3269841 write-q 106 write (80398246,10.2) defrag-q 177 defrag-read (79568379,17.1) defrag-write (36172696,5.3) shadow-write-q 0

the write queue should always be at 0, or very very close. 
Anytime it goes higher than that, it means that we are not able to flush the blocks as fast as they are filled up.

write queue 最好接近0 ,否则表示来不及写入

Now as the write-q starts increasing, 
the write per seconds decreases to around 10 blocks per seconds (10 times less). 
So, even though the write load reduced by an order of magnitude, 
the disk-io subsystem is still not able to handle it.

This can lead to high CPU load, which can cause the whole system to slow down, 
for example, it can also let the nsup cycle being very very long

migration writes will continue to be processed even after the max-write-cache limit has been hit. 
Prole writes and defragmentation writes will continue as well. 
This could lead to very high memory usage and its potential consequences.




Size in bytes of each I/O block that is written to the disk. 
This effectively sets the maximum object size. 
The maximum allowed size is 8388608 (or 8M) for versions 4.2 and higher. 
For versions prior to 4.2, the maximum allowed size is 1048576 (or 1M). 




Number of bytes (should be multiple of write-block-size) 
the system is allowed to keep pending write blocks before failing writes. 
This value is allocated on a per device basis and needs to be accounted in the total sizing calculation.

write-block-size 写块大小的整数倍

The SAR tool may be used to check the loading on the various devices. 
This will collect and retain data related to network and device usage for a rolling 7 day period. 
The default SAR interval for collecting data is 10 minutes. 
It may be useful to reduce this to 5 or 2 minutes 
dependant on the duration of queue too deep errors when they occur.

iostat can also be used to check disk performance but this must be collected when the issue is happening.

磁盘  IO load 性能 监控工具

上一篇     下一篇