文章详情|aerospike写入失败处理queue too deep

aerospike写入失败处理queue too deep 所属分类 aerospike 浏览量 3070
Warning: write fail: queue too deep
https://discuss.aerospike.com/t/why-do-i-see-a-warning-write-fail-queue-too-deep/3009


aerospike.log
storage-engine 使用 device
 
WARNING (drv_ssd): (drv_ssd.c:4153) {test} write fail: queue too deep: exceeds max 256
WARNING (drv_ssd): (drv_ssd.c:4153) (repeated:1096) {test} write fail: queue too deep: exceeds max 256

com.aerospike.client.AerospikeException: Error Code 18: Device overload.

com.aerospike.client.AerospikeException: Error Code 14: Hot key 



This error indicates that although the disks themselves are not necessarily faulty or nearing end of life, 
they are not keeping up with the load placed upon them. 
When Aerospike writes it returns success to the client 
when the record has been written to storage write blocks which are flushed to disk asynchronously. 
If the storage device is not able to keep up with the write load 
the storage write blocks are cached in the write cache. 
When this cache is full the error above is reported. 
As the cache is not flushing to disk synchronously and get queued up, 
the errors may appear suddenly. 
In this scenario latencies may be visible in the reads histogram 
as reads will involve disk access in most circumstances 
(excluding data in memory and reading from the post write queue)


这个错误表明，尽管磁盘本身并不一定有故障或接近使用寿命，但它们并没有跟上所加的负载。
当Aerospike写入时，当记录被写入到异步刷新到磁盘的存储写入块时，它将成功返回给客户机。
如果存储设备无法跟上写负载，那么存储写块将缓存在写缓存中。
当此缓存已满时，将报告上面的错误。由于缓存没有同步刷新到磁盘并排队，可能会突然出现错误。
在这种情况下，读取直方图中可以看到延迟，因为读取在大多数情况下都涉及磁盘访问(不包括内存中的数据和从post write队列中读取)


current write block 


If the storage device is not able to keep up with the write load 
the storage write blocks are cached in the write cache. 
When this cache is full the error above is reported. 

来不及写入设备，写块缓存在写缓存中
当写缓存满了就会报告错误



Cause and Solution

1 Check whether the write load/client traffic has increased

asloglatency
http://www.aerospike.com/docs/tools/asloglatency

客户端流量

2 Figure out the device and the namespace that are impacted

The following log line will indicate 
how many storage-write_blocks are currently in use and the trend for (write-q). 
If the issue is happening only on one device, 
this can potentially infer to an existing hot-key situation.

INFO (drv_ssd): (drv_ssd.c:2115) {test} /dev/xvdb: used-bytes 1626239616 free-wblocks 28505 write-q 39 write (8203,23.0) defrag-q 0 defrag-read (7981,21.7) defrag-write (1490,3.0) shadow-write-q 0

写入队列长度  hot key 问题

3 Check whether migrations are tuned too aggressive
检查迁移是否调整得过于激进

检查以下参数是否跟默认值有很大差异

migration-sleep 
migration-threads 
migration-max-number-entry 

Note that incoming migration data will go through the write queues as well 
and can put the w-q at a really high value.

迁入的数据会经过写入队列 ，可能会将写入队列数变为一个很高的值


Please note this is limited to tracking write-queue on a per-device basis, 
we do not have a direct cluster statistic.

4 Check whether defragmentation is tuned too aggressive

  碎片整理是否太激进了
  
   defrag-sleep defrag-lwm-pct 
   
   
This will potentially cause extra large block reads and writes that would hurt disk performance 
(extra defrag reads / defrag writes) as well as increase further the write queue due to defrag writes

会导致非常大的块读和写，这会损害磁盘性能(额外的碎片整理读/碎片整理写)，并由于碎片整理写而进一步增加写队列



5 Confirm the health of the device
  确认设备是否健康

Even though the error itself does not imply a faulty device, 
it’s good to confirm if disks are not nearing end of life

错误本身并不意味着设备故障，最好确认磁盘是否接近生命周期的尽头



In the long term the load on the cluster should be 
considered to see if the disks are able to keep up with the throughput. 
Factors such as level of defragmentation and whether migration is ongoing 
should be considered as these can contribute to pressure on the disks. 
Consider keeping configuration at default to lower the load on the disk.


从长期来看，应该考虑集群上的负载，以查看磁盘是否能够跟上吞吐量。
应该考虑碎片整理和迁移是否正在进行等因素，因为这些因素会对磁盘造成压力。
考虑将配置保持默认状态，以降低磁盘上的负载。


负载 吞吐量 综合考虑 业务TPS  数据迁移 碎片整理 等情况


Workaround 解决方案



In case of short bursts of writes, 
increasing the write cache will provide more room for the write buffers (w-q) 
and potentially allow client writes to proceed. 
If the disks are not capable of handling the write load this will not be a permanent solution. 

在短时间写操作爆发的情况下，增加写缓存将为写缓冲区(w-q)提供更多空间，并可能允许客户端继续写操作。
如果磁盘不能处理写负载，这将不是一个永久的解决方案。


The size of the write cache can be increased using the following command:

asinfo -v 'set-config:context=namespace;id=test;max-write-cache=128M'

The default size for the write cache is 64MB 
(with a write-block-size of 128K a 64MB cache would comprise 512 storage-write-blocks hence the q 513, max 512 portion of the error). 
When increasing the cache, it should always be increased to a multiple of the write-block-size.

默认 write cache 大小 64MB
write-block-size of 128K 
当增加缓存时，它应该总是增加到写块大小的倍数。

Ensure that you have sufficient memory capacity before increasing the max-write-cache to a high value to avoid running out of memory. 

Oct 08 2017 06:46:35 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70507569408 free-wblocks 3269912 write-q 0 write (80395102,140.9) defrag-q 0 defrag-read (79565128,178.9) defrag-write (36171273,82.2) shadow-write-q 0
Oct 08 2017 06:46:55 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70515256192 free-wblocks 3269869 write-q 0 write (80397215,105.7) defrag-q 0 defrag-read (79567198,103.5) defrag-write (36172224,47.5) shadow-write-q 1
Oct 08 2017 06:47:15 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70517832704 free-wblocks 3269777 write-q 35 write (80397876,33.0) defrag-q 29 defrag-read (79567797,30.0) defrag-write (36172491,13.4) shadow-write-q 0
Oct 08 2017 06:47:35 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70516947328 free-wblocks 3269817 write-q 56 write (80398042,8.3) defrag-q 63 defrag-read (79568037,12.0) defrag-write (36172589,4.9) shadow-write-q 0
Oct 08 2017 06:47:55 GMT: INFO (drv_ssd): (drv_ssd.c:2115) {ns1} /dev/sda: used-bytes 70515790720 free-wblocks 3269841 write-q 106 write (80398246,10.2) defrag-q 177 defrag-read (79568379,17.1) defrag-write (36172696,5.3) shadow-write-q 0


the write queue should always be at 0, or very very close. 
Anytime it goes higher than that, it means that we are not able to flush the blocks as fast as they are filled up.


write queue 最好接近0 ，否则表示来不及写入


Now as the write-q starts increasing, 
the write per seconds decreases to around 10 blocks per seconds (10 times less). 
So, even though the write load reduced by an order of magnitude, 
the disk-io subsystem is still not able to handle it.

This can lead to high CPU load, which can cause the whole system to slow down, 
for example, it can also let the nsup cycle being very very long


migration writes will continue to be processed even after the max-write-cache limit has been hit. 
Prole writes and defragmentation writes will continue as well. 
This could lead to very high memory usage and its potential consequences.

数据迁移写入将继续被处理，即使在达到最大写缓存限制之后也是如此。
副本写和碎片整理写也将继续。这可能导致非常高的内存使用量及其潜在后果。


write-block-size 
max-write-cache-size


write-block-size 

Size in bytes of each I/O block that is written to the disk. 
This effectively sets the maximum object size. 
The maximum allowed size is 8388608 (or 8M) for versions 4.2 and higher. 
For versions prior to 4.2, the maximum allowed size is 1048576 (or 1M). 

每个写入磁盘的I/O块的大小(以字节为单位)。
这有效地设置了最大对象大小。
对于4.2或更高版本，最大允许大小为8388608(或8M)。
对于4.2之前的版本，最大允许大小为1048576(或1M)。


https://discuss.aerospike.com/t/faq-write-block-size/681




max-write-cache-size

Number of bytes (should be multiple of write-block-size) 
the system is allowed to keep pending write blocks before failing writes. 
This value is allocated on a per device basis and needs to be accounted in the total sizing calculation.

write-block-size 写块大小的整数倍
允许系统在写操作失败之前保持挂起的写块。



The SAR tool may be used to check the loading on the various devices. 
This will collect and retain data related to network and device usage for a rolling 7 day period. 
The default SAR interval for collecting data is 10 minutes. 
It may be useful to reduce this to 5 or 2 minutes 
dependant on the duration of queue too deep errors when they occur.

iostat can also be used to check disk performance but this must be collected when the issue is happening.


磁盘  IO load 性能 监控工具
SAR 
iostat
aerospike存储机制

aerospike架构概述

aerospike缓冲和缓存机制

aerospike写块大小设置FAQ

五大最佳开源java性能监控工具

轻量级web容器undertow