首页   快速返回

aerospike删除节点     所属分类 aerospike
https://www.aerospike.com/docs/operations/manage/cluster_mng/removing_node/index.html

Removing a node from a cluster is as easy as stopping the node, 
but it is important to follow the steps outlined to ensure proper operation of the cluster and its tools in the long run. 
Examples:

Prevent the remaining nodes in the cluster to re-connect to the removed node if they are restarted.
Prevent a cluster from attempting to join another cluster when one of its previously removed node is recommissioned.

从集群中删除节点与停止节点一样简单,但重要的是要遵循概述的步骤,以确保集群及其工具在长期内能够正常运行。
防止集群中剩余的节点在重新启动时重新连接到删除的节点。
防止在重新启用一个集群之前删除的节点时,该集群试图加入另一个集群。
在关闭节点或将其从集群中删除之前,最好先对节点进行静默。有关详细信息,请参阅 静默节点 页面。


It is a good practice to quiesce a node prior to shutting it down or removing it from a cluster. 
Refer to the Quiesce Node page for further details.

在关闭节点或将其从集群中删除之前,最好先对节点进行静默。有关详细信息,请参阅 静默节点 页面。
https://www.aerospike.com/docs/operations/manage/cluster_mng/quiescing_node/index.html


If the node is shipping records via XDR, it is a good practice to wait for xdr_timelag to drop to zero prior to removing the node from the cluster.
如果节点通过XDR传送记录,那么在从集群中删除节点之前,最好等待xdr_timelag降到零。



Ensure there are no ongoing migrations
确保没有迁移在运行

As of version 4.3, the cluster-stable info command can be used in lieu of manually checking the statistics listed below.
There are several ways to know the cluster migrations state. One of the way is to look at the migration-related statistics Pending Migrates (tx%,rx%) on all the nodes. Example: on a 3 node cluster

Admin> info namespace
Namespace                         Node   Avail%   Evictions                 Master                Replica     Repl     Stop     Pending       Disk    Disk     HWM        Mem     Mem    HWM      Stop
        .                            .        .           .   (Objects,Tombstones)   (Objects,Tombstones)   Factor   Writes    Migrates       Used   Used%   Disk%       Used   Used%   Mem%   Writes%   
        .                            .        .           .                      .                      .        .        .   (tx%,rx%)          .       .       .          .       .      .         .   
test        10.0.0.100:3000              N/E        0.000     (0.000  ,0.000  )      (0.000  ,0.000  )      2        false    (0,0)            N/E   N/E     50      0.000 B    0       60     90
test        10.0.0.103:3000              N/E        0.000     (0.000  ,0.000  )      (0.000  ,0.000  )      2        false    (0,0)            N/E   N/E     50      0.000 B    0       60     90
test        10.0.0.101:3000              N/E        0.000     (0.000  ,0.000  )      (0.000  ,0.000  )      2        false    (0,0)            N/E   N/E     50      0.000 B    0       60     90
test                                                0.000     (0.000  ,0.000  )      (0.000  ,0.000  )                        (0,0)       0.000 B                    0.000 B
Number of rows: 4
For server versions 3.11 and above, make sure the "migrate_partitions_remaining" statistic shows 0 for each node:

Admin> show statistics like migrate
NODE                        :   10.0.0.100:3000   10.0.0.103:3000   10.0.0.101:3000
migrate_allowed             :   true              true              true
migrate_partitions_remaining:   0                 0                 0


Shutdown the node
Shutdown the node gracefully by stopping the aerospike daemon

停止节点

sudo service aerospike stop
The shutdown is successful when we see the following message is logged

finished clean shutdown – exiting
The Shutdown can also be ensured by observing the status of Aerospike daemon using the following command

sudo service aerospike status
* aerospike is not running


Update configuration (on all other nodes in the cluster)
Update configuration (on all other nodes in the cluster) 
If this node is in the seed list of other nodes, the configuration of all the other nodes should be updated to ensure that they do not try to connect to this node if they are restarted.

更新配置文件 ,确保不会连接到删除的节点

Tip clear
Multicast mode Skip this step to go to step 5 Mesh mode If the cluster is formed using mesh mode, 
the next step is to run the 'tip-clear' command on all the remaining nodes in the cluster. 
This is to clear IP or hostname tip list from mesh-mode heartbeat list to prevent the remaining nodes from continuously trying to send heartbeats to the removed node.

从网格模式心跳列表中清除IP或主机名提示列表,以防止其余节点继续尝试向删除的节点发送心跳。

asadm -e "asinfo -v 'tip-clear:host-port-list=XX.XX.XX.XX:3002'"
Where XX.XX.XX.XX is the IP address of the node(s) to be removed.


To validate tip-clear, run the following command to log the heart-beat dump in the log file located at /var/log/aerospike/aerospike.log. 
The heartbeat dump should not contain the node that is decomissioned.

asadm
Admin> asinfo -v 'dump-hb:verbose=true'



Alumni reset
As a final step, remove the node from the alumni list. 
The alumni list is used by some tools to refer to all nodes in a cluster, 
even nodes that may have split from the cluster, so it is important to also clear this node from the list. 
This command should be run on all the remaining nodes in the cluster:

asinfo -v 'services-alumni-reset'
From asadm:

asadm
Admin> asinfo -v 'services-alumni-reset'
10.0.0.101:3000 (10.0.0.101) returned:
ok
10.0.0.103:3000 (10.0.0.103) returned:
ok

校友重置
最后一步,从校友列表中删除节点。
一些工具使用校友列表来引用集群中的所有节点,
甚至可能已经从集群中分离出来的节点,因此从列表中清除这个节点也是很重要的。
这个命令应该在集群中所有剩余的节点上运行:


Notes: Please note that steps 2, 3, 4 and 5 should be run on each of the remaining nodes in the cluster, 
which will be done automatically when issuing the asinfo commands from within asadm.

注意:请注意步骤2、3、4和5应该在集群中的每个剩余节点上运行,这将在从asadm中发出asinfo命令时自动完成。


If you want to take down multiple nodes from the cluster, 
make sure that you start from step 1 and take one node down at a time, 
waiting for migrations to complete between each node to avoid losing any data.

如果你想从集群中删除多个节点,确保从第一步开始,每次取下一个节点,等待每个节点之间的迁移完成,以避免丢失任何数据。


It is expected to see clients either timeout or have higher retries to the removed node until the client retrieves an updated partition map, 
especially for write transactions that typically do not have the option to fall back to a replica. 
By default, it would take a cluster up to 1.5 seconds to detect that a node has left (based on the configured heartbeat interval and timeout), 
another 1 or 2 seconds for the cluster to reform and clients by default tend for the partition map every 1 second, 
it would therefore usually take up to around 5 seconds for the clients to start issuing transactions 
that were previously targetted to the removed node against the new owner(s) for the partitions the departed noded owned.
 
在客户端检索到更新的分区映射之前,预计会看到客户端超时或对删除的节点进行更高的重试,特别是对于通常没有 回退到副本选项的写事务。
默认情况下,集群最多需要1.5秒来检测节点是否已经离开(基于配置的心跳间隔和超时),
集群再花1到2秒来进行 改进 ,默认情况下,分区映射的客户端每1秒就会更新一次,
因此,客户端开始发出事务通常需要大约5秒的时间  ,才能定位到新的分区位置

上一篇     下一篇
21 Questions for a Trading Plan

科创板要点

aerospike增加节点

The java.util.concurrent Synchronizer Framework 翻译

java8 Lambda 实例

富爸爸穷爸爸 摘录