Data Distribution

Shared-Nothing architecture

Every node in the Aerospike cluster is identical.
All nodes are peers.
There is no single point of failure.


Smart Partitions algorithm, data distributes evenly across all nodes in the cluster.


Each namespace is divided into 4096 logical partitions, which are evenly distributed between the cluster nodes. 


keys  RipeMD160 digests  mod4096  partitions  partition-assignment   partitionMap

uses a deterministic hash process to consistently map a record to a single partition.

To determine record assingment to a partition, 
the record's key (of any size) is hashed into a 20-byte fixed-length digest using RIPEMD160. 
Using 12 bits of this digest, the partition ID of the record is determined.

使用RIPEMD160散列把key 映射成 20字节的摘要值

RIPEMD160, which is a field-tested, extremely random hash function, 
ensures that records distribute very evenly on a partition by partition basis. 


Partition Distribution to Cluster Nodes

uses a random hashing method to ensure that partitions distribute evenly to the cluster nodes. 
There is no need for manual sharding.

All of the nodes in the cluster are peers .
there is no single database master node that can fail and take the whole database down.

When nodes are added or removed, 
a new cluster will form and its nodes will coordinate to evenly divide partitions between themselves. 
The cluster will then automatically re-balance.

Data Replication and Synchronization

replicates partitions on one or more nodes. 
分区复制 副本 

One node becomes the data master for reads and writes for a partition, 
while other nodes store its replica partitions.


The replication factor is configurable; 
however, it cannot exceed the number of nodes in the cluster. 
More replicas equals better reliability, 
but creates higher cluster demand as write requests must go to all replicas. 
Most deployments use replication factor of 2 (one master copy and one replica).

副本因子可以配置,不过不能超过 节点数

Synchronous replication provides a higher level of correctness in the face of no network faults. 
A write transaction propagates to all replicas before committing the data and returning results to the client. 
In rare cases during cluster reconfiguration when the Smart Client 
may have sent the request to the wrong node because it is briefly out of date, 
the Smart Cluster transparently proxys the request to the right node. 
When a cluster is recovering from partitioning, 
there may be writes which have been applied in conflict to different partitions. 
In this case, Aerospike applies a heuristic to choose the most likely version, 
which is it resolves any conflicts that occurred between different copies of the data. 
By default, the version with the largest number of changes ( highest generation count ) is chosen, 
although the version with the most recently modified time can be chosen. 
The correct choice will be determined by the data model.



写冲突  启发式选择最可能的版本

Aerospike Cluster with No Replication
replication factor = 1

Smart Client is location-aware. 
It knows where each partition is located so that the data retrieval is achieved in a single hop. 
Every read and write request is sent to the data master for processing. 

When a node receives a write request, 
it saves the data and forwards the write request to the replica node. 
Once the replica node confirms a successful write and the node writes the data itself, 
a confirmation returns to the client.

Automatic Rebalancing

The transaction algorithms integrated with the data distribution system 
ensure that there is one consensus vote to coordinate a cluster change. 
Voting per cluster change, instead of per transaction, 
provides higher performance while maintaining shared-nothing simplicity.


Aerospike allows configuration options to specify how fast rebalance proceeds. 
Temporarily slowing transactions heals the cluster more quickly. 
If you need to maintain transactional speed and volume, the cluster rebalances more slowly.
重新平衡的速度可配置 ,暂时减缓事务处理速度可以更快地修复集群

During rebalance, Aerospike does not retain full replication factors of all partitions. 
Some in-transit partitions temporarily become single replica, 
to provide maximal memory and storage availability as the cluster rebalances to new stability.


By not requiring operator intervention, 
the cluster self-heals even at the most demanding times. 
For example, in one customer deployment a rack circuit breaker tripped, 
and one node of an 8-node cluster went down. No operator intervention was required. 
After several hours the fault was corrected and the rack came back online. 
Operators never had to take special steps to maintain the Aerospike cluster.


In Aerospike, capacity planning and system monitoring manage virtually any failure with no loss of service. 
You can configure and provision your hardware capacity, 
and set up the replication/synchronization policies 
so that the database recovers from failures without affecting users.


Traffic Saturation Management
The Aerospike Database monitoring tools let you evaluate bottlenecks. 
Network bottlenecks decrease database throughput capacity, making requests slow.


Capacity Overflows
On storage overflow, the Aerospike stop-write limit prevents new record writes. 
Replica and migration writes, as well as reads, continue processing. 
So, even beyond optimal capacity, the database does not stop handling requests. 
It continues to do as much as possible to continue processing user requests.

存储溢出时,stop-write限制 阻止新记录的写入。

