文章详情|elasticsearch5.0索引状态管理

首页

elasticsearch5.0索引状态管理 所属分类 elasticsearch 浏览量 1518

根据原文翻译整理
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/indices-clearcache.html

Clear Cache 清除缓存

POST /twitter/_cache/clear

by default, will clear all caches.
Specific caches can be cleaned explicitly by setting query, fielddata or request.

POST /kimchy,elasticsearch/_cache/clear

POST /_cache/clear

flush

flush one or more indices through an API

The flush process of an index basically frees memory from the index
by flushing data to the index storage and clearing the internal transaction log.
By default, Elasticsearch uses memory heuristics in order to
automatically trigger flush operations as required in order to clear memory.

刷新数据到索引存储 清除事务日志
默认使用内存启发式根据需要自动触发刷新操作以清除内存

POST twitter/_flush

wait_if_ongoing

If set to true the flush operation will block until the flush can be executed
if another flush operation is already executing.
The default is false and will cause an exception to be thrown on the shard level
if another flush operation is already running.

是否等待其他FLUSH操作完成
默认为 FALSE , 如果其他flush操作正在进行会引发异常

force

Whether a flush should be forced even if it is not necessarily needed
ie. if no changes will be committed to the index.
This is useful if transaction log IDs should be incremented even if
no uncommitted changes are present.
(This setting can be considered as internal)

是否应该强制刷新，即使不一定需要。
如果没有更改提交给索引。即使不存在未提交的更改，也应该增加事务日志id，那么这很有用。(此设置可视为内部设置)

POST kimchy,elasticsearch/_flush

POST _flush

Synced Flush

Elasticsearch tracks the indexing activity of each shard.
Shards that have not received any indexing operations for 5 minutes are automatically marked as inactive.
This presents an opportunity for Elasticsearch to reduce shard resources
and also perform a special kind of flush, called synced flush.
A synced flush performs a normal flush,
then adds a generated unique marker (sync_id) to all shards.

跟踪每个分片的索引活动。5分钟内未接收到任何索引操作的分片将自动标记为非活动的。
这为ES提供了一个减少分片资源并执行一种称为同步刷新的特殊刷新的机会。
同步刷新执行普通刷新，然后向所有分片添加生成的惟一标记(sync_id)。

Since the sync id marker was added when there were no ongoing indexing operations,
it can be used as a quick way to check if the two shards' lucene indices are identical.
This quick sync id comparison (if present) is used during recovery
or restarts to skip the first and most costly phase of the process.
In that case, no segment files need to be copied
and the transaction log replay phase of the recovery can start immediately.
Note that since the sync id marker was applied together with a flush,
it is very likely that the transaction log will be empty,
speeding up recoveries even more.

由于sync id标记是在没有进行索引操作的情况下添加的，
因此可以使用它快速检查两个分片的索引是否相同。
这种快速同步id比较(如果存在)在恢复或重启时使用的，跳过第一个也是代价最高的阶段(分发分片，或重放事务日志)。
在这种情况下，不需要复制任何段文件，并且可以立即启动恢复的事务日志重放阶段。
请注意，由于同步id标记是与刷新一起应用的，因此事务日志很可能是空的，从而进一步加快恢复速度。

This is particularly useful for use cases having lots of indices
which are never or very rarely updated, such as time based data.
This use case typically generates lots of indices whose recovery
without the synced flush marker would take a long time.

这对于具有许多索引但从未更新或很少更新的场景特别有用，譬如基于时间的数据。
这个用例通常生成许多索引，如果没有同步刷新标记，这些索引的恢复将花费很长时间。

To check whether a shard has a marker or not,
look for the commit section of shard stats returned by the indices stats API

查看 标记信息 commit部分
sync_id AWhZn76Lt8zQvWGuVKCx
http://localhost:9200/bank/_stats/commit?level=shards

POST twitter/_flush/synced
POST _flush/synced
POST kimchy,elasticsearch/_flush/synced

Any ongoing indexing operations will cause the synced flush to fail on that shard.

The sync_id marker is removed as soon as the shard is flushed again.

It is harmless to request a synced flush while there is ongoing indexing.
Shards that are idle will succeed and shards that are not will fail.
Any shards that succeeded will have faster recovery times.

当索引正在进行时，请求同步刷新是无害的。
空闲的碎片将成功，而不空闲的碎片将失败。任何成功的碎片都有更快的恢复时间。

synced flush fails due to concurrent indexing operations.
The HTTP status code in that case will be 409 CONFLICT.
Sometimes the failures are specific to a shard copy.

synced flush和flush

重启结点的时候，先对比一下shard的synced flush ID，就可以知道两个shard是否完全相同，
避免了不必要的segment file拷贝，极大加快了冷索引的恢复速度。
synced flush只对冷索引有效，对于热索引（5分钟内有更新的索引）没有作用

重启一个结点之前，为加快恢复启动速度 ，可以按照以下步骤执行
暂停数据写入程序
关闭集群shard allocation
手动执行POST /_flush/synced
重启结点
重新开启集群shard allocation
等待recovery完成，集群health status变成green
重新开启数据写入程序

热索引为何恢复慢
对于冷索引，由于数据不再更新，利用synced flush特性，可以快速直接从本地恢复数据。
而对于热索引，特别是shard很大的热索引，synced flush派不上用场，
需要大量跨结点拷贝segment file以外，translog recovery 会导致恢复慢。

从主片恢复数据到副片需要经历3个阶段:
对主片上的segment file做一个快照，然后拷贝到复制片分配到的结点。数据拷贝期间，不会阻塞索引请求，新增索引操作记录到translog里。
对translog做一个快照，此快照包含第一阶段新增的索引请求，然后重放快照里的索引操作。此阶段仍然不阻塞索引请求，新增索引操作记录到translog里。
为了达到主副片完全同步，阻塞掉新索引请求，然后重放阶段二新增的translog操作。

Refresh

The refresh API allows to explicitly refresh one or more index,
making all operations performed since the last refresh available for search.
The (near) real-time capabilities depend on the index engine used.

刷新之后 索引更新对搜索可见

POST /twitter/_refresh

POST /kimchy,elasticsearch/_refresh

POST /_refresh

Force Merge 强制合并

The merge relates to the number of segments a Lucene index holds within each shard.
The force merge operation allows to reduce the number of segments by merging them.

合并段文件 ，减少段文件个数

POST /twitter/_forcemerge
POST /kimchy,elasticsearch/_forcemerge
POST /_forcemerge

This call will block until the merge is complete.
If the http connection is lost, the request will continue in the background,
and any new requests will block until the previous force merge is complete.

max_num_segments
The number of segments to merge to. To fully merge the index, set it to 1.
Defaults to simply checking if a merge needs to execute, and if so, executes it.

only_expunge_deletes
Should the merge process only expunge segments with deletes in it.
In Lucene, a document is not deleted from a segment, just marked as deleted.
During a merge process of segments,
a new segment is created that does not have those deletes.
This flag allows to only merge segments that have deletes.
Defaults to false. Note that this won’t override the
index.merge.policy.expunge_deletes_allowed threshold.

该参数设置为true则只合并包含删除文档的段 默认为false
文档删除只是做删除标记，合并时才真正删除

flush
Should a flush be performed after the forced merge. Defaults to true.

合并之后做一次FLUSH 默认为TRUE

elasticsearch5.0索引映射管理

elasticsearch5.0索引设置

elasticsearch5.0索引监控

线上故障处理

elasticsearch aerospike kafka副本数设置

kafka副本机制