首页  

elasticsearch5.0批量更新API     所属分类 elasticsearch 浏览量 1432
根据原文翻译整理
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docs-bulk.html


The bulk API makes it possible to perform many index/delete operations in a single API call. 
This can greatly increase the indexing speed.

批量索引和删除,提升索引速度

The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk.

输入格式 json 

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

the final line of data must end with a newline character \n
最后一行换行符!!!

The possible actions are index, create, delete and update. 

index and create expect a source on the next line, 
and have the same semantics as the op_type parameter to the standard index API
create will fail if a document with the same index and type exists already, 
whereas index will add or replace a document as necessary

create 文档存在时会失败
索引操作 新增或替换文档
delete does not expect a source on the following line 
删除操作不需要 提供 source 
update expects that the partial doc, upsert and script and its options are specified on the next line.
更新操作 需要部分文档 upsert and script 及相应的可选参数

curl  use the --data-binary flag instead of plain -d. 

cat data.txt
{ "index" : { "_index" : "test", "_type" : "type2", "_id" : "1" } }
{ "field1" : "value1" }

curl -s -X POST http://127.0.0.1:9200/_bulk --data-binary "@data.txt"

-s --silent 静默模式,不显示错误和进度, 
 
http://127.0.0.1:9200/test/type2/_search


curl -s -X POST http://127.0.0.1:9200/_bulk --data-binary "@data.txt"

{ "index" : { "_index" : "test", "_type" : "type2", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type2", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type2", "_index" : "test"} }
{ "doc" : {"field1" : "value1-new"} }

update action is a partial document, that will be merged with the already stored document.

update操作是一个部分文档,它将与已存储的文档合并。

A note on the format. The idea here is to make processing of this as fast as possible. 
As some of the actions will be redirected to other shards on other nodes, 
only action_meta_data is parsed on the receiving node side.
一些操作将被重定向到其他节点上的分片,因此在接收节点端只解析action_meta_data。

客户端尽量不要缓冲

The failure of a single action does not affect the remaining actions.
一个action失败不会影响其他的action

There is no "correct" number of actions to perform in a single bulk call. 
You should experiment with different settings to find the optimum size for your particular workload.
批量大小需要根据实际情况设置

_version/version  version_type/_version_type

_routing/routing
 
_parent/parent
It automatically follows the behavior of the index / delete operation based on the _parent / _routing mapping.

wait_for_active_shards

refresh

_retry_on_conflict
specify how many times an update should be retried in the case of a version conflict.
版本冲突重试次数

doc (partial document), upsert, doc_as_upsert, script and _source. 


{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "script" : { "inline": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}

上一篇     下一篇
elasticsearch5.0文档更新API

elasticsearch5.0文档查询更新API

elasticsearch5.0批量读取API

elasticsearch5.0重建索引API

elasticsearch5.0词向量信息查询接口

elasticsearch5.0刷新机制