首页   快速返回

elasticsearch5.0文档索引API
文章分类 elasticsearch
发布时间 2019-01-22 修改时间 2019-01-22
根据原文翻译整理
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docs-index_.html



Index API


inserts the JSON document into the "twitter" index, under a type called "tweet" with an id of 1

PUT twitter/tweet/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

The result of the above index operation is:
{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "tweet",
    "_id" : "1",
    "_version" : 1,
    "created" : true,
    "result" : created
}
Replica shards may not all be started when an indexing operation successfully returns 
(by default, only the primary is required, but this behavior can be changed). 
当索引操作成功返回时,可能不会全部启动复制碎片(默认情况下,只需要主碎片,但可以更改此行为)。

Automatic Index Creation
索引自动创建

The index operation automatically creates an index if it has not been created before,
and also automatically creates a dynamic type mapping for the specific type if one has not yet been created
索引操作会自动创建索引,自动创建动态 type mapping
The mapping itself is very flexible and is schema-free. 
New fields and objects will automatically be added to the mapping definition of the type specified.

mapping是灵活的,无模式的,新的字段和对象将自动添加到指定类型的映射定义中。


手工创建索引 
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/indices-create-index.html

 
禁用自动创建索引  节点配置文件  action.auto_create_index=false
Automatic mapping creation can be disabled by setting index.mapper.dynamic to false per-index as an index setting.
索引设置 index.mapper.dynamic=false

Automatic index creation can include a pattern based white/black list, 
for example, set action.auto_create_index to +aaa*,-bbb*,+ccc*,-* 
(+ meaning allowed, and - meaning disallowed).


版本机制

Each indexed document is given a version number. 
The index API optionally allows for optimistic concurrency control when the version parameter is specified. 

每个文档都有一个版本, 索引时提供版本,会使用乐观锁机制

 A good example of a use case for versioning is performing a transactional read-then-update. 
 Specifying a version from the document initially read ensures no changes have happened in the meantime 
 (when reading in order to update, it is recommended to set preference to _primary).
 
版本控制的一个很好的用例是执行事务性的先读后更新。
从最初读取的文档中指定一个版本可以确保在此期间没有发生任何更改
(在读取更新时,建议将首选项设置为_primary)。

PUT twitter/tweet/1?version=2
{
    "message" : "elasticsearch now has versioning support, double cool!"
}



Optionally, the version number can be supplemented with an external value 
(for example, if maintained in a database). 
To enable this functionality, version_type should be set to external. 

使用外部版本号  version_type=external

The value provided must be a numeric, long value greater or equal to 0

If the value provided is less than or equal to the stored document’s version number, 
a version conflict will occur and the index operation will fail.
提供的版本号小于或等于单前文档版本号,会出现版本冲突索引失败

外部版本号从0开始 ,
documents with version number equal to zero cannot neither be updated using the Update-By-Query API 
nor be deleted using the Delete By Query API as long as their version number is equal to zero.

版本号为0的文档不能被删除和更新


version types 
Here is an overview of the different version types and their semantics.

internal
only index the document if the given version is identical to the version of the stored document.

external or external_gt

external_gte


Operation Type 操作类型

The index operation also accepts an op_type that can be used to force a create operation, 
allowing for "put-if-absent" behavior. 
When create is used, the index operation will fail if a document by that id already exists in the index.

不存在才创建的实现使用 create 模式时,文档存在时,索引操作失败

PUT twitter/tweet/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

PUT twitter/tweet/1/_create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}


ID自动生成
索引文档时可以不指定ID , 这样 ID自动生成,使用CREATE模式,
note the POST used instead of PUT
注意使用 POST 代替 PUT  !!!

Routing
shard placement — or routing  is controlled by using a hash of the document’s id value.

分片放置或路由

 For more explicit control, the value fed into the hash function used by the router 
 can be directly specified on a per-operation basis using the routing parameter. 
 
使用 routing 设置路由 hash函数值

POST twitter/tweet?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

 the "tweet" document is routed to a shard based on the routing parameter provided: "kimchy".
 
路由hash函数使用 kimchy 进行路由


可以使用_routing字段来指导索引操作,从文档本身提取路由值。
如果定义了_routing映射并将其设置为必需的,那么如果没有提供或提取路由值,那么索引操作将失败

A child document can be indexed by specifying its parent when indexing. 

When indexing a child document, the routing value is automatically set to be the same as its parent, 

The index operation is directed to the primary shard based on its route 
and performed on the actual node containing this shard. 
After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

索引操作会定位到主分片所在节点执行 

To improve the resiliency of writes to the system, 
indexing operations can be configured to wait for a certain number of active shard copies 
before proceeding with the operation. 

为了提升可靠性,可以设置等待指定数量的副本数写入成功后再返回

By default, write operations only wait for the primary shards to be active before proceeding 
默认 写操作主分片成功即可
wait_for_active_shards=1

This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. 

Valid values are all or any positive integer up to the total number of configured copies per shard in the index 
(which is number_of_replicas+1). 
Specifying a negative value or a number greater than the number of shard copies will throw an error.

主分片副本数配置为0 ,则没有副本




3个节点 ,设置3个副本 ( 1主副本+3个副本 总共有4个副本)
默认 ,主副本写入成功即可
wait_for_active_shards 设置为3 ,需要3个副本写入成功
如果设置为4,需要4个活跃副本,操作会超时,除非集群中出现一个新节点来承载分片的第四个副本

active shard copies

refresh

 refresh和flush区别 

When updating a document using the index api a new version 
of the document is always created even if the document hasn’t changed. 

_update api with detect_noop set to true
检测是否需要更新

This option isn’t available on the index api because the index api doesn’t fetch the old source 
and isn’t able to compare it against the new source.

此选项在索引api上不可用,因为索引api不获取旧的SOURCE无法将其与新的进行比较。

超时

The primary shard assigned to perform the index operation 
might not be available when the index operation is executed. 
Some reasons for this might be that the primary shard 
is currently recovering from a gateway or undergoing relocation. 
By default, the index operation will wait on the primary shard to become available 
for up to 1 minute before failing and responding with an error. 
The timeout parameter can be used to explicitly specify how long it waits. 

执行索引操作时,主分片可能不可用。 当前正在恢复或重新定位
默认情况,索引操作会在主分片可用之前等待1分钟
超时时间可以显式指定
设置超时5分钟
PUT twitter/tweet/1?timeout=5m
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

上一篇     下一篇
elasticsearch5.0启动检查

elasticsearch5.0重要系统配置

elasticsearch5.0API约定

elasticsearch5.0文档读取API

elasticsearch5.0文档删除API

kafka这些年