文章详情|elasticsearch5.0术语

elasticsearch5.0术语 所属分类 elasticsearch 浏览量 1694

根据原文翻译整理
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/glossary.html

Glossary of terms

analysis
Analysis is the process of converting full text to terms.
Depending on which analyzer is used, these phrases: FOO BAR, Foo-Bar, foo,bar
will probably all result in the terms foo and bar.
These terms are what is actually stored in the index.
A full text query (not a term query) for FoO:bAR will also be analyzed to the terms foo,bar
and will thus match the terms stored in the index.
It is this process of analysis (both at index time and at search time)
that allows elasticsearch to perform full text queries.

分词 分词 全文索引 索引 搜索 分析 分词

cluster
A cluster consists of one or more nodes which share the same cluster name.
Each cluster has a single master node which is chosen automatically by the cluster
and which can be replaced if the current master node fails.

集群由一个或多个使用相同集群名称的节点组成。
每个集群都有一个由集群自动选举的主节点，如果当前主节点失败，可以被替换。

document
A document is a JSON document which is stored in elasticsearch.
It is like a row in a table in a relational database.
Each document is stored in an index and has a type and an id.
A document is a JSON object (also known in other languages as a hash / hashmap / associative array)
which contains zero or more fields, or key-value pairs.
The original JSON document that is indexed will be stored in the _source field,
which is returned by default when getting or searching for a document.

每个文档存储在一个索引中，并具有类型和id。文档是JSON对象
原始的json文档存储在 _source 字段中，搜索时会返回该字段

id
The ID of a document identifies a document. The index/type/id of a document must be unique.
If no ID is provided, then it will be auto-generated.

ID是文档的标识，文档ID是唯一的，如果没有提供会自动产生一个

field
A document contains a list of fields, or key-value pairs.
The value can be a simple (scalar) value (eg a string, integer, date),
or a nested structure like an array or an object.
A field is similar to a column in a table in a relational database.
The mapping for each field has a field type (not to be confused with document type)
which indicates the type of data that can be stored in that field, eg integer, string, object.
The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.

一个文档包含多个字段，字段值可以是标量，也可以是嵌套结构，譬如数组或对象
字段和关系数据库的列类似
字段映射 字段类型 如何分词

index
An index is like a table in a relational database.
It has a mapping which defines the fields in the index,
which are grouped by multiple type.
An index is a logical namespace which maps to one or more primary shards
and can have zero or more replica shards.

索引和关系数据库的表类似 。 应该是DB type 对应表 ，待确认
mapping定义索引中的字段，按多种类型分组
索引是映射到一个或多个主分片的逻辑命名空间，可以有零个或多个复本分片。

mapping
A mapping is like a schema definition in a relational database.
Each index has a mapping, which defines each type within the index,plus a number of index-wide settings.
A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

映射类似于关系数据库中的模式定义。
每个索引都有一个映射，它定义索引中的每个类型，以及许多索引级别的设置。
映射既可以显式定义，也可以在索引文档时自动生成。

node
A node is a running instance of elasticsearch which belongs to a cluster.
Multiple nodes can be started on a single server for testing purposes,
but usually you should have one node per server.
At startup, a node will use unicast to discover an existing cluster with the same cluster name
and will try to join that cluster.

节点是集群的一个运行实例。
出于测试目的，可以在一台服务器上启动多个节点，但通常一个服务器一个节点。
在启动时，节点使用单播发现具有相同集群名称的现有集群，并试图加入这个集群。

primary shard

主分片

Each document is stored in a single primary shard.
When you index a document, it is indexed first on the primary shard,
then on all replicas of the primary shard.
By default, an index has 5 primary shards.
You can specify fewer or more primary shards to scale the number of documents that your index can handle.
You cannot change the number of primary shards in an index, once the index is created.

索引文档时先写入主分片，索引默认有5个分片
根据文档数设置合适的分片数，控制每个分片的大小
索引创建之后不能修改分片数

replica shard
Each primary shard can have zero or more replicas.
A replica is a copy of the primary shard, and has two purposes:
increase failover: a replica shard can be promoted to a primary shard if the primary fails
increase performance: get and search requests can be handled by primary or replica shards.
By default, each primary shard has one replica,
but the number of replicas can be changed dynamically on an existing index.
A replica shard will never be started on the same node as its primary shard.
When you index a document, it is stored on a single primary shard.
That shard is chosen by hashing the routing value.
By default, the routing value is derived from the ID of the document or,
if the document has a specified parent document,
from the ID of the parent document (to ensure that child and parent documents are stored on the same shard).
This value can be overridden by specifying a routing value at index time, or a routing field in the mapping.

一个主分片可以有0到多个副本，默认为1个副本
副本的作用
提升故障切换能力
提升性能 在分片上并行操作
路由 散列 ， 默认按ID路由
父子文档需要存储在同一个分片上
路由字段 可以再索引时指定或在映射里定义

shard
A shard is a single Lucene instance.
It is a low-level “worker” unit which is managed automatically by elasticsearch.
An index is a logical namespace which points to primary and replica shards.
Other than defining the number of primary and replica shards that an index should have,
you never need to refer to shards directly.
Instead, your code should deal only with an index.
Elasticsearch distributes shards amongst all nodes in the cluster,
and can move shards automatically from one node to another in the case of node failure,
or the addition of new nodes.

分片是一个完整的lucene实例
索引是一个逻辑命名空间，用于管理分片和副本
节点失败或新增节点时，会自动移动分片

source field
By default, the JSON document that you index will be stored in the _source field
and will be returned by all get and search requests.
This allows you access to the original object directly from search results,
rather than requiring a second step to retrieve the object from an ID.

json文档存储在_source字段，查询时会返回，避免根据ID取获取文档

term
A term is an exact value that is indexed in elasticsearch.
The terms foo, Foo, FOO are NOT equivalent.
Terms (i.e. exact values) can be searched for using term queries.

词查询

text
Text (or full text) is ordinary unstructured text, such as this paragraph.
By default, text will be analyzed into terms, which is what is actually stored in the index.
Text fields need to be analyzed at index time in order to be searchable as full text,
and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms
that were generated at index time.

文本分词
文本字段索引时分词
关键词搜索时分词

type
A type represents the type of document, e.g. an email, a user, or a tweet.
The search API can filter documents by type.
An index can contain multiple types,
and each type has a list of fields that can be specified for documents of that type.
Fields with the same name in different types in the same index must have the same mapping
(which defines how each field in the document is indexed and made searchable).

文档类型
跟剧类型过滤文档
一个索引可以包含多个类型
同一个索引中不同类型中具有相同名称的字段必须具有相同的映射

kafka消费者offset记录位置和方式

KPI与KOR

elasticsearch5.0基本概念

elasticsearch5.0入门之索引操作

elasticsearch中refresh和flush区别

elasticsearch5.0数据索引与查询实战