文章详情|elasticsearch5.0数据索引与查询实战

elasticsearch5.0数据索引与查询实战 所属分类 elasticsearch 浏览量 1439
根据原文翻译整理
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/_exploring_your_data.html




let’s try to work on a more realistic dataset. 
尝试使用更实际的数据集
I’ve prepared a sample of fictitious JSON documents of customer bank account information. 
虚构的客户银行账户信息JSON文档

https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json



{
    "account_number":1,
    "balance":39225,
    "firstname":"Amber",
    "lastname":"Duke",
    "age":32,
    "gender":"M",
    "address":"880 Holmes Lane",
    "employer":"Pyrami",
    "email":"amberduke@pyrami.com",
    "city":"Brogan",
    "state":"IL"
}


For the curious, I generated this data from www.json-generator.com/ 
so please ignore the actual values and semantics of the data as these are all randomly generated.

出于好奇，我从www.json-generator.com/生成这些数据，因此请忽略数据的实际值和语义，因为它们都是随机生成的。

批量索引数据
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"

curl 'localhost:9200/_cat/indices?v'
http://127.0.0.1:9200/_cat/indices?v

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank      k0hpKiOGQFWeKwnuTQy-2w   5   1       1000            0    648.3kb        648.3kb

Which means that we just successfully bulk indexed 1000 documents into the bank index (under the account type).


There are two basic ways to run searches: one is by sending search parameters through the REST request URI 
and the other by sending them through the REST request body. 
The request body method allows you to be more expressive and also to define your searches in a more readable JSON format. 

The REST API for search is accessible from the _search endpoint.

两种查询接口 querystring 和 request body 
request body方法更有表现力，可以用更可读的JSON格式定义搜索。

http://127.0.0.1:9200/bank/_search?q=*&sort=account_number:asc&pretty

http://127.0.0.1:9200/bank/_search?q=*&sort=account_number:desc&pretty

curl -X POST http://127.0.0.1:9200/bank/_search?pretty -d '
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}
'

POST a JSON-style query request body to the _search API.

返回结果

took – time in milliseconds for Elasticsearch to execute the search
timed_out – tells us if the search timed out or not
_shards – tells us how many shards were searched, as well as a count of the successful/failed searched shards
hits – search results
hits.total – total number of documents matching our search criteria
hits.hits – actual array of search results (defaults to first 10 documents)
sort - sort key for results (missing if sorting by score)
_score and max_score - ignore these fields for now

 
Introducing the Query Language
Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries. 
Query DSL. 
json 风格的 DSL 用于执行查询

The query language is quite comprehensive
查询语言非常全面
 

GET /bank/_search
{
  "query": { "match_all": {} }
}

The match_all query is simply a search for all documents in the specified index.

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 3,
  "sort": { "balance": { "order": "desc" } },
  "_source": ["account_number", "balance"]
}

if size is not specified, it defaults to 10.
returns documents 11 through 20
The from parameter (0-based) 

"sort": { "balance": { "order": "desc" } }

"_source": ["account_number", "balance"]

分页 排序 字段筛选

"query": { "match": { "account_number": 20 } }
"query": { "match_phrase": { "address": "mill lane" } }

The bool query allows us to compose smaller queries into bigger queries using boolean logic.
bool查询使用布尔逻辑将较小的查询组合成较大的查询。

{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

must should must_not

This example returns all accounts of anybody who is 40 years old but don’t live in ID(aho):


{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

Executing Filters

document score (_score field in the search results). 
The score is a numeric value that is a relative measure of 
how well the document matches the search query that we specified. 
The higher the score, the more relevant the document is
文档分数  分数越高  相关性越大

But queries do not always need to produce scores, 
in particular when they are only used for "filtering" the document set. 
Elasticsearch detects these situations and automatically 
optimizes query execution in order not to compute useless scores.

查询并不总是需要生成分数，特别是只用于“过滤”文档集时。ES会检测这些情况并自动优化查询执行，以避免计算无用的分数。


 "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
the bool query contains a match_all query (the query part) and a range query (the filter part). 

聚合
Aggregations provide the ability to group and extract statistics from your data. 

groups all the accounts by state, and then returns the top 10 (default) states 
sorted by count descending (also default)

默认按数量降序

/bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

In SQL, the above aggregation is similar in concept to:

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

set size=0 to not show search hits because we only want to see the aggregation results in the response.

size设置为0，只返回聚合结果 不显示符合条件的原始记录

聚合字段需要为keyword类型、text类型需设置fielddata为true，将按照分词聚合


 {
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

嵌套聚合 
 nested the average_balance aggregation inside the group_by_state aggregation. 
 
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

根据average_balance排序

根据年龄分段分桶 聚合 

group by age brackets (ages 20-29, 30-39, and 40-49), 
then by gender, and then finally get the average account balance, per age bracket, per gender

{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}
elasticsearch5.0术语

elasticsearch5.0入门之索引操作

elasticsearch中refresh和flush区别

网站运营推广之七言绝句

elasticsearch5.0使用RPM包安装

elasticsearch5.0安装