文章详情|elasticsearch中的DocValues

elasticsearch中的DocValues 所属分类 elasticsearch 浏览量 1121

搜索引擎 反向索引 关键词到文档的映射 所有的关键词是一个有序列表
聚合分析 vs 搜索
ES 为聚合计算引入 fielddata 
fielddata 保存在 内存中 ，占用内存  OutOfMemory fullGC  影响稳定性

由具体的搜索需求来触发 , 未命中的搜索，需要先在内存中建立fielddata，影响性能和响应时间

fielddata的问题在于内存的有限性 和 JVM 大内存 GC 对系统带来的稳定性挑战。

引入 DocValues 机制 ，持久化存储 ，预先构建
数据写入时，生成反向索引和DocValues，这会消耗额外的存储空间，但是减少对内存的需求
DocValues是预先构建的，避免查询未不命中时构建fielddata ，
总体来看，DocValues只比内存fielddata慢大概10~25%，稳定性则有了大幅度提升。


使用反向索引做搜索，通过DocValues列式存储做分析


DocValues 默认对所有字段启用，除了 analyzed strings
禁用 DocValues
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "session_id": {
          "type":       "string",
          "index":      "not_analyzed",
          "doc_values": false 
        }
      }
    }
  }
}

通过设置 doc_values:false ，这个字段将不能被用于聚合、排序以及脚本操作

同样可以禁用倒排索引，使它不能被正常搜索，但是可以排序
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "customer_token": {
          "type":       "string",
          "index":      "not_analyzed",
          "doc_values": true, 
          "index": "no" 
        }
      }
    }
  }
}

通过设置 doc_values:true 和 index:no ，我们得到一个只能被用于聚合/排序/脚本的字段。

git branch 和 git checkout常见用法

hive数据仓库

Zookeeper在HBase中的应用

zab协议

hive

HIVE数据模型