文章详情|elasticsearch 文档评分模型

elasticsearch 文档评分模型 所属分类 elasticsearch 浏览量 1345

https://www.elastic.co/guide/en/elasticsearch/reference/5.6/index-modules-similarity.html

A similarity (scoring / ranking model) defines how matching documents are scored. 

文本相似度

org.apache.lucene.search.similarities.BM25Similarity


/**
   * BM25 with the supplied parameter values.
   * @param k1 Controls non-linear term frequency normalization (saturation).
   * @param b Controls to what degree document length normalizes tf values.
   */
  public BM25Similarity(float k1, float b) {
    this.k1 = k1;
    this.b  = b;
  }
  
  public BM25Similarity() {
    this.k1 = 1.2f;
    this.b  = 0.75f;
  }
  
传统的TF-IDF算法中 词频的影响程度是无限增大的，关键词出现的越频繁，TF-IDF相关度就越高。

  
elasticsearch.yml file
index.similarity.default.type: BM25

PUT /your_index/ 
{
       "settings": {
          "similarity": {
             "bm25-inverse-zero": {
                "type": "BM25",
                "b": 0
             }
          },
}

doc_values和fielddata

Elasticsearch zen discovery

elasticsearch索引原理

git提交指定文件

Elasticsearch mapping中的字段属性总结

BI工具需求要点