elasticsearch 文档评分模型
所属分类 elasticsearch
浏览量 984
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/index-modules-similarity.html
A similarity (scoring / ranking model) defines how matching documents are scored.
文本相似度
org.apache.lucene.search.similarities.BM25Similarity
/**
* BM25 with the supplied parameter values.
* @param k1 Controls non-linear term frequency normalization (saturation).
* @param b Controls to what degree document length normalizes tf values.
*/
public BM25Similarity(float k1, float b) {
this.k1 = k1;
this.b = b;
}
public BM25Similarity() {
this.k1 = 1.2f;
this.b = 0.75f;
}
传统的TF-IDF算法中 词频的影响程度是无限增大的,关键词出现的越频繁,TF-IDF相关度就越高。
elasticsearch.yml file
index.similarity.default.type: BM25
PUT /your_index/
{
"settings": {
"similarity": {
"bm25-inverse-zero": {
"type": "BM25",
"b": 0
}
},
}
上一篇
下一篇
doc_values和fielddata
Elasticsearch zen discovery
elasticsearch索引原理
git提交指定文件
Elasticsearch mapping中的字段属性总结
BI工具需求要点