Lucene 查询类型及实例
所属分类 lucene
浏览量 1211
Lucene 8.2.0
org.apache.lucene.search.Query
MatchAllDocsQuery
TermQuery
BooleanQuery
PhraseQuery
MultiPhraseQuery
PrefixQuery
WildcardQuery
RegexpQuery
FuzzyQuery
PointRangeQuery
TermRangeQuery
ConstantScoreQuery
DisjunctionMaxQuery 主要用于控制评分机制
SpanQuery
BooleanQuery
SHOULD 或
MUST 与
FILTER 与,和must的区别是不计算score,性能比must好。如果只关注是否匹配,不关注匹配程度(即得分),优先使用filter
MUST NOT
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new TermQuery(new Term(CONTENT, "python")), BooleanClause.Occur.MUST);
builder.add(new TermQuery(new Term(CONTENT, "snake")), BooleanClause.Occur.MUST);
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new TermQuery(new Term(CONTENT, "python")), BooleanClause.Occur.MUST);
builder.add(new TermQuery(new Term(CONTENT, "snake")), BooleanClause.Occur.MUST_NOT);
PhraseQuery
public PhraseQuery(int slop, String field, String... terms)
public PhraseQuery(String field, String... terms) {
this(0, field, terms);
}
PhraseQuery query = new PhraseQuery(CONTENT, "quick", "fox");
PhraseQuery query = new PhraseQuery(1, CONTENT, "quick", "fox");
PhraseQuery query = new PhraseQuery(2, CONTENT, "fox", "dog");
slop
PrefixQuery
前缀查询,匹配所有该前缀开头的term
WildcardQuery
支持通配符 *(匹配0个或多个字符), ?(匹配一个字符)
PrefixQuery是WildcardQuery的一种特殊情况,但其底层不是基于WildcardQuery,是单独实现的
RegexpQuery 支持正则表达式
FuzzyQuery (与 PhraseQuery 类似 基于 编辑距离 )
public FuzzyQuery(Term term, int maxEdits)
FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term(CONTENT, "jack"), 1);
FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term(CONTENT, "jack"), 2);
Edit distance 两个字符串的相似度(词是一种特殊的字符串)
Levenshtein distance 及其扩展 Damerau–Levenshtein distance
如果最少通过n个 增加(Insertion)/删除(Deletion)/替换(Substitution) 单个符号(symbol)的操作能使两个字符串相等,那这两个字符串的距离就是 n
只允许使用增、删、替换三种操作。
一次只能操作1个符号。如果是计算两个词的距离,那一个符号就代表一个字母;如果是计算两个句子的距离,那一个符号就代表一个词。
计算的是最少达到目标的操作数。
Damerau–Levenshtein distance对Levenshtein distance做了一个扩展
增加了一个transposition操作,定义 相邻 symbol的位置交换为1次操作,即distance为1
cat和cta的距离
Levenshtein distance 距离为2 将cat的a替换为t,t替换为a两个操作才可以使得cat和cta相等
Damerau–Levenshtein distance
TermRangeQuery
TermRangeQuery termRangeQuery = new TermRangeQuery(CONTENT, new BytesRef("java"), new BytesRef("jb"), true, false);
PointRangeQuery
常用数值型字段 IntPoint LongPoint FloatPoint DoublePoint
Query query = IntPoint.newRangeQuery("length2", 30, 90);
ConstantScoreQuery
包装查询,将查询结果中的评分改为一个常量值(默认1.0)
完整代码
https://gitee.com/dyyx/demos/tree/master/lucenedemo
索引 IndexDemo
查询 QueryDemo
测试文本 src/main/resources/data/data.txt
The python is a larger snake
Python is a programming language that lets you work quickly and integrate systems more effectively.
the quick brown fox jumped over the lazy dog.
the quick fox jumped over the lazy cat.
Jazz is a style of music that combined ragtime with experimental orchestral techniques.
Java is a programming language
Jetty is an open-source project providing an HTTP server, HTTP client, and javax.servlet container.
jack ma is a good teacher
上一篇
下一篇
lucene flush commit与elasticsearch的refresh flush
Lucene核心知识点
《清单革命》读书笔记
iotdb 概述
iotdb 数据模型和术语
linux shell 输出重定向