首页  

Lucene 查询类型及实例     所属分类 lucene 浏览量 1003
Lucene 8.2.0

org.apache.lucene.search.Query

MatchAllDocsQuery
TermQuery
BooleanQuery
PhraseQuery
MultiPhraseQuery
PrefixQuery
WildcardQuery
RegexpQuery
FuzzyQuery
PointRangeQuery
TermRangeQuery
ConstantScoreQuery


DisjunctionMaxQuery 主要用于控制评分机制
SpanQuery


BooleanQuery
SHOULD  或 
MUST    与 
FILTER  与,和must的区别是不计算score,性能比must好。如果只关注是否匹配,不关注匹配程度(即得分),优先使用filter
MUST NOT  

BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new TermQuery(new Term(CONTENT, "python")), BooleanClause.Occur.MUST);
builder.add(new TermQuery(new Term(CONTENT, "snake")), BooleanClause.Occur.MUST);
	        
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new TermQuery(new Term(CONTENT, "python")), BooleanClause.Occur.MUST);
builder.add(new TermQuery(new Term(CONTENT, "snake")), BooleanClause.Occur.MUST_NOT);
	               

PhraseQuery
public PhraseQuery(int slop, String field, String... terms)
public PhraseQuery(String field, String... terms) {
    this(0, field, terms);
}

PhraseQuery query = new PhraseQuery(CONTENT, "quick", "fox");
PhraseQuery query = new PhraseQuery(1, CONTENT, "quick", "fox");
PhraseQuery query = new PhraseQuery(2, CONTENT, "fox", "dog");

slop


PrefixQuery 
前缀查询,匹配所有该前缀开头的term

WildcardQuery  
支持通配符 *(匹配0个或多个字符), ?(匹配一个字符) 
PrefixQuery是WildcardQuery的一种特殊情况,但其底层不是基于WildcardQuery,是单独实现的

RegexpQuery 支持正则表达式


FuzzyQuery (与 PhraseQuery 类似 基于 编辑距离 )
public FuzzyQuery(Term term, int maxEdits)
FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term(CONTENT, "jack"), 1);
FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term(CONTENT, "jack"), 2);


Edit distance  两个字符串的相似度(词是一种特殊的字符串)
Levenshtein distance 及其扩展 Damerau–Levenshtein distance
如果最少通过n个 增加(Insertion)/删除(Deletion)/替换(Substitution) 单个符号(symbol)的操作能使两个字符串相等,那这两个字符串的距离就是 n

只允许使用增、删、替换三种操作。
一次只能操作1个符号。如果是计算两个词的距离,那一个符号就代表一个字母;如果是计算两个句子的距离,那一个符号就代表一个词。
计算的是最少达到目标的操作数。

Damerau–Levenshtein distance对Levenshtein distance做了一个扩展
增加了一个transposition操作,定义 相邻 symbol的位置交换为1次操作,即distance为1

cat和cta的距离 
Levenshtein distance 距离为2   将cat的a替换为t,t替换为a两个操作才可以使得cat和cta相等
Damerau–Levenshtein distance 

TermRangeQuery
TermRangeQuery termRangeQuery = new TermRangeQuery(CONTENT, new BytesRef("java"), new BytesRef("jb"), true, false);

PointRangeQuery
常用数值型字段   IntPoint LongPoint FloatPoint DoublePoint
Query query = IntPoint.newRangeQuery("length2", 30, 90);


ConstantScoreQuery
包装查询,将查询结果中的评分改为一个常量值(默认1.0)


完整代码 https://gitee.com/dyyx/demos/tree/master/lucenedemo 索引 IndexDemo 查询 QueryDemo 测试文本 src/main/resources/data/data.txt The python is a larger snake Python is a programming language that lets you work quickly and integrate systems more effectively. the quick brown fox jumped over the lazy dog. the quick fox jumped over the lazy cat. Jazz is a style of music that combined ragtime with experimental orchestral techniques. Java is a programming language Jetty is an open-source project providing an HTTP server, HTTP client, and javax.servlet container. jack ma is a good teacher

上一篇     下一篇
lucene flush commit与elasticsearch的refresh flush

Lucene核心知识点

《清单革命》读书笔记

iotdb 概述

iotdb 数据模型和术语

linux shell 输出重定向