lucene知识点整理
所属分类 lucene
浏览量 1464
版本历史
2018-09 7.5.0
2017-10 7.0
2016-04 6.0
2015-02 5.0
2013-01 4.0
2011-03 3.0
全文检索 倒排索引
索引 查询
IndexWriter IndexSearcher
String name = "dyyx";
Field("name",name,Store.YES,Index.NOT_ANALYZED);
Store
YES 存储数据
NO 不存储数据
Index
ANALYZED 分词
NOT_ANALYZED 不分词
NO 不建索引
数据+索引
Document
Field
中文分词器,庖丁分词 IKAnalyzer
自定义停用词 自定义词库
IKAnalyzer.cfg.xml
ext_dict
ext_stopwords
标准分词器StandardAnalyzer
Lucene4.4.0
索引写入
// 索引存放的位置
Directory indexdir = FSDirectory.open(new File("indexdir"));
Version matchVersion = Version.LUCENE_CURRENT;
// 使用标准分词器
Analyzer analyzer = new StandardAnalyzer(matchVersion);
IndexWriterConfig conf = new IndexWriterConfig(matchVersion, analyzer);
IndexWriter indexWriter = new IndexWriter(indexdir, conf);
Document doc = new Document();
IndexableField id = new IntField("id", 1, Store.YES);
IndexableField title = new StringField("title", "hello lucene",Store.YES);
IndexableField content = new TextField("content","hello lucene,i love lucene",Store.YES);
doc.add(id);
doc.add(title);
doc.add(content);
indexWriter.addDocument(doc);
indexWriter.close();
查询
Directory dir = FSDirectory.open(new File("indexdir"));
IndexReader indexReader = DirectoryReader.open(dir);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
Query query = new TermQuery(new Term("content", "lucene"));
// 找到符合条件的前100条数据
TopDocs topDocs = indexSearcher.search(query, 100);
System.out.println("总记录数:" + topDocs.totalHits);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
ScoreDoc scoreDoc = scoreDocs[0];
System.out.println("相关度得分:" + scoreDoc.score);
int docid = scoreDoc.doc;
Document doc = indexSearcher.doc(docid);
System.out.println(doc.get("id"));
System.out.println(doc.get("title"));
System.out.println(doc.get("content"));
使用Filter 过滤搜索结果 Filter对性能影响很大
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
String fields[] = { "title" };
QueryParser queryParser = new MultiFieldQueryParser(
LuceneUtils.getMatchVersion(), fields,
LuceneUtils.getAnalyzer());
Query query = queryParser.parse("lucene");
Filter filter = NumericRangeFilter.newIntRange("id", 1, 10, false, true);
TopDocs topDocs = indexSearcher.search(query, filter, 100);
查询
关键字查询 TermQuery
Query query = new TermQuery(new Term("title","Lucene"));
字符串搜索
QueryParser 只在一个字段中查询
MultiFieldQueryParser 可以在多个字段查询
String[] fields={"title","content"};
QueryParser queryParser = new MultiFieldQueryParser(LuceneUtils.getMatchVersion(),fields,LuceneUtils.getAnalyzer());
Query query=queryParser.parse("Lucene");
查询所有
Query query = new MatchAllDocsQuery();
范围查询,可以用来替代过滤器
Query query=NumericRangeQuery.newIntRange("id", 1, 10, true, true);
通配符
Query query=new WildcardQuery(new Term("title", "luce*"));
?代表单个任意字符,* 代表多个任意字符.
模糊查询
Query query = new FuzzyQuery(new Term("title", "Lucene"), 1);
短语查询
PhraseQuery query=new PhraseQuery();
query.add(new Term("title","solr"));
布尔查询
BooleanQuery query = new BooleanQuery();
Query query1 = NumericRangeQuery.newIntRange("id", 1, 10, true, true);
Query query2 = NumericRangeQuery.newIntRange("id", 5, 15, true, true);
// 必须满足第一个条件...
query.add(query1, Occur.MUST);
// 可以满足第二个条件
query.add(query2, Occur.SHOULD);
查询结果高亮
lucene-highlighter-4.4.0.jar
lucene-memory-4.4.0.jar
优化
MergePolicy
LogDocMergePolicy mergePolicy = new LogDocMergePolicy();
mergePolicy.setMergeFactor(6);
mergeFactor 生成多少个 segments 时进行合并
值越小,搜索越快,索引越慢
MaxMergeDocs
RAMBufferSizeMB
MaxBufferedDocs和RAMBufferSizeMB这两个参数是可以一起使用的,
一起使用时只要有一个触发条件满足就写入硬盘,生成一个新的索引segment文件。
FSDirectory与RAMDirectory(内存)
上一篇
下一篇
Elasticsearch知识点整理
2017年度五十大喜感新闻
2018年五十大喜感新闻
linux ulimit命令总结
linux之dmesg命令总结
linux性能分析及调优