首页  

Lucene7 Index File Formats     所属分类 lucene 浏览量 38
V7.0.0

索引(index)  文档(document) 域(field) 词(term)

An index contains a sequence of documents.
A document is a sequence of fields.
A field is a named sequence of terms.
A term is a sequence of bytes.


Inverted Indexing(倒排索引)
The index stores statistics about terms in order to make term-based search more efficient. 
Lucene's index falls into the family of indexes known as an inverted index. 
This is because it can list, for a term, the documents that contain it. 
This is the inverse of the natural relationship, in which documents list terms.

倒排索引存储与词相关的统计信息的索引,提高基于词的搜索效率。

In Lucene, fields may be stored, in which case their text is stored in the index literally, in a non-inverted manner. 
Fields that are inverted are called indexed. A field may be both stored and indexed.

The text of a field may be tokenized into terms to be indexed, 
or the text of a field may be used literally as a term to be indexed. 
Most fields are tokenized, but sometimes it is useful for certain identifier fields to be indexed literally.

分词 或 不分词

索引可有多个子索引或段(segment)构成,每个段是一个完全独立的索引,可以单独搜索。

Document Numbers 文档编号

添加到索引中的首个文档编号为0,后续添加的文档依次递增
文档编号可能会改变


Name(名称) Extension(扩展名) Brief Description(简介)
Segments File segments_N Stores information about a commit point.
存储提交点的信息
Lock File write.lock The Write lock prevents multiple IndexWriters from writing to the same file.
写锁可以防止多个IndexWriter写入同一文件
Segment Info .si Stores metadata about a segment.
存储段(segment)的元数据
Compound File .cfs, .cfe An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.
一个可选的"虚拟"文件,包含经常用光的系统内所有其他索引(index)文件
Fields .fnm Stores information about the fields.
存储域(field)信息
Field Index .fdx Contains pointers to field data.
存储指向域(field)数据的指针
Field Data .fdt The stored fields for documents.
存储的文档(document)域(field)
Term Dictionary .tim The term dictionary, stores term info.
词典(term dictionary),存储词(term)信息
Term Index .tip The index into the Term Dictionary.
词典(term dictionary)中的索引(index)
Frequencies .doc Contains the list of docs which contain each term along with frequency.
包含每个词(term)及词频(term fuequency)的文档列表
Positions .pos Stores position information about where a term occurs in the index.
存储词(term)在索引(index)中出现位置信息
Payloads .pay Stores additional per-position metadata information such as character offsets and user payloads.
存储额外的位置元数据信息,如字符串的偏移量和用户有效载荷
Norms .nvd, .nvm Encodes length and boost factors for docs and fields.
编码长度和文档(document)及域(field)的提升因素
Per-Document Values .dvd, .dvm Encodes additional scoring factors or other per-document information.
编码附加打分因子(score factors)和其他每篇文档(document)信息
Term Vector Index .tvx Stores offset into the document data file.
存储到文档(document)数据文件中的偏移量(offset)
Term Vector Data .tvd Contains term vector data.
包含词向量(term vector)数据
Live Documents .liv Info about what documents are live.
哪些文档(document)是活跃的信息
Point values .dii, .dim Holds indexed points, if any.
保留被索引的点,如果有的话

上一篇     下一篇
influxdb 连续查询

influxDB HTTP API使用

Lucene6索引文件格式

lucene二十年

lucene flush commit与elasticsearch的refresh flush

Lucene核心知识点