文章详情|lucene二十年

lucene二十年 所属分类 lucene 浏览量 1250

https://www.elastic.co/cn/celebrating-lucene

2000
When Doug Cutting decided to learn Java, he honed his skills by creating a new search indexer project.
This endeavour led him to author the first version of Lucene — making it available
(as free and open source software) via SourceForge in April of 2000.
Lucene was Doug's fifth search engine, having previously written two while at Xerox PARC, one at Apple, and a fourth at Excite.

2001
Lucene joins Apache
In September, Lucene joins the Apache Jakarta Project.
Jakarta was a family of open source Java projects, including Apache Tomcat and Apache Struts.

2002
Lucene 1.2 becomes the first release of Lucene under the Apache license, marking its departure from LGPL licensing.
In 2002, Lucene saw nearly 400 commits from 13 unique authors — and it's only grown since then.
Observe the first years of commits and see how they've grown over time.

2003
Lucene 1.3 introduces some early flexibility with PerFieldAnalyzerWrapper,
allowing fields to use different approaches to tokenizing.

2004
Navigating search
Shay Banon releases Compass an open source project built on top of Lucene
aiming to simplify the integration of search into any Java application.
Compass would serve as the precursor for Elasticsearch.

Lucene 1.4 introduces hit sorting, allowing the results of a given query to be sorted by any indexed field.

2005
In 2005, there were 484 commits from 11 unique authors.
Committed to solving problems, big and small

2006
Lucene 1.9 introduces DateTools, allowing users to format dates for better readability, as well as handle dates before 1970.
Lucene 2.0 mirrors 1.9, aside from dropping deprecated APIs. Some contribute the longevity of the project to this type of maintenance.

2007
Lucene 2.1 updates QueryParser to allow Unicode characters to be added in their Unicode escape form. \u1F973 \u1F973 \u1F973
Lucene 2.2 includes some small optimizations with big impact. Increase to two buffer sizes yields a 10-18% write performance boost.

2008
Lucene 2.3 improves how the IndexWriter utilizes RAM for buffering documents — leading to a 2-8x indexing speed boost.

2009
Lucene 2.9 changes its API to reflect the segmented structure of its indices, boosting speeds with this new realignment.
Lucene 3.0 is the first version to require Java 5 at runtime, making use of new JVM features like generics, enums, and variable arguments.

2010
The Lucene and Solr repositories are merged. While sharing the same repo, Lucene is still available to download and use by itself.
Elasticsearch 0.1.0 is released on February 7! The open source, distributed, RESTful search engine is built on top of Lucene.

2011
Lucene 3.4 introduces block joins, allowing for efficient 1-N joins.
Twitter implements a patched version of Lucene for real-time search.

2012
Lucene 4.0 is full of important new features.
Improved index flexibility via the Codecs API,
added similarity models (BM25, DFR, and more),
and the introduction of doc values elevate Lucene into the world of serious analytics.

2013
Lucene 4.3 adds a cost API to approximate query match counts.
This adds significant speed by running cheap queries first.

Search suggestions are standard these days, but that doesn't mean they're easy.

2014
Lucene 4.8 implements a common format that includes checksums for all index files, making it possible to better detect hardware errors.

2015
Lucene 5.0 is primarily motivated by the removal of support for 3.x indices.
Managing technical debt isn't glamorous, but it is important.

Lucene 5.1 adds two-phase intersections for phrase queries,
boosting speeds by splitting approximation and confirmation into two steps.

2016
Lucene 6.0 adds support for indexing multi-dimensional points (BKD trees) and sets BM25 as the default similarity model (not TF/IDF).

With the introduction of BKD tree data structures for multidimensional search,
Lucene is no longer just a full-text search engine, but a search engine of anything.
This opened the door for the advancements we're currently watching unfold in the world of geospatial search, scoring using dynamic features, and more.

2017
Lucene 6.5 introduces logic to automatically run range queries in the most efficient way, creating considerable performance boosts.
Lucene 7.0 updates the doc value API from random access to iterative, optimizing performance with sparse data (nearly empty fields).

2018
Lucene 7.6 allows BKD trees to index a subset of their dimensions, effectively treating them like R-trees (the standard structure in geo).

2019
Lucene 8.0 implements the Block-Max WAND algorithm,
creating a tremendous boost in search result speeds over large collections of documents by excluding low scoring results from the set.

2020

influxDB HTTP API使用

Lucene6索引文件格式

Lucene7 Index File Formats

lucene flush commit与elasticsearch的refresh flush

Lucene核心知识点

《清单革命》读书笔记