When Doug Cutting decided to learn Java, he honed his skills by creating a new search indexer project.
This endeavour led him to author the first version of Lucene — making it available
(as free and open source software) via SourceForge in April of 2000.
Lucene was Doug's fifth search engine, having previously written two while at Xerox PARC, one at Apple, and a fourth at Excite.
Lucene joins Apache
In September, Lucene joins the Apache Jakarta Project.
Jakarta was a family of open source Java projects, including Apache Tomcat and Apache Struts.
Lucene 1.2 becomes the first release of Lucene under the Apache license, marking its departure from LGPL licensing.
In 2002, Lucene saw nearly 400 commits from 13 unique authors — and it's only grown since then.
Observe the first years of commits and see how they've grown over time.
Lucene 1.3 introduces some early flexibility with PerFieldAnalyzerWrapper,
allowing fields to use different approaches to tokenizing.
Shay Banon releases Compass an open source project built on top of Lucene
aiming to simplify the integration of search into any Java application.
Compass would serve as the precursor for Elasticsearch.
Lucene 1.4 introduces hit sorting, allowing the results of a given query to be sorted by any indexed field.
In 2005, there were 484 commits from 11 unique authors.
Committed to solving problems, big and small
Lucene 1.9 introduces DateTools, allowing users to format dates for better readability, as well as handle dates before 1970.
Lucene 2.0 mirrors 1.9, aside from dropping deprecated APIs. Some contribute the longevity of the project to this type of maintenance.
Lucene 2.1 updates QueryParser to allow Unicode characters to be added in their Unicode escape form. \u1F973 \u1F973 \u1F973
Lucene 2.2 includes some small optimizations with big impact. Increase to two buffer sizes yields a 10-18% write performance boost.
Lucene 2.3 improves how the IndexWriter utilizes RAM for buffering documents — leading to a 2-8x indexing speed boost.
Lucene 2.9 changes its API to reflect the segmented structure of its indices, boosting speeds with this new realignment.
Lucene 3.0 is the first version to require Java 5 at runtime, making use of new JVM features like generics, enums, and variable arguments.
The Lucene and Solr repositories are merged. While sharing the same repo, Lucene is still available to download and use by itself.
Elasticsearch 0.1.0 is released on February 7! The open source, distributed, RESTful search engine is built on top of Lucene.
Lucene 3.4 introduces block joins, allowing for efficient 1-N joins.
Twitter implements a patched version of Lucene for real-time search.
Lucene 4.0 is full of important new features.
Improved index flexibility via the Codecs API,
added similarity models (BM25, DFR, and more),
and the introduction of doc values elevate Lucene into the world of serious analytics.
Lucene 4.3 adds a cost API to approximate query match counts.
This adds significant speed by running cheap queries first.
Search suggestions are standard these days, but that doesn't mean they're easy.
Lucene 4.8 implements a common format that includes checksums for all index files, making it possible to better detect hardware errors.
Lucene 5.0 is primarily motivated by the removal of support for 3.x indices.
Managing technical debt isn't glamorous, but it is important.
Lucene 5.1 adds two-phase intersections for phrase queries,
boosting speeds by splitting approximation and confirmation into two steps.
Lucene 6.0 adds support for indexing multi-dimensional points (BKD trees) and sets BM25 as the default similarity model (not TF/IDF).
With the introduction of BKD tree data structures for multidimensional search,
Lucene is no longer just a full-text search engine, but a search engine of anything.
This opened the door for the advancements we're currently watching unfold in the world of geospatial search, scoring using dynamic features, and more.
Lucene 6.5 introduces logic to automatically run range queries in the most efficient way, creating considerable performance boosts.
Lucene 7.0 updates the doc value API from random access to iterative, optimizing performance with sparse data (nearly empty fields).
Lucene 7.6 allows BKD trees to index a subset of their dimensions, effectively treating them like R-trees (the standard structure in geo).
Lucene 8.0 implements the Block-Max WAND algorithm,
creating a tremendous boost in search result speeds over large collections of documents by excluding low scoring results from the set.
influxDB HTTP API使用
Lucene7 Index File Formats
lucene flush commit与elasticsearch的refresh flush