lucene

Apache Lucene open-source search software

2,943

1,115

2,943

2,429

View on GitHub

Top Related Projects

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

vespa

6,226

AI + Data, online. https://vespa.ai

OpenSearch

10,574

🔎 Open source distributed and RESTful search engine.

meilisearch

50,860

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

typesense

22,898

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

sonic

20,705

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Quick Overview

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform applications. Lucene is the foundation for many popular search platforms, including Apache Solr and Elasticsearch.

Pros

Highly efficient and scalable search capabilities
Supports advanced features like faceting, highlighting, and geospatial search
Extensive language support with analyzers for many languages
Active development and strong community support

Cons

Steep learning curve for beginners
Requires significant memory and CPU resources for large-scale applications
Can be complex to configure and fine-tune for optimal performance
Direct usage of Lucene may require more low-level implementation compared to higher-level search platforms

Code Examples

Creating an index and adding documents:

Directory index = new ByteBuffersDirectory();
IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(index, config);

Document doc = new Document();
doc.add(new TextField("title", "Example Document", Field.Store.YES));
doc.add(new TextField("content", "This is the content of the document.", Field.Store.YES));
writer.addDocument(doc);

writer.close();

Searching the index:

DirectoryReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);

Query query = new QueryParser("content", new StandardAnalyzer()).parse("content");
TopDocs results = searcher.search(query, 10);

for (ScoreDoc scoreDoc : results.scoreDocs) {
    Document doc = searcher.doc(scoreDoc.doc);
    System.out.println("Title: " + doc.get("title"));
}

reader.close();

Using custom analyzers:

Analyzer customAnalyzer = CustomAnalyzer.builder()
    .withTokenizer(StandardTokenizerFactory.class)
    .addTokenFilter(LowerCaseFilterFactory.class)
    .addTokenFilter(StopFilterFactory.class)
    .build();

IndexWriterConfig config = new IndexWriterConfig(customAnalyzer);
IndexWriter writer = new IndexWriter(index, config);

Getting Started

To use Apache Lucene in your Java project, add the following Maven dependency:

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>9.5.0</version>
</dependency>

For additional features like analyzers and query parsers, include the relevant modules:

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-common</artifactId>
    <version>9.5.0</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>9.5.0</version>
</dependency>

Then, import the necessary classes and start using Lucene in your Java code as shown in the examples above.

Competitor Comparisons

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

Full-featured search and analytics engine with a RESTful API
Distributed architecture for scalability and high availability
Rich ecosystem of plugins and integrations

Cons of Elasticsearch

Higher resource requirements and complexity
Steeper learning curve for advanced features
Potential licensing concerns with recent changes

Code Comparison

Lucene (Java):

IndexWriter writer = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new TextField("title", "My Document", Field.Store.YES));
writer.addDocument(doc);
writer.close();

Elasticsearch (JSON API):

POST /my_index/_doc
{
  "title": "My Document"
}

Elasticsearch builds upon Lucene's core search functionality, providing a more user-friendly and feature-rich solution for distributed search and analytics. While Lucene offers low-level control and efficiency, Elasticsearch simplifies deployment and scaling for large-scale applications. However, this comes at the cost of increased complexity and resource usage.

Lucene is better suited for embedded search functionality or when fine-grained control over indexing and searching is required. Elasticsearch excels in scenarios requiring distributed search, real-time analytics, and integration with other data processing tools.

vespa

6,226

AI + Data, online. https://vespa.ai

Pros of Vespa

Offers real-time, scalable search and recommendation capabilities
Provides advanced machine learning and AI integration out-of-the-box
Supports structured data and complex queries with a flexible query language

Cons of Vespa

Steeper learning curve due to its comprehensive feature set
Requires more system resources for optimal performance
Less extensive community support compared to Lucene's ecosystem

Code Comparison

Vespa query example:

select * from music where
  ({defaultIndex:"hey jude"}) and
  artist contains "beatles"

Lucene query example:

Query query = new BooleanQuery.Builder()
    .add(new TermQuery(new Term("lyrics", "hey jude")), BooleanClause.Occur.MUST)
    .add(new TermQuery(new Term("artist", "beatles")), BooleanClause.Occur.MUST)
    .build();

Both examples demonstrate querying capabilities, but Vespa's query language is more concise and readable for complex queries. Lucene's approach offers more programmatic flexibility but may require more code for similar functionality.

OpenSearch

10,574

🔎 Open source distributed and RESTful search engine.

Pros of OpenSearch

More comprehensive search solution with built-in analytics and visualization capabilities
Offers a complete ecosystem with plugins, dashboards, and additional features
Designed for scalability and distributed environments out of the box

Cons of OpenSearch

Higher resource requirements and complexity compared to Lucene
Steeper learning curve for implementation and management
Less flexibility for custom low-level search implementations

Code Comparison

Lucene (Java):

IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer());
Document doc = new Document();
doc.add(new TextField("content", "Lucene example", Field.Store.YES));
writer.addDocument(doc);
writer.close();

OpenSearch (JSON):

PUT /my-index/_doc/1
{
  "content": "OpenSearch example"
}

Summary

Lucene is a low-level search library offering fine-grained control and efficiency for custom search implementations. OpenSearch, built on Lucene, provides a more comprehensive search and analytics solution with additional features and scalability. While OpenSearch offers a complete ecosystem, it comes with increased complexity and resource requirements. Lucene is better suited for lightweight, custom search implementations, while OpenSearch excels in large-scale, distributed search and analytics scenarios.

meilisearch

50,860

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

Pros of Meilisearch

Easy to set up and use, with a more user-friendly API
Typo-tolerance and relevancy-focused search out of the box
Faster indexing and query performance for smaller datasets

Cons of Meilisearch

Less mature and battle-tested compared to Lucene
Limited advanced features and customization options
Smaller community and ecosystem

Code Comparison

Meilisearch query example:

client.index('movies').search('botman', {
  limit: 5,
  attributesToHighlight: ['title']
})

Lucene query example:

IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new QueryParser("title", analyzer).parse("botman");
TopDocs docs = searcher.search(query, 5);

Both Lucene and Meilisearch are powerful search engines, but they cater to different use cases. Lucene offers more flexibility and advanced features for complex search requirements, while Meilisearch focuses on simplicity and ease of use for smaller to medium-sized applications. Lucene's maturity and extensive ecosystem make it suitable for large-scale enterprise applications, whereas Meilisearch's modern approach and user-friendly design make it attractive for developers looking for quick implementation and good out-of-the-box performance.

typesense

22,898

Pros of Typesense

Easier to set up and use, with a more user-friendly API
Built-in typo tolerance and relevance tuning
Faster indexing and search performance for certain use cases

Cons of Typesense

Less mature and battle-tested compared to Lucene
Smaller community and ecosystem
Limited advanced features and customization options

Code Comparison

Typesense query example:

client.collections('books').documents().search({
  q: 'harry potter',
  query_by: 'title,author',
  sort_by: 'ratings_count:desc'
})

Lucene query example:

IndexSearcher searcher = new IndexSearcher(reader);
Query query = new QueryParser("title", analyzer).parse("harry potter");
TopDocs docs = searcher.search(query, 10);

Both Lucene and Typesense are search engine libraries, but they cater to different needs. Lucene is a highly flexible and powerful search library that forms the foundation for many enterprise search solutions. It offers extensive customization options and supports complex queries.

Typesense, on the other hand, is designed to be a simpler, more user-friendly alternative that prioritizes ease of use and fast setup. It provides out-of-the-box features like typo tolerance and relevance tuning, making it attractive for developers who want a quick and efficient search solution without diving deep into search engine intricacies.

While Lucene has a larger ecosystem and more advanced features, Typesense may be preferable for projects that require rapid development and straightforward integration of search functionality.

sonic

20,705

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Pros of Sonic

Lightweight and fast, designed for high-performance search and suggest operations
Simple to set up and use, with a straightforward API
Written in Rust, offering memory safety and concurrent processing benefits

Cons of Sonic

Less feature-rich compared to Lucene's extensive capabilities
Smaller community and ecosystem, with fewer resources and integrations
Limited language analysis and advanced text processing features

Code Comparison

Sonic (search query):

let results = channel.query("collection", "bucket", "quick brown fox", 10, None);

Lucene (search query):

Query query = new QueryParser("content", analyzer).parse("quick brown fox");
TopDocs results = searcher.search(query, 10);

Summary

Sonic is a lightweight, fast search engine focused on simplicity and performance, ideal for basic search and suggest functionalities. Lucene, on the other hand, is a more comprehensive and feature-rich search library with advanced text analysis capabilities and a larger ecosystem. Sonic may be preferable for projects requiring quick setup and simple search operations, while Lucene is better suited for complex search requirements and extensive text processing needs.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Latest Releases: https://lucene.apache.org/core/documentation.html
Nightly: https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/
New contributors should start by reading Contributing Guide
Build System Documentation: help/
Migration Guide: lucene/MIGRATE.md

Building

Basic steps:

Install OpenJDK 23.
Clone Lucene's git repository (or download the source distribution).
Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Additional Developer Documentation: dev-docs/

Discussion and Support

Users Mailing List
Developers Mailing List
IRC: #lucene and #lucene-dev on freenode.net

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot