lucene-solr

Apache Lucene and Solr open-source search software

4,370

2,659

4,370

232

View on GitHub

Top Related Projects

elasticsearch

69,881

Free and Open Source, Distributed, RESTful Search Engine

vespa

5,880

AI + Data, online. https://vespa.ai

crate

4,140

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

OpenSearch

9,671

🔎 Open source distributed and RESTful search engine.

algoliasearch-client-javascript

1,329

⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.

Quick Overview

Apache Lucene and Solr are high-performance, full-featured text search engine libraries written entirely in Java. Lucene is a powerful core search library, while Solr is a search server that uses Lucene as its core search engine. Together, they provide scalable, efficient, and feature-rich search capabilities for various applications.

Pros

Highly scalable and performant, capable of handling large volumes of data
Rich set of features including full-text search, faceting, highlighting, and geospatial search
Active community and regular updates, ensuring ongoing improvements and support
Flexible and customizable, allowing for integration with various data sources and applications

Cons

Steep learning curve, especially for beginners
Complex configuration and setup process
Resource-intensive, requiring significant memory and CPU for optimal performance
May be overkill for simple search requirements in smaller applications

Code Examples

Creating an index and adding documents:

Directory index = new ByteBuffersDirectory();
IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(index, config);

Document doc = new Document();
doc.add(new TextField("title", "Example Document", Field.Store.YES));
doc.add(new TextField("content", "This is the content of the document.", Field.Store.YES));
writer.addDocument(doc);

writer.close();

Performing a search:

DirectoryReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);

Query query = new QueryParser("content", new StandardAnalyzer()).parse("content");
TopDocs results = searcher.search(query, 10);

for (ScoreDoc scoreDoc : results.scoreDocs) {
    Document doc = searcher.doc(scoreDoc.doc);
    System.out.println("Title: " + doc.get("title"));
}

reader.close();

Using Solr's SolrJ client to index documents:

SolrClient solr = new HttpSolrClient.Builder("http://localhost:8983/solr/mycore").build();

SolrInputDocument document = new SolrInputDocument();
document.addField("id", "1");
document.addField("title", "Solr Example");
document.addField("content", "This is an example document for Solr.");

solr.add(document);
solr.commit();

Getting Started

Download Apache Lucene or Solr from the official website.
For Lucene, add the following Maven dependency:

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>8.8.2</version>
</dependency>

For Solr, start the Solr server:

bin/solr start

Create a core:

bin/solr create -c mycore

Begin indexing documents and performing searches using the provided APIs.

Competitor Comparisons

elasticsearch

69,881

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

More user-friendly and easier to set up and configure
Built-in RESTful API for easier integration and management
Better distributed architecture for improved scalability

Cons of Elasticsearch

Proprietary features in recent versions limit full open-source usage
Higher resource consumption, especially memory usage
Steeper learning curve for advanced features and optimizations

Code Comparison

Elasticsearch query:

{
  "query": {
    "match": {
      "title": "search example"
    }
  }
}

Lucene-Solr query:

title:search title:example

Both Elasticsearch and Lucene-Solr are powerful search engines built on top of Apache Lucene. Elasticsearch offers a more modern, distributed approach with easier setup and management, while Lucene-Solr provides a more traditional, highly customizable solution. Elasticsearch's JSON-based query language is more verbose but often easier to read and construct, especially for complex queries. Lucene-Solr's query syntax is more compact and closer to Lucene's native query language. The choice between the two often depends on specific project requirements, existing infrastructure, and team expertise.

vespa

5,880

AI + Data, online. https://vespa.ai

Pros of Vespa

Real-time indexing and serving, allowing immediate updates and queries
Built-in machine learning capabilities for advanced ranking and recommendation
Scalable distributed architecture for handling large datasets and high query loads

Cons of Vespa

Steeper learning curve due to its comprehensive feature set
Less widespread adoption compared to Lucene/Solr
Requires more resources for deployment and management

Code Comparison

Vespa query example:

SearchRequest request = new SearchRequest.Builder()
    .yql("select * from music where artist contains 'Beatles'")
    .hits(10)
    .build();
Result result = container.search(request);

Lucene query example:

Query query = new TermQuery(new Term("artist", "Beatles"));
TopDocs docs = searcher.search(query, 10);
ScoreDoc[] hits = docs.scoreDocs;

Both examples demonstrate basic search functionality, but Vespa's query language (YQL) offers more flexibility for complex queries. Lucene's approach is more low-level, requiring additional code for advanced features.

Vespa excels in real-time processing and machine learning integration, while Lucene/Solr offers a more established ecosystem and simpler setup for basic search needs. The choice between them depends on specific project requirements and scalability needs.

crate

4,140

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Pros of Crate

Built specifically for distributed SQL databases, offering better scalability for large datasets
Supports real-time full-text search and geospatial queries out of the box
Easier setup and maintenance, with a more modern architecture

Cons of Crate

Less mature and battle-tested compared to Lucene/Solr
Smaller community and ecosystem, potentially leading to fewer resources and third-party integrations
May not be as feature-rich for advanced text analysis and information retrieval tasks

Code Comparison

Crate SQL query:

SELECT * FROM users WHERE match(name, 'John') ORDER BY _score DESC LIMIT 10;

Lucene/Solr query:

SolrQuery query = new SolrQuery();
query.setQuery("name:John");
query.setSort("score", SolrQuery.ORDER.desc);
query.setRows(10);

Both repositories offer powerful search capabilities, but Crate provides a more SQL-like syntax for querying, while Lucene/Solr uses a Java-based API. Crate's approach may be more familiar to developers with SQL backgrounds, whereas Lucene/Solr's approach offers more fine-grained control over query construction and execution.

OpenSearch

9,671

🔎 Open source distributed and RESTful search engine.

Pros of OpenSearch

More active development and frequent updates
Broader feature set, including machine learning capabilities
Better support for distributed systems and cloud environments

Cons of OpenSearch

Newer project with less established ecosystem
Potentially steeper learning curve for newcomers
Some compatibility issues with older Elasticsearch versions

Code Comparison

OpenSearch (query DSL):

{
  "query": {
    "match": {
      "title": "OpenSearch example"
    }
  }
}

Lucene-Solr (Lucene query syntax):

title:"Lucene Solr example"

Both projects use similar query structures, but OpenSearch tends to use more JSON-based DSL, while Lucene-Solr often uses a more compact syntax.

Summary

OpenSearch is a more modern and actively developed project, offering advanced features and better cloud support. However, Lucene-Solr has a longer history and a more established ecosystem. The choice between them depends on specific project requirements, existing infrastructure, and desired features.

sphinx

6,483

The Sphinx documentation generator

Pros of Sphinx

Lightweight and easy to set up for documentation projects
Supports multiple output formats (HTML, PDF, ePub)
Extensible through plugins and extensions

Cons of Sphinx

Limited to documentation use cases, not a full-text search engine
Less scalable for large-scale search applications
Smaller community and ecosystem compared to Lucene/Solr

Code Comparison

Sphinx (Python):

from sphinx.application import Sphinx

app = Sphinx(srcdir, confdir, outdir, doctreedir, buildername)
app.build()

Lucene/Solr (Java):

IndexWriter writer = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new TextField("content", "Hello, world!", Field.Store.YES));
writer.addDocument(doc);
writer.close();

The code snippets demonstrate the core functionality of each project. Sphinx focuses on building documentation, while Lucene/Solr is designed for indexing and searching text content. Sphinx uses a higher-level API for generating documentation, whereas Lucene/Solr provides low-level indexing capabilities for search applications.

algoliasearch-client-javascript

1,329

⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.

Pros of algoliasearch-client-javascript

Lightweight and easy to integrate into JavaScript projects
Provides real-time search capabilities with minimal configuration
Offers built-in analytics and personalization features

Cons of algoliasearch-client-javascript

Less flexible for complex, custom search implementations
Requires ongoing subscription for Algolia's hosted service
Limited control over search infrastructure and data storage

Code Comparison

algoliasearch-client-javascript:

const client = algoliasearch('APP_ID', 'API_KEY');
const index = client.initIndex('your_index_name');
index.search('query').then(({ hits }) => {
  console.log(hits);
});

lucene-solr:

SolrClient client = new HttpSolrClient.Builder("http://localhost:8983/solr").build();
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
QueryResponse response = client.query("your_core_name", query);
SolrDocumentList results = response.getResults();

The algoliasearch-client-javascript code demonstrates a simpler setup and search execution, while the lucene-solr example shows a more verbose but potentially more customizable approach. Algolia's client is designed for quick integration and real-time search, whereas Lucene-Solr offers more control over the search process and infrastructure.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Lucene and Solr have separate repositories now!

Solr has become a top-level Apache project and main line development for Lucene and Solr is happening in each project's git repository now:

Lucene: https://gitbox.apache.org/repos/asf/lucene.git
Solr: https://gitbox.apache.org/repos/asf/solr.git

Development for bugfixes of 8.11.x releases remains on branch branch_8_11 in the shared repository:

https://gitbox.apache.org/repos/asf/lucene-solr.git

GitHub forks?

If you are using GitHub, make a clone of the corresponding repository mirror and create your pull requests against the main branch:

Lucene: https://github.com/apache/lucene
Solr: https://github.com/apache/solr

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot