Top Related Projects
Free and Open Source, Distributed, RESTful Search Engine
AI + Data, online. https://vespa.ai
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
🔎 Open source distributed and RESTful search engine.
The Sphinx documentation generator
⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.
Quick Overview
Apache Lucene and Solr are high-performance, full-featured text search engine libraries written entirely in Java. Lucene is a powerful core search library, while Solr is a search server that uses Lucene as its core search engine. Together, they provide scalable, efficient, and feature-rich search capabilities for various applications.
Pros
- Highly scalable and performant, capable of handling large volumes of data
- Rich set of features including full-text search, faceting, highlighting, and geospatial search
- Active community and regular updates, ensuring ongoing improvements and support
- Flexible and customizable, allowing for integration with various data sources and applications
Cons
- Steep learning curve, especially for beginners
- Complex configuration and setup process
- Resource-intensive, requiring significant memory and CPU for optimal performance
- May be overkill for simple search requirements in smaller applications
Code Examples
- Creating an index and adding documents:
Directory index = new ByteBuffersDirectory();
IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(index, config);
Document doc = new Document();
doc.add(new TextField("title", "Example Document", Field.Store.YES));
doc.add(new TextField("content", "This is the content of the document.", Field.Store.YES));
writer.addDocument(doc);
writer.close();
- Performing a search:
DirectoryReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new QueryParser("content", new StandardAnalyzer()).parse("content");
TopDocs results = searcher.search(query, 10);
for (ScoreDoc scoreDoc : results.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println("Title: " + doc.get("title"));
}
reader.close();
- Using Solr's SolrJ client to index documents:
SolrClient solr = new HttpSolrClient.Builder("http://localhost:8983/solr/mycore").build();
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "1");
document.addField("title", "Solr Example");
document.addField("content", "This is an example document for Solr.");
solr.add(document);
solr.commit();
Getting Started
- Download Apache Lucene or Solr from the official website.
- For Lucene, add the following Maven dependency:
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>8.8.2</version>
</dependency>
- For Solr, start the Solr server:
bin/solr start
- Create a core:
bin/solr create -c mycore
- Begin indexing documents and performing searches using the provided APIs.
Competitor Comparisons
Free and Open Source, Distributed, RESTful Search Engine
Pros of Elasticsearch
- More user-friendly and easier to set up and configure
- Built-in RESTful API for easier integration and management
- Better distributed architecture for improved scalability
Cons of Elasticsearch
- Proprietary features in recent versions limit full open-source usage
- Higher resource consumption, especially memory usage
- Steeper learning curve for advanced features and optimizations
Code Comparison
Elasticsearch query:
{
"query": {
"match": {
"title": "search example"
}
}
}
Lucene-Solr query:
title:search title:example
Both Elasticsearch and Lucene-Solr are powerful search engines built on top of Apache Lucene. Elasticsearch offers a more modern, distributed approach with easier setup and management, while Lucene-Solr provides a more traditional, highly customizable solution. Elasticsearch's JSON-based query language is more verbose but often easier to read and construct, especially for complex queries. Lucene-Solr's query syntax is more compact and closer to Lucene's native query language. The choice between the two often depends on specific project requirements, existing infrastructure, and team expertise.
AI + Data, online. https://vespa.ai
Pros of Vespa
- Real-time indexing and serving, allowing immediate updates and queries
- Built-in machine learning capabilities for advanced ranking and recommendation
- Scalable distributed architecture for handling large datasets and high query loads
Cons of Vespa
- Steeper learning curve due to its comprehensive feature set
- Less widespread adoption compared to Lucene/Solr
- Requires more resources for deployment and management
Code Comparison
Vespa query example:
SearchRequest request = new SearchRequest.Builder()
.yql("select * from music where artist contains 'Beatles'")
.hits(10)
.build();
Result result = container.search(request);
Lucene query example:
Query query = new TermQuery(new Term("artist", "Beatles"));
TopDocs docs = searcher.search(query, 10);
ScoreDoc[] hits = docs.scoreDocs;
Both examples demonstrate basic search functionality, but Vespa's query language (YQL) offers more flexibility for complex queries. Lucene's approach is more low-level, requiring additional code for advanced features.
Vespa excels in real-time processing and machine learning integration, while Lucene/Solr offers a more established ecosystem and simpler setup for basic search needs. The choice between them depends on specific project requirements and scalability needs.
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
Pros of Crate
- Built specifically for distributed SQL databases, offering better scalability for large datasets
- Supports real-time full-text search and geospatial queries out of the box
- Easier setup and maintenance, with a more modern architecture
Cons of Crate
- Less mature and battle-tested compared to Lucene/Solr
- Smaller community and ecosystem, potentially leading to fewer resources and third-party integrations
- May not be as feature-rich for advanced text analysis and information retrieval tasks
Code Comparison
Crate SQL query:
SELECT * FROM users WHERE match(name, 'John') ORDER BY _score DESC LIMIT 10;
Lucene/Solr query:
SolrQuery query = new SolrQuery();
query.setQuery("name:John");
query.setSort("score", SolrQuery.ORDER.desc);
query.setRows(10);
Both repositories offer powerful search capabilities, but Crate provides a more SQL-like syntax for querying, while Lucene/Solr uses a Java-based API. Crate's approach may be more familiar to developers with SQL backgrounds, whereas Lucene/Solr's approach offers more fine-grained control over query construction and execution.
🔎 Open source distributed and RESTful search engine.
Pros of OpenSearch
- More active development and frequent updates
- Broader feature set, including machine learning capabilities
- Better support for distributed systems and cloud environments
Cons of OpenSearch
- Newer project with less established ecosystem
- Potentially steeper learning curve for newcomers
- Some compatibility issues with older Elasticsearch versions
Code Comparison
OpenSearch (query DSL):
{
"query": {
"match": {
"title": "OpenSearch example"
}
}
}
Lucene-Solr (Lucene query syntax):
title:"Lucene Solr example"
Both projects use similar query structures, but OpenSearch tends to use more JSON-based DSL, while Lucene-Solr often uses a more compact syntax.
Summary
OpenSearch is a more modern and actively developed project, offering advanced features and better cloud support. However, Lucene-Solr has a longer history and a more established ecosystem. The choice between them depends on specific project requirements, existing infrastructure, and desired features.
The Sphinx documentation generator
Pros of Sphinx
- Lightweight and easy to set up for documentation projects
- Supports multiple output formats (HTML, PDF, ePub)
- Extensible through plugins and extensions
Cons of Sphinx
- Limited to documentation use cases, not a full-text search engine
- Less scalable for large-scale search applications
- Smaller community and ecosystem compared to Lucene/Solr
Code Comparison
Sphinx (Python):
from sphinx.application import Sphinx
app = Sphinx(srcdir, confdir, outdir, doctreedir, buildername)
app.build()
Lucene/Solr (Java):
IndexWriter writer = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new TextField("content", "Hello, world!", Field.Store.YES));
writer.addDocument(doc);
writer.close();
The code snippets demonstrate the core functionality of each project. Sphinx focuses on building documentation, while Lucene/Solr is designed for indexing and searching text content. Sphinx uses a higher-level API for generating documentation, whereas Lucene/Solr provides low-level indexing capabilities for search applications.
⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.
Pros of algoliasearch-client-javascript
- Lightweight and easy to integrate into JavaScript projects
- Provides real-time search capabilities with minimal configuration
- Offers built-in analytics and personalization features
Cons of algoliasearch-client-javascript
- Less flexible for complex, custom search implementations
- Requires ongoing subscription for Algolia's hosted service
- Limited control over search infrastructure and data storage
Code Comparison
algoliasearch-client-javascript:
const client = algoliasearch('APP_ID', 'API_KEY');
const index = client.initIndex('your_index_name');
index.search('query').then(({ hits }) => {
console.log(hits);
});
lucene-solr:
SolrClient client = new HttpSolrClient.Builder("http://localhost:8983/solr").build();
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
QueryResponse response = client.query("your_core_name", query);
SolrDocumentList results = response.getResults();
The algoliasearch-client-javascript code demonstrates a simpler setup and search execution, while the lucene-solr example shows a more verbose but potentially more customizable approach. Algolia's client is designed for quick integration and real-time search, whereas Lucene-Solr offers more control over the search process and infrastructure.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Apache Lucene and Solr have separate repositories now!
Solr has become a top-level Apache project and main line development for Lucene and Solr is happening in each project's git repository now:
- Lucene: https://gitbox.apache.org/repos/asf/lucene.git
- Solr: https://gitbox.apache.org/repos/asf/solr.git
Development for bugfixes of 8.11.x releases remains on branch
branch_8_11
in the shared repository:
GitHub forks?
If you are using GitHub, make a clone of the corresponding repository mirror and create your pull requests against the main branch:
Top Related Projects
Free and Open Source, Distributed, RESTful Search Engine
AI + Data, online. https://vespa.ai
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
🔎 Open source distributed and RESTful search engine.
The Sphinx documentation generator
⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot