vespa

AI + Data, online. https://vespa.ai

6,226

645

6,226

217

View on GitHub

Top Related Projects

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

lucene-solr

4,377

Apache Lucene and Solr open-source search software

algoliasearch-client-javascript

1,353

⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.

OpenSearch

10,574

🔎 Open source distributed and RESTful search engine.

meilisearch

50,860

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

typesense

22,898

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Quick Overview

Vespa is an open-source big data processing and serving engine. It provides real-time, scalable computation and selection of data, allowing for low-latency ranking and organization of large datasets. Vespa is designed to handle complex queries and machine-learned models in production environments.

Pros

Highly scalable and performant for large-scale data processing and serving
Supports real-time updates and queries on big data
Flexible query language and ranking framework
Integrates machine learning models seamlessly

Cons

Steep learning curve for beginners
Complex setup and configuration process
Limited community support compared to some other big data technologies
Requires significant resources for optimal performance

Code Examples

Creating a simple document:

public class MyDocument extends Document {
    public MyDocument(DocumentId id) {
        super("mydocument", id);
        this.setFieldValue("title", "Hello, Vespa!");
        this.setFieldValue("body", "This is a sample document.");
    }
}

Performing a simple query:

SearchResult result = container.search(new Query("query=hello"));
for (Hit hit : result.hits()) {
    System.out.println(hit.getField("title"));
}

Defining a custom ranking profile:

rank-profiles:
  my_ranking_profile:
    first-phase:
      expression: nativeRank(title) + 0.1 * attribute(popularity)

Getting Started

Install Docker and Docker Compose

Clone the Vespa sample apps repository:

git clone https://github.com/vespa-engine/sample-apps.git

Navigate to the basic search app:
```
cd sample-apps/basic-search
```
Build and start the Docker container:
```
docker-compose up -d
```
Wait for the application to start, then feed and search data:
```
./feed_and_search.py
```

This will set up a basic Vespa application with sample data and demonstrate simple search functionality.

Competitor Comparisons

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

More mature and widely adopted, with a larger community and ecosystem
Extensive documentation and learning resources available
Powerful full-text search capabilities out of the box

Cons of Elasticsearch

Can be resource-intensive and may require significant hardware for large-scale deployments
Complex configuration and tuning process for optimal performance
Limited support for real-time updates and streaming data

Code Comparison

Elasticsearch query:

{
  "query": {
    "match": {
      "title": "search example"
    }
  }
}

Vespa query:

yql: select * from sources * where title contains "search example"

Both Elasticsearch and Vespa are powerful search and analytics engines, but they have different strengths and use cases. Elasticsearch excels in full-text search and log analysis, while Vespa is designed for more complex, real-time applications with advanced ranking and machine learning capabilities.

Vespa offers better support for real-time updates and streaming data, making it more suitable for applications that require frequent data changes and immediate visibility. It also provides more flexibility in terms of ranking and relevance tuning.

Elasticsearch, on the other hand, has a larger ecosystem and more third-party integrations, making it easier to find solutions for common use cases and extend its functionality.

lucene-solr

4,377

Apache Lucene and Solr open-source search software

Pros of Lucene-Solr

More mature and widely adopted in the industry
Extensive documentation and community support
Flexible and customizable for various use cases

Cons of Lucene-Solr

Can be complex to set up and configure
May require more resources for large-scale deployments
Less integrated machine learning capabilities compared to Vespa

Code Comparison

Lucene-Solr (Java):

IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("title", "My Document", Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc);
writer.close();

Vespa (Java):

DocumentProcessor processor = new DocumentProcessor() {
    @Override
    public Progress process(Processing processing) {
        for (DocumentOperation op : processing.getDocumentOperations()) {
            if (op instanceof DocumentPut) {
                Document doc = ((DocumentPut) op).getDocument();
                doc.setFieldValue("title", "My Document");
            }
        }
        return Progress.DONE;
    }
};

Both repositories offer powerful search and indexing capabilities, but Vespa provides a more integrated platform for real-time big data serving and processing, while Lucene-Solr focuses on core search functionality with extensive customization options.

algoliasearch-client-javascript

1,353

⚡️ A fully-featured and blazing-fast JavaScript API client to interact with Algolia.

Pros of algoliasearch-client-javascript

Lightweight and focused on search functionality
Easy integration with JavaScript and TypeScript projects
Extensive documentation and examples for quick implementation

Cons of algoliasearch-client-javascript

Limited to search and indexing operations
Requires external hosting and management of search infrastructure
Less flexibility for custom search algorithms and ranking models

Code Comparison

algoliasearch-client-javascript:

const client = algoliasearch('APP_ID', 'API_KEY');
const index = client.initIndex('your_index_name');
index.search('query').then(({ hits }) => {
  console.log(hits);
});

Vespa:

SearchRequest request = new SearchRequest.Builder()
    .yql("select * from sources * where query()").build();
Result result = container.search(request);
System.out.println(result.hits());

The algoliasearch-client-javascript code demonstrates a simple search operation using the Algolia client, while the Vespa code shows a basic search request using Vespa's Java API. Algolia's client is more concise and JavaScript-friendly, whereas Vespa offers more control over the search process and can be integrated into larger Java applications.

OpenSearch

10,574

🔎 Open source distributed and RESTful search engine.

Pros of OpenSearch

More extensive documentation and community support
Broader ecosystem with plugins and integrations
Better suited for large-scale distributed search and analytics

Cons of OpenSearch

Higher resource requirements and complexity
Steeper learning curve for beginners
Less flexible for custom application-specific search solutions

Code Comparison

OpenSearch query example:

GET /my-index/_search
{
  "query": {
    "match": {
      "title": "search example"
    }
  }
}

Vespa query example:

yql: select * from sources * where title contains "search example";

OpenSearch focuses on JSON-based queries, while Vespa uses a SQL-like syntax called YQL. Vespa's approach can be more intuitive for developers familiar with SQL, while OpenSearch's JSON structure aligns well with modern web development practices.

Both systems offer powerful search capabilities, but Vespa provides more flexibility for custom ranking and real-time big data applications. OpenSearch, being a fork of Elasticsearch, benefits from a larger ecosystem and is often preferred for general-purpose search and analytics use cases.

meilisearch

50,860

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

Pros of Meilisearch

Easier to set up and use, with a focus on simplicity and developer experience
Faster indexing and search performance for smaller datasets
Built-in typo tolerance and relevancy ranking out of the box

Cons of Meilisearch

Limited scalability for very large datasets compared to Vespa
Fewer advanced features and customization options
Less support for complex query types and distributed search

Code Comparison

Meilisearch query example:

const search = await client.index('movies').search('batman', {
  limit: 10,
  attributesToRetrieve: ['title', 'year']
});

Vespa query example:

SearchResult result = container.search(
    new Query("select * from movies where title contains 'batman'")
        .setHits(10)
        .setRanking(new Ranking().setProfile("default"))
);

Both examples demonstrate basic search functionality, but Vespa's query syntax is more SQL-like and offers more advanced options for ranking and filtering. Meilisearch's API is simpler and more intuitive for basic use cases, aligning with its focus on ease of use.

typesense

22,898

Pros of Typesense

Simpler setup and configuration, making it easier for beginners
Faster indexing and search performance for smaller datasets
Built-in typo tolerance and fuzzy search capabilities

Cons of Typesense

Limited scalability for very large datasets compared to Vespa
Fewer advanced features and customization options
Less support for complex query types and machine learning integrations

Code Comparison

Typesense query example:

client.collections('books').documents().search({
  q: 'harry potter',
  query_by: 'title,author',
  sort_by: 'ratings_count:desc'
})

Vespa query example:

SearchRequest request = new SearchRequest.Builder()
    .yql("select * from sources * where title contains 'harry potter' or author contains 'harry potter' order by ratings_count desc")
    .build();
Result result = container.search(request);

Both Typesense and Vespa are powerful search engines, but they cater to different use cases. Typesense is more suitable for smaller to medium-sized applications that require quick setup and straightforward search functionality. Vespa, on the other hand, excels in large-scale, complex search and recommendation systems with advanced machine learning capabilities. The choice between the two depends on the specific requirements of your project, including scalability needs, complexity of search queries, and desired level of customization.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GitHub License Maven metadata URL

Search, make inferences in and organize vectors, tensors, text and structured data, at serving time and any scale.

This repository contains all the code required to build and run all of Vespa yourself, and where you can see all development as it happens. All the content in this repository is licensed under the Apache 2.0 license.

A new release of Vespa is made from this repository's master branch every morning CET Monday through Thursday.

Home page: https://vespa.ai
Documentation: https://docs.vespa.ai
Continuous build: https://factory.vespa.ai
Run applications in the cloud for free: vespa.ai/free-trial

Background
Install
Usage
Contribute
Building
License

Background

Use cases such as search, recommendation and personalization need to select a subset of data in a large corpus, evaluate machine-learned models over the selected data, organize and aggregate it and return it, typically in less than 100 milliseconds, all while the data corpus is continuously changing.

This is hard to do, especially with large data sets that need to be distributed over multiple nodes and evaluated in parallel. Vespa is a platform that performs these operations for you with high availability and performance. It has been in development for many years and is used on several large internet services and apps which serve hundreds of thousands of queries from Vespa per second.

Install

Deploy your Vespa applications to the cloud service: console.vespa-cloud.com, or run your own Vespa instance: https://docs.vespa.ai/en/getting-started.html

Usage

The application created in the getting started guides linked above is fully functional and production-ready, but you may want to add more nodes for redundancy.
See developing applications on adding your own Java components to your Vespa application.
Vespa APIs is useful to understand how to interface with Vespa
Explore the sample applications
Follow the Vespa Blog for feature updates / use cases

Full documentation is at https://docs.vespa.ai.

Contribute

We welcome contributions! See CONTRIBUTING.md to learn how to contribute.

If you want to contribute to the documentation, see https://github.com/vespa-engine/documentation

Building

You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code. This section explains how to build and test Vespa. To understand where to make changes, see Code-map.md. Some suggested improvements with pointers to code are in TODO.md.

Development environment

C++ and Java building is supported on AlmaLinux 8. The Java source can also be built on any platform having Java 17 and Maven 3.8+ installed. Use the following guide to set up a complete development environment using Docker for building Vespa, running unit tests and running system tests: Vespa development on AlmaLinux 8.

Java environment for Mac

Install JDK17, Maven Version Manager and jEnv through Homebrew.

brew install jenv mvnvm openjdk@17

For the system Java wrappers to find this JDK, symlink it with

sudo ln -sfn /opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-17.jdk

Follow "Configure your shell" in https://www.jenv.be. Configuration is shell specific. For zsh use the below commands:

echo 'export PATH="$HOME/.jenv/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(jenv init -)"' >> ~/.zshrc
eval "$(jenv init -)"
jenv enable-plugin export
exec $SHELL -l

Add JDK17 to jEnv

jenv add $(/usr/libexec/java_home -v 17)

Verify configuration with Maven by executing below command in the root of the source code. Output should refer to the JDK and Maven version specified in the .java-version and mvnvm.properties.

mvn -v

Build Java modules

export MAVEN_OPTS="-Xms128m -Xmx1024m"
./bootstrap.sh java
mvn install --threads 1C

Use this if you only need to build the Java modules, otherwise follow the complete development guide above.

License

Code licensed under the Apache 2.0 license. See LICENSE for terms.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Elasticsearch

Cons of Elasticsearch

Code Comparison

Pros of Lucene-Solr

Cons of Lucene-Solr

Code Comparison

Pros of algoliasearch-client-javascript

Cons of algoliasearch-client-javascript

Code Comparison

Pros of OpenSearch

Cons of OpenSearch

Code Comparison

Pros of Meilisearch

Cons of Meilisearch

Code Comparison

Pros of Typesense

Cons of Typesense

Code Comparison

Convert designs to code with AI

README

Table of contents

Background

Install

Usage

Contribute

Building

Development environment

Java environment for Mac

Build Java modules

License

Top Related Projects

Convert designs to code with AI