Convert Figma logo to code with AI

toshi-search logoToshi

A full-text search engine in rust

4,181
130
4,181
27

Top Related Projects

Free and Open, Distributed, RESTful Search Engine

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

20,388

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

19,790

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

ZincSearch . A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

Quick Overview

Toshi is an open-source full-text search engine written in Rust. It aims to provide a fast, reliable, and scalable solution for text search and indexing, with a focus on performance and ease of use.

Pros

  • High performance due to Rust implementation
  • Scalable architecture suitable for large datasets
  • Easy to set up and use
  • Active development and community support

Cons

  • Relatively new project, may lack some advanced features of established search engines
  • Documentation could be more comprehensive
  • Limited ecosystem compared to more mature search solutions
  • May require Rust knowledge for advanced customization

Code Examples

Here are a few examples of using Toshi:

  1. Creating an index:
use toshi::Index;

let index = Index::create("my_index").unwrap();
  1. Adding documents to the index:
use toshi::Document;

let doc = Document::new()
    .add_text("title", "Sample Document")
    .add_text("content", "This is the content of the sample document.");

index.add_document(doc).unwrap();
  1. Performing a search:
use toshi::query::Query;

let query = Query::parse("sample content").unwrap();
let results = index.search(&query).unwrap();

for (doc_id, score) in results {
    println!("Document ID: {}, Score: {}", doc_id, score);
}

Getting Started

To get started with Toshi, follow these steps:

  1. Add Toshi to your Cargo.toml:

    [dependencies]
    toshi = "0.x.x"
    
  2. In your Rust code:

    use toshi::{Index, Document, query::Query};
    
    fn main() -> Result<(), Box<dyn std::error::Error>> {
        let index = Index::create("my_index")?;
    
        let doc = Document::new()
            .add_text("title", "Hello Toshi")
            .add_text("content", "This is a sample document for Toshi search engine.");
    
        index.add_document(doc)?;
    
        let query = Query::parse("sample toshi")?;
        let results = index.search(&query)?;
    
        for (doc_id, score) in results {
            println!("Found document: {}, score: {}", doc_id, score);
        }
    
        Ok(())
    }
    

This example creates an index, adds a document, and performs a search query.

Competitor Comparisons

Free and Open, Distributed, RESTful Search Engine

Pros of Elasticsearch

  • Mature and widely adopted, with extensive documentation and community support
  • Rich ecosystem of plugins and integrations
  • Highly scalable and distributed architecture

Cons of Elasticsearch

  • Resource-intensive, requiring significant memory and CPU
  • Complex configuration and setup process
  • Steep learning curve for advanced features

Code Comparison

Elasticsearch query:

{
  "query": {
    "match": {
      "title": "search engine"
    }
  }
}

Toshi query:

Query::new_match("title", "search engine")

Key Differences

  • Language: Elasticsearch is written in Java, while Toshi is written in Rust
  • Performance: Toshi aims for better performance and lower resource usage
  • Simplicity: Toshi focuses on a more straightforward API and configuration
  • Features: Elasticsearch offers more advanced features and aggregations
  • Indexing: Toshi uses a different indexing approach, potentially offering faster indexing speeds

Use Cases

Elasticsearch is well-suited for large-scale enterprise applications with complex search requirements, while Toshi may be a good fit for projects prioritizing performance and simplicity, especially those already using Rust in their stack.

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

Pros of Meilisearch

  • More mature and widely adopted, with a larger community and ecosystem
  • Offers a user-friendly web interface for easy management and monitoring
  • Provides built-in typo tolerance and language detection features

Cons of Meilisearch

  • Written in Rust, which may have a steeper learning curve for some developers
  • Less flexible in terms of customization compared to Toshi's modular architecture

Code Comparison

Meilisearch query example:

let search_results = index.search()
    .with_query("search term")
    .with_limit(20)
    .execute()
    .await?;

Toshi query example:

let query = Query::new("search term".to_string());
let results = index.search(&query).await?;

Both Meilisearch and Toshi are open-source search engines written in Rust, aiming to provide fast and efficient full-text search capabilities. Meilisearch offers a more polished and feature-rich experience out of the box, while Toshi focuses on modularity and extensibility.

Meilisearch is better suited for projects requiring a ready-to-use solution with minimal setup, whereas Toshi may appeal to developers who want more control over the search engine's internals and are willing to invest time in customization.

20,388

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Pros of Typesense

  • More mature and feature-rich, with a larger community and better documentation
  • Offers out-of-the-box support for various data types and advanced search features
  • Provides official client libraries for multiple programming languages

Cons of Typesense

  • Closed-source with a commercial license, limiting customization options
  • Higher resource consumption compared to Toshi's lightweight design
  • Steeper learning curve due to more complex configuration options

Code Comparison

Typesense query example:

client.collections['books'].search({
  'q': 'harry potter',
  'query_by': 'title,author',
  'sort_by': 'ratings_count:desc'
})

Toshi query example:

let query = Query::new("harry potter")
    .with_fields(vec!["title", "author"])
    .with_order_by("ratings_count", Order::Desc);
index.search(&query).await?;

Both examples demonstrate similar querying capabilities, with Typesense using a Python client and Toshi using Rust. Typesense's API appears more concise, while Toshi's query builder offers a more explicit structure. The core functionality of searching and sorting is present in both systems, showcasing their comparable basic features despite differences in implementation and language choice.

19,790

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Pros of Sonic

  • Lightweight and fast, with low memory footprint
  • Simple setup and configuration
  • Supports multiple languages out of the box

Cons of Sonic

  • Limited advanced search features compared to Toshi
  • Less flexible schema and indexing options
  • Smaller community and ecosystem

Code Comparison

Sonic (Rust):

use sonic_channel::*;

let channel = IngestChannel::start("localhost:1491", "SecretPassword").unwrap();
channel.push("collection", "bucket", "object:1", "Hello World!").unwrap();

Toshi (Rust):

use toshi_types::{Catalog, Document, IndexHandle};

let mut catalog = Catalog::new("my_index");
let doc = Document::new("1", "Hello World!");
catalog.add_document(doc).unwrap();

Both Sonic and Toshi are search engines written in Rust, but they have different focuses. Sonic aims to be a lightweight, fast search backend with simple setup, while Toshi provides more advanced search capabilities and flexibility.

Sonic excels in scenarios requiring quick deployment and low resource usage, supporting multiple languages out of the box. However, it lacks some of the advanced features and flexibility offered by Toshi, such as complex querying and schema customization.

Toshi, on the other hand, provides more powerful search capabilities and greater control over indexing and querying. It's better suited for applications requiring advanced search features, but may have a steeper learning curve and higher resource requirements compared to Sonic.

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Pros of Quickwit

  • Written in Rust, offering better performance and memory safety
  • Supports distributed search and horizontal scaling
  • More active development and larger community support

Cons of Quickwit

  • Relatively newer project, potentially less stable
  • Limited documentation compared to Toshi
  • Steeper learning curve for developers new to Rust

Code Comparison

Toshi (Elasticsearch-like query syntax):

{
  "query": {
    "term": { "field": "value" }
  }
}

Quickwit (Custom query syntax):

field:value AND other_field:[1 TO 10]

Both projects aim to provide fast and efficient search capabilities, but they differ in their approach and implementation. Toshi is built on top of Tantivy and follows an Elasticsearch-like architecture, while Quickwit is designed from the ground up with a focus on distributed search and cloud-native deployments.

Toshi offers a familiar query syntax for those coming from Elasticsearch, which can be an advantage for teams already familiar with that ecosystem. On the other hand, Quickwit's custom query syntax may require some adjustment but offers flexibility and potential performance benefits.

In terms of performance, Quickwit's Rust implementation gives it an edge, especially in scenarios requiring high throughput and low latency. However, Toshi's more established codebase might provide better stability for production use cases.

Ultimately, the choice between Toshi and Quickwit depends on specific project requirements, team expertise, and scalability needs.

ZincSearch . A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

Pros of ZincSearch

  • Written in Go, offering better performance and lower resource usage
  • Simpler setup and configuration process
  • Provides a built-in web UI for easier management and data visualization

Cons of ZincSearch

  • Less mature project with fewer features compared to Toshi
  • Smaller community and potentially less support
  • Limited advanced querying capabilities

Code Comparison

Toshi (Rust):

let index = Index::create("my_index", IndexOptions::default())?;
index.add_document(doc!{
    "title" => "Example Document",
    "content" => "This is a sample document."
})?;

ZincSearch (Go):

index := zincsearch.NewIndex("my_index")
doc := map[string]interface{}{
    "title":   "Example Document",
    "content": "This is a sample document.",
}
index.AddDocument(doc)

Both projects aim to provide full-text search capabilities, but they differ in implementation language and feature sets. Toshi, built in Rust, offers more advanced features and potentially better performance for complex queries. ZincSearch, written in Go, focuses on simplicity and ease of use, making it more accessible for quick setup and basic search functionality. The choice between the two depends on specific project requirements and the level of complexity needed in search operations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Toshi

A Full-Text Search Engine in Rust

License: MIT Codacy Badge Actions Status codecov Join the chat at https://gitter.im/toshi-search/Toshi dependency status

Please note that this is far from production ready, also Toshi is still under active development, I'm just slow.

Description

Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene.

Motivations

Toshi will always target stable Rust and will try our best to never make any use of unsafe Rust. While underlying libraries may make some use of unsafe, Toshi will make a concerted effort to vet these libraries in an effort to be completely free of unsafe Rust usage. The reason I chose this was because I felt that for this to actually become an attractive option for people to consider it would have to have be safe, stable and consistent. This was why stable Rust was chosen because of the guarantees and safety it provides. I did not want to go down the rabbit hole of using nightly features to then have issues with their stability later on. Since Toshi is not meant to be a library, I'm perfectly fine with having this requirement because people who would want to use this more than likely will take it off the shelf and not modify it. My motivation was to cater to that use case when building Toshi.

Build Requirements

At this current time Toshi should build and work fine on Windows, Mac OS X, and Linux. From dependency requirements you are going to need 1.39.0 and Cargo installed in order to build. You can get rust easily from rustup.

Configuration

There is a default configuration file in config/config.toml:

host = "127.0.0.1"
port = 8080
path = "data2/"
writer_memory = 200000000
log_level = "info"
json_parsing_threads = 4
bulk_buffer_size = 10000
auto_commit_duration = 10
experimental = false

[experimental_features]
master = true
nodes = [
    "127.0.0.1:8081"
]

[merge_policy]
kind = "log"
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75
Host

host = "localhost"

The hostname Toshi will bind to upon start.

Port

port = 8080

The port Toshi will bind to upon start.

Path

path = "data/"

The data path where Toshi will store its data and indices.

Writer Memory

writer_memory = 200000000

The amount of memory (in bytes) Toshi should allocate to commits for new documents.

Log Level

log_level = "info"

The detail level to use for Toshi's logging.

Json Parsing

json_parsing_threads = 4

When Toshi does a bulk ingest of documents it will spin up a number of threads to parse the document's json as it's received. This controls the number of threads spawned to handle this job.

Bulk Buffer

bulk_buffer_size = 10000

This will control the buffer size for parsing documents into an index. It will control the amount of memory a bulk ingest will take up by blocking when the message buffer is filled. If you want to go totally off the rails you can set this to 0 in order to make the buffer unbounded.

Auto Commit Duration

auto_commit_duration = 10

This controls how often an index will automatically commit documents if there are docs to be committed. Set this to 0 to disable this feature, but you will have to do commits yourself when you submit documents.

Merge Policy
[merge_policy]
kind = "log"

Tantivy will merge index segments according to the configuration outlined here. There are 2 options for this. "log" which is the default segment merge behavior. Log has 3 additional values to it as well. Any of these 3 values can be omitted to use Tantivy's default value. The default values are listed below.

min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75

In addition there is the "nomerge" option, in which Tantivy will do no merging of segments.

Experimental Settings
experimental = false

[experimental_features]
master = true
nodes = [
    "127.0.0.1:8081"
]

In general these settings aren't ready for usage yet as they are very unstable or flat out broken. Right now the distribution of Toshi is behind this flag, so if experimental is set to false then all these settings are ignored.

Building and Running

Toshi can be built using cargo build --release. Once Toshi is built you can run ./target/release/toshi from the top level directory to start Toshi according to the configuration in config/config.toml

You should get a startup message like this.

  ______         __   _   ____                 __
 /_  __/__  ___ / /  (_) / __/__ ___ _________/ /
  / / / _ \(_-</ _ \/ / _\ \/ -_) _ `/ __/ __/ _ \
 /_/  \___/___/_//_/_/ /___/\__/\_,_/_/  \__/_//_/
 Such Relevance, Much Index, Many Search, Wow
 
 INFO  toshi::index > Indexes: []

You can verify Toshi is running with:

curl -X GET http://localhost:8080/

which should return:

{
  "name": "Toshi Search",
  "version": "0.1.1"
}

Once toshi is running it's best to check the requests.http file in the root of this project to see some more examples of usage.

Example Queries

Term Query
{ "query": {"term": {"test_text": "document" } }, "limit": 10 }
Fuzzy Term Query
{ "query": {"fuzzy": {"test_text": {"value": "document", "distance": 0, "transposition": false } } }, "limit": 10 }
Phrase Query
{ "query": {"phrase": {"test_text": {"terms": ["test","document"] } } }, "limit": 10 }
Range Query
{ "query": {"range": { "test_i64": { "gte": 2012, "lte": 2015 } } }, "limit": 10 }
Regex Query
{ "query": {"regex": { "test_text": "d[ou]{1}c[k]?ument" } }, "limit": 10 }
Boolean Query
{ "query": {"bool": {"must": [ { "term": { "test_text": "document" } } ], "must_not": [ {"range": {"test_i64": { "gt": 2017 } } } ] } }, "limit": 10 }
Usage

To try any of the above queries you can use the above example

curl -X POST http://localhost:8080/test_index -H 'Content-Type: application/json' -d '{ "query": {"term": {"test_text": "document" } }, "limit": 10 }'

Also, to note, limit is optional, 10 is the default value. It's only included here for completeness.

Running Tests

cargo test

What is a Toshi?

Toshi is a three year old Shiba Inu. He is a very good boy and is the official mascot of this project. Toshi personally reviews all code before it is committed to this repository and is dedicated to only accepting the highest quality contributions from his human. He will, though, accept treats for easier code reviews.