Convert Figma logo to code with AI

nmslib logonmslib

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

3,507
461
3,507
96

Top Related Projects

34,439

A library for efficient similarity search and clustering of dense vectors.

13,700

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

7,729

Uniform Manifold Approximation and Projection

Benchmarks of approximate nearest neighbor libraries in Python

34,208

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Google Research

Quick Overview

NMSLIB (Non-Metric Space Library) is a powerful and efficient similarity search library for both metric and non-metric spaces. It provides a comprehensive set of algorithms for approximate nearest neighbor search, making it suitable for various applications such as information retrieval, machine learning, and data mining.

Pros

  • High performance and scalability for large datasets
  • Supports both metric and non-metric spaces
  • Offers a wide range of algorithms and index types
  • Provides bindings for multiple programming languages (C++, Python, Java)

Cons

  • Steeper learning curve compared to some simpler libraries
  • Documentation can be sparse or outdated in some areas
  • Limited support for distributed computing scenarios
  • Some advanced features may require in-depth understanding of the underlying algorithms

Code Examples

  1. Creating an index and adding data:
import nmslib

# Create an index
index = nmslib.init(method='hnsw', space='cosinesimil')

# Add data to the index
data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
index.addDataPointBatch(data)

# Build the index
index.createIndex({'post': 2}, print_progress=True)
  1. Performing a k-nearest neighbor search:
# Perform k-NN search
query = [2.0, 3.0, 4.0]
k = 2
ids, distances = index.knnQuery(query, k=k)

print(f"Nearest neighbors for query {query}:")
for i, d in zip(ids, distances):
    print(f"ID: {i}, Distance: {d}")
  1. Saving and loading an index:
# Save the index to a file
index.saveIndex('my_index.bin')

# Load the index from a file
loaded_index = nmslib.init(method='hnsw', space='cosinesimil')
loaded_index.loadIndex('my_index.bin')

Getting Started

To get started with NMSLIB, follow these steps:

  1. Install NMSLIB using pip:

    pip install nmslib
    
  2. Import the library and create an index:

    import nmslib
    index = nmslib.init(method='hnsw', space='l2')
    
  3. Add data to the index:

    data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
    index.addDataPointBatch(data)
    
  4. Build the index and perform a search:

    index.createIndex({'post': 2})
    query = [2.0, 3.0, 4.0]
    ids, distances = index.knnQuery(query, k=2)
    

For more detailed information and advanced usage, refer to the NMSLIB documentation and examples in the GitHub repository.

Competitor Comparisons

34,439

A library for efficient similarity search and clustering of dense vectors.

Pros of FAISS

  • Better GPU support and optimization for large-scale similarity search
  • More comprehensive documentation and examples
  • Stronger integration with PyTorch and other deep learning frameworks

Cons of FAISS

  • Steeper learning curve, especially for beginners
  • Less flexibility in terms of index types and distance metrics compared to NMSLIB

Code Comparison

NMSLIB example:

import nmslib
index = nmslib.init(method='hnsw', space='cosine')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)

FAISS example:

import faiss
index = faiss.IndexFlatL2(dimension)
index.add(data)

Both libraries offer efficient similarity search capabilities, but FAISS excels in GPU-accelerated operations and integration with deep learning workflows. NMSLIB provides more flexibility in index types and distance metrics, making it suitable for a wider range of applications. FAISS has a steeper learning curve but offers more comprehensive documentation. The code examples demonstrate that NMSLIB requires more setup, while FAISS has a more streamlined API for basic operations.

13,700

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Pros of Annoy

  • Simpler to use and integrate, especially for Python projects
  • Optimized for memory usage, allowing larger datasets on limited hardware
  • Supports incremental index building, useful for dynamic datasets

Cons of Annoy

  • Generally slower search performance compared to NMSLIB
  • Limited to angular and Euclidean distance metrics
  • Less flexibility in terms of index construction parameters

Code Comparison

Annoy:

from annoy import AnnoyIndex

f = 40  # Length of item vector
t = AnnoyIndex(f, 'angular')
for i in range(1000):
    v = [random() for _ in range(f)]
    t.add_item(i, v)
t.build(10)  # 10 trees

NMSLIB:

import nmslib

data = [[random() for _ in range(40)] for _ in range(1000)]
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)

Both libraries offer efficient approximate nearest neighbor search capabilities, but NMSLIB generally provides better performance and more flexibility at the cost of increased complexity. Annoy is easier to use and more memory-efficient, making it a good choice for simpler projects or resource-constrained environments.

7,729

Uniform Manifold Approximation and Projection

Pros of UMAP

  • Focuses on dimensionality reduction and visualization
  • Better preserves global structure in low-dimensional embeddings
  • Faster runtime for large datasets compared to t-SNE

Cons of UMAP

  • Limited to nearest neighbor search and dimensionality reduction
  • May not perform as well for exact nearest neighbor search

Code Comparison

UMAP example:

import umap
reducer = umap.UMAP()
embedding = reducer.fit_transform(data)

NMSLIB example:

import nmslib
index = nmslib.init(method='hnsw', space='cosine')
index.addDataPointBatch(data)
index.createIndex({'post': 2})

Key Differences

  • UMAP is primarily for dimensionality reduction and visualization
  • NMSLIB is a more general-purpose library for approximate nearest neighbor search
  • UMAP offers better global structure preservation in low dimensions
  • NMSLIB provides more flexibility in index types and distance metrics

Use Cases

  • UMAP: Ideal for visualizing high-dimensional data and preserving global structure
  • NMSLIB: Better suited for large-scale nearest neighbor search tasks and when exact search is required

Both libraries have their strengths, and the choice between them depends on the specific requirements of your project.

Benchmarks of approximate nearest neighbor libraries in Python

Pros of ann-benchmarks

  • Comprehensive benchmarking suite for various ANN algorithms
  • Regularly updated with new algorithms and datasets
  • Provides visualizations and easy-to-interpret results

Cons of ann-benchmarks

  • Focuses on benchmarking rather than providing a ready-to-use library
  • May not include all the latest optimizations for each algorithm
  • Results can be sensitive to hardware and configuration differences

Code comparison

ann-benchmarks (example of running a benchmark):

import ann_benchmarks
dataset = ann_benchmarks.datasets.random(1000, 100)
algorithm = ann_benchmarks.algorithms.bruteforce()
results = ann_benchmarks.run(algorithm, dataset)

nmslib (example of using the library):

#include <nmslib/index.h>
hnswlib::HierarchicalNSW<float> index(space, dim);
index.addPoint(data, id);
std::vector<float> query;
auto result = index.searchKnn(query, k);

While ann-benchmarks provides a framework for comparing different ANN algorithms, nmslib is a specific implementation of efficient similarity search algorithms. ann-benchmarks is useful for researchers and developers choosing the best algorithm for their needs, while nmslib is more suitable for direct integration into applications requiring fast nearest neighbor search.

34,208

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Pros of Milvus

  • Designed for cloud-native environments with distributed architecture
  • Supports multiple index types and similarity metrics
  • Offers advanced features like data management and access control

Cons of Milvus

  • Higher complexity and resource requirements for setup and maintenance
  • Steeper learning curve for beginners
  • May be overkill for smaller-scale applications

Code Comparison

Milvus (Python client):

from pymilvus import Collection, connections

connections.connect()
collection = Collection("example")
collection.insert(entities)
results = collection.search(query_vectors, "vector_field", param, limit=10)

NMSLIB:

import nmslib

index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2})
ids, distances = index.knnQuery(query, k=10)

Summary

Milvus is a more comprehensive solution for large-scale vector similarity search, offering cloud-native features and advanced functionality. NMSLIB is simpler and more lightweight, making it easier to integrate into existing projects but with fewer enterprise-level features. Choose Milvus for scalable, distributed applications, and NMSLIB for simpler, single-machine setups or when ease of integration is a priority.

Google Research

Pros of google-research

  • Broader scope, covering various research areas in AI and machine learning
  • Regularly updated with new projects and papers from Google's research teams
  • Provides implementations of cutting-edge algorithms and techniques

Cons of google-research

  • Less focused on a specific problem domain, making it harder to find relevant code
  • May contain experimental or incomplete implementations
  • Documentation can be sparse for some projects

Code comparison

nmslib:

#include "init.h"
#include "index.h"
#include "params.h"

using namespace similarity;

int main() {
    Index<float>* index = new Index<float>(L2Space, "index.bin");
    index->loadIndex("index.bin");
}

google-research:

import tensorflow as tf
from bert import modeling

bert_config = modeling.BertConfig.from_json_file("bert_config.json")
model = modeling.BertModel(config=bert_config, is_training=True)

Summary

nmslib is a specialized library for approximate nearest neighbor search, while google-research is a collection of diverse research projects. nmslib offers a more focused solution for similarity search tasks, with better documentation and stability. google-research provides access to a wide range of cutting-edge research implementations but may require more effort to navigate and use effectively.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Pypi version Downloads Downloads Windows Build Status Join the chat at https://gitter.im/nmslib/Lobby

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order: