nmslib
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.
Top Related Projects
A library for efficient similarity search and clustering of dense vectors.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Uniform Manifold Approximation and Projection
Benchmarks of approximate nearest neighbor libraries in Python
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Google Research
Quick Overview
NMSLIB (Non-Metric Space Library) is a powerful and efficient similarity search library for both metric and non-metric spaces. It provides a comprehensive set of algorithms for approximate nearest neighbor search, making it suitable for various applications such as information retrieval, machine learning, and data mining.
Pros
- High performance and scalability for large datasets
- Supports both metric and non-metric spaces
- Offers a wide range of algorithms and index types
- Provides bindings for multiple programming languages (C++, Python, Java)
Cons
- Steeper learning curve compared to some simpler libraries
- Documentation can be sparse or outdated in some areas
- Limited support for distributed computing scenarios
- Some advanced features may require in-depth understanding of the underlying algorithms
Code Examples
- Creating an index and adding data:
import nmslib
# Create an index
index = nmslib.init(method='hnsw', space='cosinesimil')
# Add data to the index
data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
index.addDataPointBatch(data)
# Build the index
index.createIndex({'post': 2}, print_progress=True)
- Performing a k-nearest neighbor search:
# Perform k-NN search
query = [2.0, 3.0, 4.0]
k = 2
ids, distances = index.knnQuery(query, k=k)
print(f"Nearest neighbors for query {query}:")
for i, d in zip(ids, distances):
print(f"ID: {i}, Distance: {d}")
- Saving and loading an index:
# Save the index to a file
index.saveIndex('my_index.bin')
# Load the index from a file
loaded_index = nmslib.init(method='hnsw', space='cosinesimil')
loaded_index.loadIndex('my_index.bin')
Getting Started
To get started with NMSLIB, follow these steps:
-
Install NMSLIB using pip:
pip install nmslib
-
Import the library and create an index:
import nmslib index = nmslib.init(method='hnsw', space='l2')
-
Add data to the index:
data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] index.addDataPointBatch(data)
-
Build the index and perform a search:
index.createIndex({'post': 2}) query = [2.0, 3.0, 4.0] ids, distances = index.knnQuery(query, k=2)
For more detailed information and advanced usage, refer to the NMSLIB documentation and examples in the GitHub repository.
Competitor Comparisons
A library for efficient similarity search and clustering of dense vectors.
Pros of FAISS
- Better GPU support and optimization for large-scale similarity search
- More comprehensive documentation and examples
- Stronger integration with PyTorch and other deep learning frameworks
Cons of FAISS
- Steeper learning curve, especially for beginners
- Less flexibility in terms of index types and distance metrics compared to NMSLIB
Code Comparison
NMSLIB example:
import nmslib
index = nmslib.init(method='hnsw', space='cosine')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)
FAISS example:
import faiss
index = faiss.IndexFlatL2(dimension)
index.add(data)
Both libraries offer efficient similarity search capabilities, but FAISS excels in GPU-accelerated operations and integration with deep learning workflows. NMSLIB provides more flexibility in index types and distance metrics, making it suitable for a wider range of applications. FAISS has a steeper learning curve but offers more comprehensive documentation. The code examples demonstrate that NMSLIB requires more setup, while FAISS has a more streamlined API for basic operations.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Pros of Annoy
- Simpler to use and integrate, especially for Python projects
- Optimized for memory usage, allowing larger datasets on limited hardware
- Supports incremental index building, useful for dynamic datasets
Cons of Annoy
- Generally slower search performance compared to NMSLIB
- Limited to angular and Euclidean distance metrics
- Less flexibility in terms of index construction parameters
Code Comparison
Annoy:
from annoy import AnnoyIndex
f = 40 # Length of item vector
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random() for _ in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
NMSLIB:
import nmslib
data = [[random() for _ in range(40)] for _ in range(1000)]
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)
Both libraries offer efficient approximate nearest neighbor search capabilities, but NMSLIB generally provides better performance and more flexibility at the cost of increased complexity. Annoy is easier to use and more memory-efficient, making it a good choice for simpler projects or resource-constrained environments.
Uniform Manifold Approximation and Projection
Pros of UMAP
- Focuses on dimensionality reduction and visualization
- Better preserves global structure in low-dimensional embeddings
- Faster runtime for large datasets compared to t-SNE
Cons of UMAP
- Limited to nearest neighbor search and dimensionality reduction
- May not perform as well for exact nearest neighbor search
Code Comparison
UMAP example:
import umap
reducer = umap.UMAP()
embedding = reducer.fit_transform(data)
NMSLIB example:
import nmslib
index = nmslib.init(method='hnsw', space='cosine')
index.addDataPointBatch(data)
index.createIndex({'post': 2})
Key Differences
- UMAP is primarily for dimensionality reduction and visualization
- NMSLIB is a more general-purpose library for approximate nearest neighbor search
- UMAP offers better global structure preservation in low dimensions
- NMSLIB provides more flexibility in index types and distance metrics
Use Cases
- UMAP: Ideal for visualizing high-dimensional data and preserving global structure
- NMSLIB: Better suited for large-scale nearest neighbor search tasks and when exact search is required
Both libraries have their strengths, and the choice between them depends on the specific requirements of your project.
Benchmarks of approximate nearest neighbor libraries in Python
Pros of ann-benchmarks
- Comprehensive benchmarking suite for various ANN algorithms
- Regularly updated with new algorithms and datasets
- Provides visualizations and easy-to-interpret results
Cons of ann-benchmarks
- Focuses on benchmarking rather than providing a ready-to-use library
- May not include all the latest optimizations for each algorithm
- Results can be sensitive to hardware and configuration differences
Code comparison
ann-benchmarks (example of running a benchmark):
import ann_benchmarks
dataset = ann_benchmarks.datasets.random(1000, 100)
algorithm = ann_benchmarks.algorithms.bruteforce()
results = ann_benchmarks.run(algorithm, dataset)
nmslib (example of using the library):
#include <nmslib/index.h>
hnswlib::HierarchicalNSW<float> index(space, dim);
index.addPoint(data, id);
std::vector<float> query;
auto result = index.searchKnn(query, k);
While ann-benchmarks provides a framework for comparing different ANN algorithms, nmslib is a specific implementation of efficient similarity search algorithms. ann-benchmarks is useful for researchers and developers choosing the best algorithm for their needs, while nmslib is more suitable for direct integration into applications requiring fast nearest neighbor search.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Pros of Milvus
- Designed for cloud-native environments with distributed architecture
- Supports multiple index types and similarity metrics
- Offers advanced features like data management and access control
Cons of Milvus
- Higher complexity and resource requirements for setup and maintenance
- Steeper learning curve for beginners
- May be overkill for smaller-scale applications
Code Comparison
Milvus (Python client):
from pymilvus import Collection, connections
connections.connect()
collection = Collection("example")
collection.insert(entities)
results = collection.search(query_vectors, "vector_field", param, limit=10)
NMSLIB:
import nmslib
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2})
ids, distances = index.knnQuery(query, k=10)
Summary
Milvus is a more comprehensive solution for large-scale vector similarity search, offering cloud-native features and advanced functionality. NMSLIB is simpler and more lightweight, making it easier to integrate into existing projects but with fewer enterprise-level features. Choose Milvus for scalable, distributed applications, and NMSLIB for simpler, single-machine setups or when ease of integration is a priority.
Google Research
Pros of google-research
- Broader scope, covering various research areas in AI and machine learning
- Regularly updated with new projects and papers from Google's research teams
- Provides implementations of cutting-edge algorithms and techniques
Cons of google-research
- Less focused on a specific problem domain, making it harder to find relevant code
- May contain experimental or incomplete implementations
- Documentation can be sparse for some projects
Code comparison
nmslib:
#include "init.h"
#include "index.h"
#include "params.h"
using namespace similarity;
int main() {
Index<float>* index = new Index<float>(L2Space, "index.bin");
index->loadIndex("index.bin");
}
google-research:
import tensorflow as tf
from bert import modeling
bert_config = modeling.BertConfig.from_json_file("bert_config.json")
model = modeling.BertModel(config=bert_config, is_training=True)
Summary
nmslib is a specialized library for approximate nearest neighbor search, while google-research is a collection of diverse research projects. nmslib offers a more focused solution for similarity search tasks, with better documentation and stability. google-research provides access to a wide range of cutting-edge research implementations but may require more effort to navigate and use effectively.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Non-Metric Space Library (NMSLIB)
Important Notes
- NMSLIB is generic but fast, see the results of ANN benchmarks.
- A standalone implementation of our fastest method HNSW also exists as a header-only library.
- All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
- For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.
Objectives
Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.
The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.
NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.
Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.
Brief History
NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.
Credits and Citing
If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.
License
The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):
Older versions of the library included the following components:
- The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
- The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
- FALCONN library's licence is MIT.
Funding
Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.
Related Publications
Most important related papers are listed below in the chronological order:
- L. Boytsov, D. Novak, Y. Malkov, E. Nyberg (2016). Off the Beaten Path: Letâs Replace Term-Based Retrieval with k-NN Search. In proceedings of CIKM'16. [BibTex] We use a special branch of this library, plus the following Java code.
- Malkov, Y.A., Yashunin, D.A.. (2016). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. CoRR, abs/1603.09320. [BibTex]
- Bilegsaikhan, N., Boytsov, L. 2015 Permutation Search Methods are Efficient, Yet Faster Search is Possible PVLDB, 8(12):1618--1629, 2015 [BibTex]
- Ponomarenko, A., Averlin, N., Bilegsaikhan, N., Boytsov, L., 2014. Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search. [BibTex]
- Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V., 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45, 61-68. [BibTex]
- Boytsov, L., Bilegsaikhan, N., 2013. Engineering Efficient and Effective Non-Metric Space Library. In Proceedings of the 6th International Conference on Similarity Search and Applications (SISAP 2013). [BibTex]
- Boytsov, L., Bilegsaikhan, N., 2013. Learning to Prune in Metric and Non-Metric Spaces. In Advances in Neural Information Processing Systems 2013. [BibTex]
- Tellez, Eric Sadit, Edgar Chávez, and Gonzalo Navarro. Succinct nearest neighbor search. Information Systems 38.7 (2013): 1019-1030. [BibTex]
- A. Ponomarenko, Y. Malkov, A. Logvinov, and V. Krylov Approximate nearest neighbor search small world approach. ICTA 2011
- Dong, Wei, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. Proceedings of the 20th international conference on World wide web. ACM, 2011. [BibTex]
- L. Cayton, 2008 Fast nearest neighbor retrieval for bregman divergences. Twenty-Fifth International Conference on Machine Learning (ICML). [BibTex]
- Amato, Giuseppe, and Pasquale Savino. 2008 Approximate similarity search in metric spaces using inverted files. [BibTex]
- Gonzalez, Edgar Chavez, Karina Figueroa, and Gonzalo Navarro. Effective proximity retrieval by ordering permutations. Pattern Analysis and Machine Intelligence, IEEE Transactions on 30.9 (2008): 1647-1658. [BibTex]
Top Related Projects
A library for efficient similarity search and clustering of dense vectors.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Uniform Manifold Approximation and Projection
Benchmarks of approximate nearest neighbor libraries in Python
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Google Research
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot