Convert Figma logo to code with AI

tensorflow logosimilarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

1,008
104
1,008
70

Top Related Projects

37,810

TensorFlow code and pre-trained models for BERT

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

30,390

A library for efficient similarity search and clustering of dense vectors.

13,073

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

3,373

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

7,343

Uniform Manifold Approximation and Projection

Quick Overview

TensorFlow Similarity is a Python library for similarity learning and metric learning. It provides tools and utilities to train models that can learn similarity between inputs, which is useful for tasks like image retrieval, face recognition, and recommendation systems. The library is built on top of TensorFlow and Keras, making it easy to integrate with existing TensorFlow workflows.

Pros

  • Easy to use API for similarity learning tasks
  • Built on top of TensorFlow and Keras, allowing for seamless integration
  • Supports various loss functions and architectures for similarity learning
  • Includes pre-trained models and datasets for quick experimentation

Cons

  • Limited documentation and examples compared to more established libraries
  • Requires understanding of TensorFlow and Keras for advanced usage
  • May have a steeper learning curve for beginners in similarity learning
  • Fewer community contributions and support compared to larger projects

Code Examples

  1. Creating a similarity model:
import tensorflow_similarity as tfsim

model = tfsim.models.SimilarityModel(
    backbone=tf.keras.applications.ResNet50(weights=None, include_top=False),
    dimensions=128
)
  1. Training the model:
model.compile(optimizer="adam", loss="triplet")
model.fit(train_dataset, epochs=10)
  1. Computing similarity between two inputs:
similarity = model.similarity(image1, image2)
print(f"Similarity score: {similarity}")
  1. Performing nearest neighbor search:
index = tfsim.Index(model)
index.add(reference_images)
neighbors = index.query(query_image, k=5)

Getting Started

To get started with TensorFlow Similarity, follow these steps:

  1. Install the library:
pip install tensorflow-similarity
  1. Import the library and create a simple model:
import tensorflow as tf
import tensorflow_similarity as tfsim

model = tfsim.models.SimilarityModel(
    backbone=tf.keras.applications.MobileNetV2(weights=None, include_top=False),
    dimensions=128
)
model.compile(optimizer="adam", loss="triplet")
  1. Prepare your dataset and train the model:
# Assume you have a dataset of images and labels
model.fit(train_dataset, epochs=10)
  1. Use the trained model for similarity tasks:
similarity = model.similarity(image1, image2)
print(f"Similarity score: {similarity}")

Competitor Comparisons

37,810

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • More comprehensive and widely adopted for natural language processing tasks
  • Offers pre-trained models for various languages and domains
  • Extensive documentation and community support

Cons of BERT

  • Larger model size and higher computational requirements
  • More complex to fine-tune and adapt for specific tasks
  • Less focused on similarity-specific tasks

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

Similarity example:

from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel

model = SimilarityModel(
    backbone=tf.keras.Sequential([
        tf.keras.layers.Input((28, 28, 1)),
        tf.keras.layers.Conv2D(64, 3),
        tf.keras.layers.GlobalAveragePooling2D(),
        MetricEmbedding(64)
    ])
)

Both repositories offer valuable tools for different aspects of machine learning. BERT excels in general natural language processing tasks, while Similarity focuses on similarity-based learning and retrieval. The choice between them depends on the specific requirements of your project.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Broader scope, supporting a wide range of NLP tasks and models
  • Larger community and more frequent updates
  • Extensive documentation and examples

Cons of Transformers

  • Steeper learning curve due to its comprehensive nature
  • Potentially higher resource requirements for some tasks

Code Comparison

Transformers:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

Similarity:

import tensorflow_similarity as tfsim

model = tfsim.models.SimilarityModel()
model.add(tfsim.layers.Dense(128, activation="relu"))
model.add(tfsim.layers.Dense(64, activation="relu"))
model.compile(optimizer="adam")

Transformers offers a higher-level API for working with pre-trained models, while Similarity provides a more flexible approach for building custom similarity models. Transformers is better suited for general NLP tasks, whereas Similarity excels in specific similarity-based applications.

30,390

A library for efficient similarity search and clustering of dense vectors.

Pros of FAISS

  • Highly optimized for large-scale similarity search and clustering
  • Supports GPU acceleration for faster processing
  • Offers a wide range of indexing algorithms for different use cases

Cons of FAISS

  • Steeper learning curve due to its low-level C++ implementation
  • Less integrated with TensorFlow ecosystem
  • Requires separate installation and setup

Code Comparison

FAISS:

import faiss

index = faiss.IndexFlatL2(d)
index.add(xb)
D, I = index.search(xq, k)

Similarity:

import tensorflow_similarity as tfsim

index = tfsim.Index(embedding_size)
index.add(embeddings, labels)
neighbors = index.query(queries)

Key Differences

  • FAISS is more focused on efficient similarity search and clustering, while Similarity is designed for broader machine learning tasks within the TensorFlow ecosystem.
  • FAISS offers more advanced indexing algorithms and optimizations, making it better suited for large-scale applications.
  • Similarity provides a higher-level API that integrates seamlessly with TensorFlow, making it easier to use for TensorFlow developers.
  • FAISS supports both CPU and GPU implementations, while Similarity primarily leverages TensorFlow's built-in device management.
13,073

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Pros of Annoy

  • Lightweight and efficient, with a focus on memory usage and speed
  • Supports multiple distance metrics (Euclidean, Manhattan, Cosine, Hamming)
  • Can be used from both C++ and Python

Cons of Annoy

  • Limited to approximate nearest neighbor search
  • Does not provide built-in support for GPU acceleration
  • Less integrated with machine learning workflows compared to Similarity

Code Comparison

Annoy:

from annoy import AnnoyIndex

f = 40  # Length of item vector
t = AnnoyIndex(f, 'angular')
for i in range(1000):
    v = [random() for _ in range(f)]
    t.add_item(i, v)
t.build(10)  # 10 trees

Similarity:

import tensorflow as tf
import tensorflow_similarity as tfsim

model = tfsim.models.SimilarityModel(
    backbone=tf.keras.applications.ResNet50(weights='imagenet', include_top=False),
    dimensions=2048
)
model.compile(optimizer='adam')
model.fit(x_train, y_train, epochs=10)

The code examples highlight Annoy's focus on efficient indexing and searching, while Similarity integrates more closely with TensorFlow's machine learning ecosystem.

3,373

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Pros of nmslib

  • Supports a wider range of distance metrics and index types
  • Generally faster for high-dimensional data and large datasets
  • More mature project with extensive documentation and benchmarks

Cons of nmslib

  • Less integration with TensorFlow ecosystem
  • Requires more manual tuning and parameter selection
  • Not as well-suited for deep learning-based similarity search

Code Comparison

nmslib:

import nmslib
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2})
ids, distances = index.knnQuery(query, k=10)

TensorFlow Similarity:

import tensorflow_similarity as tfsim
index = tfsim.Index(embedding_size, distance='cosine')
index.add(embeddings, labels)
distances, indices = index.query(queries, k=10)

Summary

nmslib is a more general-purpose similarity search library with broader algorithm support and better performance for large-scale datasets. TensorFlow Similarity is more tightly integrated with the TensorFlow ecosystem and offers easier implementation for deep learning-based similarity search tasks. The choice between the two depends on the specific use case, dataset size, and integration requirements with existing TensorFlow workflows.

7,343

Uniform Manifold Approximation and Projection

Pros of UMAP

  • More general-purpose dimensionality reduction tool, not limited to similarity search
  • Faster runtime for large datasets compared to t-SNE and other methods
  • Preserves both local and global structure of data

Cons of UMAP

  • Requires more manual parameter tuning than Similarity
  • Less integrated with TensorFlow ecosystem
  • May be overkill for simple similarity search tasks

Code Comparison

UMAP:

import umap
reducer = umap.UMAP()
embedding = reducer.fit_transform(data)

Similarity:

import tensorflow_similarity as tfsim
model = tfsim.SimilarityModel()
model.fit(data)
similar = model.lookup(query, k=5)

Key Differences

  • UMAP focuses on dimensionality reduction and visualization
  • Similarity specializes in efficient similarity search and retrieval
  • UMAP is more flexible but requires more expertise to use effectively
  • Similarity integrates seamlessly with TensorFlow models and workflows

Use Cases

  • UMAP: Exploratory data analysis, visualization, and general dimensionality reduction
  • Similarity: Building recommendation systems, content-based search, and similarity-based clustering

Both libraries have their strengths, and the choice depends on the specific requirements of your project and your familiarity with the TensorFlow ecosystem.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TensorFlow Similarity: Metric Learning for Humans

TensorFlow Similarity is a TensorFlow library for similarity learning which includes techniques such as self-supervised learning, metric learning, similarity learning, and contrastive learning. TensorFlow Similarity is still in beta and we may push breaking changes.

Introduction

Tensorflow Similarity offers state-of-the-art algorithms for metric learning along with all the necessary components to research, train, evaluate, and serve similarity and contrastive based models. These components include models, losses, metrics, samplers, visualizers, and indexing subsystems to make this quick and easy.

Example of nearest neighbors search performed on the embedding generated by a similarity model trained on the Oxford IIIT Pet Dataset.

With Tensorflow Similarity you can train two main types of models:

  1. Self-supervised models: Used to learn general data representations on unlabeled data to boost the accuracy of downstream tasks where you have few labels. For example, you can pre-train a model on a large number of unlabled images using one of the supported contrastive methods supported by TensorFlow Similarity, and then fine-tune it on a small labeled dataset to achieve higher accuracy. To get started training your own self-supervised model see this notebook.

  2. Similarity models: Output embeddings that allow you to find and cluster similar examples such as images representing the same object within a large corpus of examples. For instance, as visible above, you can train a similarity model to find and cluster similar looking, unseen cat and dog images from the Oxford IIIT Pet Dataset while only training on a few of the dataset classes. To get started training your own similarity model see this notebook.

What's new

  • [Mar 2023]: 0.17 more losses and metric and massive refactoring
    • Added VicReg Loss to contrastive losses.
    • Added metrics used in retrieval papers such as Precision@K
    • Native support for distributed training e.g SimClr now works correctly with distributed training.
    • Multi-modal embedding initial support (CLIP)

For more details and previous releases information - see the changelog

Getting Started

Installation

Use pip to install the library.

NOTE: The Tensorflow extra_require key can be omitted if you already have tensorflow>=2.4 installed.

pip install --upgrade-strategy=only-if-needed tensorflow_similarity[tensorflow] 

Documentation

The detailed and narrated notebooks are a good way to get started with TensorFlow Similarity. There is likely to be one that is similar to your data or your problem (if not, let us know). You can start working with the examples immediately in Google Colab by clicking the Google Colab icon.

For more information about specific functions, you can check the API documentation

For contributing to the project please check out the contribution guidelines

Minimal Example: MNIST similarity

Click to expand and see how to train a supervised similarity model on mnist using TF.Similarity

Here is a bare bones example demonstrating how to train a TensorFlow Similarity model on the MNIST data. This example illustrates some of the main components provided by TensorFlow Similarity and how they fit together. Please refer to the hello_world notebook for a more detailed introduction.

Preparing data

TensorFlow Similarity provides data samplers, for various dataset types, that balance the batches to ensure smoother training. In this example, we are using the multi-shot sampler that integrates directly from the TensorFlow dataset catalog.

from tensorflow_similarity.samplers import TFDatasetMultiShotMemorySampler

# Data sampler that generates balanced batches from MNIST dataset
sampler = TFDatasetMultiShotMemorySampler(dataset_name='mnist', classes_per_batch=10)

Building a Similarity model

Building a TensorFlow Similarity model is similar to building a standard Keras model, except the output layer is usually a MetricEmbedding() layer that enforces L2 normalization and the model is instantiated as a specialized subclass SimilarityModel() that supports additional functionality.

from tensorflow.keras import layers
from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel

# Build a Similarity model using standard Keras layers
inputs = layers.Input(shape=(28, 28, 1))
x = layers.experimental.preprocessing.Rescaling(1/255)(inputs)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = MetricEmbedding(64)(x)

# Build a specialized Similarity model
model = SimilarityModel(inputs, outputs)

Training model via contrastive learning

To output a metric embedding, that are searchable via approximate nearest neighbor search, the model needs to be trained using a similarity loss. Here we are using the MultiSimilarityLoss(), which is one of the most efficient loss functions.

from tensorflow_similarity.losses import MultiSimilarityLoss

# Train Similarity model using contrastive loss
model.compile('adam', loss=MultiSimilarityLoss())
model.fit(sampler, epochs=5)

Building images index and querying it

Once the model is trained, reference examples must be indexed via the model index API to be searchable. After indexing, you can use the model lookup API to search the index for the K most similar items.

from tensorflow_similarity.visualization import viz_neigbors_imgs

# Index 100 embedded MNIST examples to make them searchable
sx, sy = sampler.get_slice(0,100)
model.index(x=sx, y=sy, data=sx)

# Find the top 5 most similar indexed MNIST examples for a given example
qx, qy = sampler.get_slice(3713, 1)
nns = model.single_lookup(qx[0])

# Visualize the query example and its top 5 neighbors
viz_neigbors_imgs(qx[0], qy[0], nns)

Supported Algorithms

Self-Supervised Models

  • SimCLR
  • SimSiam
  • Barlow Twins

Supervised Losses

  • Triplet Loss
  • PN Loss
  • Multi Sim Loss
  • Circle Loss
  • Soft Nearest Neighbor Loss

Metrics

Tensorflow Similarity offers many of the most common metrics used for classification and retrieval evaluation. Including:

NameTypeDescription
PrecisionClassification
RecallClassification
F1 ScoreClassification
Recall@KRetrieval
Binary NDCGRetrieval

Citing

Please cite this reference if you use any part of TensorFlow similarity in your research:

@article{EBSIM21,
  title={TensorFlow Similarity: A Usable, High-Performance Metric Learning Library},
  author={Elie Bursztein, James Long, Shun Lin, Owen Vallis, Francois Chollet},
  journal={Fixme},
  year={2021}
}

Disclaimer

This is not an official Google product.