Convert Figma logo to code with AI

jamesturk logojellyfish

🪼 a python library for doing approximate and phonetic matching of strings.


Top Related Projects


Library for fast text representation and classification.

Fuzzy String Matching in Python


Fuzzy String Matching in Python

Quick Overview

Jellyfish is a Python library that provides a set of functions for calculating distance metrics between strings, such as Levenshtein distance, Jaro-Winkler distance, and Soundex. It is designed to be fast, efficient, and easy to use, making it a useful tool for tasks like spell-checking, data cleaning, and record linkage.


  • Fast and Efficient: Jellyfish is written in C and provides a Python interface, making it significantly faster than pure Python implementations of the same algorithms.
  • Comprehensive Functionality: The library includes a wide range of distance metrics and string comparison functions, covering a variety of use cases.
  • Easy to Use: Jellyfish has a simple and intuitive API, making it easy to integrate into existing projects.
  • Well-Documented: The project has detailed documentation, including examples and usage guides, making it easy to get started.


  • Limited to String Comparisons: Jellyfish is focused solely on string distance metrics and does not provide any other functionality beyond that.
  • Dependency on C: While the C implementation provides performance benefits, it also introduces a dependency that may be a barrier for some users.
  • Potential Compatibility Issues: As a low-level library, Jellyfish may be susceptible to compatibility issues with different versions of Python or other dependencies.
  • Limited Customization: The library provides a fixed set of distance metrics and does not allow for easy customization or extension of the available functions.

Code Examples

Here are a few examples of how to use Jellyfish:

import jellyfish

# Calculate the Levenshtein distance between two strings
distance = jellyfish.levenshtein_distance("hello", "world")
print(distance)  # Output: 4

# Calculate the Jaro-Winkler similarity between two strings
similarity = jellyfish.jaro_winkler("John Smith", "Jon Smythe")
print(similarity)  # Output: 0.9392857142857143

# Soundex encoding of a string
soundex = jellyfish.soundex("Jellyfish")
print(soundex)  # Output: "J412"

Getting Started

To get started with Jellyfish, you can install it using pip:

pip install jellyfish

Once installed, you can import the library and start using its functions. Here's an example of how to use the levenshtein_distance function:

import jellyfish

word1 = "hello"
word2 = "world"
distance = jellyfish.levenshtein_distance(word1, word2)
print(f"The Levenshtein distance between '{word1}' and '{word2}' is {distance}")

This will output:

The Levenshtein distance between 'hello' and 'world' is 4

You can find more examples and documentation in the Jellyfish GitHub repository.

Competitor Comparisons


Library for fast text representation and classification.

Pros of fastText

  • fastText is a highly efficient and scalable library for text representation and classification, capable of handling large-scale datasets.
  • It provides pre-trained word vectors for a variety of languages, which can be used for various NLP tasks without the need for extensive training.
  • fastText supports a wide range of applications, including text classification, word analogies, and sentence representation.

Cons of fastText

  • fastText is primarily focused on text-based tasks and may not be as versatile as Jellyfish, which covers a broader range of string similarity and distance metrics.
  • The documentation and community support for fastText may not be as extensive as for some other popular NLP libraries.

Code Comparison

Jellyfish (jamesturk/jellyfish):

from jellyfish import jaro_winkler
jaro_winkler('jellyfish', 'smellyfish')
# Output: 0.8400000000000001

fastText (facebookresearch/fastText):

import fasttext
model = fasttext.load_model('cc.en.300.bin')
# Output: array([-0.0235,  0.0493, -0.0266, ...,  0.0249, -0.0408,  0.0481], dtype=float32)

Fuzzy String Matching in Python

Pros of FuzzyWuzzy

  • Flexible Matching Algorithms: FuzzyWuzzy provides a variety of matching algorithms, including Levenshtein distance, Jaro-Winkler distance, and Partial Ratio, allowing for more robust string comparison.
  • Extensive Documentation: The FuzzyWuzzy project has detailed documentation, including usage examples and explanations of the different matching techniques.
  • Active Development: FuzzyWuzzy has a larger and more active community, with more frequent updates and bug fixes compared to Jellyfish.

Cons of FuzzyWuzzy

  • Dependency on the Difflib Library: FuzzyWuzzy relies on the Difflib library, which may not be available on all platforms or in all environments.
  • Potentially Slower Performance: FuzzyWuzzy's more advanced matching algorithms may be slower than the simpler approaches used in Jellyfish, especially for large datasets.
  • Limited Functionality: While FuzzyWuzzy excels at string matching, it may not provide the same breadth of functionality as Jellyfish, which covers a wider range of text processing tasks.

Code Comparison


from jellyfish import jaro_distance

print(jaro_distance("jellyfish", "seallyfish"))  # Output: 0.9444444444444444


from fuzzywuzzy import fuzz

print(fuzz.ratio("jellyfish", "seallyfish"))  # Output: 88

Both libraries provide similar functionality for string comparison, but the specific algorithms and output formats may differ. Jellyfish focuses on a more narrow set of core text processing tasks, while FuzzyWuzzy offers a broader range of matching techniques.


Fuzzy String Matching in Python

Pros of Thefuzz

  • Thefuzz provides a more comprehensive set of string similarity algorithms, including Levenshtein, Jaro-Winkler, and Soundex, among others.
  • The library has a larger user base and more active development, with more contributors and a higher number of stars on GitHub.
  • Thefuzz offers a more intuitive and user-friendly API, with clearer documentation and examples.

Cons of Thefuzz

  • Jellyfish has a smaller codebase and may be more lightweight and efficient for certain use cases.
  • Jellyfish is written in pure Python, while Thefuzz has dependencies on the fuzzywuzzy library, which may introduce additional complexity.
  • The performance of Thefuzz may be slightly slower than Jellyfish for certain string comparison tasks.

Code Comparison


from jellyfish import levenshtein_distance
levenshtein_distance("hello", "world")  # Output: 4


from thefuzz import fuzz
fuzz.levenshtein("hello", "world")  # Output: 4

As you can see, the APIs for the two libraries are quite similar, with Thefuzz providing a slightly more concise and intuitive interface for the Levenshtein distance calculation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot



jellyfish is a library for approximate & phonetic matching of strings.




PyPI badge Test badge Coveralls Test Rust

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaccard Index
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
>>> jellyfish.jaro_similarity('jellyfish', 'smellyfish')
>>> jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')

>>> jellyfish.metaphone('Jellyfish')
>>> jellyfish.soundex('Jellyfish')
>>> jellyfish.nysiis('Jellyfish')
>>> jellyfish.match_rating_codex('Jellyfish')

NPM DownloadsLast 30 Days