Convert Figma logo to code with AI

jamesturk logojellyfish

🪼 a python library for doing approximate and phonetic matching of strings.

2,040
157
2,040
7

Top Related Projects

25,835

Library for fast text representation and classification.

Fuzzy String Matching in Python

2,754

Fuzzy String Matching in Python

Quick Overview

Jellyfish is a Python library that provides a set of functions for calculating distance metrics between strings, such as Levenshtein distance, Jaro-Winkler distance, and Soundex. It is designed to be fast, efficient, and easy to use, making it a useful tool for tasks like spell-checking, data cleaning, and record linkage.

Pros

  • Fast and Efficient: Jellyfish is written in C and provides a Python interface, making it significantly faster than pure Python implementations of the same algorithms.
  • Comprehensive Functionality: The library includes a wide range of distance metrics and string comparison functions, covering a variety of use cases.
  • Easy to Use: Jellyfish has a simple and intuitive API, making it easy to integrate into existing projects.
  • Well-Documented: The project has detailed documentation, including examples and usage guides, making it easy to get started.

Cons

  • Limited to String Comparisons: Jellyfish is focused solely on string distance metrics and does not provide any other functionality beyond that.
  • Dependency on C: While the C implementation provides performance benefits, it also introduces a dependency that may be a barrier for some users.
  • Potential Compatibility Issues: As a low-level library, Jellyfish may be susceptible to compatibility issues with different versions of Python or other dependencies.
  • Limited Customization: The library provides a fixed set of distance metrics and does not allow for easy customization or extension of the available functions.

Code Examples

Here are a few examples of how to use Jellyfish:

import jellyfish

# Calculate the Levenshtein distance between two strings
distance = jellyfish.levenshtein_distance("hello", "world")
print(distance)  # Output: 4

# Calculate the Jaro-Winkler similarity between two strings
similarity = jellyfish.jaro_winkler("John Smith", "Jon Smythe")
print(similarity)  # Output: 0.9392857142857143

# Soundex encoding of a string
soundex = jellyfish.soundex("Jellyfish")
print(soundex)  # Output: "J412"

Getting Started

To get started with Jellyfish, you can install it using pip:

pip install jellyfish

Once installed, you can import the library and start using its functions. Here's an example of how to use the levenshtein_distance function:

import jellyfish

word1 = "hello"
word2 = "world"
distance = jellyfish.levenshtein_distance(word1, word2)
print(f"The Levenshtein distance between '{word1}' and '{word2}' is {distance}")

This will output:

The Levenshtein distance between 'hello' and 'world' is 4

You can find more examples and documentation in the Jellyfish GitHub repository.

Competitor Comparisons

25,835

Library for fast text representation and classification.

Pros of fastText

  • fastText is a highly efficient and scalable library for text representation and classification, capable of handling large-scale datasets.
  • It provides pre-trained word vectors for a variety of languages, which can be used for various NLP tasks without the need for extensive training.
  • fastText supports a wide range of applications, including text classification, word analogies, and sentence representation.

Cons of fastText

  • fastText is primarily focused on text-based tasks and may not be as versatile as Jellyfish, which covers a broader range of string similarity and distance metrics.
  • The documentation and community support for fastText may not be as extensive as for some other popular NLP libraries.

Code Comparison

Jellyfish (jamesturk/jellyfish):

from jellyfish import jaro_winkler
jaro_winkler('jellyfish', 'smellyfish')
# Output: 0.8400000000000001

fastText (facebookresearch/fastText):

import fasttext
model = fasttext.load_model('cc.en.300.bin')
model.get_word_vector('dog')
# Output: array([-0.0235,  0.0493, -0.0266, ...,  0.0249, -0.0408,  0.0481], dtype=float32)

Fuzzy String Matching in Python

Pros of FuzzyWuzzy

  • Flexible Matching Algorithms: FuzzyWuzzy provides a variety of matching algorithms, including Levenshtein distance, Jaro-Winkler distance, and Partial Ratio, allowing for more robust string comparison.
  • Extensive Documentation: The FuzzyWuzzy project has detailed documentation, including usage examples and explanations of the different matching techniques.
  • Active Development: FuzzyWuzzy has a larger and more active community, with more frequent updates and bug fixes compared to Jellyfish.

Cons of FuzzyWuzzy

  • Dependency on the Difflib Library: FuzzyWuzzy relies on the Difflib library, which may not be available on all platforms or in all environments.
  • Potentially Slower Performance: FuzzyWuzzy's more advanced matching algorithms may be slower than the simpler approaches used in Jellyfish, especially for large datasets.
  • Limited Functionality: While FuzzyWuzzy excels at string matching, it may not provide the same breadth of functionality as Jellyfish, which covers a wider range of text processing tasks.

Code Comparison

Jellyfish:

from jellyfish import jaro_distance

print(jaro_distance("jellyfish", "seallyfish"))  # Output: 0.9444444444444444

FuzzyWuzzy:

from fuzzywuzzy import fuzz

print(fuzz.ratio("jellyfish", "seallyfish"))  # Output: 88

Both libraries provide similar functionality for string comparison, but the specific algorithms and output formats may differ. Jellyfish focuses on a more narrow set of core text processing tasks, while FuzzyWuzzy offers a broader range of matching techniques.

2,754

Fuzzy String Matching in Python

Pros of Thefuzz

  • Thefuzz provides a more comprehensive set of string similarity algorithms, including Levenshtein, Jaro-Winkler, and Soundex, among others.
  • The library has a larger user base and more active development, with more contributors and a higher number of stars on GitHub.
  • Thefuzz offers a more intuitive and user-friendly API, with clearer documentation and examples.

Cons of Thefuzz

  • Jellyfish has a smaller codebase and may be more lightweight and efficient for certain use cases.
  • Jellyfish is written in pure Python, while Thefuzz has dependencies on the fuzzywuzzy library, which may introduce additional complexity.
  • The performance of Thefuzz may be slightly slower than Jellyfish for certain string comparison tasks.

Code Comparison

Jellyfish:

from jellyfish import levenshtein_distance
levenshtein_distance("hello", "world")  # Output: 4

Thefuzz:

from thefuzz import fuzz
fuzz.levenshtein("hello", "world")  # Output: 4

As you can see, the APIs for the two libraries are quite similar, with Thefuzz providing a slightly more concise and intuitive interface for the Levenshtein distance calculation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Overview

jellyfish is a library for approximate & phonetic matching of strings.

Source: https://github.com/jamesturk/jellyfish

Documentation: https://jamesturk.github.io/jellyfish/

Issues: https://github.com/jamesturk/jellyfish/issues

PyPI badge Test badge Coveralls Test Rust

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaccard Index
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
>>> jellyfish.jaro_similarity('jellyfish', 'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
1

>>> jellyfish.metaphone('Jellyfish')
'JLFX'
>>> jellyfish.soundex('Jellyfish')
'J412'
>>> jellyfish.nysiis('Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex('Jellyfish')
'JLLFSH'

NPM DownloadsLast 30 Days