fuzzywuzzy

Fuzzy String Matching in Python

9,208

875

9,208

107

View on GitHub View on NPM

Top Related Projects

RapidFuzz

2,606

Rapid fuzzy string matching in Python using various string metrics

RapidFuzz

2,606

Rapid fuzzy string matching in Python using various string metrics

python-Levenshtein

1,260

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

textdistance

3,354

📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

jellyfish

2,040

🪼 a python library for doing approximate and phonetic matching of strings.

Quick Overview

The fuzzywuzzy project is a Python library that provides a set of functions to perform fuzzy string matching. It can be used to compare and match similar strings, even if they are not exactly the same. This is useful in a variety of applications, such as data cleaning, record linkage, and spell-checking.

Pros

Flexible Matching: fuzzywuzzy supports various matching algorithms, including Levenshtein distance, Jaro-Winkler distance, and token-based matching, allowing you to choose the most appropriate method for your use case.
Easy to Use: The library provides a simple and intuitive API, making it easy to integrate into your Python projects.
Efficient Performance: fuzzywuzzy is written in Cython, which provides a significant performance boost compared to pure Python implementations.
Active Development: The project is actively maintained, with regular updates and bug fixes.

Cons

Limited to Python: fuzzywuzzy is a Python-specific library, which means it may not be suitable for projects in other programming languages.
Potential for False Positives: Depending on the matching algorithm and the data you're working with, fuzzywuzzy may sometimes return false positive matches, which may require additional validation.
Dependency on the python-Levenshtein package: fuzzywuzzy relies on the python-Levenshtein package, which is a C extension that may be more difficult to install on certain platforms.
Limited Customization: While fuzzywuzzy provides several matching algorithms, the options for customizing the matching process may be limited compared to more advanced fuzzy string matching libraries.

Code Examples

Here are a few examples of how to use the fuzzywuzzy library:

from fuzzywuzzy import fuzz

# Comparing two strings
print(fuzz.ratio("hello", "hello"))  # Output: 100
print(fuzz.ratio("hello", "world"))  # Output: 0

# Partial string matching
print(fuzz.partial_ratio("hello", "hello world"))  # Output: 100
print(fuzz.partial_ratio("hello", "world hello"))  # Output: 100

# Token-based matching
print(fuzz.token_sort_ratio("hello world", "world hello"))  # Output: 100
print(fuzz.token_set_ratio("hello world", "hello there world"))  # Output: 100

Getting Started

To get started with fuzzywuzzy, you can install the library using pip:

pip install fuzzywuzzy

Once installed, you can import the necessary functions and start using the library in your Python code:

from fuzzywuzzy import fuzz

# Compare two strings
result = fuzz.ratio("hello", "hello world")
print(result)  # Output: 92

# Perform partial string matching
result = fuzz.partial_ratio("hello", "hello world")
print(result)  # Output: 100

# Use token-based matching
result = fuzz.token_sort_ratio("hello world", "world hello")
print(result)  # Output: 100

For more advanced usage and customization, you can refer to the fuzzywuzzy documentation.

Competitor Comparisons

RapidFuzz

2,606

Rapid fuzzy string matching in Python using various string metrics

Pros of RapidFuzz

Performance: RapidFuzz is designed to be faster than FuzzyWuzzy, especially for larger datasets.
Flexibility: RapidFuzz supports a wider range of string comparison algorithms, including Levenshtein, Damerau-Levenshtein, and Jaro-Winkler.
Scalability: RapidFuzz is written in Cython, which allows it to take advantage of low-level optimizations and perform well on large datasets.

Cons of RapidFuzz

Fewer Features: FuzzyWuzzy has a more extensive set of features, such as support for partial string matching and token-based comparisons.
Steeper Learning Curve: RapidFuzz has a more complex API than FuzzyWuzzy, which may make it less accessible for some users.
Smaller Community: FuzzyWuzzy has a larger user base and more community support than RapidFuzz.

Code Comparison

FuzzyWuzzy:

from fuzzywuzzy import fuzz

ratio = fuzz.ratio("hello", "world")
print(ratio)  # Output: 0

RapidFuzz:

from rapidfuzz import fuzz

ratio = fuzz.ratio("hello", "world")
print(ratio)  # Output: 0

As you can see, the basic usage of the fuzz.ratio() function is very similar between the two libraries.

RapidFuzz

2,606

Rapid fuzzy string matching in Python using various string metrics

Pros of RapidFuzz

Performance: RapidFuzz is designed to be faster than FuzzyWuzzy, especially for larger datasets.
Flexibility: RapidFuzz supports a wider range of string comparison algorithms, including Levenshtein, Damerau-Levenshtein, and Jaro-Winkler.
Scalability: RapidFuzz is written in Cython, which allows it to take advantage of low-level optimizations and perform well on large datasets.

Cons of RapidFuzz

Fewer Features: FuzzyWuzzy has a more extensive set of features, such as support for partial string matching and token-based comparisons.
Steeper Learning Curve: RapidFuzz has a more complex API than FuzzyWuzzy, which may make it less accessible for some users.
Smaller Community: FuzzyWuzzy has a larger user base and more community support than RapidFuzz.

Code Comparison

FuzzyWuzzy:

from fuzzywuzzy import fuzz

ratio = fuzz.ratio("hello", "world")
print(ratio)  # Output: 0

RapidFuzz:

from rapidfuzz import fuzz

ratio = fuzz.ratio("hello", "world")
print(ratio)  # Output: 0

As you can see, the basic usage of the fuzz.ratio() function is very similar between the two libraries.

python-Levenshtein

1,260

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Pros of python-Levenshtein

Faster performance compared to FuzzyWuzzy, especially for larger datasets
Provides a more accurate Levenshtein distance calculation
Supports a wider range of Unicode characters

Cons of python-Levenshtein

Requires a C compiler to install, which may be a barrier for some users
Lacks some of the advanced features and functionality of FuzzyWuzzy, such as partial string matching
May have a steeper learning curve for users unfamiliar with the Levenshtein distance algorithm

Code Comparison

FuzzyWuzzy:

from fuzzywuzzy import fuzz

fuzz.ratio("hello", "world")  # Output: 0
fuzz.partial_ratio("hello", "world")  # Output: 0

python-Levenshtein:

import Levenshtein

Levenshtein.distance("hello", "world")  # Output: 4
Levenshtein.ratio("hello", "world")  # Output: 0.5

textdistance

3,354

📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Pros of textdistance

Supports a wider range of distance algorithms, including Levenshtein, Hamming, Jaro-Winkler, and more.
Provides a more comprehensive set of features, such as batch processing and normalization options.
Includes a larger community and more active development compared to FuzzyWuzzy.

Cons of textdistance

May have a steeper learning curve due to the broader set of features and algorithms.
Potentially slower performance for simple use cases compared to the more focused FuzzyWuzzy library.
May have less integration with other popular libraries and frameworks compared to FuzzyWuzzy.

Code Comparison

FuzzyWuzzy:

from fuzzywuzzy import fuzz
fuzz.ratio("hello", "world")  # Output: 0

textdistance:

import textdistance
textdistance.levenshtein("hello", "world")  # Output: 4

jellyfish

2,040

🪼 a python library for doing approximate and phonetic matching of strings.

Pros of Jellyfish

Broader Functionality: Jellyfish provides a wider range of string similarity and distance metrics, including Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and more.
Performance: Jellyfish is generally faster than FuzzyWuzzy, especially for larger datasets, due to its optimized implementation.
Multilingual Support: Jellyfish supports Unicode characters and can handle a variety of languages, making it more versatile than FuzzyWuzzy.

Cons of Jellyfish

Fewer Matching Algorithms: FuzzyWuzzy offers a more extensive set of matching algorithms, such as partial ratio, token sort ratio, and token set ratio, which can be useful in certain scenarios.
Less Intuitive API: The Jellyfish API may be less intuitive and user-friendly compared to the more straightforward FuzzyWuzzy API.
Smaller Community: FuzzyWuzzy has a larger user base and more active community, which can mean more support and resources available.

Code Comparison

FuzzyWuzzy:

from fuzzywuzzy import fuzz

print(fuzz.ratio("hello", "world"))  # Output: 0
print(fuzz.partial_ratio("hello", "hello world"))  # Output: 100

Jellyfish:

import jellyfish

print(jellyfish.levenshtein_distance("hello", "world"))  # Output: 4
print(jellyfish.jaro_winkler("hello", "hello world"))  # Output: 0.9333333333333333

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

This project has been renamed and moved to https://github.com/seatgeek/thefuzz

TheFuzz version 0.19.0 correlates with this project's 0.18.0 version with thefuzz replacing all instances of this project's name.

PRs and issues here will need to be resubmitted to TheFuzz

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of RapidFuzz

Cons of RapidFuzz

Code Comparison

Pros of RapidFuzz

Cons of RapidFuzz

Code Comparison

Pros of python-Levenshtein

Cons of python-Levenshtein

Code Comparison

Pros of textdistance

Cons of textdistance

Code Comparison

Pros of Jellyfish

Cons of Jellyfish

Code Comparison

Convert designs to code with AI

README

This project has been renamed and moved to https://github.com/seatgeek/thefuzz

Top Related Projects

Convert designs to code with AI

NPM DownloadsLast 30 Days