python-Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

1,274

157

1,274

View on GitHub

Top Related Projects

pyperclip

1,732

Python module for cross-platform clipboard functions.

jellyfish

2,115

🪼 a python library for doing approximate and phonetic matching of strings.

RapidFuzz

2,969

Rapid fuzzy string matching in Python using various string metrics

paper-tips-and-tricks

3,662

Best practice and tips & tricks to write scientific papers in LaTeX, with figures generated in Python or Matlab.

Quick Overview

The python-Levenshtein library is a Python implementation of the Levenshtein distance algorithm, which is a widely used metric for measuring the similarity between two strings. It provides a fast and efficient way to calculate the edit distance between two strings, which can be useful in a variety of applications, such as spell-checking, text processing, and data analysis.

Pros

Fast and Efficient: The library is written in C, which makes it significantly faster than pure Python implementations of the Levenshtein distance algorithm.
Actively Maintained: The project is actively maintained, with regular updates and bug fixes.
Flexible and Customizable: The library provides a range of options and parameters that can be used to customize the behavior of the Levenshtein distance calculation.
Well-Documented: The project has good documentation, including examples and usage guides, which makes it easy to get started with.

Cons

Limited to String Comparison: The library is focused solely on string comparison and does not provide any additional functionality beyond the Levenshtein distance calculation.
Dependency on C Extensions: The library requires the installation of C extensions, which can be more complex than installing a pure Python library.
Limited to Python: The library is only available for Python and does not provide bindings for other programming languages.
Potential Performance Issues: While the library is generally fast, it may still experience performance issues when working with very large strings or large datasets.

Code Examples

Here are a few examples of how to use the python-Levenshtein library:

from Levenshtein import distance

# Calculate the Levenshtein distance between two strings
distance("hello", "world")  # Output: 4

from Levenshtein import ratio

# Calculate the similarity ratio between two strings
ratio("hello", "world")  # Output: 0.5

from Levenshtein import editops

# Get the edit operations required to transform one string into another
editops("hello", "world")  # Output: [('replace', 0, 'h', 'w'), ('replace', 1, 'e', 'o'), ('replace', 2, 'l', 'r'), ('insert', 3, 'l'), ('replace', 3, 'l', 'd')]

from Levenshtein import setratio

# Calculate the set similarity ratio between two strings
setratio("hello", "world")  # Output: 0.3333333333333333

Getting Started

To get started with the python-Levenshtein library, you can install it using pip:

pip install python-Levenshtein

Once installed, you can import the necessary functions and start using the library in your Python code. Here's a simple example:

from Levenshtein import distance

# Calculate the Levenshtein distance between two strings
print(distance("hello", "world"))  # Output: 4

For more advanced usage and customization, you can refer to the project's documentation.

Competitor Comparisons

pyperclip

1,732

Python module for cross-platform clipboard functions.

Pros of pyperclip

pyperclip is a cross-platform clipboard library, allowing you to copy and paste text on Windows, macOS, and Linux.
The library is easy to use and has a simple API, making it accessible for beginners.
pyperclip is actively maintained and has a larger community compared to python-Levenshtein.

Cons of pyperclip

pyperclip does not provide any functionality related to string similarity or distance calculation, which is the primary focus of python-Levenshtein.
The library may have limited functionality compared to more specialized clipboard management tools.
pyperclip does not offer the same level of performance and optimization as python-Levenshtein for certain tasks.

Code Comparison

pyperclip:

import pyperclip
pyperclip.copy("Hello, World!")
text = pyperclip.paste()
print(text)  # Output: "Hello, World!"

python-Levenshtein:

from Levenshtein import distance
distance("hello", "world")  # Output: 4

fuzzywuzzy

9,249

Fuzzy String Matching in Python

Pros of FuzzyWuzzy

FuzzyWuzzy provides a more user-friendly API with functions like process.extract() and process.extractOne() that make it easier to find the best match for a given input.
FuzzyWuzzy supports a wider range of matching algorithms, including Levenshtein distance, Jaro-Winkler distance, and Soundex.
FuzzyWuzzy has better documentation and a more active community, with more contributors and more frequent updates.

Cons of FuzzyWuzzy

FuzzyWuzzy is a Python wrapper around the C-based Levenshtein library, which means it may be slower than using the Levenshtein library directly.
FuzzyWuzzy has a larger dependency footprint, as it requires the python-Levenshtein library as a dependency.
FuzzyWuzzy may be overkill for simple use cases where the basic Levenshtein distance is sufficient.

Code Comparison

Here's a simple example of how to use the Levenshtein library and FuzzyWuzzy to calculate the distance between two strings:

Levenshtein:

from Levenshtein import distance

print(distance("hello", "world"))  # Output: 4

FuzzyWuzzy:

from fuzzywuzzy import fuzz

print(fuzz.ratio("hello", "world"))  # Output: 50

As you can see, the Levenshtein library provides a more direct way to calculate the Levenshtein distance, while FuzzyWuzzy provides a higher-level API with additional functionality.

jellyfish

2,115

🪼 a python library for doing approximate and phonetic matching of strings.

Pros of Jellyfish

Jellyfish provides a wider range of string similarity and distance metrics, including Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and more.
Jellyfish is actively maintained and has a larger community, with more contributors and more frequent updates.
Jellyfish has better documentation and more examples, making it easier to get started and understand the available functionality.

Cons of Jellyfish

Jellyfish is a larger and more complex library, which may be overkill if you only need basic Levenshtein distance functionality.
Jellyfish has a slightly higher learning curve compared to the more focused python-Levenshtein library.
Jellyfish may have a slightly higher performance overhead for simple Levenshtein distance calculations, as it has to handle a wider range of functionality.

Code Comparison

python-Levenshtein:

from Levenshtein import distance
distance("hello", "world")  # Output: 4

Jellyfish:

import jellyfish
jellyfish.levenshtein_distance("hello", "world")  # Output: 4

As you can see, the basic usage of Levenshtein distance is very similar between the two libraries, with Jellyfish providing a slightly more explicit function name.

RapidFuzz

2,969

Rapid fuzzy string matching in Python using various string metrics

Pros of RapidFuzz

RapidFuzz is written in Cython, which allows for faster performance compared to the pure Python implementation of python-Levenshtein.
RapidFuzz supports a wider range of string similarity algorithms, including Levenshtein, Damerau-Levenshtein, and Jaro-Winkler.
RapidFuzz provides a more user-friendly API with additional features like fuzzy string matching and string normalization.

Cons of RapidFuzz

RapidFuzz has a larger dependency footprint, as it requires the Cython library to be installed.
The installation process for RapidFuzz may be more complex, especially on certain platforms, compared to the simpler installation of python-Levenshtein.
RapidFuzz may have a slightly higher memory footprint due to the Cython-based implementation.

Code Comparison

python-Levenshtein:

from Levenshtein import distance
distance("hello", "world")  # Output: 4

RapidFuzz:

from rapidfuzz import distance
distance("hello", "world")  # Output: 4

As you can see, the API for both libraries is very similar, with the main difference being the library name and the slightly more concise syntax in RapidFuzz.

paper-tips-and-tricks

3,662

Best practice and tips & tricks to write scientific papers in LaTeX, with figures generated in Python or Matlab.

Pros of paper-tips-and-tricks

Provides a comprehensive collection of tips and tricks for writing and publishing academic papers, covering a wide range of topics such as LaTeX, figures, citations, and more.
Includes contributions from multiple authors, providing diverse perspectives and experiences.
Regularly updated with new content, ensuring the information remains relevant and up-to-date.

Cons of paper-tips-and-tricks

Primarily focused on academic paper writing, which may not be directly applicable to other types of technical writing or software development.
The repository does not contain any code, so it may not be as useful for developers looking for specific code-related tips and tricks.
The content is primarily in the form of Markdown files, which may not be as visually appealing or interactive as a website or a more structured documentation format.

Code Comparison

Here's a brief code comparison between the two repositories:

python-Levenshtein:

def distance(s1, s2):
    """
    Calculates the Levenshtein distance between two strings.
    """
    if len(s1) < len(s2):
        return distance(s2, s1)

    if len(s2) == 0:
        return len(s1)

    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row

    return previous_row[-1]

paper-tips-and-tricks:

# Figures

## Subfigures

To create subfigures, you can use the `subfig` package. Here's an example:

```latex
\usepackage{subfig}

\begin{figure}
  \centering
  \subfloat[Subfigure 1 caption]{{\includegraphics[width=5cm]{figure1a.png} }}%
  \qquad
  \subfloat[Subfigure 2 caption]{{\includegraphics[width=5cm]{figure1b.png} }}%
  \caption{Figure caption}
  \label{fig:example}
\end{figure}

This will create a figure with two subfigures, each with its own caption.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

.. contents ::

Maintainer wanted

|MaintainerWanted|_

.. |MaintainerWanted| image:: https://img.shields.io/badge/maintainers-wanted-red.svg .. _MaintainerWanted: https://github.com/pickhardt/maintainers-wanted

I am looking for a new maintainer to the project as it is apparent that I haven't had the need for this particular library for well over 7 years now, due to it being a C-only library and its somewhat restrictive original license.

Introduction

The Levenshtein Python C extension module contains functions for fast computation of

Levenshtein (edit) distance, and edit operations
string similarity
approximate median strings, and generally string averaging
string sequence and set similarity

It supports both normal and Unicode strings.

Python 2.2 or newer is required; Python 3 is supported.

StringMatcher.py is an example SequenceMatcher-like class built on the top of Levenshtein. It misses some SequenceMatcher's functionality, and has some extra OTOH.

Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension. No separate docs are provided yet, RTFS. But they are not interchangeable:

C functions exported when compiling with -DNO_PYTHON (see Levenshtein.h) are not exported when compiling as a Python extension (and vice versa)
Unicode character type used with -DNO_PYTHON is wchar_t, Python extension uses Py_UNICODE, they may be the same but don't count on it

Installation

pip install python-Levenshtein

Documentation

Documentation for the current version <https://rawgit.com/ztane/python-Levenshtein/master/docs/Levenshtein.html>_

gendoc.sh generates HTML API documentation, you probably want a selfcontained instead of includable version, so run in ./gendoc.sh --selfcontained. It needs Levenshtein already installed and genextdoc.py.

License

Levenshtein is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

See the file COPYING for the full text of GNU General Public License version 2.

History

This package was long missing from the Python Package Index and available as source checkout only, but can now be found on PyPI again <https://pypi.python.org/pypi/python-Levenshtein>_.

We needed to restore this package for Go Mobile for Plone <http://webandmobile.mfabrik.com>_ and Pywurfl <http://celljam.net/>_ projects which depend on this.

Source code

http://github.com/ztane/python-Levenshtein/

Authors

Maintainer: Antti Haapala <antti@haapala.name>
Python 3 compatibility: Esa MÃ¤Ã¤ttÃ¤
Jonatas CD: Fixed documentation generation
Previous maintainer: Mikko Ohtamaa <http://opensourcehacker.com>_
Original code: David Necas (Yeti)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot