Top Related Projects
Fuzzy String Matching in Python
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Fuzzy String Matching in Python
Quick Overview
FuzzySet.js is a fuzzy string matching library for JavaScript. It provides a way to perform approximate string matching, allowing you to find the closest match to a given string within a set of strings. This is particularly useful for tasks like autocomplete, spell checking, or finding similar items in a dataset.
Pros
- Easy to use with a simple API
- Supports both browser and Node.js environments
- Customizable matching threshold and gram size
- Lightweight with no external dependencies
Cons
- Limited to exact substring matching, not semantic similarity
- Performance may degrade with large datasets
- Not actively maintained (last update was in 2019)
- Limited documentation and examples
Code Examples
- Creating a FuzzySet and adding items:
const FuzzySet = require('fuzzyset.js');
const set = FuzzySet(['apple', 'banana', 'orange']);
- Finding the closest match:
const result = set.get('aple');
console.log(result); // [[0.8, 'apple']]
- Adding items dynamically and adjusting the threshold:
set.add('grape');
set.add('pineapple');
const result = set.get('grap', null, 0.7);
console.log(result); // [[0.75, 'grape']]
Getting Started
To use FuzzySet.js in your project, follow these steps:
-
Install the package:
npm install fuzzyset.js
-
Import and use in your JavaScript code:
const FuzzySet = require('fuzzyset.js'); const set = FuzzySet(['hello', 'world', 'fuzzy', 'matching']); const result = set.get('helo'); console.log(result); // [[0.75, 'hello']]
-
For browser usage, include the script in your HTML:
<script src="https://cdnjs.cloudflare.com/ajax/libs/fuzzyset.js/0.0.91/fuzzyset.min.js"></script>
Competitor Comparisons
Fuzzy String Matching in Python
Pros of fuzzywuzzy
- More comprehensive set of string matching algorithms, including Levenshtein distance and token-based matching
- Better support for handling non-ASCII characters and Unicode strings
- Includes built-in functions for extracting best matches from a list of choices
Cons of fuzzywuzzy
- Generally slower performance compared to fuzzyset.js, especially for large datasets
- Requires additional dependencies (Python's difflib) for some functionalities
- Less suitable for browser-based applications due to its Python implementation
Code Comparison
fuzzywuzzy:
from fuzzywuzzy import fuzz
ratio = fuzz.ratio("this is a test", "this is a test!")
fuzzyset.js:
const FuzzySet = require('fuzzyset.js');
const a = FuzzySet(['this is a test']);
const result = a.get('this is a test!');
Both libraries provide simple interfaces for fuzzy string matching, but fuzzywuzzy offers more built-in algorithms and options for customization. fuzzyset.js is more lightweight and better suited for JavaScript environments, while fuzzywuzzy provides a broader range of functionalities at the cost of performance and language limitations.
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Pros of python-Levenshtein
- Implemented in C, offering superior performance for large-scale operations
- Provides a wider range of string similarity algorithms beyond Levenshtein distance
- Supports Unicode strings natively
Cons of python-Levenshtein
- Limited to Python environment, not suitable for JavaScript projects
- Requires compilation, which may be challenging on some systems
- Less intuitive API for simple fuzzy matching tasks
Code Comparison
python-Levenshtein:
from Levenshtein import distance
result = distance("kitten", "sitting")
fuzzyset.js:
const FuzzySet = require('fuzzyset.js');
const a = FuzzySet(['kitten']);
const result = a.get('sitting');
python-Levenshtein provides a more direct approach to calculating string distances, while fuzzyset.js offers a higher-level API for fuzzy matching. The python-Levenshtein example calculates the Levenshtein distance between two strings, whereas fuzzyset.js creates a set of strings and performs a fuzzy search against it.
Both libraries serve different purposes and environments. python-Levenshtein is better suited for high-performance, low-level string operations in Python, while fuzzyset.js provides an easy-to-use fuzzy matching solution for JavaScript applications.
Fuzzy String Matching in Python
Pros of thefuzz
- More comprehensive set of string matching algorithms, including Levenshtein, Jaro-Winkler, and Q-gram
- Better performance for large datasets due to optimized C implementations
- Active development and maintenance with regular updates
Cons of thefuzz
- Larger library size, which may impact load times in browser environments
- Slightly more complex API, requiring more setup for basic use cases
- Python-based, which may not be ideal for JavaScript-centric projects
Code Comparison
thefuzz:
from thefuzz import fuzz
ratio = fuzz.ratio("this is a test", "this is a test!")
fuzzyset.js:
const FuzzySet = require('fuzzyset.js');
const a = FuzzySet(['this is a test']);
const result = a.get('this is a test!');
Both libraries provide fuzzy string matching capabilities, but thefuzz offers a wider range of algorithms and is better suited for large-scale applications. fuzzyset.js, on the other hand, is more lightweight and easier to integrate into JavaScript projects. The choice between the two depends on the specific requirements of your project, such as the programming language, performance needs, and the complexity of string matching tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Fuzzyset - A fuzzy string set for javascript
Fuzzyset is a data structure that performs something akin to fulltext search against data to determine likely mispellings and approximate string matching.
Usage
The usage is simple. Just add a string to the set, and ask for it later
by using .get
:
a = FuzzySet();
a.add("michael axiak");
a.get("micael asiak");
// will be [[0.8461538461538461, 'michael axiak']];
The result will be an array of [score, matched_value]
arrays.
The score is between 0 and 1, with 1 being a perfect match.
Install
npm install fuzzyset
(Used to be fuzzyset.js
.)
Then:
import FuzzySet from 'fuzzyset'
// or, depending on your JavaScript environment...
const FuzzySet = require('fuzzyset')
Or for use directly on the web:
<script type="text/javascript" src="dist/fuzzyset.js"></script>
This library should work just fine with TypeScript, too.
Construction Arguments
array
: An array of strings to initialize the data structure withuseLevenshtein
: Whether or not to use the levenshtein distance to determine the match scoring. Default:true
gramSizeLower
: The lower bound of gram sizes to use, inclusive (see interactive documentation). Default:2
gramSizeUpper
: The upper bound of gram sizes to use, inclusive (see interactive documentation). Default:3
Methods
get(value, [default], [minScore=.33])
: try to match a string to entries with a score of at least minScore (defaulted to .33), otherwise returnnull
ordefault
if it is given.add(value)
: add a value to the set returningfalse
if it is already in the set.length()
: return the number of items in the set.isEmpty()
: returns true if the set is empty.values()
: returns an array of the values in the set.
Interactive Documentation
To play with the library or see how it works internally, check out the amazing interactive documentation:
Develop
To contribute to the library, edit the lib/fuzzyset.js
file then run npm run build
to generate all the different file formats in the dist/
directory. Or run npm run dev
while developing to auto-build as you change files.
License
This package is licensed under the Prosperity Public License 3.0.
That means that this package is free to use for non-commercial projects â personal projects, public benefit projects, research, education, etc. (see the license for full details). If your project is commercial (even for internal use at your company), you have 30 days to try this package for free before you have to pay a one-time licensing fee of $42.
You can purchase a commercial license instantly here.
Why this license scheme? Since I quit tech to become a therapist, my income is much lower (due to the unjust costs of mental health care in the US, but don't get me started). I'm asking for paid licenses for Fuzzyset.js to support all the free work I've done on this project over the past 10 years (!) and so I can live a sustainable life in service of my therapy clients. If you're a small operation that would like to use Fuzzyset.js but can't swing the license cost, please reach out to me and we can work something out.
Top Related Projects
Fuzzy String Matching in Python
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Fuzzy String Matching in Python
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot