Top Related Projects
Style and Grammar Checker for 25+ Languages
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
📜 A collection of wordlists for many different usages
Quick Overview
The wooorm/dictionaries repository is a comprehensive collection of dictionaries for various languages, primarily intended for spell-checking purposes. It provides a standardized format for dictionaries across multiple languages, making it easier for developers to integrate spell-checking capabilities into their applications.
Pros
- Extensive language support with dictionaries for numerous languages and dialects
- Consistent format across all dictionaries, simplifying integration and usage
- Regular updates and contributions from the community
- MIT licensed, allowing for free use in both open-source and commercial projects
Cons
- Some languages may have less comprehensive dictionaries compared to others
- Relies on community contributions for updates and additions, which may lead to inconsistencies
- Large repository size due to the number of dictionaries included
- May require additional processing or tools to be used effectively in applications
Getting Started
To use a dictionary from this repository in your project:
- Clone the repository or download the specific dictionary file you need.
- Install a spell-checking library that can work with these dictionary files (e.g., Hunspell).
- Point your spell-checking library to the downloaded dictionary file.
Example using Node.js with the nspell
package:
npm install nspell dictionary-en
import nspell from 'nspell'
import enGB from 'dictionary-en-gb'
enGB((err, dict) => {
if (err) throw err
const spell = nspell(dict)
console.log(spell.correct('color')) // false
console.log(spell.correct('colour')) // true
console.log(spell.suggest('color')) // ['colour', 'color', ...]
})
Note: The exact implementation may vary depending on your programming language and chosen spell-checking library.
Competitor Comparisons
Style and Grammar Checker for 25+ Languages
Pros of LanguageTool
- Comprehensive grammar and style checker with support for multiple languages
- Offers both a standalone application and integration options for various platforms
- Actively maintained with regular updates and improvements
Cons of LanguageTool
- Larger and more complex codebase, potentially harder to contribute to or customize
- Requires more system resources due to its extensive feature set
- May have a steeper learning curve for developers looking to integrate it
Code Comparison
LanguageTool (Java):
public class SentenceTokenizer implements Tokenizer {
public List<String> tokenize(String text) {
List<String> sentences = new ArrayList<>();
// Tokenization logic here
return sentences;
}
}
Dictionaries (JavaScript):
function tokenize(text) {
return text.split(/\s+/).filter(Boolean);
}
Summary
LanguageTool is a full-featured grammar and style checker with broad language support, while Dictionaries focuses on providing simple word lists and basic text processing utilities. LanguageTool offers more advanced features but comes with increased complexity, while Dictionaries is lightweight and easier to integrate for basic word-related tasks.
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
Pros of english-words
- Simple, straightforward list of English words in a single file
- Easy to use and integrate into projects
- Includes a large number of words (466k+)
Cons of english-words
- Limited to English language only
- Lacks additional linguistic information (e.g., parts of speech, definitions)
- May include some non-standard or uncommon words
Code Comparison
english-words:
# Simple text file with one word per line
aardvark
aardwolf
aaron
aback
abacus
dictionaries:
{
"name": "en",
"words": ["aardvark", "aardwolf", "aaron", "aback", "abacus"],
"description": "English dictionary",
"license": "MIT"
}
Key Differences
- dictionaries offers multi-language support with separate files for each language
- dictionaries provides structured JSON format with metadata
- english-words is a simple text file, easier to parse but less feature-rich
- dictionaries includes additional tools and scripts for processing dictionaries
- english-words focuses solely on providing a comprehensive list of English words
Both repositories serve as valuable resources for developers working on language-related projects, with dictionaries offering a more comprehensive and structured approach across multiple languages, while english-words provides a simpler, English-focused solution.
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Pros of google-10000-english
- Focused list of the most common English words, ideal for basic language processing
- Simple, easy-to-use format with words sorted by frequency
- Lightweight and quick to implement in projects
Cons of google-10000-english
- Limited vocabulary scope compared to comprehensive dictionaries
- Lacks additional linguistic information (e.g., parts of speech, definitions)
- May not be suitable for advanced natural language processing tasks
Code Comparison
google-10000-english:
the
of
and
to
a
dictionaries:
{
"name": "en",
"words": ["a", "aback", "abacus", "abandon", "abandoned"]
}
Summary
google-10000-english provides a straightforward list of common English words, making it ideal for simple language processing tasks. However, it lacks the depth and linguistic information found in dictionaries. The latter offers a more comprehensive approach with structured data, including language identification and a broader vocabulary range. While google-10000-english is easier to implement quickly, dictionaries provides more versatility for advanced language-related projects.
📜 A collection of wordlists for many different usages
Pros of wordlists
- More diverse content, including specialized wordlists for security testing and penetration testing
- Regularly updated with new wordlists and contributions
- Includes wordlists in multiple languages and for various purposes (e.g., passwords, usernames)
Cons of wordlists
- Less structured organization compared to dictionaries
- May contain potentially sensitive or offensive content
- Lacks the extensive language coverage found in dictionaries
Code comparison
wordlists:
# Example of a simple wordlist (passwords.txt)
password123
qwerty
letmein
admin
dictionaries:
{
"name": "English",
"words": ["apple", "banana", "cherry", "date"],
"aff": "...",
"dic": "..."
}
Key differences
- Purpose: wordlists focuses on security and penetration testing, while dictionaries aims to provide comprehensive language resources.
- Structure: wordlists uses simple text files, whereas dictionaries employs structured JSON format with additional language-specific data.
- Scope: wordlists covers a broader range of applications but with less depth in any single language, while dictionaries offers in-depth coverage for fewer languages.
- Maintenance: wordlists is more frequently updated due to its community-driven nature, while dictionaries has a more stable, curated approach.
Both repositories serve different purposes and cater to distinct user needs, making direct comparison challenging in some aspects.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
dictionaries
Collection of normalized and installable hunspell dictionaries.
Contents
- What is this?
- When should I use this?
- Install
- Use
- List of dictionaries
- Examples
- Types
- Security
- Contribute
- License
What is this?
This monorepo is a bunch of scripts that crawls dictionaries from several sources, normalizes them, and packs them so that they can each be installed and used in one single way. Dictionaries are not maintained here but they are usable from here.
When should I use this?
You can particularly use the packages here as a programmer when integrating with
other tools (such as nodehun
or nspell
)
or when making such tools.
Install
These packages are ESM only. In Node.js (version 16+), install with npm:
npm install dictionary-en
ð Note: replace
en
with the language code you want.â ï¸ Important: this project itself is MIT, but each
index.dic
andindex.aff
file still has its original license!
Use
import en from 'dictionary-en'
console.log(en)
// To do: use `en` somehow
Yields:
{aff: <Buffer>, dic: <Buffer>}
List of dictionaries
ð Note: preferred BCP-47 codes are used (according to Unicode CLDR). To illustrate, as American English and Brazilian Portuguese are the most common types of English and Portuguese respectively, they get the codes
en
andpt
.
In total 92 dictionaries are provided.
Examples
Example: use with nspell
This example uses dictionary-en
in combination with
nspell
.
Show install command for this example
npm install dictionary-en nspell
import en from 'dictionary-en'
import nspell from 'nspell'
const spell = nspell(en)
console.log(spell.correct('color'))
console.log(spell.correct('colour'))
Yields:
true
false
Example: load files
This example loads the index.dic
and index.aff
files located in
dictionary-hyw
(Western Armenian) from a Node.js JavaScript module (ESM).
It uses a ponyfill (import-meta-resolve
) for
an experimental Node API.
Show install command for this example
npm install dictionary-hyw import-meta-resolve
import fs from 'node:fs/promises'
import {resolve} from 'import-meta-resolve'
const base = await resolve('dictionary-hyw', import.meta.url)
const aff = await fs.readFile(new URL('index.aff', base))
const dic = await fs.readFile(new URL('index.dic', base))
console.log(aff, dic)
Example: use with macOS
Follow these steps to use a dictionary on macOS:
- navigate to the dictionary you want on GitHub,
such as
dictionaries/$code
(replace$code
with the language code you want) - download the
index.aff
andindex.dic
files (as in open them, right-click âRawâ, and âdownload linked filesâ) - rename the download files to
$code.aff
and$code.dic
- move
$code.aff
and$code.dic
into the folder~/Library/Spelling/
- go to System Preferences > Keyboard > Text > Spelling and
select your added language (it should come with the
(Library)
suffix and is situated at the bottom)
Types
The packages are typed with TypeScript.
Security
These packages are safe.
Contribute
Yes please! See How to Contribute to Open Source.
Build
To build this project, on macOS, you at least need to install:
- wget:
brew install wget
(crawling) - hunspell:
brew install hunspell
(many dictionaries) - sed:
brew install gnu-sed
(crawling, many dictionaries) - coreutils:
brew install coreutils
(many dictionaries) - ispell:
brew install ispell
(German)
ð Note: sed and the GNU replacements should be setup in PATH to overwrite macOS defaults.
Updating a dictionary
Dictionaries are not maintained here. Report problems upstream.
Adding a new dictionary
Dictionaries are not maintained here. Most languages have a small community or institute that maintains a dictionary, and they often do so on GitHub or similar. Please ask in the issues to request that such a dictionary is included here.
ð Note: acceptable dictionaries must:
- have a significant affix file (not just a
.dic
file)- have an open source license
- have recent contributions
License
MIT © Titus Wormer
See license
files in each dictionary for the licensing of index.dic
and
index.aff
files.
Top Related Projects
Style and Grammar Checker for 25+ Languages
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
📜 A collection of wordlists for many different usages
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot