Convert Figma logo to code with AI

wooorm logodictionaries

Hunspell dictionaries in UTF-8

1,221
398
1,221
6

Top Related Projects

Style and Grammar Checker for 25+ Languages

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

📜 A collection of wordlists for many different usages

Quick Overview

The wooorm/dictionaries repository is a comprehensive collection of dictionaries for various languages, primarily intended for spell-checking purposes. It provides a standardized format for dictionaries across multiple languages, making it easier for developers to integrate spell-checking capabilities into their applications.

Pros

  • Extensive language support with dictionaries for numerous languages and dialects
  • Consistent format across all dictionaries, simplifying integration and usage
  • Regular updates and contributions from the community
  • MIT licensed, allowing for free use in both open-source and commercial projects

Cons

  • Some languages may have less comprehensive dictionaries compared to others
  • Relies on community contributions for updates and additions, which may lead to inconsistencies
  • Large repository size due to the number of dictionaries included
  • May require additional processing or tools to be used effectively in applications

Getting Started

To use a dictionary from this repository in your project:

  1. Clone the repository or download the specific dictionary file you need.
  2. Install a spell-checking library that can work with these dictionary files (e.g., Hunspell).
  3. Point your spell-checking library to the downloaded dictionary file.

Example using Node.js with the nspell package:

npm install nspell dictionary-en
import nspell from 'nspell'
import enGB from 'dictionary-en-gb'

enGB((err, dict) => {
  if (err) throw err
  const spell = nspell(dict)
  console.log(spell.correct('color')) // false
  console.log(spell.correct('colour')) // true
  console.log(spell.suggest('color')) // ['colour', 'color', ...]
})

Note: The exact implementation may vary depending on your programming language and chosen spell-checking library.

Competitor Comparisons

Style and Grammar Checker for 25+ Languages

Pros of LanguageTool

  • Comprehensive grammar and style checker with support for multiple languages
  • Offers both a standalone application and integration options for various platforms
  • Actively maintained with regular updates and improvements

Cons of LanguageTool

  • Larger and more complex codebase, potentially harder to contribute to or customize
  • Requires more system resources due to its extensive feature set
  • May have a steeper learning curve for developers looking to integrate it

Code Comparison

LanguageTool (Java):

public class SentenceTokenizer implements Tokenizer {
  public List<String> tokenize(String text) {
    List<String> sentences = new ArrayList<>();
    // Tokenization logic here
    return sentences;
  }
}

Dictionaries (JavaScript):

function tokenize(text) {
  return text.split(/\s+/).filter(Boolean);
}

Summary

LanguageTool is a full-featured grammar and style checker with broad language support, while Dictionaries focuses on providing simple word lists and basic text processing utilities. LanguageTool offers more advanced features but comes with increased complexity, while Dictionaries is lightweight and easier to integrate for basic word-related tasks.

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Pros of english-words

  • Simple, straightforward list of English words in a single file
  • Easy to use and integrate into projects
  • Includes a large number of words (466k+)

Cons of english-words

  • Limited to English language only
  • Lacks additional linguistic information (e.g., parts of speech, definitions)
  • May include some non-standard or uncommon words

Code Comparison

english-words:

# Simple text file with one word per line
aardvark
aardwolf
aaron
aback
abacus

dictionaries:

{
  "name": "en",
  "words": ["aardvark", "aardwolf", "aaron", "aback", "abacus"],
  "description": "English dictionary",
  "license": "MIT"
}

Key Differences

  • dictionaries offers multi-language support with separate files for each language
  • dictionaries provides structured JSON format with metadata
  • english-words is a simple text file, easier to parse but less feature-rich
  • dictionaries includes additional tools and scripts for processing dictionaries
  • english-words focuses solely on providing a comprehensive list of English words

Both repositories serve as valuable resources for developers working on language-related projects, with dictionaries offering a more comprehensive and structured approach across multiple languages, while english-words provides a simpler, English-focused solution.

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

Pros of google-10000-english

  • Focused list of the most common English words, ideal for basic language processing
  • Simple, easy-to-use format with words sorted by frequency
  • Lightweight and quick to implement in projects

Cons of google-10000-english

  • Limited vocabulary scope compared to comprehensive dictionaries
  • Lacks additional linguistic information (e.g., parts of speech, definitions)
  • May not be suitable for advanced natural language processing tasks

Code Comparison

google-10000-english:

the
of
and
to
a

dictionaries:

{
  "name": "en",
  "words": ["a", "aback", "abacus", "abandon", "abandoned"]
}

Summary

google-10000-english provides a straightforward list of common English words, making it ideal for simple language processing tasks. However, it lacks the depth and linguistic information found in dictionaries. The latter offers a more comprehensive approach with structured data, including language identification and a broader vocabulary range. While google-10000-english is easier to implement quickly, dictionaries provides more versatility for advanced language-related projects.

📜 A collection of wordlists for many different usages

Pros of wordlists

  • More diverse content, including specialized wordlists for security testing and penetration testing
  • Regularly updated with new wordlists and contributions
  • Includes wordlists in multiple languages and for various purposes (e.g., passwords, usernames)

Cons of wordlists

  • Less structured organization compared to dictionaries
  • May contain potentially sensitive or offensive content
  • Lacks the extensive language coverage found in dictionaries

Code comparison

wordlists:

# Example of a simple wordlist (passwords.txt)
password123
qwerty
letmein
admin

dictionaries:

{
  "name": "English",
  "words": ["apple", "banana", "cherry", "date"],
  "aff": "...",
  "dic": "..."
}

Key differences

  1. Purpose: wordlists focuses on security and penetration testing, while dictionaries aims to provide comprehensive language resources.
  2. Structure: wordlists uses simple text files, whereas dictionaries employs structured JSON format with additional language-specific data.
  3. Scope: wordlists covers a broader range of applications but with less depth in any single language, while dictionaries offers in-depth coverage for fewer languages.
  4. Maintenance: wordlists is more frequently updated due to its community-driven nature, while dictionaries has a more stable, curated approach.

Both repositories serve different purposes and cater to distinct user needs, making direct comparison challenging in some aspects.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

dictionaries

Collection of normalized and installable hunspell dictionaries.

Contents

What is this?

This monorepo is a bunch of scripts that crawls dictionaries from several sources, normalizes them, and packs them so that they can each be installed and used in one single way. Dictionaries are not maintained here but they are usable from here.

When should I use this?

You can particularly use the packages here as a programmer when integrating with other tools (such as nodehun or nspell) or when making such tools.

Install

These packages are ESM only. In Node.js (version 16+), install with npm:

npm install dictionary-en

👉 Note: replace en with the language code you want.

⚠️ Important: this project itself is MIT, but each index.dic and index.aff file still has its original license!

Use

import en from 'dictionary-en'

console.log(en)
// To do: use `en` somehow

Yields:

{aff: <Buffer>, dic: <Buffer>}

List of dictionaries

👉 Note: preferred BCP-47 codes are used (according to Unicode CLDR). To illustrate, as American English and Brazilian Portuguese are the most common types of English and Portuguese respectively, they get the codes en and pt.

In total 92 dictionaries are provided.

NameDescriptionLicense
dictionary-bgBulgarian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-brBreton(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-caCatalan(GPL-2.0 OR LGPL-2.1)
dictionary-ca-valenciaCatalan (Valencia)(GPL-2.0 OR LGPL-2.1)
dictionary-csCzechGPL-2.0
dictionary-cyWelshLGPL-3.0
dictionary-daDanish(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-deGerman(GPL-2.0 OR GPL-3.0)
dictionary-de-atGerman (Austria)(GPL-2.0 OR GPL-3.0)
dictionary-de-chGerman (Switzerland)(GPL-2.0 OR GPL-3.0)
dictionary-elGreek(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-el-polytonGreek (Polyton)GPL-3.0
dictionary-enEnglish(MIT AND BSD)
dictionary-en-auEnglish (Australia)(MIT AND BSD)
dictionary-en-caEnglish (Canada)(MIT AND BSD)
dictionary-en-gbEnglish (United Kingdom)(MIT AND BSD)
dictionary-en-zaEnglish (South Africa)LGPL-2.1
dictionary-eoEsperantoGPL-2.0
dictionary-esSpanish(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-arSpanish (Argentina)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-boSpanish (Bolivia)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-clSpanish (Chile)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-coSpanish (Colombia)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-crSpanish (Costa Rica)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-cuSpanish (Cuba)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-doSpanish (Dominican Republic)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-ecSpanish (Ecuador)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-gtSpanish (Guatemala)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-hnSpanish (Honduras)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-mxSpanish (Mexico)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-niSpanish (Nicaragua)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-paSpanish (Panama)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-peSpanish (Peru)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-phSpanish (Philippines)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-prSpanish (Puerto Rico)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-pySpanish (Paraguay)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-svSpanish (El Salvador)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-usSpanish (United States of America)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-uySpanish (Uruguay)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-es-veSpanish (Venezuela)(GPL-3.0 OR LGPL-3.0 OR MPL-1.1)
dictionary-etEstonianLGPL-2.1
dictionary-euBasqueGPL-2.0
dictionary-faPersianApache-2.0
dictionary-foFaroese(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-frFrenchMPL-2.0
dictionary-furFriulianGPL-2.0
dictionary-fyWestern FrisianGPL-3.0
dictionary-gaIrishGPL-2.0
dictionary-gdScottish GaelicGPL-3.0
dictionary-glGalicianGPL-3.0
dictionary-heHebrewAGPL-3.0
dictionary-hrCroatian(LGPL-2.1 OR SISSL)
dictionary-huHungarian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-hyArmenian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-hywWestern Armenian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-iaInterlinguaGPL-3.0
dictionary-ieInterlingueApache-2.0
dictionary-isIcelandicCC-BY-SA-3.0
dictionary-itItalianGPL-3.0
dictionary-kaGeorgianMIT
dictionary-koKorean(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-laLatinGPL-2.0
dictionary-lbLuxembourgishEUPL-1.1
dictionary-ltLithuanianBSD-3-Clause
dictionary-ltgLatgalianLGPL-2.1
dictionary-lvLatvianLGPL-2.1
dictionary-mkMacedonianGPL-3.0
dictionary-mnMongolianLPPL-1.3c
dictionary-nbNorwegian BokmålGPL-2.0
dictionary-ndsLow GermanGPL-3.0
dictionary-neNepaliLGPL-2.1
dictionary-nlDutch(BSD-3-Clause OR CC-BY-3.0)
dictionary-nnNorwegian NynorskGPL-2.0
dictionary-ocOccitanGPL-2.0
dictionary-plPolish(GPL-3.0 OR LGPL-3.0 OR MPL-2.0)
dictionary-ptPortuguese(LGPL-3.0 OR MPL-2.0)
dictionary-pt-ptPortuguese (Portugal)(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-roRomanian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-ruRussianBSD-3-Clause
dictionary-rwKinyarwandaGPL-3.0
dictionary-skSlovak(GPL-2.0 OR LGPL-2.1 OR MPL-1.1)
dictionary-slSlovenian(GPL-3.0 OR LGPL-2.1)
dictionary-srSerbian(GPL-2.0 OR LGPL-2.1 OR MPL-1.1 OR CC-BY-SA-3.0)
dictionary-sr-latnSerbian (Latin script)(GPL-2.0 OR LGPL-2.1 OR MPL-1.1 OR CC-BY-SA-3.0)
dictionary-svSwedishLGPL-3.0
dictionary-sv-fiSwedish (Finland)LGPL-3.0
dictionary-tkTurkmenApache-2.0
dictionary-tlhKlingonApache-2.0
dictionary-tlh-latnKlingon (Latin script)Apache-2.0
dictionary-trTurkishMIT
dictionary-ukUkrainianGPL-3.0
dictionary-viVietnameseGPL-2.0

Examples

Example: use with nspell

This example uses dictionary-en in combination with nspell.

Show install command for this example
npm install dictionary-en nspell
import en from 'dictionary-en'
import nspell from 'nspell'

const spell = nspell(en)
console.log(spell.correct('color'))
console.log(spell.correct('colour'))

Yields:

true
false

Example: load files

This example loads the index.dic and index.aff files located in dictionary-hyw (Western Armenian) from a Node.js JavaScript module (ESM).

It uses a ponyfill (import-meta-resolve) for an experimental Node API.

Show install command for this example
npm install dictionary-hyw import-meta-resolve
import fs from 'node:fs/promises'
import {resolve} from 'import-meta-resolve'

const base = await resolve('dictionary-hyw', import.meta.url)
const aff = await fs.readFile(new URL('index.aff', base))
const dic = await fs.readFile(new URL('index.dic', base))
console.log(aff, dic)

Example: use with macOS

Follow these steps to use a dictionary on macOS:

  1. navigate to the dictionary you want on GitHub, such as dictionaries/$code (replace $code with the language code you want)
  2. download the index.aff and index.dic files (as in open them, right-click “Raw”, and “download linked files”)
  3. rename the download files to $code.aff and $code.dic
  4. move $code.aff and $code.dic into the folder ~/Library/Spelling/
  5. go to System Preferences > Keyboard > Text > Spelling and select your added language (it should come with the (Library) suffix and is situated at the bottom)

Types

The packages are typed with TypeScript.

Security

These packages are safe.

Contribute

Yes please! See How to Contribute to Open Source.

Build

To build this project, on macOS, you at least need to install:

  • wget: brew install wget (crawling)
  • hunspell: brew install hunspell (many dictionaries)
  • sed: brew install gnu-sed (crawling, many dictionaries)
  • coreutils: brew install coreutils (many dictionaries)
  • ispell: brew install ispell (German)

👉 Note: sed and the GNU replacements should be setup in PATH to overwrite macOS defaults.

Updating a dictionary

Dictionaries are not maintained here. Report problems upstream.

Adding a new dictionary

Dictionaries are not maintained here. Most languages have a small community or institute that maintains a dictionary, and they often do so on GitHub or similar. Please ask in the issues to request that such a dictionary is included here.

👉 Note: acceptable dictionaries must:

  • have a significant affix file (not just a .dic file)
  • have an open source license
  • have recent contributions

License

MIT © Titus Wormer

See license files in each dictionary for the licensing of index.dic and index.aff files.