Convert Figma logo to code with AI

minimaxir logobig-list-of-naughty-strings

The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

46,173
2,129
46,173
104

Top Related Projects

57,590

SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

4,951

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Quick Overview

The "Big List of Naughty Strings" is a collection of strings that have a high probability of causing issues when used as user-input data. It's designed to help developers test their systems against potentially problematic inputs, including special characters, Unicode sequences, and common hacking attempts.

Pros

  • Comprehensive collection of edge-case strings for thorough testing
  • Regularly updated with new problematic strings
  • Easy to integrate into existing testing frameworks
  • Open-source and community-driven

Cons

  • May not cover all possible edge cases for specific applications
  • Some strings may be considered offensive or inappropriate in certain contexts
  • Requires careful handling when used in production environments
  • May need customization for specific use cases or languages

Code Examples

This project is not a code library, but rather a data resource. However, here are some examples of how you might use it in your testing:

# Python example: Reading and using the naughty strings
with open('blns.txt', 'r', encoding='utf-8') as f:
    naughty_strings = f.read().splitlines()

for string in naughty_strings:
    test_function(string)
// JavaScript example: Fetching and using the naughty strings
fetch('https://raw.githubusercontent.com/minimaxir/big-list-of-naughty-strings/master/blns.json')
  .then(response => response.json())
  .then(naughtyStrings => {
    naughtyStrings.forEach(string => {
      testFunction(string);
    });
  });
# Ruby example: Using naughty strings in RSpec tests
require 'json'

naughty_strings = JSON.parse(File.read('blns.json'))

RSpec.describe 'InputValidator' do
  naughty_strings.each do |string|
    it "handles naughty string: #{string[0..20]}..." do
      expect(InputValidator.validate(string)).to be_valid
    end
  end
end

Getting Started

To use the Big List of Naughty Strings in your project:

  1. Clone the repository:

    git clone https://github.com/minimaxir/big-list-of-naughty-strings.git
    
  2. Choose the format you prefer (txt, json, xml) from the repository.

  3. Incorporate the strings into your testing framework or scripts as shown in the code examples above.

  4. Run your tests with these strings to identify potential vulnerabilities or bugs in your input handling.

Competitor Comparisons

57,590

SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

Pros of SecLists

  • More comprehensive, covering a wider range of security testing scenarios
  • Better organized into categories (e.g., passwords, usernames, fuzzing)
  • Regularly updated with contributions from the security community

Cons of SecLists

  • Larger file size, which may be overwhelming for simple testing needs
  • Some lists may contain potentially offensive content
  • Requires more time to navigate and find specific lists

Code Comparison

big-list-of-naughty-strings:

# Example usage
with open('blns.txt') as f:
    strings = f.read().splitlines()

SecLists:

# Example usage (using wfuzz)
wfuzz -c -z file,/path/to/SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt http://example.com/login.php?username=admin&password=FUZZ

The big-list-of-naughty-strings is a single file containing various problematic strings, making it easy to use in simple testing scenarios. SecLists, on the other hand, provides multiple files organized into directories, offering more flexibility for different security testing needs but requiring more specific file selection.

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

Pros of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

  • Focused specifically on profanity and offensive language
  • Available in multiple languages
  • Simpler structure, easier to integrate for basic profanity filtering

Cons of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

  • Limited scope compared to big-list-of-naughty-strings
  • Lacks context-specific strings and edge cases
  • May not cover all potential input validation scenarios

Code Comparison

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words:

2g1c
2 girls 1 cup
acrotomophilia
anal
anilingus
anus

big-list-of-naughty-strings:

undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False

The List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words repository focuses on explicit profanity and offensive terms, while big-list-of-naughty-strings covers a broader range of potentially problematic inputs, including programming-related strings, Unicode characters, and various edge cases for input validation.

big-list-of-naughty-strings is more comprehensive and suitable for thorough input validation and security testing, while List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words is better suited for simple profanity filtering in multiple languages.

4,951

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

Pros of corpora

  • Broader scope: Contains diverse datasets beyond just problematic strings
  • More structured: Data organized into categories and subcategories
  • Regularly updated: Active community contributions and maintenance

Cons of corpora

  • Less focused: Not specifically tailored for testing edge cases or security
  • Larger size: May be overkill for simple string validation tasks
  • More complex: Requires more effort to navigate and utilize specific datasets

Code comparison

big-list-of-naughty-strings:

with open('blns.txt', 'r') as f:
    naughty_strings = f.read().splitlines()

corpora:

import json
with open('corpora/data/technology/programming_languages.json') as f:
    programming_languages = json.load(f)['programming_languages']

The big-list-of-naughty-strings is a simple text file, while corpora uses JSON format for structured data. corpora requires parsing JSON and navigating the data structure, whereas big-list-of-naughty-strings can be used directly as a list of strings.

Both repositories serve different purposes: big-list-of-naughty-strings focuses on problematic input for testing, while corpora provides a wide range of curated datasets for various applications.

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

Pros of google-10000-english

  • Contains a comprehensive list of common English words, useful for various language-related tasks
  • Organized by frequency, allowing for easy selection of most common words
  • Simple and straightforward format, easy to integrate into projects

Cons of google-10000-english

  • Limited to English words, not suitable for testing edge cases or special characters
  • Lacks variety in string types (e.g., no numbers, symbols, or Unicode characters)
  • Not designed for security testing or input validation scenarios

Code Comparison

big-list-of-naughty-strings:

# Example strings
"undefined"
"undef"
"null"
"NULL"
"(null)"

google-10000-english:

# Example words
the
of
and
to
a

The big-list-of-naughty-strings repository contains a diverse set of problematic strings for testing input validation and edge cases. It includes various types of strings that could potentially cause issues in software systems.

In contrast, google-10000-english provides a list of common English words, sorted by frequency. This repository is more suitable for natural language processing tasks, vocabulary building, or general language-related applications.

While big-list-of-naughty-strings focuses on identifying potential vulnerabilities and edge cases, google-10000-english serves as a resource for common English vocabulary. The choice between these repositories depends on the specific requirements of your project, whether it's security testing or language processing.

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Pros of english-words

  • Comprehensive list of English words, useful for various language-related applications
  • Regularly updated and maintained, ensuring accuracy and relevance
  • Simple and straightforward structure, easy to integrate into projects

Cons of english-words

  • Limited to English language only, not suitable for multilingual applications
  • Lacks special characters, symbols, or edge cases found in big-list-of-naughty-strings
  • May not be as effective for security testing or input validation purposes

Code Comparison

english-words:

aback
abacus
abandon
abandoned
abandoning
abandonment

big-list-of-naughty-strings:

undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False

The english-words repository contains a simple list of English words, one per line. In contrast, big-list-of-naughty-strings includes various types of strings that could potentially cause issues in software applications, such as null values, boolean expressions, and special characters.

While english-words is ideal for general language processing tasks, big-list-of-naughty-strings is more suited for testing input validation, security, and edge cases in software development.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Big List of Naughty Strings

The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data. This is intended for use in helping both automated and manual QA testing; useful for whenever your QA engineer walks into a bar.

Why Test Naughty Strings?

Even multi-billion dollar companies with huge amounts of automated testing can't find every bad input. For example, look at what happens when you try to Tweet a zero-width space (U+200B) on Twitter:

Although this is not a malicious error, and typical users aren't Tweeting weird unicode, an "internal server error" for unexpected input is never a positive experience for the user, and may in fact be a symptom of deeper string-validation issues. The Big List of Naughty Strings is intended to help reveal such issues.

Usage

blns.txt consists of newline-delimited strings and comments which are preceded with #. The comments divide the strings into sections for easy manual reading and copy/pasting into input forms. For those who want to access the strings programmatically, a blns.json file is provided containing an array with all the comments stripped out (the scripts folder contains a Python script used to generate the blns.json).

Contributions

Feel free to send a pull request to add more strings, or additional sections. However, please do not send pull requests with very-long strings (255+ characters), as that makes the list much more difficult to view.

Likewise, please do not send pull requests which compromise manual usability of the file. This includes the EICAR test string, which can cause the file to be flagged by antivirus scanners, and files which alter the encoding of blns.txt. Also, do not send a null character (U+0000) string, as it changes the file format on GitHub to binary and renders it unreadable in pull requests. Finally, when adding or removing a string please update all files when you perform a pull request.

Disclaimer

The Big List of Naughty Strings is intended to be used for software you own and manage. Some of the Naughty Strings can indicate security vulnerabilities, and as a result using such strings with third-party software may be a crime. The maintainer is not responsible for any negative actions that result from the use of the list.

Additionally, the Big List of Naughty Strings is not a fully-comprehensive substitute for formal security/penetration testing for your service.

Library / Packages

Various implementations of the Big List of Naughty Strings have made it to various package managers. Those are maintained by outside parties, but can be found here:

LibraryLink
Nodehttps://www.npmjs.com/package/blns
Nodehttps://www.npmjs.com/package/big-list-of-naughty-strings
.NEThttps://github.com/SimonCropp/NaughtyStrings
PHPhttps://github.com/mattsparks/blns-php
C++https://github.com/eliabieri/blnscpp

Please open a PR to list others.

Maintainer/Creator

Max Woolf (@minimaxir)

Social Media Discussions

License

MIT