big-list-of-naughty-strings

The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

47,333

2,158

47,333

109

View on GitHub

Top Related Projects

SecLists

64,602

SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

3,121

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

corpora

5,011

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

google-10000-english

4,138

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

english-words

11,470

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Quick Overview

The "Big List of Naughty Strings" is a collection of strings that have a high probability of causing issues when used as user-input data. It's designed to help developers test their systems against potentially problematic inputs, including special characters, Unicode sequences, and common hacking attempts.

Pros

Comprehensive collection of edge-case strings for thorough testing
Regularly updated with new problematic strings
Easy to integrate into existing testing frameworks
Open-source and community-driven

Cons

May not cover all possible edge cases for specific applications
Some strings may be considered offensive or inappropriate in certain contexts
Requires careful handling when used in production environments
May need customization for specific use cases or languages

Code Examples

This project is not a code library, but rather a data resource. However, here are some examples of how you might use it in your testing:

# Python example: Reading and using the naughty strings
with open('blns.txt', 'r', encoding='utf-8') as f:
    naughty_strings = f.read().splitlines()

for string in naughty_strings:
    test_function(string)

// JavaScript example: Fetching and using the naughty strings
fetch('https://raw.githubusercontent.com/minimaxir/big-list-of-naughty-strings/master/blns.json')
  .then(response => response.json())
  .then(naughtyStrings => {
    naughtyStrings.forEach(string => {
      testFunction(string);
    });
  });

# Ruby example: Using naughty strings in RSpec tests
require 'json'

naughty_strings = JSON.parse(File.read('blns.json'))

RSpec.describe 'InputValidator' do
  naughty_strings.each do |string|
    it "handles naughty string: #{string[0..20]}..." do
      expect(InputValidator.validate(string)).to be_valid
    end
  end
end

Getting Started

To use the Big List of Naughty Strings in your project:

Clone the repository:

git clone https://github.com/minimaxir/big-list-of-naughty-strings.git

Choose the format you prefer (txt, json, xml) from the repository.
Incorporate the strings into your testing framework or scripts as shown in the code examples above.
Run your tests with these strings to identify potential vulnerabilities or bugs in your input handling.

Competitor Comparisons

SecLists

64,602

Pros of SecLists

More comprehensive, covering a wider range of security testing scenarios
Better organized into categories (e.g., passwords, usernames, fuzzing)
Regularly updated with contributions from the security community

Cons of SecLists

Larger file size, which may be overwhelming for simple testing needs
Some lists may contain potentially offensive content
Requires more time to navigate and find specific lists

Code Comparison

big-list-of-naughty-strings:

# Example usage
with open('blns.txt') as f:
    strings = f.read().splitlines()

SecLists:

# Example usage (using wfuzz)
wfuzz -c -z file,/path/to/SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt http://example.com/login.php?username=admin&password=FUZZ

The big-list-of-naughty-strings is a single file containing various problematic strings, making it easy to use in simple testing scenarios. SecLists, on the other hand, provides multiple files organized into directories, offering more flexibility for different security testing needs but requiring more specific file selection.

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

3,121

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

Pros of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Focused specifically on profanity and offensive language
Available in multiple languages
Simpler structure, easier to integrate for basic profanity filtering

Cons of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Limited scope compared to big-list-of-naughty-strings
Lacks context-specific strings and edge cases
May not cover all potential input validation scenarios

Code Comparison

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words:

2g1c
2 girls 1 cup
acrotomophilia
anal
anilingus
anus

big-list-of-naughty-strings:

undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False

The List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words repository focuses on explicit profanity and offensive terms, while big-list-of-naughty-strings covers a broader range of potentially problematic inputs, including programming-related strings, Unicode characters, and various edge cases for input validation.

big-list-of-naughty-strings is more comprehensive and suitable for thorough input validation and security testing, while List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words is better suited for simple profanity filtering in multiple languages.

corpora

5,011

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

Pros of corpora

Broader scope: Contains diverse datasets beyond just problematic strings
More structured: Data organized into categories and subcategories
Regularly updated: Active community contributions and maintenance

Cons of corpora

Less focused: Not specifically tailored for testing edge cases or security
Larger size: May be overkill for simple string validation tasks
More complex: Requires more effort to navigate and utilize specific datasets

Code comparison

big-list-of-naughty-strings:

with open('blns.txt', 'r') as f:
    naughty_strings = f.read().splitlines()

corpora:

import json
with open('corpora/data/technology/programming_languages.json') as f:
    programming_languages = json.load(f)['programming_languages']

The big-list-of-naughty-strings is a simple text file, while corpora uses JSON format for structured data. corpora requires parsing JSON and navigating the data structure, whereas big-list-of-naughty-strings can be used directly as a list of strings.

Both repositories serve different purposes: big-list-of-naughty-strings focuses on problematic input for testing, while corpora provides a wide range of curated datasets for various applications.

google-10000-english

4,138

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

Pros of google-10000-english

Contains a comprehensive list of common English words, useful for various language-related tasks
Organized by frequency, allowing for easy selection of most common words
Simple and straightforward format, easy to integrate into projects

Cons of google-10000-english

Limited to English words, not suitable for testing edge cases or special characters
Lacks variety in string types (e.g., no numbers, symbols, or Unicode characters)
Not designed for security testing or input validation scenarios

Code Comparison

big-list-of-naughty-strings:

# Example strings
"undefined"
"undef"
"null"
"NULL"
"(null)"

google-10000-english:

# Example words
the
of
and
to
a

The big-list-of-naughty-strings repository contains a diverse set of problematic strings for testing input validation and edge cases. It includes various types of strings that could potentially cause issues in software systems.

In contrast, google-10000-english provides a list of common English words, sorted by frequency. This repository is more suitable for natural language processing tasks, vocabulary building, or general language-related applications.

While big-list-of-naughty-strings focuses on identifying potential vulnerabilities and edge cases, google-10000-english serves as a resource for common English vocabulary. The choice between these repositories depends on the specific requirements of your project, whether it's security testing or language processing.

english-words

11,470

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Pros of english-words

Comprehensive list of English words, useful for various language-related applications
Regularly updated and maintained, ensuring accuracy and relevance
Simple and straightforward structure, easy to integrate into projects

Cons of english-words

Limited to English language only, not suitable for multilingual applications
Lacks special characters, symbols, or edge cases found in big-list-of-naughty-strings
May not be as effective for security testing or input validation purposes

Code Comparison

english-words:

aback
abacus
abandon
abandoned
abandoning
abandonment

big-list-of-naughty-strings:

undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False

The english-words repository contains a simple list of English words, one per line. In contrast, big-list-of-naughty-strings includes various types of strings that could potentially cause issues in software applications, such as null values, boolean expressions, and special characters.

While english-words is ideal for general language processing tasks, big-list-of-naughty-strings is more suited for testing input validation, security, and edge cases in software development.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Big List of Naughty Strings

The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data. This is intended for use in helping both automated and manual QA testing; useful for whenever your QA engineer walks into a bar.

Why Test Naughty Strings?

Even multi-billion dollar companies with huge amounts of automated testing can't find every bad input. For example, look at what happens when you try to Tweet a zero-width space (U+200B) on Twitter:

Although this is not a malicious error, and typical users aren't Tweeting weird unicode, an "internal server error" for unexpected input is never a positive experience for the user, and may in fact be a symptom of deeper string-validation issues. The Big List of Naughty Strings is intended to help reveal such issues.

Usage

blns.txt consists of newline-delimited strings and comments which are preceded with #. The comments divide the strings into sections for easy manual reading and copy/pasting into input forms. For those who want to access the strings programmatically, a blns.json file is provided containing an array with all the comments stripped out (the scripts folder contains a Python script used to generate the blns.json).

Contributions

Feel free to send a pull request to add more strings, or additional sections. However, please do not send pull requests with very-long strings (255+ characters), as that makes the list much more difficult to view.

Likewise, please do not send pull requests which compromise manual usability of the file. This includes the EICAR test string, which can cause the file to be flagged by antivirus scanners, and files which alter the encoding of blns.txt. Also, do not send a null character (U+0000) string, as it changes the file format on GitHub to binary and renders it unreadable in pull requests. Finally, when adding or removing a string please update all files when you perform a pull request.

Disclaimer

The Big List of Naughty Strings is intended to be used for software you own and manage. Some of the Naughty Strings can indicate security vulnerabilities, and as a result using such strings with third-party software may be a crime. The maintainer is not responsible for any negative actions that result from the use of the list.

Additionally, the Big List of Naughty Strings is not a fully-comprehensive substitute for formal security/penetration testing for your service.

Library / Packages

Various implementations of the Big List of Naughty Strings have made it to various package managers. Those are maintained by outside parties, but can be found here:

Library	Link
Node	https://www.npmjs.com/package/blns
Node	https://www.npmjs.com/package/big-list-of-naughty-strings
.NET	https://github.com/SimonCropp/NaughtyStrings
PHP	https://github.com/mattsparks/blns-php
C++	https://github.com/eliabieri/blnscpp

Please open a PR to list others.

Maintainer/Creator

Max Woolf (@minimaxir)

Social Media Discussions

June 10, 2015 [Hacker News]: Show HN: Big List of Naughty Strings for testing user-input data
August 17, 2015 [Reddit]: Big list of naughty strings.
February 9, 2016 [Reddit]: Big List of Naughty Strings
January 15, 2017 [Hacker News]: Naughty Strings: A list of strings likely to cause issues as user-input data
January 16, 2017 [Reddit]: Naughty Strings: A list of strings likely to cause issues as user-input data
November 16, 2018 [Hacker News]: Big List of Naughty Strings
November 16, 2018 [Reddit]: Naughty Strings - A list of strings which have a high probability of causing issues when used as user-input data

License

MIT

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot