big-list-of-naughty-strings
The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.
Top Related Projects
SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
Quick Overview
The "Big List of Naughty Strings" is a collection of strings that have a high probability of causing issues when used as user-input data. It's designed to help developers test their systems against potentially problematic inputs, including special characters, Unicode sequences, and common hacking attempts.
Pros
- Comprehensive collection of edge-case strings for thorough testing
- Regularly updated with new problematic strings
- Easy to integrate into existing testing frameworks
- Open-source and community-driven
Cons
- May not cover all possible edge cases for specific applications
- Some strings may be considered offensive or inappropriate in certain contexts
- Requires careful handling when used in production environments
- May need customization for specific use cases or languages
Code Examples
This project is not a code library, but rather a data resource. However, here are some examples of how you might use it in your testing:
# Python example: Reading and using the naughty strings
with open('blns.txt', 'r', encoding='utf-8') as f:
naughty_strings = f.read().splitlines()
for string in naughty_strings:
test_function(string)
// JavaScript example: Fetching and using the naughty strings
fetch('https://raw.githubusercontent.com/minimaxir/big-list-of-naughty-strings/master/blns.json')
.then(response => response.json())
.then(naughtyStrings => {
naughtyStrings.forEach(string => {
testFunction(string);
});
});
# Ruby example: Using naughty strings in RSpec tests
require 'json'
naughty_strings = JSON.parse(File.read('blns.json'))
RSpec.describe 'InputValidator' do
naughty_strings.each do |string|
it "handles naughty string: #{string[0..20]}..." do
expect(InputValidator.validate(string)).to be_valid
end
end
end
Getting Started
To use the Big List of Naughty Strings in your project:
-
Clone the repository:
git clone https://github.com/minimaxir/big-list-of-naughty-strings.git
-
Choose the format you prefer (txt, json, xml) from the repository.
-
Incorporate the strings into your testing framework or scripts as shown in the code examples above.
-
Run your tests with these strings to identify potential vulnerabilities or bugs in your input handling.
Competitor Comparisons
SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.
Pros of SecLists
- More comprehensive, covering a wider range of security testing scenarios
- Better organized into categories (e.g., passwords, usernames, fuzzing)
- Regularly updated with contributions from the security community
Cons of SecLists
- Larger file size, which may be overwhelming for simple testing needs
- Some lists may contain potentially offensive content
- Requires more time to navigate and find specific lists
Code Comparison
big-list-of-naughty-strings:
# Example usage
with open('blns.txt') as f:
strings = f.read().splitlines()
SecLists:
# Example usage (using wfuzz)
wfuzz -c -z file,/path/to/SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt http://example.com/login.php?username=admin&password=FUZZ
The big-list-of-naughty-strings is a single file containing various problematic strings, making it easy to use in simple testing scenarios. SecLists, on the other hand, provides multiple files organized into directories, offering more flexibility for different security testing needs but requiring more specific file selection.
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
Pros of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
- Focused specifically on profanity and offensive language
- Available in multiple languages
- Simpler structure, easier to integrate for basic profanity filtering
Cons of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
- Limited scope compared to big-list-of-naughty-strings
- Lacks context-specific strings and edge cases
- May not cover all potential input validation scenarios
Code Comparison
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words:
2g1c
2 girls 1 cup
acrotomophilia
anal
anilingus
anus
big-list-of-naughty-strings:
undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False
The List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words repository focuses on explicit profanity and offensive terms, while big-list-of-naughty-strings covers a broader range of potentially problematic inputs, including programming-related strings, Unicode characters, and various edge cases for input validation.
big-list-of-naughty-strings is more comprehensive and suitable for thorough input validation and security testing, while List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words is better suited for simple profanity filtering in multiple languages.
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Pros of corpora
- Broader scope: Contains diverse datasets beyond just problematic strings
- More structured: Data organized into categories and subcategories
- Regularly updated: Active community contributions and maintenance
Cons of corpora
- Less focused: Not specifically tailored for testing edge cases or security
- Larger size: May be overkill for simple string validation tasks
- More complex: Requires more effort to navigate and utilize specific datasets
Code comparison
big-list-of-naughty-strings:
with open('blns.txt', 'r') as f:
naughty_strings = f.read().splitlines()
corpora:
import json
with open('corpora/data/technology/programming_languages.json') as f:
programming_languages = json.load(f)['programming_languages']
The big-list-of-naughty-strings is a simple text file, while corpora uses JSON format for structured data. corpora requires parsing JSON and navigating the data structure, whereas big-list-of-naughty-strings can be used directly as a list of strings.
Both repositories serve different purposes: big-list-of-naughty-strings focuses on problematic input for testing, while corpora provides a wide range of curated datasets for various applications.
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Pros of google-10000-english
- Contains a comprehensive list of common English words, useful for various language-related tasks
- Organized by frequency, allowing for easy selection of most common words
- Simple and straightforward format, easy to integrate into projects
Cons of google-10000-english
- Limited to English words, not suitable for testing edge cases or special characters
- Lacks variety in string types (e.g., no numbers, symbols, or Unicode characters)
- Not designed for security testing or input validation scenarios
Code Comparison
big-list-of-naughty-strings:
# Example strings
"undefined"
"undef"
"null"
"NULL"
"(null)"
google-10000-english:
# Example words
the
of
and
to
a
The big-list-of-naughty-strings repository contains a diverse set of problematic strings for testing input validation and edge cases. It includes various types of strings that could potentially cause issues in software systems.
In contrast, google-10000-english provides a list of common English words, sorted by frequency. This repository is more suitable for natural language processing tasks, vocabulary building, or general language-related applications.
While big-list-of-naughty-strings focuses on identifying potential vulnerabilities and edge cases, google-10000-english serves as a resource for common English vocabulary. The choice between these repositories depends on the specific requirements of your project, whether it's security testing or language processing.
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
Pros of english-words
- Comprehensive list of English words, useful for various language-related applications
- Regularly updated and maintained, ensuring accuracy and relevance
- Simple and straightforward structure, easy to integrate into projects
Cons of english-words
- Limited to English language only, not suitable for multilingual applications
- Lacks special characters, symbols, or edge cases found in big-list-of-naughty-strings
- May not be as effective for security testing or input validation purposes
Code Comparison
english-words:
aback
abacus
abandon
abandoned
abandoning
abandonment
big-list-of-naughty-strings:
undefined
undef
null
NULL
(null)
nil
NIL
true
false
True
False
The english-words repository contains a simple list of English words, one per line. In contrast, big-list-of-naughty-strings includes various types of strings that could potentially cause issues in software applications, such as null values, boolean expressions, and special characters.
While english-words is ideal for general language processing tasks, big-list-of-naughty-strings is more suited for testing input validation, security, and edge cases in software development.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Big List of Naughty Strings
The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data. This is intended for use in helping both automated and manual QA testing; useful for whenever your QA engineer walks into a bar.
Why Test Naughty Strings?
Even multi-billion dollar companies with huge amounts of automated testing can't find every bad input. For example, look at what happens when you try to Tweet a zero-width space (U+200B) on Twitter:
Although this is not a malicious error, and typical users aren't Tweeting weird unicode, an "internal server error" for unexpected input is never a positive experience for the user, and may in fact be a symptom of deeper string-validation issues. The Big List of Naughty Strings is intended to help reveal such issues.
Usage
blns.txt
consists of newline-delimited strings and comments which are preceded with #
. The comments divide the strings into sections for easy manual reading and copy/pasting into input forms. For those who want to access the strings programmatically, a blns.json
file is provided containing an array with all the comments stripped out (the scripts
folder contains a Python script used to generate the blns.json
).
Contributions
Feel free to send a pull request to add more strings, or additional sections. However, please do not send pull requests with very-long strings (255+ characters), as that makes the list much more difficult to view.
Likewise, please do not send pull requests which compromise manual usability of the file. This includes the EICAR test string, which can cause the file to be flagged by antivirus scanners, and files which alter the encoding of blns.txt
. Also, do not send a null character (U+0000) string, as it changes the file format on GitHub to binary and renders it unreadable in pull requests. Finally, when adding or removing a string please update all files when you perform a pull request.
Disclaimer
The Big List of Naughty Strings is intended to be used for software you own and manage. Some of the Naughty Strings can indicate security vulnerabilities, and as a result using such strings with third-party software may be a crime. The maintainer is not responsible for any negative actions that result from the use of the list.
Additionally, the Big List of Naughty Strings is not a fully-comprehensive substitute for formal security/penetration testing for your service.
Library / Packages
Various implementations of the Big List of Naughty Strings have made it to various package managers. Those are maintained by outside parties, but can be found here:
Please open a PR to list others.
Maintainer/Creator
Max Woolf (@minimaxir)
Social Media Discussions
- June 10, 2015 [Hacker News]: Show HN: Big List of Naughty Strings for testing user-input data
- August 17, 2015 [Reddit]: Big list of naughty strings.
- February 9, 2016 [Reddit]: Big List of Naughty Strings
- January 15, 2017 [Hacker News]: Naughty Strings: A list of strings likely to cause issues as user-input data
- January 16, 2017 [Reddit]: Naughty Strings: A list of strings likely to cause issues as user-input data
- November 16, 2018 [Hacker News]: Big List of Naughty Strings
- November 16, 2018 [Reddit]: Naughty Strings - A list of strings which have a high probability of causing issues when used as user-input data
License
MIT
Top Related Projects
SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot