Convert Figma logo to code with AI

berzerk0 logoProbable-Wordlists

Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!

8,663
1,609
8,663
18

Top Related Projects

57,590

SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Collection of some common wordlists such as RDP password, user name list, ssh password wordlist for brute force. IP Cameras Default Passwords.

Crack password hashes without the fuss :cat2:

A collection of wordlists dictionaries for password cracking

Quick Overview

Probable-Wordlists is a collection of password wordlists compiled from real-world data breaches and other sources. It aims to provide security researchers, penetration testers, and system administrators with realistic password datasets for testing and analysis. The project includes various wordlists sorted by probability of use and categorized by length and character types.

Pros

  • Extensive collection of real-world passwords
  • Wordlists are sorted by probability and categorized for easy use
  • Regularly updated with new data
  • Useful for security testing and password policy analysis

Cons

  • Large file sizes can be difficult to handle
  • May contain sensitive or personal information
  • Potential for misuse if not handled responsibly
  • Some wordlists may be outdated or less relevant over time

Getting Started

As this is not a code library but a collection of wordlists, there's no code to run. However, you can get started with the project by following these steps:

  1. Clone the repository:

    git clone https://github.com/berzerk0/Probable-Wordlists.git
    
  2. Navigate to the desired wordlist directory:

    cd Probable-Wordlists/Real-Passwords
    
  3. Use the wordlists with your preferred password cracking or analysis tools. For example, with John the Ripper:

    john --wordlist=Top1575-probable-v2.txt target_hashes.txt
    

Remember to use these wordlists responsibly and only on systems you have permission to test.

Competitor Comparisons

57,590

SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

Pros of SecLists

  • More comprehensive collection of wordlists for various purposes (passwords, usernames, URLs, etc.)
  • Regularly updated with contributions from the security community
  • Well-organized directory structure for easy navigation

Cons of SecLists

  • Large repository size may be overwhelming for some users
  • Some lists may contain redundant or less relevant entries
  • Requires more filtering to find specific, targeted wordlists

Code Comparison

SecLists:

# Example from SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt
123456
password
12345678
qwerty
123456789

Probable-Wordlists:

# Example from Probable-Wordlists/Real-Passwords/Top12Thousand-probable-v2.txt
123456
password
123456789
12345678
12345

Both repositories provide valuable wordlists for security testing and password analysis. SecLists offers a broader range of lists for various purposes, while Probable-Wordlists focuses more on password-specific collections. The choice between them depends on the user's specific needs and the scope of their security testing or research.

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

Pros of google-10000-english

  • Smaller, more focused dataset of common English words
  • Easier to integrate and use for basic language processing tasks
  • Faster to process due to its compact size

Cons of google-10000-english

  • Limited vocabulary compared to Probable-Wordlists
  • Less suitable for advanced password cracking or security testing
  • May not include domain-specific or technical terms

Code Comparison

Probable-Wordlists:

with open('Top204Thousand-WPA-probable-v2.txt', 'r') as f:
    wordlist = f.read().splitlines()

for word in wordlist:
    # Process each word

google-10000-english:

with open('google-10000-english.txt', 'r') as f:
    wordlist = f.read().splitlines()

for word in wordlist:
    # Process each word

The code usage is similar for both repositories, with the main difference being the file name and potentially the number of words processed.

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Pros of english-words

  • Simple and straightforward word list
  • Includes a large number of English words (466k+)
  • Easy to use and integrate into various projects

Cons of english-words

  • Limited to English words only
  • Lacks frequency or probability information
  • May include less common or archaic words

Code comparison

english-words:

with open('words_alpha.txt', 'r') as f:
    words = f.read().splitlines()

Probable-Wordlists:

with open('Top1575-probable-v2.txt', 'r') as f:
    words = f.read().splitlines()

Key differences

  • Probable-Wordlists focuses on common passwords and probable word combinations
  • english-words is a comprehensive list of English words
  • Probable-Wordlists includes frequency information and categorized lists
  • english-words is more suitable for general language processing tasks
  • Probable-Wordlists is better for security and password-related applications

Use cases

english-words:

  • Spell-checking
  • Word games
  • Natural language processing

Probable-Wordlists:

  • Password strength analysis
  • Penetration testing
  • Security research

Both repositories offer valuable word lists, but they serve different purposes. Choose the one that best fits your project's requirements.

Collection of some common wordlists such as RDP password, user name list, ssh password wordlist for brute force. IP Cameras Default Passwords.

Pros of wordlist

  • More diverse content, including multiple languages and specialized lists (e.g., SQL injection payloads)
  • Smaller file sizes, making it easier to download and manage
  • Includes additional tools and scripts for wordlist manipulation

Cons of wordlist

  • Less comprehensive in terms of English language coverage
  • Fewer variations and combinations of common words
  • Less frequently updated compared to Probable-Wordlists

Code comparison

Probable-Wordlists:

Top207-probable-v2.txt
Top1575-probable-v2.txt
Top95Thousand-probable-v2.txt
Top304Thousand-probable-v2.txt
Top1pt6Million-probable-v2.txt

wordlist:

bruteforce-database.txt
common-passwords.txt
directory-list-2.3-medium.txt
sql-injection-payload-list.txt
user-agents.txt

Both repositories provide wordlists for various purposes, but Probable-Wordlists focuses more on comprehensive English language coverage with probability-based lists, while wordlist offers a broader range of specialized lists for different applications. Probable-Wordlists is more suitable for password cracking and general wordlist needs, whereas wordlist caters to a wider array of security testing scenarios.

Crack password hashes without the fuss :cat2:

Pros of naive-hashcat

  • Focuses on password cracking and provides a streamlined approach for using Hashcat
  • Includes scripts and tools for automating password cracking tasks
  • Offers educational resources on password security and cracking techniques

Cons of naive-hashcat

  • Limited in scope compared to the extensive wordlists provided by Probable-Wordlists
  • May require more technical knowledge to use effectively
  • Doesn't offer as comprehensive a collection of pre-generated wordlists

Code Comparison

Probable-Wordlists (example of wordlist content):

password123
qwerty
123456
letmein
admin

naive-hashcat (example of a cracking script):

#!/bin/bash
hashcat -m 0 -a 0 -o cracked.txt target_hashes.txt wordlist.txt

While Probable-Wordlists primarily consists of wordlist files, naive-hashcat includes scripts and tools for password cracking. The code examples demonstrate the difference in focus between the two repositories.

Probable-Wordlists offers a vast collection of pre-generated wordlists, making it useful for various password-related tasks. naive-hashcat, on the other hand, provides tools and scripts specifically tailored for password cracking using Hashcat, offering a more specialized approach to this particular aspect of security testing and analysis.

A collection of wordlists dictionaries for password cracking

Pros of wpa2-wordlists

  • Focused specifically on WPA2 password cracking
  • Includes pre-generated rainbow tables for faster cracking
  • Smaller file sizes, more portable for specific use cases

Cons of wpa2-wordlists

  • Less comprehensive than Probable-Wordlists
  • Not regularly updated (last commit in 2018)
  • Limited to WPA2 use cases, less versatile for other password cracking scenarios

Code Comparison

While both repositories primarily contain wordlists rather than code, Probable-Wordlists includes some Python scripts for processing and analyzing wordlists. For example:

Probable-Wordlists:

def count_lines(filename):
    with open(filename, 'r') as f:
        return sum(1 for line in f)

wpa2-wordlists doesn't contain any code, focusing solely on providing wordlists and rainbow tables.

Summary

Probable-Wordlists offers a more comprehensive and regularly updated collection of wordlists for various password cracking scenarios. It includes tools for analysis and processing, making it more versatile.

wpa2-wordlists is a specialized repository focused on WPA2 password cracking, offering pre-generated rainbow tables for faster cracking in this specific context. However, it's less frequently updated and has a narrower scope.

Choose Probable-Wordlists for a wide range of password cracking tasks and ongoing updates. Opt for wpa2-wordlists if you're specifically targeting WPA2 passwords and want pre-generated rainbow tables.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Probable Wordlists - Version 2.0

Do you know what the world's most common passwords are?
Do you know what they look like?
You'll want to avoid them to be secure!

Thinking of Cloning?

This repository does not contain code, but links to a group of lists.
A clone may not be necessary to get the files you need.
Visit the downloads page for more information.

Logo

Check out the Password Trend Analysis - and learn!

I visualized the trends of passwords that appeared 10 times or more in the Version 1 files. The charts contain immediately actionable advice on how to make your passwords more unique.

Methodology: Why and How

The Why

Password wordlists are not hard to find. It seems like every few weeks we hear about a massive, record-breaking data breach that has scattered millions of credentials across the internet for everyone to see. If our data is leaked, we'll change our passwords, the hard-working security teams will address the vulnerabilities and everyone will wait until they hear about the next breach.

While leaks may be published with malicious intent, I see an opportunity here for the us to make ourselves a bit more secure online.

Passwords, by definition, are meant to be secret. If it weren't for these leaks, we might not have any idea what a password looks like. Sure, we might know the password to a friend's home Wifi network, or for a company expense account, but passwords are usually only intended to be known by the user and an authentication system.

But, consider this:
If you are never supposed to tell me yours, and I am never going to tell you mine...
How do we know that we aren't using the same passwords?

How do we know we aren't using the same passwords as millions of other people?

If crooks are the only ones who understand what common passwords look like, then the rest of us may never change our passwords! Without this knowledge, we may just continue believing that our password is one of a kind. Data shows that frequently, passwords certainly are not one of a kind.

This is confirmed year after year when password is found to be among the top 3 password for the umpteenth time in a row. Until we know what common passwords look like, we will come up with passwords that appear on dozens of leaks.

If any of your passwords has been published on the internet for everyone to see, then can you really claim it as your password?

The How

While studying password wordlists, I noticed most were either sorted alphabetically or not sorted at all. This might be okay computerized analysis, but I wanted to learn something about the way people think.

I determined that for the most practical analysis, lists had to be sorted in a manner that reflected actual human behavior, not an arbitrary alphabet system or random chronology.

For the better part of a year, I went to sites like SecLists, Weakpass, and Hashes.org to download nearly every single Wordlist containing real passwords I could find. After attempting to remove non-pertinent information, this harvest yielded 1600 files spanning more than 350GB worth of leaked passwords.

For each file, I removed internal duplicates and ensured that they all used the same style of newline character. Some of these lists were composed of smaller lists, and some lists were exact copies, but I took care that the source material was as "pure" as possible. Then, all files were combined into a single amalgamation that represented all of the source files.

Each time a password was found in this file represented a time it was found in the source materials. I considered the number of times a password was found across all of the files to be an approximation of its overall popularity. If an entry was found in less than 5 files, it isn't commonly used. But, if an entry could be found more than 350 files, it is incredibly popular. The passwords that were found in the highest number of source files are considered to be the most popular and are placed at top of the list. Files that didn't appear frequently were placed at the bottom.

The giant source file represented nearly 13 billion passwords! However, since this project aims to find the most popular passwords, and not just list as many passwords as I could find, a password needed to be found at least 5 times in analysis to be included on these lists.

The end result is a list of approximately 2 Billion real passwords, sorted in order of their popularity, not by the alphabet.


Directories In This Repository

Files sorted by popularity will include probable-v2 in the filename

Real-Passwords

These are REAL passwords.

The files in this folder come from sites like https://github.com/danielmiessler/SecLists, https://weakpass.com/ and https://hashes.org/

Some files contain entries between 8-40 characters. These can be found in the Real-Passwords/WPA-Length directory.

Dictionary-Style Lists

Files including dictionaries, encyclopedic lists and miscellaneous. Wordlists in this folder were not necessarily associated with the "password" label.

Some technically useful lists, such as common usernames, tlds, directories, etc. are included.

Analysis Files

Files useful for password recovery and analysis. Includes HashCat Rules and Character Masks.

These files were generated using the PACK project.

Attributions

People Are Talking About Probable-Wordlists?!

Note that the author is not affiliated with or officially endorsing the visiting of any of the links below.

I found most (if not all) of these mentions by simply searching for the project in various engines

Thanks for the shout-outs!


Disclaimer and License

  • These lists are for LAWFUL, ETHICAL AND EDUCATIONAL PURPOSES ONLY.
  • The files contained in this repository are released "as is" without warranty, support, or guarantee of effectiveness.
  • However, I am open to hearing about any issues found within these files and will be actively maintaining this repository for the foreseeable future. If you find anything noteworthy, let me know and I'll see what I can do about it.

The author did not steal, phish, deceive or hack in any way to get hold of these passwords. All lines in these files were obtained through freely available means.

The author's intent for this project is to provide information on insecure passwords in order to increase overall password security. The lists will show you what passwords are the most common, what patterns are the most common, and what you should avoid when creating your own passwords.

License: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Enjoy!