Top Related Projects
Incredibly fast crawler designed for OSINT.
Fast subdomains enumeration tool for penetration testers
Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.
SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.
Hunt down social media accounts by username across social networks
Fast passive subdomain enumeration tool.
Quick Overview
theHarvester is an open-source tool designed for gathering open source intelligence (OSINT) during the early stages of a penetration test or red team engagement. It collects emails, names, subdomains, IPs, and URLs using multiple public data sources.
Pros
- Comprehensive data collection from various sources
- Easy-to-use command-line interface
- Actively maintained and regularly updated
- Supports both passive and active information gathering techniques
Cons
- Some data sources require API keys or subscriptions
- Results may vary depending on the target and available public information
- Can be noisy and potentially detectable by target organizations
- May require additional tools for result analysis and visualization
Getting Started
- Install theHarvester:
git clone https://github.com/laramies/theHarvester.git
cd theHarvester
python3 -m pip install -r requirements/base.txt
- Basic usage:
python3 theHarvester.py -d example.com -b all
This command searches for information related to "example.com" using all available data sources.
- Specify data sources and limit results:
python3 theHarvester.py -d example.com -b google,bing,dnsdumpster -l 100
This command uses Google, Bing, and DNSDumpster as sources, limiting results to 100 entries.
- Save results to files:
python3 theHarvester.py -d example.com -b all -f output_file
This command saves the results to files with the prefix "output_file" in various formats (HTML, XML, JSON).
Competitor Comparisons
Incredibly fast crawler designed for OSINT.
Pros of Photon
- More versatile, capable of crawling websites and extracting various types of data
- Faster execution due to multi-threading and asynchronous requests
- User-friendly command-line interface with customizable options
Cons of Photon
- Limited to web-based information gathering
- May require more setup and dependencies
- Less focused on specific OSINT tasks compared to theHarvester
Code Comparison
Photon:
def photon(url, level, threadCount, delay, timeout, headers):
# Initialization and crawling logic
for url in urls:
# Extract information from each URL
theHarvester:
def start(self):
# Initialization
for source in self.sources:
# Gather information from each source
Summary
Photon is a versatile web crawler and information gathering tool, while theHarvester focuses on collecting email addresses, subdomains, and other specific OSINT data. Photon offers faster execution and more customization options but may require additional setup. theHarvester provides a more targeted approach to OSINT tasks and is easier to use out of the box. The choice between the two depends on the specific requirements of the information gathering task at hand.
Fast subdomains enumeration tool for penetration testers
Pros of Sublist3r
- Faster subdomain enumeration due to multi-threading
- Supports more search engines and sources for subdomain discovery
- Provides a clean, easy-to-read output format
Cons of Sublist3r
- Less actively maintained compared to theHarvester
- Fewer overall features and data sources for information gathering
- Limited to subdomain enumeration, while theHarvester offers broader OSINT capabilities
Code Comparison
Sublist3r:
def main(domain, threads, savefile, ports, silent, verbose, enable_bruteforce, engines):
bruteforce_list = []
subdomains = []
search_list = []
# Rest of the code...
theHarvester:
async def start(self):
self.domain = self.domain.strip()
self.emails = []
self.hosts = []
self.results = []
# Rest of the code...
Both projects use Python and have similar main function structures. However, Sublist3r focuses on subdomain enumeration with multi-threading, while theHarvester has a broader scope for information gathering.
Sublist3r is more specialized for subdomain discovery, making it potentially more efficient for that specific task. theHarvester, on the other hand, offers a wider range of OSINT capabilities, making it more versatile for general reconnaissance.
Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.
Pros of recon-ng
- More comprehensive and modular framework for reconnaissance
- Supports a wider range of data sources and modules
- Offers a command-line interface with interactive shell capabilities
Cons of recon-ng
- Steeper learning curve due to its more complex structure
- Requires more setup and configuration compared to theHarvester
Code Comparison
theHarvester:
from theHarvester.discovery import *
from theHarvester.discovery.constants import *
search = googlesearch.search_google(word, limit, start)
recon-ng:
from recon.core.module import BaseModule
class Module(BaseModule):
def module_run(self):
self.query('SELECT * FROM domains WHERE domain LIKE ?', ('%{}%'.format(self.options['domain']),))
Both tools are written in Python, but recon-ng has a more structured approach with modules and a core framework. theHarvester focuses on simpler, direct searches using various discovery methods. recon-ng offers a more extensible and customizable platform for reconnaissance tasks, while theHarvester provides a straightforward tool for gathering open-source intelligence.
SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.
Pros of SpiderFoot
- More comprehensive and feature-rich OSINT platform
- User-friendly web interface for easier operation
- Supports a wider range of data sources and modules
Cons of SpiderFoot
- Steeper learning curve due to its complexity
- Requires more system resources to run effectively
Code Comparison
SpiderFoot:
class SpiderFootPlugin(object):
def __init__(self, options):
self._opts = options
def setup(self):
pass
def enrichTarget(self, target):
pass
theHarvester:
class Plugin:
def __init__(self, word):
self.word = word
self.results = []
self.totalresults = []
def do_search(self):
pass
Key Differences
- SpiderFoot offers a more modular and extensible architecture
- theHarvester is more focused on email and domain harvesting
- SpiderFoot provides a broader range of OSINT capabilities
- theHarvester is generally easier to use for beginners
- SpiderFoot has a more active development community
Both tools are valuable for OSINT, but SpiderFoot is more suitable for comprehensive investigations, while theHarvester excels in quick email and domain reconnaissance.
Hunt down social media accounts by username across social networks
Pros of Sherlock
- Focuses specifically on finding usernames across multiple social networks and websites
- Supports a larger number of sites (350+) compared to theHarvester
- Provides a more user-friendly command-line interface with colorful output
Cons of Sherlock
- Limited to username searches, while theHarvester offers broader information gathering capabilities
- May produce more false positives due to its wide-ranging search across numerous platforms
- Lacks some of the advanced features found in theHarvester, such as DNS brute forcing and shodan search
Code Comparison
Sherlock:
def sherlock(username, site_data, timeout=60):
results = {}
for social_network, net_info in site_data.items():
results[social_network] = {"url_main": net_info.get("urlMain")}
url = net_info["url"].format(username)
results[social_network]["url"] = url
results[social_network]["exists"] = "yes"
theHarvester:
async def search(self, domain: str) -> None:
self.domain = domain
url = f'https://api.github.com/search/code?q="{domain}"'
async with aiohttp.ClientSession(headers=self.headers) as session:
async with session.get(url) as resp:
self.results = await resp.json()
Both tools are useful for OSINT purposes, but Sherlock is more specialized for username searches across social media platforms, while theHarvester offers a broader range of information gathering capabilities for domains and organizations.
Fast passive subdomain enumeration tool.
Pros of Subfinder
- Faster subdomain enumeration with concurrent processing
- More extensive list of supported sources for subdomain discovery
- Better integration with other tools in the ProjectDiscovery ecosystem
Cons of Subfinder
- More focused on subdomain enumeration, less versatile for general OSINT
- Steeper learning curve for advanced features and configuration
- May require additional tools for comprehensive information gathering
Code Comparison
TheHarvester:
from theHarvester.discovery import *
from theHarvester.discovery.constants import *
search = googlesearch.search_google(word, limit, start)
Subfinder:
package main
import (
"github.com/projectdiscovery/subfinder/v2/pkg/runner"
)
options := &runner.Options{
Threads: 10,
Timeout: 30,
Sources: []string{"alienvault", "bufferover", "crtsh"},
}
TheHarvester is written in Python and offers a more modular approach for various search engines and data sources. Subfinder, written in Go, focuses on efficient subdomain enumeration with concurrent processing.
Both tools are valuable for reconnaissance, but Subfinder excels in rapid subdomain discovery, while TheHarvester provides a broader range of OSINT capabilities. The choice between them depends on the specific requirements of your information gathering tasks.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
About
theHarvester is a simple to use, yet powerful tool designed to be used during the reconnaissance stage of a red team assessment or penetration test. It performs open source intelligence (OSINT) gathering to help determine a domain's external threat landscape. The tool gathers names, emails, IPs, subdomains, and URLs by using multiple public resources that include:
Install and dependencies
- Python 3.12 or higher.
- https://github.com/laramies/theHarvester/wiki/Installation
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
Clone the repository:
git clone https://github.com/laramies/theHarvester
cd theHarvester
Install dependencies and create a virtual environment:
uv sync
Run theHarvester:
uv run theHarvester
Development
To install development dependencies:
uv sync --extra dev
To run tests:
uv run pytest
To run linting and formatting:
uv run ruff check
uv run ruff format
Passive modules
-
baidu: Baidu search engine (https://www.baidu.com)
-
bevigil: CloudSEK BeVigil scans mobile application for OSINT assets (https://bevigil.com/osint-api)
-
brave: Brave search engine - now uses official Brave Search API (https://api-dashboard.search.brave.com)
-
bufferoverun: Fast domain name lookups for TLS certificates in IPv4 space (https://tls.bufferover.run)
-
builtwith: Find out what websites are built with (https://builtwith.com)
-
censys: Uses certificates searches to enumerate subdomains and gather emails (https://censys.io)
-
certspotter: Cert Spotter monitors Certificate Transparency logs (https://sslmate.com/certspotter)
-
criminalip: Specialized Cyber Threat Intelligence (CTI) search engine (https://www.criminalip.io)
-
crtsh: Comodo Certificate search (https://crt.sh)
-
dehashed: Take your data security to the next level is (https://dehashed.com)
-
dnsdumpster: Domain research tool that can discover hosts related to a domain (https://dnsdumpster.com)
-
duckduckgo: DuckDuckGo search engine (https://duckduckgo.com)
-
fullhunt: Next-generation attack surface security platform (https://fullhunt.io)
-
github-code: GitHub code search engine (https://www.github.com)
-
hackertarget: Online vulnerability scanners and network intelligence to help organizations (https://hackertarget.com)
-
haveibeenpwned: Check if your email address is in a data breach (https://haveibeenpwned.com)
-
hunter: Hunter search engine (https://hunter.io)
-
hunterhow: Internet search engines for security researchers (https://hunter.how)
-
intelx: Intelx search engine (https://intelx.io)
-
leaklookup: Data breach search engine (https://leak-lookup.com)
-
netlas: A Shodan or Censys competitor (https://app.netlas.io)
-
onyphe: Cyber defense search engine (https://www.onyphe.io)
-
otx: AlienVault open threat exchange (https://otx.alienvault.com)
-
pentesttools: Cloud-based toolkit for offensive security testing, focused on web applications and network penetration testing (https://pentest-tools.com)
-
projecdiscovery: Actively collects and maintains internet-wide assets data, to enhance research and analyse changes around DNS for better insights (https://chaos.projectdiscovery.io)
-
rapiddns: DNS query tool which make querying subdomains or sites of a same IP easy (https://rapiddns.io)
-
rocketreach: Access real-time verified personal/professional emails, phone numbers, and social media links (https://rocketreach.co)
-
securityscorecard: helps TPRM and SOC teams detect, prioritize, and remediate vendor risk across their entire supplier ecosystem at scale (https://securityscorecard.com)
-
securityTrails: Security Trails search engine, the world's largest repository of historical DNS data (https://securitytrails.com)
-
-s, --shodan: Shodan search engine will search for ports and banners from discovered hosts (https://shodan.io)
-
subdomaincenter: A subdomain finder tool used to find subdomains of a given domain (https://www.subdomain.center)
-
subdomainfinderc99: A subdomain finder is a tool used to find the subdomains of a given domain (https://subdomainfinder.c99.nl)
-
threatminer: Data mining for threat intelligence (https://www.threatminer.org)
-
tomba: Tomba search engine (https://tomba.io)
-
urlscan: A sandbox for the web that is a URL and website scanner (https://urlscan.io)
-
venacus: Venacus search engine (https://venacus.com)
-
virustotal: Domain search (https://www.virustotal.com)
-
whoisxml: Subdomain search (https://subdomains.whoisxmlapi.com/api/pricing)
-
yahoo: Yahoo search engine (https://www.yahoo.com)
-
zoomeye: China's version of Shodan (https://www.zoomeye.org)
Active modules
- DNS brute force: dictionary brute force enumeration
- Screenshots: Take screenshots of subdomains that were found
Modules that require an API key
Documentation to setup API keys can be found at - https://github.com/laramies/theHarvester/wiki/Installation#api-keys
- bevigil - 50 free queries/month, 1k queries/month $50
- brave - Free plan available, Pro plans for higher limits
- bufferoverun - 100 free queries/month, 10k/month $25
- builtwith - 50 free queries ever, $2950/yr
- censys - 500 credits $100
- criminalip - 100 free queries/month, 700k/month $59
- dehashed - 500 credts $15, 5k credits $150
- dnsdumpster - 50 free querries/day, $49
- fullhunt - 50 free queries, 200 queries $29/month, 500 queries $59/month
- github-code
- haveibeenpwned - 10 email searches/min $4.50, 50 email searches/min $22
- hunter - 50 credits/month free, 12k credits/yr $34
- hunterhow - 10k free API results per 30 days, 50k API results per 30 days $10
- intelx
- leaklookup - 20 credits $10, 50 credits $20, 140 credits $50, 300 credits $100
- netlas - 50 free requests/day, 1k requests $49, 10k requests $249
- onyphe - 10M results/month $587
- pentesttools - 5 assets netsec $95/month, 5 assets webnetsec $140/month
- projecdiscovery - requires work email. Free monthly discovery and vulnerability scans on sign-up email domain, enterprise $
- rocketreach - 100 email lookups/month $48, 250 email lookups/month $108
- securityscorecard
- securityTrails - 50 free queries/month, 20k queries/month $500
- shodan - Freelancer $69 month, Small Business $359 month
- tomba - 25 searches/month free, 1k searches/month $39, 5k searches/month $89
- venacus - 1 search/day free, 10 searches/day $12, 30 searches/day $36
- whoisxml - 2k queries $50, 5k queries $105
- zoomeye - 5 results/day free, 30/results/day $190/yr
Package versions
Comments, bugs, and requests
Christian Martorella @laramies cmartorella@edge-security.com
Matthew Brown @NotoriousRebel1
Jay "L1ghtn1ng" Townsend @jay_townsend1
Main contributors
Thanks
- John Matherly - Shodan project
- Ahmed Aboul Ela - subdomain names dictionaries (big and small)
Top Related Projects
Incredibly fast crawler designed for OSINT.
Fast subdomains enumeration tool for penetration testers
Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.
SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.
Hunt down social media accounts by username across social networks
Fast passive subdomain enumeration tool.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot