theHarvester

E-mails, subdomains and names Harvester - OSINT

12,453

2,140

12,453

View on GitHub

Top Related Projects

Photon

11,525

Incredibly fast crawler designed for OSINT.

Sublist3r

10,311

Fast subdomains enumeration tool for penetration testers

recon-ng

4,762

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

spiderfoot

14,694

SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.

sherlock

63,927

Hunt down social media accounts by username across social networks

subfinder

11,579

Fast passive subdomain enumeration tool.

Quick Overview

theHarvester is an open-source tool designed for gathering open source intelligence (OSINT) during the early stages of a penetration test or red team engagement. It collects emails, names, subdomains, IPs, and URLs using multiple public data sources.

Pros

Comprehensive data collection from various sources
Easy-to-use command-line interface
Actively maintained and regularly updated
Supports both passive and active information gathering techniques

Cons

Some data sources require API keys or subscriptions
Results may vary depending on the target and available public information
Can be noisy and potentially detectable by target organizations
May require additional tools for result analysis and visualization

Getting Started

Install theHarvester:

git clone https://github.com/laramies/theHarvester.git
cd theHarvester
python3 -m pip install -r requirements/base.txt

Basic usage:

python3 theHarvester.py -d example.com -b all

This command searches for information related to "example.com" using all available data sources.

Specify data sources and limit results:

python3 theHarvester.py -d example.com -b google,bing,dnsdumpster -l 100

This command uses Google, Bing, and DNSDumpster as sources, limiting results to 100 entries.

Save results to files:

python3 theHarvester.py -d example.com -b all -f output_file

This command saves the results to files with the prefix "output_file" in various formats (HTML, XML, JSON).

Competitor Comparisons

Photon

11,525

Incredibly fast crawler designed for OSINT.

Pros of Photon

More versatile, capable of crawling websites and extracting various types of data
Faster execution due to multi-threading and asynchronous requests
User-friendly command-line interface with customizable options

Cons of Photon

Limited to web-based information gathering
May require more setup and dependencies
Less focused on specific OSINT tasks compared to theHarvester

Code Comparison

Photon:

def photon(url, level, threadCount, delay, timeout, headers):
    # Initialization and crawling logic
    for url in urls:
        # Extract information from each URL

theHarvester:

def start(self):
    # Initialization
    for source in self.sources:
        # Gather information from each source

Summary

Photon is a versatile web crawler and information gathering tool, while theHarvester focuses on collecting email addresses, subdomains, and other specific OSINT data. Photon offers faster execution and more customization options but may require additional setup. theHarvester provides a more targeted approach to OSINT tasks and is easier to use out of the box. The choice between the two depends on the specific requirements of the information gathering task at hand.

Sublist3r

10,311

Fast subdomains enumeration tool for penetration testers

Pros of Sublist3r

Faster subdomain enumeration due to multi-threading
Supports more search engines and sources for subdomain discovery
Provides a clean, easy-to-read output format

Cons of Sublist3r

Less actively maintained compared to theHarvester
Fewer overall features and data sources for information gathering
Limited to subdomain enumeration, while theHarvester offers broader OSINT capabilities

Code Comparison

Sublist3r:

def main(domain, threads, savefile, ports, silent, verbose, enable_bruteforce, engines):
    bruteforce_list = []
    subdomains = []
    search_list = []
    
    # Rest of the code...

theHarvester:

async def start(self):
    self.domain = self.domain.strip()
    self.emails = []
    self.hosts = []
    self.results = []
    
    # Rest of the code...

Both projects use Python and have similar main function structures. However, Sublist3r focuses on subdomain enumeration with multi-threading, while theHarvester has a broader scope for information gathering.

Sublist3r is more specialized for subdomain discovery, making it potentially more efficient for that specific task. theHarvester, on the other hand, offers a wider range of OSINT capabilities, making it more versatile for general reconnaissance.

recon-ng

4,762

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

Pros of recon-ng

More comprehensive and modular framework for reconnaissance
Supports a wider range of data sources and modules
Offers a command-line interface with interactive shell capabilities

Cons of recon-ng

Steeper learning curve due to its more complex structure
Requires more setup and configuration compared to theHarvester

Code Comparison

theHarvester:

from theHarvester.discovery import *
from theHarvester.discovery.constants import *
search = googlesearch.search_google(word, limit, start)

recon-ng:

from recon.core.module import BaseModule
class Module(BaseModule):
    def module_run(self):
        self.query('SELECT * FROM domains WHERE domain LIKE ?', ('%{}%'.format(self.options['domain']),))

Both tools are written in Python, but recon-ng has a more structured approach with modules and a core framework. theHarvester focuses on simpler, direct searches using various discovery methods. recon-ng offers a more extensible and customizable platform for reconnaissance tasks, while theHarvester provides a straightforward tool for gathering open-source intelligence.

spiderfoot

14,694

SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.

Pros of SpiderFoot

More comprehensive and feature-rich OSINT platform
User-friendly web interface for easier operation
Supports a wider range of data sources and modules

Cons of SpiderFoot

Steeper learning curve due to its complexity
Requires more system resources to run effectively

Code Comparison

SpiderFoot:

class SpiderFootPlugin(object):
    def __init__(self, options):
        self._opts = options

    def setup(self):
        pass

    def enrichTarget(self, target):
        pass

theHarvester:

class Plugin:
    def __init__(self, word):
        self.word = word
        self.results = []
        self.totalresults = []

    def do_search(self):
        pass

Key Differences

SpiderFoot offers a more modular and extensible architecture
theHarvester is more focused on email and domain harvesting
SpiderFoot provides a broader range of OSINT capabilities
theHarvester is generally easier to use for beginners
SpiderFoot has a more active development community

Both tools are valuable for OSINT, but SpiderFoot is more suitable for comprehensive investigations, while theHarvester excels in quick email and domain reconnaissance.

sherlock

63,927

Hunt down social media accounts by username across social networks

Pros of Sherlock

Focuses specifically on finding usernames across multiple social networks and websites
Supports a larger number of sites (350+) compared to theHarvester
Provides a more user-friendly command-line interface with colorful output

Cons of Sherlock

Limited to username searches, while theHarvester offers broader information gathering capabilities
May produce more false positives due to its wide-ranging search across numerous platforms
Lacks some of the advanced features found in theHarvester, such as DNS brute forcing and shodan search

Code Comparison

Sherlock:

def sherlock(username, site_data, timeout=60):
    results = {}
    for social_network, net_info in site_data.items():
        results[social_network] = {"url_main": net_info.get("urlMain")}
        url = net_info["url"].format(username)
        results[social_network]["url"] = url
        results[social_network]["exists"] = "yes"

theHarvester:

async def search(self, domain: str) -> None:
    self.domain = domain
    url = f'https://api.github.com/search/code?q="{domain}"'
    async with aiohttp.ClientSession(headers=self.headers) as session:
        async with session.get(url) as resp:
            self.results = await resp.json()

Both tools are useful for OSINT purposes, but Sherlock is more specialized for username searches across social media platforms, while theHarvester offers a broader range of information gathering capabilities for domains and organizations.

subfinder

11,579

Fast passive subdomain enumeration tool.

Pros of Subfinder

Faster subdomain enumeration with concurrent processing
More extensive list of supported sources for subdomain discovery
Better integration with other tools in the ProjectDiscovery ecosystem

Cons of Subfinder

More focused on subdomain enumeration, less versatile for general OSINT
Steeper learning curve for advanced features and configuration
May require additional tools for comprehensive information gathering

Code Comparison

TheHarvester:

from theHarvester.discovery import *
from theHarvester.discovery.constants import *
search = googlesearch.search_google(word, limit, start)

Subfinder:

package main

import (
    "github.com/projectdiscovery/subfinder/v2/pkg/runner"
)

options := &runner.Options{
    Threads: 10,
    Timeout: 30,
    Sources: []string{"alienvault", "bufferover", "crtsh"},
}

TheHarvester is written in Python and offers a more modular approach for various search engines and data sources. Subfinder, written in Go, focuses on efficient subdomain enumeration with concurrent processing.

Both tools are valuable for reconnaissance, but Subfinder excels in rapid subdomain discovery, while TheHarvester provides a broader range of OSINT capabilities. The choice between them depends on the specific requirements of your information gathering tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

What is this?

theHarvester is a simple to use, yet powerful tool designed to be used during the reconnaissance stage of a red
team assessment or penetration test. It performs open source intelligence (OSINT) gathering to help determine
a domain's external threat landscape. The tool gathers names, emails, IPs, subdomains, and URLs by using
multiple public resources that include:

Passive modules:

baidu: Baidu search engine - www.baidu.com
bevigil: CloudSEK BeVigil scans mobile application for OSINT assets (Requires an API key, see below.) - https://bevigil.com/osint-api
bing: Microsoft search engine - https://www.bing.com
bingapi: Microsoft search engine, through the API (Requires an API key, see below.)
brave: Brave search engine - https://search.brave.com/
bufferoverun: Fast domain name lookups for TLS certificates in IPv4 space (Requires an API key, see below.) https://tls.bufferover.run
builtwith: Find out what websites are built with (Requires an API key, see below.) - https://builtwith.com
censys: Censys search engine will use certificates searches to enumerate subdomains and gather emails
(Requires an API key, see below.) https://censys.io
certspotter: Cert Spotter monitors Certificate Transparency logs - https://sslmate.com/certspotter/
criminalip: Specialized Cyber Threat Intelligence (CTI) search engine (Requires an API key, see below.) - https://www.criminalip.io
crtsh: Comodo Certificate search - https://crt.sh
dehashed: Take your data security to the next level (Requires an API key, see below.) - https://dehashed.com
dnsdumpster: Domain research tool that can discover hosts related to a domain - https://dnsdumpster.com
duckduckgo: DuckDuckGo search engine - https://duckduckgo.com
fullhunt: Next-generation attack surface security platform (Requires an API key, see below.) - https://fullhunt.io
github-code: GitHub code search engine (Requires a GitHub Personal Access Token, see below.) - www.github.com
hackertarget: Online vulnerability scanners and network intelligence to help organizations - https://hackertarget.com
haveibeenpwned: Check if your email address is in a data breach (Requires an API key, see below.) - https://haveibeenpwned.com
hunter: Hunter search engine (Requires an API key, see below.) - https://hunter.io
hunterhow: Internet search engines for security researchers (Requires an API key, see below.) - https://hunter.how
intelx: Intelx search engine (Requires an API key, see below.) - http://intelx.io
leaklookup: Data breach search engine (Requires an API key, see below.) - https://leak-lookup.com
netlas: A Shodan or Censys competitor (Requires an API key, see below.) - https://app.netlas.io
onyphe: Cyber defense search engine (Requires an API key, see below.) - https://www.onyphe.io/
otx: AlienVault open threat exchange - https://otx.alienvault.com
pentestTools: Cloud-based toolkit for offensive security testing, focused on web applications and network penetration testing
(Requires an API key, see below.) - https://pentest-tools.com/
projecDiscovery: We actively collect and maintain internet-wide assets data, to enhance research and analyse changes around DNS
for better insights (Requires an API key, see below.) - https://chaos.projectdiscovery.io
rapiddns: DNS query tool which make querying subdomains or sites of a same IP easy! https://rapiddns.io
rocketreach: Access real-time verified personal/professional emails, phone numbers, and social media links (Requires an API key, see below.) - https://rocketreach.co
securityscorecard: helps TPRM and SOC teams detect, prioritize, and remediate vendor risk across their entire supplier ecosystem at scale (Requires an API key, see below.) - https://securityscorecard.com
securityTrails: Security Trails search engine, the world's largest repository of historical DNS data (Requires an API key, see below.) - https://securitytrails.com
-s, --shodan: Shodan search engine will search for ports and banners from discovered hosts (Requires an API key, see below.) - https://shodan.io
sitedossier: Find available information on a site - http://www.sitedossier.com
subdomaincenter: A subdomain finder tool used to find subdomains of a given domain - https://www.subdomain.center/
subdomainfinderc99: A subdomain finder is a tool used to find the subdomains of a given domain - https://subdomainfinder.c99.nl
threatminer: Data mining for threat intelligence - https://www.threatminer.org/
tomba: Tomba search engine (Requires an API key, see below.) - https://tomba.io
urlscan: A sandbox for the web that is a URL and website scanner - https://urlscan.io
venacus: Venacus search engine (Requires an API key, see below.) - https://venacus.com
vhost: Bing virtual hosts search
virustotal: Domain search (Requires an API key, see below.) - https://www.virustotal.com
whoisxml: Subdomain search (Requires an API key, see below.) - https://subdomains.whoisxmlapi.com/api/pricing
yahoo: Yahoo search engine
zoomeye: China's version of Shodan (Requires an API key, see below.) - https://www.zoomeye.org

Active modules:

DNS brute force: dictionary brute force enumeration
Screenshots: Take screenshots of subdomains that were found

Modules that require an API key:

Documentation to setup API keys can be found at - https://github.com/laramies/theHarvester/wiki/Installation#api-keys

bevigil - Free upto 50 queries. Pricing can be found here: https://bevigil.com/pricing/osint
bing
bufferoverun - uses the free binaAPI
builtwith
censys - API keys are required and can be retrieved from your Censys account.
criminalip
dehashed
fullhunt
github-code
haveibeenpwned
hunter - limited to 10 on the free plan, so you will need to do -l 10 switch
hunterhow
intelx
leaklookup
netlas - $
onyphe -$
pentestTools - $
projecDiscovery - invite only for now
rocketreach - $
securityscorecard
securityTrails
shodan - $
tomba - Free up to 50 search.
venacus - $
whoisxml
zoomeye

Install and dependencies:

Python 3.11+
https://github.com/laramies/theHarvester/wiki/Installation

Comments, bugs, and requests:

Christian Martorella @laramies cmartorella@edge-security.com
Matthew Brown @NotoriousRebel1
Jay "L1ghtn1ng" Townsend @jay_townsend1

Main contributors:

Matthew Brown @NotoriousRebel1
Jay "L1ghtn1ng" Townsend @jay_townsend1
Lee Baird @discoverscripts

Thanks:

John Matherly - Shodan project
Ahmed Aboul Ela - subdomain names dictionaries (big and small)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot