Photon

Incredibly fast crawler designed for OSINT.

11,762

1,574

11,762

View on GitHub

Top Related Projects

theHarvester

13,101

E-mails, subdomains and names Harvester - OSINT

spiderfoot

15,457

SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.

recon-ng

5,072

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

Sublist3r

10,505

Fast subdomains enumeration tool for penetration testers

gau

4,482

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

assetfinder

3,365

Find domains and subdomains related to a given domain

Quick Overview

Photon is an open-source intelligence (OSINT) automation tool designed for fast and comprehensive web reconnaissance. It crawls websites to gather information such as URLs, emails, social media accounts, and more, making it a valuable asset for security researchers and penetration testers.

Pros

Fast and efficient crawling with multi-threading support
Extensive data extraction capabilities, including URLs, emails, social media accounts, and more
Customizable output formats (JSON, CSV, TXT) for easy integration with other tools
Active development and community support

Cons

May trigger website security measures if not used carefully
Requires Python knowledge for advanced customization
Limited documentation for some advanced features
Potential for misuse if not used responsibly

Code Examples

Basic usage to crawl a website:

from photon import Photon

url = "https://example.com"
photon = Photon(url)
photon.crawl()

Extracting specific data types:

from photon import Photon

url = "https://example.com"
photon = Photon(url)
photon.crawl(extract=['urls', 'emails', 'social'])

Customizing output format:

from photon import Photon

url = "https://example.com"
photon = Photon(url, output_file="results.json")
photon.crawl(output_format="json")

Getting Started

To get started with Photon, follow these steps:

Install Photon:

pip install photon-crawler

Import and use Photon in your Python script:

from photon import Photon

url = "https://example.com"
photon = Photon(url)
photon.crawl()

Run your script to start crawling and gathering information.

For more advanced usage and configuration options, refer to the official documentation on the GitHub repository.

Competitor Comparisons

theHarvester

13,101

E-mails, subdomains and names Harvester - OSINT

Pros of theHarvester

Broader scope of information gathering, including email addresses, subdomains, and more
Supports multiple search engines and data sources
Actively maintained with regular updates

Cons of theHarvester

Slower execution compared to Photon
Less focused on web crawling and content extraction
May require additional dependencies for full functionality

Code Comparison

Photon:

def extract_links(self, soup, name):
    links = soup.find_all('a')
    for link in links:
        href = link.get('href')
        if href:
            self.links.add(href)

theHarvester:

def get_emails(self):
    rawres = myparser.Parser(self.totalresults, self.word)
    return rawres.emails()

The code snippets show that Photon focuses on extracting links from web pages, while theHarvester emphasizes parsing and extracting specific information like email addresses from search results.

Both tools serve different purposes within the realm of information gathering and reconnaissance. Photon excels at web crawling and content extraction, while theHarvester offers a broader range of information gathering capabilities across multiple sources.

spiderfoot

15,457

SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.

Pros of Spiderfoot

More comprehensive OSINT tool with a wider range of modules and data sources
Provides a web-based GUI for easier interaction and visualization of results
Supports automation and integration with other tools through its API

Cons of Spiderfoot

Steeper learning curve due to its extensive features and configuration options
Requires more system resources and setup time compared to Photon

Code Comparison

Photon (Python):

def photon(url, level, threadCount):
    processed = set()
    storage = set()
    forms = set()
    processed.add(url)
    # ... (rest of the function)

Spiderfoot (Python):

class SpiderFootPlugin(object):
    def __init__(self, options):
        self.sf = SpiderFoot(options)
        self.results = dict()
        self.errorState = False
    # ... (rest of the class)

Both projects are written in Python, but Spiderfoot has a more modular structure with plugins, while Photon has a more straightforward approach. Spiderfoot's code is organized around a plugin system, allowing for easier extensibility, whereas Photon's code is more focused on specific crawling and information gathering tasks.

recon-ng

5,072

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

Pros of recon-ng

More comprehensive and modular framework for reconnaissance
Supports a wide range of modules for various recon tasks
Integrates with multiple external APIs and services

Cons of recon-ng

Steeper learning curve due to its complexity
Requires more setup and configuration
May be overkill for simpler web reconnaissance tasks

Code Comparison

Photon example:

from photon import Photon

photon = Photon(url='https://example.com')
photon.crawl()

recon-ng example:

from recon.core.recon import Recon

recon = Recon()
recon.do_load('recon/domains-hosts/google_site_web')
recon.do_run()

Summary

Photon is a lightweight, easy-to-use web crawler and information gathering tool, while recon-ng is a more comprehensive reconnaissance framework. Photon is better suited for quick web scraping and basic information gathering, whereas recon-ng offers a broader range of capabilities for in-depth reconnaissance tasks. The choice between the two depends on the specific requirements of the project and the user's expertise level.

Sublist3r

10,505

Fast subdomains enumeration tool for penetration testers

Pros of Sublist3r

Specialized in subdomain enumeration, providing more focused results
Utilizes multiple search engines and sources for comprehensive subdomain discovery
Supports multithreading for faster scanning

Cons of Sublist3r

Limited to subdomain enumeration, lacking broader web crawling capabilities
Less actively maintained, with fewer recent updates compared to Photon
Doesn't offer features like data extraction or JavaScript analysis

Code Comparison

Sublist3r (subdomain enumeration):

def main(domain, threads, savefile, ports, silent, verbose, enable_bruteforce, engines):
    bruteforce_list = []
    subdomains = []
    search_list = []
    
    # ... (subdomain enumeration logic)

Photon (web crawling and information gathering):

def photon(seedUrl, headers, depth, threadCount, timeout, delay, cookie):
    requests.packages.urllib3.disable_warnings()
    dataset = set()
    processed = set()
    
    # ... (web crawling and data extraction logic)

The code snippets highlight the different focus areas of each tool. Sublist3r concentrates on subdomain enumeration, while Photon offers broader web crawling and information gathering capabilities.

gau

4,482

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Pros of gau

Faster execution due to its focus on URL discovery
Supports multiple input formats (stdin, file, URL)
Can output results in JSON format for easier parsing

Cons of gau

Limited functionality compared to Photon's broader feature set
Lacks built-in crawling capabilities
Does not perform content analysis or extraction

Code Comparison

Photon:

photon = Photon(url, options)
photon.crawl()
photon.extract_info()
photon.store_results()

gau:

urls := gau.GetURLs(domain)
for url := range urls {
    fmt.Println(url)
}

Summary

Photon is a more comprehensive web reconnaissance tool that offers crawling, content analysis, and information extraction. It's suitable for in-depth analysis of a target website.

gau focuses specifically on URL discovery, leveraging various sources to find URLs associated with a domain. It's faster and more specialized but lacks the broader feature set of Photon.

Choose Photon for thorough website analysis and information gathering, or gau for rapid URL discovery and enumeration. The selection depends on the specific requirements of your project or security assessment.

assetfinder

3,365

Find domains and subdomains related to a given domain

Pros of assetfinder

Lightweight and fast, focusing solely on subdomain discovery
Written in Go, making it easy to compile and distribute as a single binary
Utilizes multiple data sources for comprehensive subdomain enumeration

Cons of assetfinder

Limited functionality compared to Photon's broader web reconnaissance capabilities
Lacks the ability to extract additional information like emails, social media accounts, etc.
Does not perform crawling or content analysis

Code Comparison

assetfinder:

func main() {
    domain := flag.String("domain", "", "The domain to find assets for")
    flag.Parse()
    for result := range assetfinder.Run(*domain) {
        fmt.Println(result)
    }
}

Photon:

def main():
    args = parser.parse_args()
    target = args.url
    crawl(target, args)
    if args.dns:
        dnsdumpster(target)
    if args.export:
        exporter(args.export)

Summary

assetfinder is a focused tool for subdomain discovery, offering speed and simplicity. It's ideal for quick reconnaissance but lacks the comprehensive features of Photon. Photon, on the other hand, provides a more extensive set of web reconnaissance capabilities, including crawling, content analysis, and information extraction. The choice between the two depends on the specific needs of the user and the depth of information required for the task at hand.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Photon

Incredibly fast crawler designed for OSINT.

demo

Photon Wiki â¢ How To Use â¢ Compatibility â¢ Photon Library â¢ Contribution â¢ Roadmap

Key Features

Data Extraction

Photon can extract the following data while crawling:

URLs (in-scope & out-of-scope)
URLs with parameters (example.com/gallery.php?id=2)
Intel (emails, social media accounts, amazon buckets etc.)
Files (pdf, png, xml etc.)
Secret keys (auth/API keys & hashes)
JavaScript files & Endpoints present in them
Strings matching custom regex pattern
Subdomains & DNS related data

The extracted information is saved in an organized manner or can be exported as json.

save demo

Flexible

Control timeout, delay, add seeds, exclude URLs matching a regex pattern and other cool stuff. The extensive range of options provided by Photon lets you crawl the web exactly the way you want.

Genius

Photon's smart thread management & refined logic gives you top notch performance.

Still, crawling can be resource intensive but Photon has some tricks up it's sleeves. You can fetch URLs archived by archive.org to be used as seeds by using --wayback option.

Plugins

Docker

Photon can be launched using a lightweight Python-Alpine (103 MB) Docker image.

$ git clone https://github.com/s0md3v/Photon.git
$ cd Photon
$ docker build -t photon .
$ docker run -it --name photon photon:latest -u google.com

To view results, you can either head over to the local docker volume, which you can find by running docker inspect photon or by mounting the target loot folder:

$ docker run -it --name photon -v "$PWD:/Photon/google.com" photon:latest -u google.com

Frequent & Seamless Updates

Photon is under heavy development and updates for fixing bugs. optimizing performance & new features are being rolled regularly.

If you would like to see features and issues that are being worked on, you can do that on Development project board.

Updates can be installed & checked for with the --update option. Photon has seamless update capabilities which means you can update Photon without losing any of your saved data.

Contribution & License

You can contribute in following ways:

Report bugs
Develop plugins
Add more "APIs" for ninja mode
Give suggestions to make it better
Fix issues & submit a pull request

Please read the guidelines before submitting a pull request or issue.

Do you want to have a conversation in private? Hit me up on my twitter, inbox is open :)

Photon is licensed under GPL v3.0 license

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot