waybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain

4,038

514

4,038

View on GitHub

Top Related Projects

gau

4,482

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

hakrawler

4,818

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

katana

13,996

A next-generation crawling and spidering framework.

gospider

2,752

Gospider - Fast web spider written in Go

amass

13,333

In-depth attack surface mapping and asset discovery

Quick Overview

Waybackurls is a command-line tool written in Go that fetches all the URLs that the Wayback Machine knows about for a given domain. It's designed to help security researchers and penetration testers discover historical and potentially forgotten endpoints of a website.

Pros

Fast and efficient, capable of processing large domains quickly
Easy to use with a simple command-line interface
Can be integrated into other tools and scripts easily
Provides valuable historical data for security assessments

Cons

Limited to data available in the Wayback Machine
May return outdated or irrelevant URLs
No built-in filtering options for results
Requires manual analysis of output for meaningful insights

Getting Started

To use waybackurls, follow these steps:

Install Go on your system if not already installed.

Install waybackurls:

go install github.com/tomnomnom/waybackurls@latest

Run waybackurls with a domain:

echo "example.com" | waybackurls

Or:

cat domains.txt | waybackurls

The tool will output a list of URLs associated with the given domain(s) that have been archived by the Wayback Machine.

Competitor Comparisons

gau

4,482

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Pros of gau

Supports multiple sources (Wayback Machine, AlienVault's OTX, Common Crawl)
Offers concurrent fetching for faster results
Provides filtering options (e.g., by status code, content-length)

Cons of gau

May produce more noise due to multiple sources
Requires more configuration to fine-tune results
Potentially higher resource usage due to concurrent fetching

Code Comparison

waybackurls:

resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", url))

gau:

urls, err := gau.FromDomains([]string{domain}, gau.WithThreads(threads), gau.WithProviders(providers...))
for url := range urls {
    fmt.Println(url)
}

Both tools aim to retrieve historical URLs, but gau offers more flexibility and sources at the cost of potential complexity. waybackurls is simpler and focuses solely on the Wayback Machine, making it easier to use for basic tasks. The choice between them depends on the specific requirements of your project and the depth of URL discovery needed.

hakrawler

4,818

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Pros of hakrawler

Crawls websites in real-time, potentially discovering new or dynamic content
Supports custom headers and cookies for authenticated crawling
Offers various output formats (JSON, URLs only, etc.)

Cons of hakrawler

May be slower for large-scale reconnaissance compared to waybackurls
Requires active crawling, which can be more intrusive and detectable
Potentially less comprehensive for historical data compared to Wayback Machine

Code comparison

waybackurls:

resp, err := http.Get(fetchURL)
if err != nil {
    return
}
defer resp.Body.Close()

hakrawler:

c := colly.NewCollector(
    colly.UserAgent("hakrawler"),
    colly.MaxDepth(depth),
)
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    link := e.Attr("href")
    // Process link
})

hakrawler uses the Colly framework for crawling, while waybackurls focuses on fetching data from the Wayback Machine API. hakrawler's approach allows for more flexible and interactive crawling, but waybackurls is more efficient for retrieving historical URL data from a single source.

katana

13,996

A next-generation crawling and spidering framework.

Pros of Katana

More comprehensive crawling capabilities, including JavaScript rendering
Faster performance due to concurrent crawling
Supports multiple output formats (JSON, HTML, etc.)

Cons of Katana

More complex setup and usage compared to Waybackurls
Requires more system resources due to its advanced features
May produce more noise in results, requiring additional filtering

Code Comparison

Waybackurls:

echo "example.com" | waybackurls

Katana:

katana -u https://example.com

Feature Comparison

Waybackurls:

Simple and straightforward URL extraction from Wayback Machine
Lightweight and easy to integrate into scripts
Focuses solely on historical URL data

Katana:

Active web crawling with customizable depth and scope
Ability to handle modern web applications and single-page apps
Includes additional features like screenshot capture and custom headers

Use Case Scenarios

Waybackurls is ideal for:

Quick historical URL discovery
Integration into simple recon workflows
Low-resource environments

Katana is better suited for:

Comprehensive web application mapping
Discovering dynamically generated content
Advanced reconnaissance with detailed output

Both tools have their place in a security researcher's toolkit, with Waybackurls offering simplicity and Katana providing more advanced crawling capabilities.

gospider

2,752

Gospider - Fast web spider written in Go

Pros of gospider

More comprehensive crawling capabilities, including JavaScript rendering
Supports multiple output formats (JSON, Markdown, CSV)
Offers additional features like form submission and custom headers

Cons of gospider

More complex to use due to additional features and options
Potentially slower execution for simple URL extraction tasks
Requires more system resources for full functionality

Code comparison

waybackurls:

func getWaybackURLs(domain string, results chan<- string) {
    resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))
    if err != nil {
        return
    }
    defer resp.Body.Close()
    // ... (processing and sending results)
}

gospider:

func (s *Spider) Start() error {
    for _, site := range s.C.Sites {
        go func(site string) {
            s.crawl(site)
        }(site)
    }
    s.wait()
    return nil
}

The code snippets highlight the different approaches: waybackurls focuses on retrieving URLs from the Wayback Machine, while gospider implements a more complex crawling mechanism with concurrent processing.

amass

13,333

In-depth attack surface mapping and asset discovery

Pros of Amass

More comprehensive subdomain enumeration, using multiple data sources and techniques
Actively maintained with regular updates and new features
Supports advanced features like DNS resolution and certificate transparency checks

Cons of Amass

Steeper learning curve due to more complex functionality
Requires more system resources and may be slower for simple tasks
Can be overkill for basic URL discovery needs

Code Comparison

Waybackurls (simple usage):

echo example.com | waybackurls

Amass (basic subdomain enumeration):

amass enum -d example.com

Summary

Waybackurls is a lightweight tool focused on retrieving URLs from the Wayback Machine, making it ideal for quick and simple URL discovery tasks. Amass, on the other hand, is a more powerful and comprehensive tool for subdomain enumeration and asset discovery, offering a wide range of features and data sources. While Amass provides more thorough results, it comes with increased complexity and resource requirements. The choice between the two depends on the specific needs of the user and the scope of the project.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

waybackurls

Accept line-delimited domains on stdin, fetch known URLs from the Wayback Machine for *.domain and output them on stdout.

Usage example:

â¶ cat domains.txt | waybackurls > urls

Install:

â¶ go install github.com/tomnomnom/waybackurls@latest

Credit

This tool was inspired by @mhmdiaa's waybackurls.py script. Thanks to them for the great idea!

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot