Convert Figma logo to code with AI

tomnomnom logowaybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain

3,487
467
3,487
46

Top Related Projects

3,923

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

11,039

A next-generation crawling and spidering framework.

Gospider - Fast web spider written in Go

11,936

In-depth attack surface mapping and asset discovery

Quick Overview

Waybackurls is a command-line tool written in Go that fetches all the URLs that the Wayback Machine knows about for a given domain. It's designed to help security researchers and penetration testers discover historical and potentially forgotten endpoints of a website.

Pros

  • Fast and efficient, capable of processing large domains quickly
  • Easy to use with a simple command-line interface
  • Can be integrated into other tools and scripts easily
  • Provides valuable historical data for security assessments

Cons

  • Limited to data available in the Wayback Machine
  • May return outdated or irrelevant URLs
  • No built-in filtering options for results
  • Requires manual analysis of output for meaningful insights

Getting Started

To use waybackurls, follow these steps:

  1. Install Go on your system if not already installed.
  2. Install waybackurls:
    go install github.com/tomnomnom/waybackurls@latest
    
  3. Run waybackurls with a domain:
    echo "example.com" | waybackurls
    
    Or:
    cat domains.txt | waybackurls
    

The tool will output a list of URLs associated with the given domain(s) that have been archived by the Wayback Machine.

Competitor Comparisons

3,923

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Pros of gau

  • Supports multiple sources (Wayback Machine, AlienVault's OTX, Common Crawl)
  • Offers concurrent fetching for faster results
  • Provides filtering options (e.g., by status code, content-length)

Cons of gau

  • May produce more noise due to multiple sources
  • Requires more configuration to fine-tune results
  • Potentially higher resource usage due to concurrent fetching

Code Comparison

waybackurls:

resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", url))

gau:

urls, err := gau.FromDomains([]string{domain}, gau.WithThreads(threads), gau.WithProviders(providers...))
for url := range urls {
    fmt.Println(url)
}

Both tools aim to retrieve historical URLs, but gau offers more flexibility and sources at the cost of potential complexity. waybackurls is simpler and focuses solely on the Wayback Machine, making it easier to use for basic tasks. The choice between them depends on the specific requirements of your project and the depth of URL discovery needed.

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Pros of hakrawler

  • Crawls websites in real-time, potentially discovering new or dynamic content
  • Supports custom headers and cookies for authenticated crawling
  • Offers various output formats (JSON, URLs only, etc.)

Cons of hakrawler

  • May be slower for large-scale reconnaissance compared to waybackurls
  • Requires active crawling, which can be more intrusive and detectable
  • Potentially less comprehensive for historical data compared to Wayback Machine

Code comparison

waybackurls:

resp, err := http.Get(fetchURL)
if err != nil {
    return
}
defer resp.Body.Close()

hakrawler:

c := colly.NewCollector(
    colly.UserAgent("hakrawler"),
    colly.MaxDepth(depth),
)
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    link := e.Attr("href")
    // Process link
})

hakrawler uses the Colly framework for crawling, while waybackurls focuses on fetching data from the Wayback Machine API. hakrawler's approach allows for more flexible and interactive crawling, but waybackurls is more efficient for retrieving historical URL data from a single source.

11,039

A next-generation crawling and spidering framework.

Pros of Katana

  • More comprehensive crawling capabilities, including JavaScript rendering
  • Faster performance due to concurrent crawling
  • Supports multiple output formats (JSON, HTML, etc.)

Cons of Katana

  • More complex setup and usage compared to Waybackurls
  • Requires more system resources due to its advanced features
  • May produce more noise in results, requiring additional filtering

Code Comparison

Waybackurls:

echo "example.com" | waybackurls

Katana:

katana -u https://example.com

Feature Comparison

Waybackurls:

  • Simple and straightforward URL extraction from Wayback Machine
  • Lightweight and easy to integrate into scripts
  • Focuses solely on historical URL data

Katana:

  • Active web crawling with customizable depth and scope
  • Ability to handle modern web applications and single-page apps
  • Includes additional features like screenshot capture and custom headers

Use Case Scenarios

Waybackurls is ideal for:

  • Quick historical URL discovery
  • Integration into simple recon workflows
  • Low-resource environments

Katana is better suited for:

  • Comprehensive web application mapping
  • Discovering dynamically generated content
  • Advanced reconnaissance with detailed output

Both tools have their place in a security researcher's toolkit, with Waybackurls offering simplicity and Katana providing more advanced crawling capabilities.

Gospider - Fast web spider written in Go

Pros of gospider

  • More comprehensive crawling capabilities, including JavaScript rendering
  • Supports multiple output formats (JSON, Markdown, CSV)
  • Offers additional features like form submission and custom headers

Cons of gospider

  • More complex to use due to additional features and options
  • Potentially slower execution for simple URL extraction tasks
  • Requires more system resources for full functionality

Code comparison

waybackurls:

func getWaybackURLs(domain string, results chan<- string) {
    resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))
    if err != nil {
        return
    }
    defer resp.Body.Close()
    // ... (processing and sending results)
}

gospider:

func (s *Spider) Start() error {
    for _, site := range s.C.Sites {
        go func(site string) {
            s.crawl(site)
        }(site)
    }
    s.wait()
    return nil
}

The code snippets highlight the different approaches: waybackurls focuses on retrieving URLs from the Wayback Machine, while gospider implements a more complex crawling mechanism with concurrent processing.

11,936

In-depth attack surface mapping and asset discovery

Pros of Amass

  • More comprehensive subdomain enumeration, using multiple data sources and techniques
  • Actively maintained with regular updates and new features
  • Supports advanced features like DNS resolution and certificate transparency checks

Cons of Amass

  • Steeper learning curve due to more complex functionality
  • Requires more system resources and may be slower for simple tasks
  • Can be overkill for basic URL discovery needs

Code Comparison

Waybackurls (simple usage):

echo example.com | waybackurls

Amass (basic subdomain enumeration):

amass enum -d example.com

Summary

Waybackurls is a lightweight tool focused on retrieving URLs from the Wayback Machine, making it ideal for quick and simple URL discovery tasks. Amass, on the other hand, is a more powerful and comprehensive tool for subdomain enumeration and asset discovery, offering a wide range of features and data sources. While Amass provides more thorough results, it comes with increased complexity and resource requirements. The choice between the two depends on the specific needs of the user and the scope of the project.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

waybackurls

Accept line-delimited domains on stdin, fetch known URLs from the Wayback Machine for *.domain and output them on stdout.

Usage example:

▶ cat domains.txt | waybackurls > urls

Install:

▶ go install github.com/tomnomnom/waybackurls@latest

Credit

This tool was inspired by @mhmdiaa's waybackurls.py script. Thanks to them for the great idea!