gau

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

4,482

487

4,482

View on GitHub

Top Related Projects

waybackurls

4,038

Fetch all the URLs that the Wayback Machine knows about for a domain

katana

13,996

A next-generation crawling and spidering framework.

hakrawler

4,818

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

gospider

2,752

Gospider - Fast web spider written in Go

Quick Overview

Gau (Get All URLs) is a command-line tool designed to fetch known URLs from various sources for a given domain. It's particularly useful for web security researchers and penetration testers to quickly gather a comprehensive list of URLs associated with a target domain.

Pros

Fast and efficient URL discovery from multiple sources
Easy to use with a simple command-line interface
Supports output in various formats (JSON, TXT)
Can be integrated into other tools and workflows

Cons

May produce a large number of results, requiring additional filtering
Depends on the availability and accuracy of third-party sources
Limited customization options for advanced users
Potential for false positives in URL discovery

Getting Started

To install and use gau:

# Install gau
go install github.com/lc/gau/v2/cmd/gau@latest

# Basic usage
gau example.com

# Output to a file
gau example.com -o urls.txt

# Use specific providers
gau example.com -providers wayback,otx,commoncrawl

# Get URLs from a list of domains
cat domains.txt | gau -b png,jpg,gif -o urls.txt

Note: Ensure you have Go installed and your Go bin directory is in your system's PATH.

Competitor Comparisons

waybackurls

4,038

Fetch all the URLs that the Wayback Machine knows about for a domain

Pros of waybackurls

Simpler and more focused tool, specifically for fetching URLs from the Wayback Machine
Lightweight and easy to use with minimal dependencies
Can be easily integrated into other tools or scripts

Cons of waybackurls

Limited to only the Wayback Machine as a data source
Fewer features and customization options compared to gau
May retrieve fewer unique URLs for a given domain

Code comparison

waybackurls:

resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))

gau:

for _, source := range sources {
    urls, err := source.Fetch(ctx, domain, providers)
    if err != nil {
        return fmt.Errorf("error fetching URLs from %s: %s", source.Name(), err)
    }
    for url := range urls {
        results <- url
    }
}

Summary

waybackurls is a straightforward tool focused on retrieving URLs from the Wayback Machine, making it easy to use and integrate. However, it lacks the versatility and extensive features of gau, which can fetch URLs from multiple sources and offers more customization options. gau's code demonstrates its ability to handle multiple data sources, while waybackurls is specifically tailored for the Wayback Machine. Choose waybackurls for simplicity and quick Wayback Machine queries, or opt for gau when you need a more comprehensive URL gathering solution.

katana

13,996

A next-generation crawling and spidering framework.

Pros of Katana

More comprehensive crawling capabilities, including JavaScript rendering and form submission
Faster crawling speed due to its concurrent design and Go implementation
Extensive configuration options for customizing the crawling process

Cons of Katana

Higher resource consumption compared to Gau
Steeper learning curve due to more complex configuration options
May be overkill for simple URL discovery tasks

Code Comparison

Gau usage:

gau example.com

Katana usage:

katana -u https://example.com

Both tools are designed for URL discovery, but Katana offers more advanced features and configuration options. While Gau is simpler and more straightforward to use, Katana provides a more comprehensive crawling solution at the cost of increased complexity and resource usage.

Gau is better suited for quick and lightweight URL discovery tasks, while Katana excels in scenarios requiring deep crawling, JavaScript rendering, and advanced configuration options. The choice between the two depends on the specific requirements of the project and the desired level of crawling depth and customization.

hakrawler

4,818

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Pros of hakrawler

Written in Go, offering potentially better performance
Supports crawling JavaScript files for additional endpoints
Can follow redirects and handle cookies

Cons of hakrawler

Limited to crawling a single domain at a time
Doesn't offer as many data sources as gau
May require more manual configuration for complex scenarios

Code Comparison

hakrawler:

func crawl(url string, depth int) {
    if depth <= 0 {
        return
    }
    // Crawl logic here
}

gau:

func getUrls(domains []string, providers []string, client *http.Client) {
    // URL fetching logic here
}

Key Differences

hakrawler focuses on active crawling of websites, while gau retrieves URLs from various sources without crawling
gau can process multiple domains simultaneously, whereas hakrawler is designed for single-domain crawling
hakrawler provides more granular control over the crawling process, including depth and JavaScript parsing

Use Cases

hakrawler is better suited for:

In-depth exploration of a single website
Discovering hidden endpoints in JavaScript files
Scenarios requiring cookie handling and redirect following

gau is more appropriate for:

Quickly gathering URLs from multiple domains
Collecting historical URL data from various sources
Situations where active crawling is not feasible or desired

gospider

2,752

Gospider - Fast web spider written in Go

Pros of gospider

More comprehensive crawling capabilities, including JavaScript rendering
Supports multiple output formats (JSON, Markdown, CSV)
Offers more customization options for crawling behavior

Cons of gospider

May be slower due to more extensive crawling features
Potentially more complex to use for simple URL extraction tasks
Requires more system resources for JavaScript rendering

Code comparison

gospider:

crawler := gospider.NewCrawler(
    gospider.WithConcurrency(10),
    gospider.WithDepth(3),
    gospider.WithJSRendering(true),
)

gau:

client := gau.NewClient()
urls, err := client.Fetch(ctx, "example.com")

gospider offers more configuration options and advanced crawling features, while gau provides a simpler interface for quick URL extraction. gospider is better suited for comprehensive web crawling tasks, whereas gau excels at rapid URL discovery from various sources.

gospider's JavaScript rendering capability allows it to discover dynamically generated content, making it more thorough but potentially slower. gau focuses on speed and simplicity, making it ideal for quick reconnaissance or when dealing with large numbers of domains.

Choose gospider for in-depth web crawling and content analysis, and gau for fast URL enumeration and initial reconnaissance tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

getallurls (gau)

getallurls (gau) fetches known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, Common Crawl, and URLScan for any given domain. Inspired by Tomnomnom's waybackurls.

Resources

Usage
Installation
ohmyzsh note

Usage:

Examples:

$ printf example.com | gau
$ cat domains.txt | gau --threads 5
$ gau example.com google.com
$ gau --o example-urls.txt example.com
$ gau --blacklist png,jpg,gif example.com

To display the help for the tool use the -h flag:

$ gau -h

Flag	Description	Example
`--blacklist`	list of extensions to skip	gau --blacklist ttf,woff,svg,png
`--config`	Use alternate configuration file (default `$HOME/config.toml` or `%USERPROFILE%\.gau.toml`)	gau --config $HOME/.config/gau.toml
`--fc`	list of status codes to filter	gau --fc 404,302
`--from`	fetch urls from date (format: YYYYMM)	gau --from 202101
`--ft`	list of mime-types to filter	gau --ft text/plain
`--fp`	remove different parameters of the same endpoint	gau --fp
`--json`	output as json	gau --json
`--mc`	list of status codes to match	gau --mc 200,500
`--mt`	list of mime-types to match	gau --mt text/html,application/json
`--o`	filename to write results to	gau --o out.txt
`--providers`	list of providers to use (wayback,commoncrawl,otx,urlscan)	gau --providers wayback
`--proxy`	http proxy to use (socks5:// or http://	gau --proxy http://proxy.example.com:8080
`--retries`	retries for HTTP client	gau --retries 10
`--timeout`	timeout (in seconds) for HTTP client	gau --timeout 60
`--subs`	include subdomains of target domain	gau example.com --subs
`--threads`	number of workers to spawn	gau example.com --threads
`--to`	fetch urls to date (format: YYYYMM)	gau example.com --to 202101
`--verbose`	show verbose output	gau --verbose example.com
`--version`	show gau version	gau --version

Configuration Files

gau automatically looks for a configuration file at $HOME/.gau.toml or%USERPROFILE%\.gau.toml. You can point to a different configuration file using the --config flag. If the configuration file is not found, gau will still run with a default configuration, but will output a message to stderr.

You can specify options and they will be used for every subsequent run of gau. Any options provided via command line flags will override options set in the configuration file.

An example configuration file can be found here

Installation:

From source:

$ go install github.com/lc/gau/v2/cmd/gau@latest

From github :

git clone https://github.com/lc/gau.git; \
cd gau/cmd; \
go build; \
sudo mv gau /usr/local/bin/; \
gau --version;

From binary:

You can download the pre-built binaries from the releases page and then move them into your $PATH.

$ tar xvf gau_2.0.6_linux_amd64.tar.gz
$ mv gau /usr/bin/gau

From Docker:

You can run gau via docker like so:

docker run --rm sxcurity/gau:latest --help

You can also build a docker image with the following command

docker build -t gau .

and then run it

docker run gau example.com

Bear in mind that piping command (echo "example.com" | gau) will not work with the docker container

ohmyzsh note:

ohmyzsh's git plugin has an alias which maps gau to the git add --update command. This is problematic, causing a binary conflict between this tool "gau" and the zsh plugin alias "gau" (git add --update). There is currently a few workarounds which can be found in this Github issue.

Useful?

Donate to CommonCrawl
Donate to the InternetArchive

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot