gau
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
Top Related Projects
Fetch all the URLs that the Wayback Machine knows about for a domain
A next-generation crawling and spidering framework.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Gospider - Fast web spider written in Go
Quick Overview
Gau (Get All URLs) is a command-line tool designed to fetch known URLs from various sources for a given domain. It's particularly useful for web security researchers and penetration testers to quickly gather a comprehensive list of URLs associated with a target domain.
Pros
- Fast and efficient URL discovery from multiple sources
- Easy to use with a simple command-line interface
- Supports output in various formats (JSON, TXT)
- Can be integrated into other tools and workflows
Cons
- May produce a large number of results, requiring additional filtering
- Depends on the availability and accuracy of third-party sources
- Limited customization options for advanced users
- Potential for false positives in URL discovery
Getting Started
To install and use gau:
# Install gau
go install github.com/lc/gau/v2/cmd/gau@latest
# Basic usage
gau example.com
# Output to a file
gau example.com -o urls.txt
# Use specific providers
gau example.com -providers wayback,otx,commoncrawl
# Get URLs from a list of domains
cat domains.txt | gau -b png,jpg,gif -o urls.txt
Note: Ensure you have Go installed and your Go bin directory is in your system's PATH.
Competitor Comparisons
Fetch all the URLs that the Wayback Machine knows about for a domain
Pros of waybackurls
- Simpler and more focused tool, specifically for fetching URLs from the Wayback Machine
- Lightweight and easy to use with minimal dependencies
- Can be easily integrated into other tools or scripts
Cons of waybackurls
- Limited to only the Wayback Machine as a data source
- Fewer features and customization options compared to gau
- May retrieve fewer unique URLs for a given domain
Code comparison
waybackurls:
resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))
gau:
for _, source := range sources {
urls, err := source.Fetch(ctx, domain, providers)
if err != nil {
return fmt.Errorf("error fetching URLs from %s: %s", source.Name(), err)
}
for url := range urls {
results <- url
}
}
Summary
waybackurls is a straightforward tool focused on retrieving URLs from the Wayback Machine, making it easy to use and integrate. However, it lacks the versatility and extensive features of gau, which can fetch URLs from multiple sources and offers more customization options. gau's code demonstrates its ability to handle multiple data sources, while waybackurls is specifically tailored for the Wayback Machine. Choose waybackurls for simplicity and quick Wayback Machine queries, or opt for gau when you need a more comprehensive URL gathering solution.
A next-generation crawling and spidering framework.
Pros of Katana
- More comprehensive crawling capabilities, including JavaScript rendering and form submission
- Faster crawling speed due to its concurrent design and Go implementation
- Extensive configuration options for customizing the crawling process
Cons of Katana
- Higher resource consumption compared to Gau
- Steeper learning curve due to more complex configuration options
- May be overkill for simple URL discovery tasks
Code Comparison
Gau usage:
gau example.com
Katana usage:
katana -u https://example.com
Both tools are designed for URL discovery, but Katana offers more advanced features and configuration options. While Gau is simpler and more straightforward to use, Katana provides a more comprehensive crawling solution at the cost of increased complexity and resource usage.
Gau is better suited for quick and lightweight URL discovery tasks, while Katana excels in scenarios requiring deep crawling, JavaScript rendering, and advanced configuration options. The choice between the two depends on the specific requirements of the project and the desired level of crawling depth and customization.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Pros of hakrawler
- Written in Go, offering potentially better performance
- Supports crawling JavaScript files for additional endpoints
- Can follow redirects and handle cookies
Cons of hakrawler
- Limited to crawling a single domain at a time
- Doesn't offer as many data sources as gau
- May require more manual configuration for complex scenarios
Code Comparison
hakrawler:
func crawl(url string, depth int) {
if depth <= 0 {
return
}
// Crawl logic here
}
gau:
func getUrls(domains []string, providers []string, client *http.Client) {
// URL fetching logic here
}
Key Differences
- hakrawler focuses on active crawling of websites, while gau retrieves URLs from various sources without crawling
- gau can process multiple domains simultaneously, whereas hakrawler is designed for single-domain crawling
- hakrawler provides more granular control over the crawling process, including depth and JavaScript parsing
Use Cases
hakrawler is better suited for:
- In-depth exploration of a single website
- Discovering hidden endpoints in JavaScript files
- Scenarios requiring cookie handling and redirect following
gau is more appropriate for:
- Quickly gathering URLs from multiple domains
- Collecting historical URL data from various sources
- Situations where active crawling is not feasible or desired
Gospider - Fast web spider written in Go
Pros of gospider
- More comprehensive crawling capabilities, including JavaScript rendering
- Supports multiple output formats (JSON, Markdown, CSV)
- Offers more customization options for crawling behavior
Cons of gospider
- May be slower due to more extensive crawling features
- Potentially more complex to use for simple URL extraction tasks
- Requires more system resources for JavaScript rendering
Code comparison
gospider:
crawler := gospider.NewCrawler(
gospider.WithConcurrency(10),
gospider.WithDepth(3),
gospider.WithJSRendering(true),
)
gau:
client := gau.NewClient()
urls, err := client.Fetch(ctx, "example.com")
gospider offers more configuration options and advanced crawling features, while gau provides a simpler interface for quick URL extraction. gospider is better suited for comprehensive web crawling tasks, whereas gau excels at rapid URL discovery from various sources.
gospider's JavaScript rendering capability allows it to discover dynamically generated content, making it more thorough but potentially slower. gau focuses on speed and simplicity, making it ideal for quick reconnaissance or when dealing with large numbers of domains.
Choose gospider for in-depth web crawling and content analysis, and gau for fast URL enumeration and initial reconnaissance tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
getallurls (gau)
getallurls (gau) fetches known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, Common Crawl, and URLScan for any given domain. Inspired by Tomnomnom's waybackurls.
Resources
Usage:
Examples:
$ printf example.com | gau
$ cat domains.txt | gau --threads 5
$ gau example.com google.com
$ gau --o example-urls.txt example.com
$ gau --blacklist png,jpg,gif example.com
To display the help for the tool use the -h
flag:
$ gau -h
Flag | Description | Example |
---|---|---|
--blacklist | list of extensions to skip | gau --blacklist ttf,woff,svg,png |
--fc | list of status codes to filter | gau --fc 404,302 |
--from | fetch urls from date (format: YYYYMM) | gau --from 202101 |
--ft | list of mime-types to filter | gau --ft text/plain |
--fp | remove different parameters of the same endpoint | gau --fp |
--json | output as json | gau --json |
--mc | list of status codes to match | gau --mc 200,500 |
--mt | list of mime-types to match | gau --mt text/html,application/json |
--o | filename to write results to | gau --o out.txt |
--providers | list of providers to use (wayback,commoncrawl,otx,urlscan) | gau --providers wayback |
--proxy | http proxy to use (socks5:// or http:// | gau --proxy http://proxy.example.com:8080 |
--retries | retries for HTTP client | gau --retries 10 |
--timeout | timeout (in seconds) for HTTP client | gau --timeout 60 |
--subs | include subdomains of target domain | gau example.com --subs |
--threads | number of workers to spawn | gau example.com --threads |
--to | fetch urls to date (format: YYYYMM) | gau example.com --to 202101 |
--verbose | show verbose output | gau --verbose example.com |
--version | show gau version | gau --version |
Configuration Files
gau automatically looks for a configuration file at $HOME/.gau.toml
or%USERPROFILE%\.gau.toml
. You can specify options and they will be used for every subsequent run of gau. Any options provided via command line flags will override options set in the configuration file.
An example configuration file can be found here
Installation:
From source:
$ go install github.com/lc/gau/v2/cmd/gau@latest
From github :
git clone https://github.com/lc/gau.git; \
cd gau/cmd; \
go build; \
sudo mv gau /usr/local/bin/; \
gau --version;
From binary:
You can download the pre-built binaries from the releases page and then move them into your $PATH.
$ tar xvf gau_2.0.6_linux_amd64.tar.gz
$ mv gau /usr/bin/gau
From Docker:
You can run gau via docker like so:
docker run --rm sxcurity/gau:latest --help
You can also build a docker image with the following command
docker build -t gau .
and then run it
docker run gau example.com
Bear in mind that piping command (echo "example.com" | gau) will not work with the docker container
ohmyzsh note:
ohmyzsh's git plugin has an alias which maps gau
to the git add --update
command. This is problematic, causing a binary conflict between this tool "gau" and the zsh plugin alias "gau" (git add --update
). There is currently a few workarounds which can be found in this Github issue.
Useful?
Top Related Projects
Fetch all the URLs that the Wayback Machine knows about for a domain
A next-generation crawling and spidering framework.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Gospider - Fast web spider written in Go
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot