Convert Figma logo to code with AI

hakluke logohakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

4,462
493
4,462
11

Top Related Projects

Fetch all the URLs that the Wayback Machine knows about for a domain

11,039

A next-generation crawling and spidering framework.

Gospider - Fast web spider written in Go

3,923

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

11,191

Incredibly fast crawler designed for OSINT.

Quick Overview

Hakrawler is a fast web crawler designed for easy, quick discovery of endpoints and assets within a web application. It's written in Go and can be used for reconnaissance during web application security assessments or bug bounty hunting.

Pros

  • Fast and efficient, capable of crawling large websites quickly
  • Supports various output formats (JSON, plain text) for easy integration with other tools
  • Can handle JavaScript rendering through Chrome headless browser integration
  • Customizable with options for depth, threads, and domain scope

Cons

  • May miss some dynamically generated content or complex JavaScript-based navigation
  • Can potentially overload target servers if not used carefully
  • Limited built-in filtering options compared to some more comprehensive crawlers
  • Requires manual analysis of results for identifying security issues

Getting Started

  1. Install Go on your system if not already installed.
  2. Install hakrawler:
go install github.com/hakluke/hakrawler@latest
  1. Basic usage:
echo https://example.com | hakrawler
  1. More advanced usage with options:
echo https://example.com | hakrawler -d 3 -t 20 -h "User-Agent: MyCustomCrawler" -insecure

This command crawls https://example.com with a depth of 3, using 20 threads, a custom User-Agent header, and ignoring SSL certificate errors.

Competitor Comparisons

Fetch all the URLs that the Wayback Machine knows about for a domain

Pros of waybackurls

  • Simpler and more focused tool, specifically for fetching URLs from the Wayback Machine
  • Faster execution for its specific task
  • Lightweight with minimal dependencies

Cons of waybackurls

  • Limited functionality compared to hakrawler's broader feature set
  • Lacks the ability to crawl websites directly
  • No built-in filtering or pattern matching capabilities

Code Comparison

waybackurls:

resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", url))
if err != nil {
    return nil, err
}
defer resp.Body.Close()

hakrawler:

c := colly.NewCollector(
    colly.UserAgent("Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"),
    colly.MaxDepth(depth),
    colly.IgnoreRobotsTxt(),
)
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    link := e.Attr("href")
    absoluteURL := e.Request.AbsoluteURL(link)
    // ... (additional processing)
}

Both tools serve different purposes within the realm of URL discovery and web crawling. waybackurls is a focused tool for retrieving historical URLs from the Wayback Machine, while hakrawler is a more comprehensive web crawler with additional features for discovering and analyzing web content.

11,039

A next-generation crawling and spidering framework.

Pros of Katana

  • More advanced crawling capabilities, including JavaScript rendering and form submission
  • Supports multiple output formats (JSON, Markdown, etc.)
  • Actively maintained with frequent updates and new features

Cons of Katana

  • Higher resource consumption due to more complex functionality
  • Steeper learning curve for advanced features
  • May be overkill for simple crawling tasks

Code Comparison

Hakrawler (simple usage):

cat urls.txt | hakrawler

Katana (simple usage):

katana -u https://example.com

Key Differences

  • Hakrawler is lightweight and focused on speed, while Katana offers more comprehensive crawling features
  • Katana provides better support for modern web applications with JavaScript rendering
  • Hakrawler is easier to use for basic tasks, while Katana offers more customization options

Use Cases

  • Hakrawler: Quick reconnaissance and simple crawling tasks
  • Katana: In-depth web application scanning and complex crawling scenarios

Both tools have their merits, and the choice depends on the specific requirements of the task at hand. Hakrawler excels in simplicity and speed, while Katana offers more advanced features for thorough web application analysis.

Gospider - Fast web spider written in Go

Pros of gospider

  • More comprehensive crawling capabilities, including JavaScript rendering
  • Supports multiple output formats (JSON, Markdown, CSV)
  • Includes built-in modules for extracting specific types of information (e.g., subdomains, AWS S3 buckets)

Cons of gospider

  • Potentially slower due to more extensive crawling and JavaScript rendering
  • May require more system resources for larger scans
  • Steeper learning curve due to more advanced features and options

Code comparison

hakrawler:

func crawl(url string, depth int) {
    if depth <= 0 {
        return
    }
    // Crawl logic here
}

gospider:

func (s *Spider) Start() error {
    for _, u := range s.C.URLs {
        s.wg.Add(1)
        go func(url string) {
            defer s.wg.Done()
            s.crawl(url)
        }(u)
    }
    s.wg.Wait()
    return nil
}

Both projects use Go for web crawling, but gospider implements a more complex concurrent crawling mechanism using goroutines and wait groups, while hakrawler uses a simpler recursive approach.

3,923

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Pros of gau

  • Faster execution due to concurrent fetching from multiple sources
  • Supports more data sources, including Wayback Machine, AlienVault OTX, and Common Crawl
  • Provides options for custom output formatting

Cons of gau

  • Less flexible in terms of crawling depth and following links
  • May produce more noise in results due to its broader data sources
  • Lacks built-in filtering options for specific file types or patterns

Code Comparison

hakrawler:

func crawl(url string, depth int, source string) {
    if depth >= *maxDepth {
        return
    }
    // ... (crawling logic)
}

gau:

func getUrls(domain string, providers []string) {
    var wg sync.WaitGroup
    for _, provider := range providers {
        wg.Add(1)
        go func(provider string) {
            defer wg.Done()
            // ... (fetching logic for each provider)
        }(provider)
    }
    wg.Wait()
}

The code snippets highlight the different approaches: hakrawler uses recursive crawling with depth control, while gau focuses on concurrent fetching from multiple providers.

11,191

Incredibly fast crawler designed for OSINT.

Pros of Photon

  • More comprehensive crawling capabilities, including JavaScript parsing and DNS enumeration
  • Supports multiple output formats (JSON, CSV, TXT)
  • Includes additional features like parameter discovery and intelligent error handling

Cons of Photon

  • Slower performance compared to hakrawler, especially for large-scale scans
  • More complex setup and usage, requiring additional dependencies
  • Less frequent updates and maintenance

Code Comparison

Photon:

def photon(url, seeds, level, threads, delay, timeout, cook, headers):
    # ... (initialization code)
    for url in urls:
        # ... (URL processing)
    # ... (output handling)

hakrawler:

func crawl(url string, depth int, source string) {
    // ... (initialization code)
    for _, link := range links {
        // ... (link processing)
    }
    // ... (output handling)
}

Both tools use similar approaches for crawling, but Photon's implementation in Python allows for more flexibility and additional features, while hakrawler's Go implementation focuses on speed and simplicity.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Hakrawler

Fast golang web crawler for gathering URLs and JavaScript file locations. This is basically a simple implementation of the awesome Gocolly library.

Example usages

Single URL:

echo https://google.com | hakrawler

Multiple URLs:

cat urls.txt | hakrawler

Timeout for each line of stdin after 5 seconds:

cat urls.txt | hakrawler -timeout 5

Send all requests through a proxy:

cat urls.txt | hakrawler -proxy http://localhost:8080

Include subdomains:

echo https://google.com | hakrawler -subs

Note: a common issue is that the tool returns no URLs. This usually happens when a domain is specified (https://example.com), but it redirects to a subdomain (https://www.example.com). The subdomain is not included in the scope, so the no URLs are printed. In order to overcome this, either specify the final URL in the redirect chain or use the -subs option to include subdomains.

Example tool chain

Get all subdomains of google, find the ones that respond to http(s), crawl them all.

echo google.com | haktrails subdomains | httpx | hakrawler

Installation

Normal Install

First, you'll need to install go.

Then run this command to download + compile hakrawler:

go install github.com/hakluke/hakrawler@latest

You can now run ~/go/bin/hakrawler. If you'd like to just run hakrawler without the full path, you'll need to export PATH="~/go/bin/:$PATH". You can also add this line to your ~/.bashrc file if you'd like this to persist.

Docker Install (from dockerhub)

echo https://www.google.com | docker run --rm -i hakluke/hakrawler:v2 -subs

Local Docker Install

It's much easier to use the dockerhub method above, but if you'd prefer to run it locally:

git clone https://github.com/hakluke/hakrawler
cd hakrawler
sudo docker build -t hakluke/hakrawler .
sudo docker run --rm -i hakluke/hakrawler --help

Kali Linux: Using apt

Note: This will install an older version of hakrawler without all the features, and it may be buggy. I recommend using one of the other methods.

sudo apt install hakrawler

Then, to run hakrawler:

echo https://www.google.com | docker run --rm -i hakluke/hakrawler -subs

Command-line options

Usage of hakrawler:
  -d int
    	Depth to crawl. (default 2)
  -dr
    	Disable following HTTP redirects.
  -h string
    	Custom headers separated by two semi-colons. E.g. -h "Cookie: foo=bar;;Referer: http://example.com/"
  -i	Only crawl inside path
  -insecure
    	Disable TLS verification.
  -json
    	Output as JSON.
  -proxy string
    	Proxy URL. E.g. -proxy http://127.0.0.1:8080
  -s	Show the source of URL based on where it was found. E.g. href, form, script, etc.
  -size int
    	Page size limit, in KB. (default -1)
  -subs
    	Include subdomains for crawling.
  -t int
    	Number of threads to utilise. (default 8)
  -timeout int
    	Maximum time to crawl each URL from stdin, in seconds. (default -1)
  -u	Show only unique urls.
  -w	Show at which link the URL is found.