Convert Figma logo to code with AI

jaeles-project logogospider

Gospider - Fast web spider written in Go

2,522
306
2,522
58

Top Related Projects

10,808

A next-generation crawling and spidering framework.

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Fetch all the URLs that the Wayback Machine knows about for a domain

3,859

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

12,220

Fast web fuzzer written in Go

Directory/File, DNS and VHost busting tool written in Go

Quick Overview

The gospider project is a fast and efficient web crawler written in Go. It is designed to quickly discover and extract information from web pages, making it a useful tool for security researchers, web developers, and data analysts.

Pros

  • Fast and Efficient: gospider is built using the Go programming language, which is known for its speed and concurrency capabilities, allowing it to crawl web pages quickly and efficiently.
  • Customizable: The project provides a wide range of configuration options, allowing users to tailor the crawling process to their specific needs, such as setting depth limits, filtering URLs, and more.
  • Robust: gospider is designed to handle a variety of web page structures and can extract data from both HTML and JSON-based content.
  • Extensible: The project's modular design makes it easy to extend with custom functionality, such as additional data extraction or processing capabilities.

Cons

  • Limited Functionality: While gospider is a powerful web crawler, it may not provide all the features and functionality that some users might require, such as advanced data analysis or visualization tools.
  • Steep Learning Curve: Configuring and using gospider may require a certain level of technical expertise, which could be a barrier for some users.
  • Potential for Abuse: Like any web crawler, gospider could potentially be used for malicious purposes, such as scraping data without permission or overwhelming web servers with excessive requests.
  • Dependency on Go: The project is written in Go, which may not be the preferred language for all users, and may require additional setup and configuration for those not familiar with the language.

Code Examples

// Example 1: Basic web crawling
package main

import (
    "fmt"
    "github.com/jaeles-project/gospider"
)

func main() {
    spider := gospider.New()
    spider.Start("https://example.com")

    for result := range spider.Results {
        fmt.Println(result.URL)
    }
}

This code demonstrates the basic usage of gospider to crawl the website https://example.com and print the URLs of the discovered pages.

// Example 2: Customizing the crawling process
package main

import (
    "github.com/jaeles-project/gospider"
)

func main() {
    spider := gospider.New()
    spider.Depth = 2
    spider.Concurrency = 10
    spider.Filters = []string{"*.jpg", "*.png"}
    spider.Start("https://example.com")

    for result := range spider.Results {
        // Process the crawled data
        println(result.URL)
    }
}

This example shows how to customize the gospider configuration, such as setting the crawling depth, concurrency level, and URL filters.

// Example 3: Extracting data from web pages
package main

import (
    "fmt"
    "github.com/jaeles-project/gospider"
)

func main() {
    spider := gospider.New()
    spider.Extractor = func(result *gospider.Result) {
        fmt.Println("Title:", result.Title)
        fmt.Println("Description:", result.Description)
    }
    spider.Start("https://example.com")
}

This code demonstrates how to use the gospider extractor functionality to extract specific data, such as the title and description, from the crawled web pages.

Getting Started

To get started with gospider, follow these steps:

  1. Install Go on your system if you haven't already. You can download it from the official Go website: https://golang.org/dl/.

  2. Install the gospider package using the Go package manager:

    go get -u github.com/jaeles-project/gospider
    
  3. Create a new Go file (e.g., main.go) and import the gospider package:

    package main
    
    import (
        "fmt"
        "github.com/jaeles-project/gospider"
    )
    
  4. Initialize a new gospider instance and start the crawling process

Competitor Comparisons

10,808

A next-generation crawling and spidering framework.

Pros of Katana

  • More advanced crawling capabilities, including JavaScript rendering and dynamic content extraction
  • Better performance and scalability for large-scale web crawling tasks
  • Extensive configuration options and customizable output formats

Cons of Katana

  • Steeper learning curve due to more complex configuration options
  • Potentially higher resource consumption for advanced features

Code Comparison

GoSpider:

crawler := gospider.NewCrawler(
    gospider.WithConcurrency(10),
    gospider.WithDepth(3),
    gospider.WithIgnoreRobotsTxt(true),
)

Katana:

crawler, err := katana.New(
    katana.WithConcurrency(10),
    katana.WithMaxDepth(3),
    katana.WithJSRendering(true),
    katana.WithCustomHeaders(map[string]string{"User-Agent": "Katana"}),
)

Both tools are web crawlers written in Go, but Katana offers more advanced features and configuration options. GoSpider is simpler to use and may be sufficient for basic crawling tasks, while Katana is better suited for complex, large-scale web crawling projects that require JavaScript rendering and dynamic content extraction. The code comparison shows that Katana provides more granular control over the crawling process, including JavaScript rendering and custom headers, which are not available in the GoSpider example.

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Pros of hakrawler

  • Lightweight and fast, with minimal dependencies
  • Supports custom headers and cookies for authenticated crawling
  • Offers flexible output options (JSON, plain text, etc.)

Cons of hakrawler

  • Less feature-rich compared to gospider
  • Limited configuration options for crawl depth and scope
  • May miss some dynamic content that gospider can detect

Code Comparison

hakrawler:

func crawl(url string, depth int, c *colly.Collector) {
    c.Visit(url)
}

gospider:

func crawl(url string, depth int, c *colly.Collector) {
    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Attr("href")
        c.Visit(e.Request.AbsoluteURL(link))
    })
    c.Visit(url)
}

The code snippets show that gospider implements more advanced crawling logic, including recursive link following, while hakrawler's approach is simpler.

Both tools are useful for web crawling and reconnaissance, but gospider offers more advanced features and customization options. hakrawler excels in simplicity and speed, making it suitable for quick scans. The choice between the two depends on the specific requirements of the task at hand, with gospider being more suitable for comprehensive scans and hakrawler for rapid initial reconnaissance.

Fetch all the URLs that the Wayback Machine knows about for a domain

Pros of waybackurls

  • Lightweight and focused on a single task: fetching URLs from the Wayback Machine
  • Simple to use with minimal configuration required
  • Can be easily integrated into other tools or scripts

Cons of waybackurls

  • Limited functionality compared to gospider's broader feature set
  • Doesn't perform active crawling or spidering of websites
  • Lacks advanced filtering options for retrieved URLs

Code comparison

waybackurls:

func getWaybackURLs(domain string, results chan<- string) {
    resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))
    if err != nil {
        return
    }
    defer resp.Body.Close()
    // ... (processing and sending results)
}

gospider:

func (s *Spider) Start() error {
    for _, site := range s.C.Sites {
        go func(site string) {
            s.crawl(site)
        }(site)
    }
    s.wait()
    return nil
}

The code snippets highlight the different approaches: waybackurls focuses on retrieving URLs from the Wayback Machine, while gospider implements a more complex crawling mechanism. gospider offers broader functionality for web crawling and information gathering, whereas waybackurls is a specialized tool for accessing historical URL data.

3,859

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.

Pros of gau

  • Faster execution due to its focus on URL discovery
  • Simpler to use with fewer configuration options
  • Integrates well with other tools in a pipeline

Cons of gau

  • Less comprehensive crawling capabilities
  • Fewer customization options for specific use cases
  • Limited built-in filtering options

Code Comparison

gau:

func main() {
    urls := make(chan string)
    var wg sync.WaitGroup
    for i := 0; i < *threads; i++ {
        wg.Add(1)
        go func() {
            for url := range urls {
                process(url)
            }
            wg.Done()
        }()
    }
}

gospider:

func main() {
    crawler := spider.New(opts)
    crawler.Start()
    for _, result := range crawler.Results {
        fmt.Println(result)
    }
}

gau focuses on URL discovery and uses a simple goroutine-based approach for processing URLs. gospider, on the other hand, provides a more comprehensive crawling solution with a dedicated crawler object and additional features.

Both tools serve different purposes within web crawling and URL discovery. gau is better suited for quick URL enumeration, while gospider offers more advanced crawling capabilities and customization options. The choice between them depends on the specific requirements of the task at hand.

12,220

Fast web fuzzer written in Go

Pros of ffuf

  • Faster performance for fuzzing tasks
  • More flexible configuration options
  • Supports multiple output formats (JSON, CSV, etc.)

Cons of ffuf

  • Limited to fuzzing and doesn't offer web crawling capabilities
  • Requires more manual setup for complex scanning scenarios

Code Comparison

ffuf:

func main() {
    flag.Parse()
    if err := ffuf.New().Run(); err != nil {
        fmt.Printf("\n[ERR] %s\n", err)
        os.Exit(1)
    }
}

gospider:

func main() {
    flag.Parse()
    core.Banner()
    if err := core.Run(); err != nil {
        log.Fatal(err)
    }
}

Summary

ffuf is a fast web fuzzer focused on performance and flexibility, while gospider is a more comprehensive web spider and crawler. ffuf excels in targeted fuzzing tasks with various configuration options, but lacks the broader web crawling capabilities of gospider. gospider offers a more all-in-one solution for web reconnaissance but may not match ffuf's speed in specific fuzzing scenarios. The choice between the two depends on the specific requirements of the task at hand.

Directory/File, DNS and VHost busting tool written in Go

Pros of gobuster

  • More focused on directory and DNS enumeration
  • Supports multiple wordlists and file extensions
  • Offers wildcard detection to reduce false positives

Cons of gobuster

  • Less versatile in terms of web crawling capabilities
  • Limited output formats compared to gospider
  • Lacks some advanced features like JavaScript parsing

Code comparison

gospider:

func (s *Spider) Start() error {
    for _, site := range s.C.Sites {
        go func(url string) {
            s.crawl(url)
        }(site)
    }
    return nil
}

gobuster:

func (d *DNSBuster) Run(ctx context.Context) error {
    d.resultChan = make(chan Result)
    d.errorChan = make(chan error)
    d.wildcardChan = make(chan string)
    return d.process(ctx)
}

Key differences

  • gospider is designed for broader web crawling and information gathering
  • gobuster focuses on specific enumeration tasks (directory, DNS, vhost)
  • gospider offers more extensive output options and data extraction
  • gobuster provides better control over enumeration parameters

Both tools are valuable for different aspects of web reconnaissance and penetration testing. gospider excels in comprehensive crawling and data extraction, while gobuster is more specialized for targeted enumeration tasks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GoSpider

GoSpider - Fast web spider written in Go

Painless integrate Gospider into your recon workflow?

OsmedeusEngine

this project was part of Osmedeus Engine. Check out how it was integrated at @OsmedeusEngine

Installation

GO install

GO111MODULE=on go install github.com/jaeles-project/gospider@latest

Docker

# Clone the repo
git clone https://github.com/jaeles-project/gospider.git
# Build the contianer
docker build -t gospider:latest gospider
# Run the container
docker run -t gospider -h

Features

  • Fast web crawling
  • Brute force and parse sitemap.xml
  • Parse robots.txt
  • Generate and verify link from JavaScript files
  • Link Finder
  • Find AWS-S3 from response source
  • Find subdomains from response source
  • Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault
  • Format output easy to Grep
  • Support Burp input
  • Crawl multiple sites in parallel
  • Random mobile/web User-Agent

Showcases

asciicast

Usage

Fast web spider written in Go - v1.1.5 by @thebl4ckturtle & @j3ssiejjj

Usage:
  gospider [flags]

Flags:
  -s, --site string               Site to crawl
  -S, --sites string              Site list to crawl
  -p, --proxy string              Proxy (Ex: http://127.0.0.1:8080)
  -o, --output string             Output folder
  -u, --user-agent string         User Agent to use
                                  	web: random web user-agent
                                  	mobi: random mobile user-agent
                                  	or you can set your special user-agent (default "web")
      --cookie string             Cookie to use (testA=a; testB=b)
  -H, --header stringArray        Header to use (Use multiple flag to set multiple header)
      --burp string               Load headers and cookie from burp raw http request
      --blacklist string          Blacklist URL Regex
      --whitelist string          Whitelist URL Regex
      --whitelist-domain string   Whitelist Domain
  -t, --threads int               Number of threads (Run sites in parallel) (default 1)
  -c, --concurrent int            The number of the maximum allowed concurrent requests of the matching domains (default 5)
  -d, --depth int                 MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
  -k, --delay int                 Delay is the duration to wait before creating a new request to the matching domains (second)
  -K, --random-delay int          RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
  -m, --timeout int               Request timeout (second) (default 10)
  -B, --base                      Disable all and only use HTML content
      --js                        Enable linkfinder in javascript file (default true)
      --subs                      Include subdomains
      --sitemap                   Try to crawl sitemap.xml
      --robots                    Try to crawl robots.txt (default true)
  -a, --other-source              Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
  -w, --include-subs              Include subdomains crawled from 3rd party. Default is main domain
  -r, --include-other-source      Also include other-source's urls (still crawl and request)
      --debug                     Turn on debug mode
      --json                      Enable JSON output
  -v, --verbose                   Turn on verbose
  -l, --length                    Turn on length
  -L, --filter-length             Turn on length filter
  -R, --raw                       Turn on raw
  -q, --quiet                     Suppress all the output and only show URL
      --no-redirect               Disable redirect
      --version                   Check version
  -h, --help                      help for gospider

Example commands

Quite output

gospider -q -s "https://google.com/"

Run with single site

gospider -s "https://google.com/" -o output -c 10 -d 1

Run with site list

gospider -S sites.txt -o output -c 10 -d 1

Run with 20 sites at the same time with 10 bot each site

gospider -S sites.txt -o output -c 10 -d 1 -t 20

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs

Use custom header/cookies

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt

Blacklist url/file extension.

P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico) as default

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"

Show and Blacklist file length.

gospider -s "https://google.com/" -o output -c 10 -d 1 --length --filter-length "6871,24432"   

License

Gospider is made with ♥ by @j3ssiejjj & @thebl4ckturtle and it is released under the MIT license.

Donation

paypal