Top Related Projects
A next-generation crawling and spidering framework.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Fetch all the URLs that the Wayback Machine knows about for a domain
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
Fast web fuzzer written in Go
Directory/File, DNS and VHost busting tool written in Go
Quick Overview
The gospider
project is a fast and efficient web crawler written in Go. It is designed to quickly discover and extract information from web pages, making it a useful tool for security researchers, web developers, and data analysts.
Pros
- Fast and Efficient:
gospider
is built using the Go programming language, which is known for its speed and concurrency capabilities, allowing it to crawl web pages quickly and efficiently. - Customizable: The project provides a wide range of configuration options, allowing users to tailor the crawling process to their specific needs, such as setting depth limits, filtering URLs, and more.
- Robust:
gospider
is designed to handle a variety of web page structures and can extract data from both HTML and JSON-based content. - Extensible: The project's modular design makes it easy to extend with custom functionality, such as additional data extraction or processing capabilities.
Cons
- Limited Functionality: While
gospider
is a powerful web crawler, it may not provide all the features and functionality that some users might require, such as advanced data analysis or visualization tools. - Steep Learning Curve: Configuring and using
gospider
may require a certain level of technical expertise, which could be a barrier for some users. - Potential for Abuse: Like any web crawler,
gospider
could potentially be used for malicious purposes, such as scraping data without permission or overwhelming web servers with excessive requests. - Dependency on Go: The project is written in Go, which may not be the preferred language for all users, and may require additional setup and configuration for those not familiar with the language.
Code Examples
// Example 1: Basic web crawling
package main
import (
"fmt"
"github.com/jaeles-project/gospider"
)
func main() {
spider := gospider.New()
spider.Start("https://example.com")
for result := range spider.Results {
fmt.Println(result.URL)
}
}
This code demonstrates the basic usage of gospider
to crawl the website https://example.com
and print the URLs of the discovered pages.
// Example 2: Customizing the crawling process
package main
import (
"github.com/jaeles-project/gospider"
)
func main() {
spider := gospider.New()
spider.Depth = 2
spider.Concurrency = 10
spider.Filters = []string{"*.jpg", "*.png"}
spider.Start("https://example.com")
for result := range spider.Results {
// Process the crawled data
println(result.URL)
}
}
This example shows how to customize the gospider
configuration, such as setting the crawling depth, concurrency level, and URL filters.
// Example 3: Extracting data from web pages
package main
import (
"fmt"
"github.com/jaeles-project/gospider"
)
func main() {
spider := gospider.New()
spider.Extractor = func(result *gospider.Result) {
fmt.Println("Title:", result.Title)
fmt.Println("Description:", result.Description)
}
spider.Start("https://example.com")
}
This code demonstrates how to use the gospider
extractor functionality to extract specific data, such as the title and description, from the crawled web pages.
Getting Started
To get started with gospider
, follow these steps:
-
Install Go on your system if you haven't already. You can download it from the official Go website: https://golang.org/dl/.
-
Install the
gospider
package using the Go package manager:go get -u github.com/jaeles-project/gospider
-
Create a new Go file (e.g.,
main.go
) and import thegospider
package:package main import ( "fmt" "github.com/jaeles-project/gospider" )
-
Initialize a new
gospider
instance and start the crawling process
Competitor Comparisons
A next-generation crawling and spidering framework.
Pros of Katana
- More advanced crawling capabilities, including JavaScript rendering and dynamic content extraction
- Better performance and scalability for large-scale web crawling tasks
- Extensive configuration options and customizable output formats
Cons of Katana
- Steeper learning curve due to more complex configuration options
- Potentially higher resource consumption for advanced features
Code Comparison
GoSpider:
crawler := gospider.NewCrawler(
gospider.WithConcurrency(10),
gospider.WithDepth(3),
gospider.WithIgnoreRobotsTxt(true),
)
Katana:
crawler, err := katana.New(
katana.WithConcurrency(10),
katana.WithMaxDepth(3),
katana.WithJSRendering(true),
katana.WithCustomHeaders(map[string]string{"User-Agent": "Katana"}),
)
Both tools are web crawlers written in Go, but Katana offers more advanced features and configuration options. GoSpider is simpler to use and may be sufficient for basic crawling tasks, while Katana is better suited for complex, large-scale web crawling projects that require JavaScript rendering and dynamic content extraction. The code comparison shows that Katana provides more granular control over the crawling process, including JavaScript rendering and custom headers, which are not available in the GoSpider example.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Pros of hakrawler
- Lightweight and fast, with minimal dependencies
- Supports custom headers and cookies for authenticated crawling
- Offers flexible output options (JSON, plain text, etc.)
Cons of hakrawler
- Less feature-rich compared to gospider
- Limited configuration options for crawl depth and scope
- May miss some dynamic content that gospider can detect
Code Comparison
hakrawler:
func crawl(url string, depth int, c *colly.Collector) {
c.Visit(url)
}
gospider:
func crawl(url string, depth int, c *colly.Collector) {
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
c.Visit(e.Request.AbsoluteURL(link))
})
c.Visit(url)
}
The code snippets show that gospider implements more advanced crawling logic, including recursive link following, while hakrawler's approach is simpler.
Both tools are useful for web crawling and reconnaissance, but gospider offers more advanced features and customization options. hakrawler excels in simplicity and speed, making it suitable for quick scans. The choice between the two depends on the specific requirements of the task at hand, with gospider being more suitable for comprehensive scans and hakrawler for rapid initial reconnaissance.
Fetch all the URLs that the Wayback Machine knows about for a domain
Pros of waybackurls
- Lightweight and focused on a single task: fetching URLs from the Wayback Machine
- Simple to use with minimal configuration required
- Can be easily integrated into other tools or scripts
Cons of waybackurls
- Limited functionality compared to gospider's broader feature set
- Doesn't perform active crawling or spidering of websites
- Lacks advanced filtering options for retrieved URLs
Code comparison
waybackurls:
func getWaybackURLs(domain string, results chan<- string) {
resp, err := http.Get(fmt.Sprintf("http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey", domain))
if err != nil {
return
}
defer resp.Body.Close()
// ... (processing and sending results)
}
gospider:
func (s *Spider) Start() error {
for _, site := range s.C.Sites {
go func(site string) {
s.crawl(site)
}(site)
}
s.wait()
return nil
}
The code snippets highlight the different approaches: waybackurls focuses on retrieving URLs from the Wayback Machine, while gospider implements a more complex crawling mechanism. gospider offers broader functionality for web crawling and information gathering, whereas waybackurls is a specialized tool for accessing historical URL data.
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
Pros of gau
- Faster execution due to its focus on URL discovery
- Simpler to use with fewer configuration options
- Integrates well with other tools in a pipeline
Cons of gau
- Less comprehensive crawling capabilities
- Fewer customization options for specific use cases
- Limited built-in filtering options
Code Comparison
gau:
func main() {
urls := make(chan string)
var wg sync.WaitGroup
for i := 0; i < *threads; i++ {
wg.Add(1)
go func() {
for url := range urls {
process(url)
}
wg.Done()
}()
}
}
gospider:
func main() {
crawler := spider.New(opts)
crawler.Start()
for _, result := range crawler.Results {
fmt.Println(result)
}
}
gau focuses on URL discovery and uses a simple goroutine-based approach for processing URLs. gospider, on the other hand, provides a more comprehensive crawling solution with a dedicated crawler object and additional features.
Both tools serve different purposes within web crawling and URL discovery. gau is better suited for quick URL enumeration, while gospider offers more advanced crawling capabilities and customization options. The choice between them depends on the specific requirements of the task at hand.
Fast web fuzzer written in Go
Pros of ffuf
- Faster performance for fuzzing tasks
- More flexible configuration options
- Supports multiple output formats (JSON, CSV, etc.)
Cons of ffuf
- Limited to fuzzing and doesn't offer web crawling capabilities
- Requires more manual setup for complex scanning scenarios
Code Comparison
ffuf:
func main() {
flag.Parse()
if err := ffuf.New().Run(); err != nil {
fmt.Printf("\n[ERR] %s\n", err)
os.Exit(1)
}
}
gospider:
func main() {
flag.Parse()
core.Banner()
if err := core.Run(); err != nil {
log.Fatal(err)
}
}
Summary
ffuf is a fast web fuzzer focused on performance and flexibility, while gospider is a more comprehensive web spider and crawler. ffuf excels in targeted fuzzing tasks with various configuration options, but lacks the broader web crawling capabilities of gospider. gospider offers a more all-in-one solution for web reconnaissance but may not match ffuf's speed in specific fuzzing scenarios. The choice between the two depends on the specific requirements of the task at hand.
Directory/File, DNS and VHost busting tool written in Go
Pros of gobuster
- More focused on directory and DNS enumeration
- Supports multiple wordlists and file extensions
- Offers wildcard detection to reduce false positives
Cons of gobuster
- Less versatile in terms of web crawling capabilities
- Limited output formats compared to gospider
- Lacks some advanced features like JavaScript parsing
Code comparison
gospider:
func (s *Spider) Start() error {
for _, site := range s.C.Sites {
go func(url string) {
s.crawl(url)
}(site)
}
return nil
}
gobuster:
func (d *DNSBuster) Run(ctx context.Context) error {
d.resultChan = make(chan Result)
d.errorChan = make(chan error)
d.wildcardChan = make(chan string)
return d.process(ctx)
}
Key differences
- gospider is designed for broader web crawling and information gathering
- gobuster focuses on specific enumeration tasks (directory, DNS, vhost)
- gospider offers more extensive output options and data extraction
- gobuster provides better control over enumeration parameters
Both tools are valuable for different aspects of web reconnaissance and penetration testing. gospider excels in comprehensive crawling and data extraction, while gobuster is more specialized for targeted enumeration tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
GoSpider
GoSpider - Fast web spider written in Go
Painless integrate Gospider into your recon workflow?
this project was part of Osmedeus Engine. Check out how it was integrated at @OsmedeusEngine
Installation
GO install
GO111MODULE=on go install github.com/jaeles-project/gospider@latest
Docker
# Clone the repo
git clone https://github.com/jaeles-project/gospider.git
# Build the contianer
docker build -t gospider:latest gospider
# Run the container
docker run -t gospider -h
Features
- Fast web crawling
- Brute force and parse sitemap.xml
- Parse robots.txt
- Generate and verify link from JavaScript files
- Link Finder
- Find AWS-S3 from response source
- Find subdomains from response source
- Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault
- Format output easy to Grep
- Support Burp input
- Crawl multiple sites in parallel
- Random mobile/web User-Agent
Showcases
Usage
Fast web spider written in Go - v1.1.5 by @thebl4ckturtle & @j3ssiejjj
Usage:
gospider [flags]
Flags:
-s, --site string Site to crawl
-S, --sites string Site list to crawl
-p, --proxy string Proxy (Ex: http://127.0.0.1:8080)
-o, --output string Output folder
-u, --user-agent string User Agent to use
web: random web user-agent
mobi: random mobile user-agent
or you can set your special user-agent (default "web")
--cookie string Cookie to use (testA=a; testB=b)
-H, --header stringArray Header to use (Use multiple flag to set multiple header)
--burp string Load headers and cookie from burp raw http request
--blacklist string Blacklist URL Regex
--whitelist string Whitelist URL Regex
--whitelist-domain string Whitelist Domain
-t, --threads int Number of threads (Run sites in parallel) (default 1)
-c, --concurrent int The number of the maximum allowed concurrent requests of the matching domains (default 5)
-d, --depth int MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
-k, --delay int Delay is the duration to wait before creating a new request to the matching domains (second)
-K, --random-delay int RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
-m, --timeout int Request timeout (second) (default 10)
-B, --base Disable all and only use HTML content
--js Enable linkfinder in javascript file (default true)
--subs Include subdomains
--sitemap Try to crawl sitemap.xml
--robots Try to crawl robots.txt (default true)
-a, --other-source Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
-w, --include-subs Include subdomains crawled from 3rd party. Default is main domain
-r, --include-other-source Also include other-source's urls (still crawl and request)
--debug Turn on debug mode
--json Enable JSON output
-v, --verbose Turn on verbose
-l, --length Turn on length
-L, --filter-length Turn on length filter
-R, --raw Turn on raw
-q, --quiet Suppress all the output and only show URL
--no-redirect Disable redirect
--version Check version
-h, --help help for gospider
Example commands
Quite output
gospider -q -s "https://google.com/"
Run with single site
gospider -s "https://google.com/" -o output -c 10 -d 1
Run with site list
gospider -S sites.txt -o output -c 10 -d 1
Run with 20 sites at the same time with 10 bot each site
gospider -S sites.txt -o output -c 10 -d 1 -t 20
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs
Use custom header/cookies
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt
Blacklist url/file extension.
P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico)
as default
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"
Show and Blacklist file length.
gospider -s "https://google.com/" -o output -c 10 -d 1 --length --filter-length "6871,24432"
License
Gospider
is made with ⥠by @j3ssiejjj
& @thebl4ckturtle and it is released under the MIT license.
Donation
Top Related Projects
A next-generation crawling and spidering framework.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Fetch all the URLs that the Wayback Machine knows about for a domain
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
Fast web fuzzer written in Go
Directory/File, DNS and VHost busting tool written in Go
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot