Top Related Projects
A Chrome DevTools Protocol driver for web automation and scraping.
A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
Elegant Scraper and Crawler Framework for Golang
A little like that j-thing, only in Go.
Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Selenium/Webdriver client for Go
Quick Overview
The headzoo/surf
project is a web browser automation library for the Go programming language. It provides a high-level API for interacting with web pages, allowing developers to automate tasks such as web scraping, form filling, and navigation.
Pros
- Powerful Automation: The library offers a comprehensive set of features for automating web interactions, making it a valuable tool for tasks like web scraping, testing, and data extraction.
- Cross-Browser Compatibility:
headzoo/surf
supports multiple web browsers, including Chrome, Firefox, and Safari, allowing for cross-browser testing and compatibility. - Ease of Use: The library's API is designed to be intuitive and easy to use, with a focus on simplifying common web automation tasks.
- Active Development: The project is actively maintained, with regular updates and bug fixes, ensuring its continued relevance and reliability.
Cons
- Limited Browser Support: While the library supports multiple browsers, it may not work with all versions or configurations, potentially limiting its usefulness in certain scenarios.
- Performance Overhead: Automating web interactions can be resource-intensive, and the library may introduce some performance overhead, especially for large-scale or complex tasks.
- Dependency on External Libraries:
headzoo/surf
relies on several external libraries, which can increase the complexity of the project setup and maintenance. - Lack of Detailed Documentation: The project's documentation, while generally helpful, could be more comprehensive, especially for advanced use cases or edge cases.
Code Examples
Here are a few examples of how to use the headzoo/surf
library:
- Navigating to a Web Page and Extracting Text:
package main
import (
"fmt"
"github.com/headzoo/surf"
)
func main() {
// Create a new web browser instance
bow := surf.NewBrowser()
// Navigate to a web page
err := bow.Open("https://www.example.com")
if err != nil {
panic(err)
}
// Extract the page title
title := bow.Title()
fmt.Println("Page Title:", title)
// Extract the page body text
body := bow.Body()
fmt.Println("Page Body:", body)
}
- Filling a Form and Submitting:
package main
import (
"github.com/headzoo/surf"
)
func main() {
// Create a new web browser instance
bow := surf.NewBrowser()
// Navigate to a web page with a form
err := bow.Open("https://www.example.com/form")
if err != nil {
panic(err)
}
// Fill in the form fields
bow.Form().Input("name", "John Doe")
bow.Form().Input("email", "john.doe@example.com")
// Submit the form
err = bow.Submit()
if err != nil {
panic(err)
}
}
- Scraping Data from a Web Page:
package main
import (
"fmt"
"github.com/headzoo/surf"
"github.com/PuerkitoBio/goquery"
)
func main() {
// Create a new web browser instance
bow := surf.NewBrowser()
// Navigate to a web page
err := bow.Open("https://www.example.com/products")
if err != nil {
panic(err)
}
// Use the goquery library to parse the HTML and extract data
doc, err := goquery.NewDocumentFromReader(bow.Reader())
if err != nil {
panic(err)
}
// Extract product names and prices
doc.Find(".product").Each(func(i int, s *goquery.Selection) {
name := s.Find(".name").Text()
price := s.Find(".price").Text()
fmt.Printf("Product: %s, Price: %s\n", name, price)
})
}
Getting Started
To get started with the headzoo/surf
library, follow these steps
Competitor Comparisons
A Chrome DevTools Protocol driver for web automation and scraping.
Pros of Rod
- Rod is a high-level, user-friendly web automation library that provides a simple and intuitive API for interacting with web pages.
- Rod offers a wide range of features, including support for headless and non-headless browsers, automatic retries, and built-in support for common web tasks like form filling, clicking, and scraping.
- Rod is highly performant and efficient, with a focus on speed and reliability.
Cons of Rod
- Rod may have a steeper learning curve compared to Surf, especially for developers who are new to web automation.
- Rod's focus on web automation may make it less suitable for general-purpose web development tasks compared to Surf.
- Rod's dependency on the Chromium browser may limit its compatibility with other browsers.
Code Comparison
Surf:
s := surf.NewSession(&surf.Options{
UserAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
})
err := s.Open("https://www.example.com")
if err != nil {
// Handle error
}
title, err := s.Title()
if err != nil {
// Handle error
}
fmt.Println("Page title:", title)
Rod:
browser := rod.New().MustConnect()
defer browser.MustClose()
page := browser.MustPage("https://www.example.com")
defer page.MustClose()
title, err := page.Title()
if err != nil {
// Handle error
}
fmt.Println("Page title:", title)
A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
Pros of chromedp/chromedp
- Supports a wide range of browser actions, including navigation, input, and screenshot capture.
- Provides a high-level API that abstracts away the complexity of interacting with the Chrome DevTools Protocol.
- Offers a flexible and extensible architecture, allowing users to customize and extend the functionality as needed.
Cons of chromedp/chromedp
- Requires the installation and configuration of a Chrome or Chromium browser, which can be a dependency for some users.
- May have a steeper learning curve compared to simpler web automation tools, especially for users new to the Chrome DevTools Protocol.
- Focuses primarily on Chrome/Chromium-based browsers, limiting its applicability to other browser environments.
Code Comparison
Here's a brief code comparison between chromedp/chromedp and headzoo/surf:
chromedp/chromedp (navigating to a website and capturing a screenshot):
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
var buf []byte
err := chromedp.Run(ctx,
chromedp.Navigate("https://www.example.com"),
chromedp.CaptureScreenshot(&buf),
)
if err != nil {
// handle error
}
headzoo/surf (navigating to a website and printing the page title):
browser := surf.NewBrowser()
err := browser.Open("https://www.example.com")
if err != nil {
// handle error
}
fmt.Println(browser.Title())
Elegant Scraper and Crawler Framework for Golang
Pros of Colly
- Colly is a fast and efficient web scraping framework for Go, making it well-suited for large-scale web crawling projects.
- Colly provides a modular and extensible design, allowing developers to easily customize and extend its functionality.
- Colly has a strong focus on performance, with features like parallel request handling and automatic retries.
Cons of Colly
- Colly may have a steeper learning curve compared to Surf, as it requires a deeper understanding of Go and web scraping concepts.
- Colly's documentation, while comprehensive, may not be as beginner-friendly as Surf's.
- Colly may have a more complex setup process, as it requires the installation of additional dependencies.
Code Comparison
Surf (JavaScript):
const surf = require('surf');
surf('https://example.com', (err, $) => {
if (err) {
console.error(err);
return;
}
console.log($('title').text());
});
Colly (Go):
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
c.OnHTML("title", func(e *colly.HTMLElement) {
fmt.Println(e.Text)
})
c.Visit("https://example.com")
}
A little like that j-thing, only in Go.
Pros of goquery
- goquery is a pure Go library, making it a more lightweight and efficient option compared to Surf.
- goquery provides a familiar jQuery-like syntax for querying and manipulating HTML documents, which can be more intuitive for developers already familiar with jQuery.
- goquery is actively maintained and has a larger community, with more contributors and a more extensive documentation.
Cons of goquery
- Surf provides a more comprehensive set of features, including support for cookies, headers, and other advanced web browsing functionality.
- Surf has a more flexible and extensible architecture, allowing for easier customization and integration with other libraries.
- Surf may be a better choice for more complex web scraping tasks that require more advanced features and functionality.
Code Comparison
Surf (headzoo/surf):
browser := surf.NewBrowser()
err := browser.Open("https://example.com")
if err != nil {
// Handle error
}
link, err := browser.Find("a.my-link").Attr("href")
if err != nil {
// Handle error
}
fmt.Println(link)
goquery (PuerkitoBio/goquery):
doc, err := goquery.NewDocument("https://example.com")
if err != nil {
// Handle error
}
link, _ := doc.Find("a.my-link").Attr("href")
fmt.Println(link)
Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Pros of Playwright-Go
- Playwright-Go provides a more comprehensive and feature-rich API for automating web browsers, including support for multiple browsers (Chromium, Firefox, and WebKit).
- The Playwright-Go library is actively maintained and has a larger community of contributors, ensuring regular updates and bug fixes.
- Playwright-Go offers better cross-browser compatibility and can handle more complex web interactions compared to Surf.
Cons of Playwright-Go
- Playwright-Go has a larger footprint and may require more system dependencies, which can make it more challenging to set up and deploy in certain environments.
- The Playwright-Go API may have a steeper learning curve for developers who are more familiar with simpler web automation libraries like Surf.
Code Comparison
Surf (headzoo/surf):
browser := surf.NewBrowser()
err := browser.Open("https://www.example.com")
if err != nil {
// Handle error
}
fmt.Println(browser.Body())
Playwright-Go (playwright-community/playwright-go):
pw, err := playwright.Run()
if err != nil {
// Handle error
}
defer pw.Stop()
browser, err := pw.Chromium.NewBrowser()
if err != nil {
// Handle error
}
defer browser.Close()
page, err := browser.NewPage()
if err != nil {
// Handle error
}
defer page.Close()
_, err = page.Navigate("https://www.example.com")
if err != nil {
// Handle error
}
fmt.Println(page.Content())
Selenium/Webdriver client for Go
Pros of Selenium
- Selenium is a widely-used and well-established library for web automation, with a large community and extensive documentation.
- Selenium supports multiple programming languages, including Python, Java, and C#, making it a versatile choice.
- Selenium can interact with a wide range of web browsers, including Chrome, Firefox, and Safari, providing cross-browser testing capabilities.
Cons of Selenium
- Selenium can be more complex to set up and configure compared to simpler web automation libraries like Surf.
- Selenium may have a steeper learning curve, especially for developers new to web automation.
- Selenium can be more resource-intensive than some alternatives, as it requires managing browser instances and handling network communication.
Code Comparison
Surf:
from surf import Browser
browser = Browser()
browser.go("https://www.example.com")
print(browser.title)
browser.quit()
Selenium:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.example.com")
print(driver.title)
driver.quit()
Both code snippets perform a similar task of navigating to a website, retrieving the page title, and closing the browser instance. The main difference is the library used, with Surf providing a more concise and user-friendly API compared to the more verbose Selenium syntax.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Surf
Surf is a Go (golang) library that implements a virtual web browser that you control programmatically. Surf isn't just another Go solution for downloading content from the web. Surf is designed to behave like web browser, and includes: cookie management, history, bookmarking, user agent spoofing (with a nifty user agent builder), submitting forms, DOM selection and traversal via jQuery style CSS selectors, scraping assets like images, stylesheets, and other features.
Installation
Download the library using go.
go get gopkg.in/headzoo/surf.v1
Import the library into your project.
import "gopkg.in/headzoo/surf.v1"
Quick Start
package main
import (
"gopkg.in/headzoo/surf.v1"
"fmt"
)
func main() {
bow := surf.NewBrowser()
err := bow.Open("http://golang.org")
if err != nil {
panic(err)
}
// Outputs: "The Go Programming Language"
fmt.Println(bow.Title())
}
Documentation
Complete documentation is available on Read the Docs.
Credits
Surf uses the awesome goquery by Martin Angers, and was written using Intellij and the golang plugin.
Contributions have been made to Surf by the following awesome developers:
- Sean Hickey
- Haruyama Seigo
- Tatsushi Demachi
- Charl Matthee
- Matt Holt
- lalyos
- lestrrat
- Carl Henrik Lunde
- cornerot
- lennyxc
- tlianza
- joshuamorris3
- sqs
- nicot
- Joseph Watson
- lxt2
The idea to create Surf was born in this Reddit thread.
Contributing
Issues and pull requests are always welcome.
See CONTRIBUTING.md for more information.
License
Surf is released open source software released under The MIT License (MIT). See LICENSE.md for more information.
Top Related Projects
A Chrome DevTools Protocol driver for web automation and scraping.
A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
Elegant Scraper and Crawler Framework for Golang
A little like that j-thing, only in Go.
Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Selenium/Webdriver client for Go
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot