Convert Figma logo to code with AI

MontFerret logoferret

Declarative web scraping

5,825
305
5,825
52

Top Related Projects

91,008

JavaScript API for Chrome and Firefox

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

32,188

A browser automation framework and ecosystem.

48,561

Fast, easy and reliable testing for anything that runs in a browser.

19,722

Cross-platform automation framework for all kinds of apps, built on top of the W3C WebDriver protocol

Quick Overview

Ferret is a web scraping system that allows users to declaratively describe web scraping tasks using a custom query language called FQL (Ferret Query Language). It combines the power of CSS selectors, XPath, and JavaScript to extract data from both static and dynamic web pages, making it a versatile tool for web data extraction and analysis.

Pros

  • Declarative syntax makes it easy to define complex scraping tasks
  • Supports both static and dynamic web page scraping
  • Integrates well with other tools and databases
  • Provides a CLI for easy execution of scraping tasks

Cons

  • Learning curve for the custom FQL language
  • Limited community support compared to more established scraping tools
  • May require additional setup for handling complex JavaScript-heavy websites
  • Documentation could be more comprehensive for advanced use cases

Code Examples

  1. Basic HTML scraping:
LET doc = DOCUMENT("https://example.com")
RETURN ELEMENT(doc, "h1")

This example fetches the content of the first <h1> tag from the specified URL.

  1. Extracting multiple elements:
LET doc = DOCUMENT("https://example.com/products")
FOR product IN ELEMENTS(doc, ".product")
    RETURN {
        name: ELEMENT(product, ".name"),
        price: ELEMENT(product, ".price")
    }

This code extracts name and price information for all products on a page.

  1. Interacting with dynamic content:
LET page = NAVIGATE("https://example.com/spa")
WAIT_ELEMENT(page, "#dynamic-content")
LET content = ELEMENT(page, "#dynamic-content")
RETURN content

This example navigates to a Single Page Application, waits for a specific element to load, and then returns its content.

Getting Started

  1. Install Ferret:

    go get -u github.com/MontFerret/ferret
    
  2. Create a simple FQL script (e.g., script.fql):

    LET doc = DOCUMENT("https://example.com")
    RETURN ELEMENT(doc, "title")
    
  3. Run the script using the Ferret CLI:

    ferret script.fql
    

This will output the title of the specified webpage.

Competitor Comparisons

91,008

JavaScript API for Chrome and Firefox

Pros of Puppeteer

  • Larger community and ecosystem, with more resources and third-party tools
  • More comprehensive API for browser automation and testing
  • Better integration with JavaScript and Node.js environments

Cons of Puppeteer

  • Steeper learning curve for beginners
  • Heavier resource usage, especially for large-scale scraping tasks
  • Limited to JavaScript/Node.js, which may not suit all project requirements

Code Comparison

Ferret (using FQL):

LET doc = DOCUMENT("https://example.com")
LET title = ELEMENT(doc, "h1")
RETURN title.innerText

Puppeteer:

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.$eval('h1', el => el.innerText);
console.log(title);

Summary

Ferret is a declarative web scraping tool using its own query language (FQL), while Puppeteer is a Node.js library for controlling headless Chrome or Chromium. Puppeteer offers more extensive browser automation capabilities but requires more setup and JavaScript knowledge. Ferret provides a simpler, more focused approach to web scraping with its domain-specific language, which can be easier for certain tasks but may be less flexible for complex automation scenarios.

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

Pros of Playwright

  • Broader browser support (Chromium, Firefox, WebKit)
  • More comprehensive API for modern web automation
  • Stronger community support and active development

Cons of Playwright

  • Steeper learning curve for beginners
  • Requires Node.js environment to run
  • Larger package size and dependencies

Code Comparison

Ferret (Web scraping):

LET doc = DOCUMENT("https://example.com")
LET title = ELEMENT(doc, "h1")
RETURN title.innerText

Playwright (Browser automation):

const browser = await playwright.chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.$eval('h1', el => el.innerText);
console.log(title);

Summary

Ferret is a declarative web scraping language, while Playwright is a comprehensive browser automation tool. Ferret excels in simplicity for basic web scraping tasks, whereas Playwright offers more advanced features for complex web interactions and testing scenarios. The choice between them depends on the specific requirements of your project and your familiarity with their respective ecosystems.

32,188

A browser automation framework and ecosystem.

Pros of Selenium

  • Widely adopted and mature ecosystem with extensive documentation and community support
  • Supports multiple programming languages (Java, Python, C#, Ruby, etc.)
  • Integrates well with various testing frameworks and CI/CD pipelines

Cons of Selenium

  • Can be slower due to browser automation overhead
  • Requires more setup and configuration compared to Ferret
  • May struggle with complex, dynamic web applications

Code Comparison

Selenium (Python):

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://example.com")
element = driver.find_element_by_id("my-element")
element.click()
driver.quit()

Ferret:

LET doc = DOCUMENT("https://example.com")
LET element = ELEMENT(doc, "#my-element")
CLICK(element)
RETURN element.innerText

Ferret offers a more concise and declarative syntax for web scraping and automation tasks, while Selenium provides a more programmatic approach with greater flexibility across multiple programming languages. Ferret's query language is designed specifically for web interactions, making it easier to express complex operations in fewer lines of code. However, Selenium's broader language support and extensive ecosystem make it a more versatile choice for larger, more diverse projects.

48,561

Fast, easy and reliable testing for anything that runs in a browser.

Pros of Cypress

  • More mature and widely adopted project with extensive documentation and community support
  • Built-in test runner with a user-friendly interface for debugging and test execution
  • Automatic waiting and retry mechanisms for handling asynchronous operations

Cons of Cypress

  • Limited to testing web applications in Chrome-based browsers
  • Cannot interact with multiple browser tabs or windows in a single test
  • Slower test execution compared to some other testing frameworks

Code Comparison

Cypress (JavaScript):

describe('Login', () => {
  it('should log in successfully', () => {
    cy.visit('/login')
    cy.get('#username').type('user@example.com')
    cy.get('#password').type('password123')
    cy.get('button[type="submit"]').click()
    cy.url().should('include', '/dashboard')
  })
})

Ferret (ArangoDB Query Language):

LET doc = DOCUMENT("users/john")
UPDATE doc WITH { visits: doc.visits + 1 } IN users
RETURN NEW.visits

While both projects are related to testing and automation, they serve different purposes. Cypress is primarily a web application testing framework, while Ferret is a web scraping and data extraction tool. The code examples reflect these differences, with Cypress focusing on browser interactions and Ferret on data manipulation.

19,722

Cross-platform automation framework for all kinds of apps, built on top of the W3C WebDriver protocol

Pros of Appium

  • Broader support for mobile platforms (iOS, Android) and web browsers
  • Larger community and ecosystem with more resources and plugins
  • More mature project with extensive documentation and enterprise adoption

Cons of Appium

  • Steeper learning curve due to complex setup and configuration
  • Slower test execution compared to native automation tools
  • Requires separate drivers for different platforms, increasing maintenance

Code Comparison

Ferret (declarative web scraping):

LET doc = DOCUMENT("https://example.com")
LET title = ELEMENT(doc, "h1")
RETURN title.innerText

Appium (mobile app testing):

const el = await driver.findElement(By.xpath("//android.widget.TextView[@text='Hello']"));
await el.click();
const text = await el.getText();

While both projects involve automation, Ferret focuses on web scraping with a declarative language, whereas Appium is designed for mobile app testing using programming languages like JavaScript. Ferret's syntax is more concise for web interactions, while Appium provides lower-level control for mobile app testing across multiple platforms.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Ferret

Go Report Status Build Status Discord Chat Discord Chat Ferret release Apache-2.0 License

ferret

Try it! Docs CLI Test runner Web worker

What is it?

ferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more.
ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast.

Read the introductory blog post about Ferret here!

Features

  • Declarative language
  • Support of both static and dynamic web pages
  • Embeddable
  • Extensible

Documentation is available at our website.

Different languages

  • Ferret for python. Pyfer

Contributors