cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

28,557

1,638

28,557

View on GitHub View on NPM

Top Related Projects

jsdom

20,475

A JavaScript implementation of various web standards, for use with Node.js

puppeteer

88,205

JavaScript API for Chrome and Firefox

axios

105,172

Promise based HTTP client for the browser and node.js

request

25,681

🏊🏾 Simplified HTTP request client.

node-fetch

8,770

A light-weight module that brings the Fetch API to Node.js

Quick Overview

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It parses markup and provides an API for traversing/manipulating the resulting data structure, making it ideal for web scraping and HTML parsing tasks in Node.js environments.

Pros

Lightweight and fast compared to full DOM implementations
Familiar jQuery-like syntax for easy adoption
Works well with static HTML, making it suitable for scraping tasks
No browser or DOM dependencies, making it efficient for server-side use

Cons

Lacks support for JavaScript rendering, limiting its use with dynamic content
Not suitable for tasks requiring full browser simulation
May require additional libraries for more complex web scraping scenarios
Limited functionality compared to full-featured browsers or headless browser solutions

Code Examples

Loading HTML and selecting elements:

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

console.log($.html());
// Output: <h2 class="title welcome">Hello there!</h2>

Traversing and manipulating the DOM:

const cheerio = require('cheerio');
const $ = cheerio.load(`
  <ul id="fruits">
    <li class="apple">Apple</li>
    <li class="orange">Orange</li>
    <li class="pear">Pear</li>
  </ul>
`);

$('.pear').attr('id', 'favorite').html('Peach');
const fruits = $('#fruits > li').map((i, el) => $(el).text()).get();
console.log(fruits); // ['Apple', 'Orange', 'Peach']

Extracting data from a table:

const cheerio = require('cheerio');
const $ = cheerio.load(`
  <table>
    <tr><th>Name</th><th>Age</th></tr>
    <tr><td>John</td><td>30</td></tr>
    <tr><td>Jane</td><td>25</td></tr>
  </table>
`);

const data = $('tr').slice(1).map((i, el) => {
  const tds = $(el).find('td');
  return { name: tds.eq(0).text(), age: parseInt(tds.eq(1).text()) };
}).get();

console.log(data);
// Output: [{ name: 'John', age: 30 }, { name: 'Jane', age: 25 }]

Getting Started

To use Cheerio in your project, first install it via npm:

npm install cheerio

Then, in your JavaScript file:

const cheerio = require('cheerio');

// Load HTML
const $ = cheerio.load('<h1 class="title">Hello, Cheerio!</h1>');

// Manipulate the DOM
$('h1').addClass('welcome').text('Welcome to Cheerio!');

// Output the modified HTML
console.log($.html());

This basic example demonstrates loading HTML, manipulating it, and outputting the result. Cheerio's API closely resembles jQuery, making it intuitive for those familiar with client-side DOM manipulation.

Competitor Comparisons

jsdom

20,475

A JavaScript implementation of various web standards, for use with Node.js

Pros of jsdom

Provides a full DOM implementation, including window and document objects
Supports running client-side JavaScript within the simulated environment
More closely mimics a real browser environment for testing purposes

Cons of jsdom

Slower performance compared to Cheerio due to its more comprehensive implementation
Higher memory usage, which can be a concern for large-scale parsing tasks
More complex setup and configuration required

Code Comparison

jsdom:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent);

Cheerio:

const cheerio = require('cheerio');
const $ = cheerio.load('<p>Hello world</p>');
console.log($('p').text());

Summary

jsdom provides a more complete browser-like environment, making it ideal for complex web application testing and scenarios requiring client-side JavaScript execution. However, this comes at the cost of performance and resource usage. Cheerio, on the other hand, offers a lightweight and fast solution for simple HTML parsing and manipulation tasks, but lacks the ability to execute JavaScript or provide a full DOM environment. The choice between the two depends on the specific requirements of your project, balancing between functionality and performance needs.

puppeteer

88,205

JavaScript API for Chrome and Firefox

Pros of Puppeteer

Full browser automation, including JavaScript execution and interaction
Supports screenshots, PDF generation, and performance analysis
Emulates mobile devices and geolocation

Cons of Puppeteer

Heavier resource usage and slower execution
Requires a complete browser environment
More complex setup and configuration

Code Comparison

Cheerio (HTML parsing):

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
console.log($('.title').text());

Puppeteer (browser automation):

const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  const title = await page.$eval('.title', el => el.textContent);
  console.log(title);
  await browser.close();
})();

Key Differences

Cheerio is a lightweight HTML parser that operates on static HTML, making it fast and efficient for simple scraping tasks. It's ideal for parsing and manipulating HTML without the need for a browser environment.

Puppeteer, on the other hand, provides full browser automation, allowing interaction with dynamic web pages, JavaScript execution, and more complex web scraping scenarios. It's better suited for tasks that require rendering JavaScript or interacting with web applications.

Choose Cheerio for simple, static HTML parsing and manipulation, and Puppeteer for more complex, dynamic web scraping and automation tasks.

axios

105,172

Promise based HTTP client for the browser and node.js

Pros of Axios

Supports both browser and Node.js environments
Built-in request and response interceptors
Automatic request and response transformations

Cons of Axios

Larger bundle size compared to Cheerio
More complex setup for simple HTTP requests
Not specialized for HTML parsing or manipulation

Code Comparison

Axios (making an HTTP request):

axios.get('https://example.com')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.error(error);
  });

Cheerio (parsing HTML):

const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

Key Differences

Axios is primarily an HTTP client for making requests, while Cheerio is focused on parsing and manipulating HTML
Axios works in both browser and Node.js environments, whereas Cheerio is typically used server-side
Cheerio provides a jQuery-like API for DOM manipulation, which Axios does not offer

Use Cases

Use Axios for making HTTP requests, handling API interactions, and working with RESTful services
Choose Cheerio for server-side web scraping, HTML parsing, and DOM manipulation tasks

Both libraries serve different purposes and can be used together in projects that require both HTTP requests and HTML parsing capabilities.

request

25,681

🏊🏾 Simplified HTTP request client.

Pros of Request

Designed for making HTTP requests, offering more flexibility for various types of web interactions
Supports streaming, which can be beneficial for handling large amounts of data
Provides a simpler API for handling complex HTTP scenarios like authentication and redirects

Cons of Request

Larger package size and more dependencies compared to Cheerio
Not specifically optimized for HTML parsing and manipulation
Deprecated and no longer maintained, which may lead to security vulnerabilities

Code Comparison

Request (making an HTTP GET request):

const request = require('request');

request('http://www.example.com', (error, response, body) => {
  console.log('body:', body);
});

Cheerio (parsing HTML):

const cheerio = require('cheerio');

const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
console.log($.html());

Summary

Request is a versatile HTTP client library, while Cheerio is focused on HTML parsing and manipulation. Request offers more flexibility for various web interactions but is no longer maintained. Cheerio is lightweight and actively maintained, making it a better choice for projects primarily focused on HTML parsing and scraping. The choice between the two depends on the specific requirements of your project.

node-fetch

8,770

A light-weight module that brings the Fetch API to Node.js

Pros of node-fetch

Provides a lightweight, Promise-based HTTP client for making network requests
Supports both Node.js and browser environments, offering a consistent API
Implements the Fetch API, making it familiar for developers coming from browser-based JavaScript

Cons of node-fetch

Limited to HTTP requests and responses, lacking DOM parsing capabilities
Requires additional libraries for more complex operations like web scraping
May have a steeper learning curve for developers unfamiliar with Promises or async/await

Code Comparison

node-fetch:

import fetch from 'node-fetch';

const response = await fetch('https://example.com');
const body = await response.text();
console.log(body);

cheerio:

import cheerio from 'cheerio';
import axios from 'axios';

const { data } = await axios.get('https://example.com');
const $ = cheerio.load(data);
console.log($('h1').text());

Key Differences

node-fetch focuses on making HTTP requests and handling responses
cheerio specializes in parsing and manipulating HTML/XML documents
node-fetch is more versatile for general network operations
cheerio excels at web scraping and DOM manipulation tasks

Use Cases

node-fetch is ideal for:

API interactions
Downloading files
Simple data fetching

cheerio is better suited for:

Web scraping
HTML parsing and manipulation
Extracting specific data from web pages

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

ä¸æææ¡£ (Chinese Readme)

import * as cheerio from 'cheerio';
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Installation

npm install cheerio

Features

❤ Proven syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.

❁ Incredibly flexible: Cheerio wraps around parse5 for parsing HTML and can optionally use the forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document. Cheerio works in both browser and server environments.

API

Loading

First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

// ESM or TypeScript:
import * as cheerio from 'cheerio';

// In other environments:
const cheerio = require('cheerio');

const $ = cheerio.load('<ul id="fruits">...</ul>');

$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>

Selectors

Once you've loaded the HTML, you can use jQuery-style selectors to find elements within the document.

$( selector, [context], [root] )

selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root, if provided, is typically the HTML document string.

This selector method is the starting point for traversing and manipulating the document. Like in jQuery, it's the primary method for selecting elements in the document.

$('.apple', '#fruits').text();
//=> Apple

$('ul .pear').attr('class');
//=> pear

$('li[class=orange]').html();
//=> Orange

Rendering

When you're ready to render the document, you can call the html method on the "root" selection:

$.root().html();
//=>  <html>
//      <head></head>
//      <body>
//        <ul id="fruits">
//          <li class="apple">Apple</li>
//          <li class="orange">Orange</li>
//          <li class="pear">Pear</li>
//        </ul>
//      </body>
//    </html>

If you want to render the outerHTML of a selection, you can use the outerHTML prop:

$('.pear').prop('outerHTML');
//=> <li class="pear">Pear</li>

You may also render the text content of a Cheerio object using the text method:

const $ = cheerio.load('This is <em>content</em>.');
$('body').text();
//=> This is content.

The "DOM Node" object

Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:

tagName
parentNode
previousSibling
nextSibling
nodeValue
firstChild
childNodes
lastChild

Screencasts

https://vimeo.com/31950192

This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.