cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
Top Related Projects
A JavaScript implementation of various web standards, for use with Node.js
JavaScript API for Chrome and Firefox
Promise based HTTP client for the browser and node.js
🏊🏾 Simplified HTTP request client.
A light-weight module that brings the Fetch API to Node.js
Quick Overview
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It parses markup and provides an API for traversing/manipulating the resulting data structure, making it ideal for web scraping and HTML parsing tasks in Node.js environments.
Pros
- Lightweight and fast compared to full DOM implementations
- Familiar jQuery-like syntax for easy adoption
- Works well with static HTML, making it suitable for scraping tasks
- No browser or DOM dependencies, making it efficient for server-side use
Cons
- Lacks support for JavaScript rendering, limiting its use with dynamic content
- Not suitable for tasks requiring full browser simulation
- May require additional libraries for more complex web scraping scenarios
- Limited functionality compared to full-featured browsers or headless browser solutions
Code Examples
Loading HTML and selecting elements:
const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
console.log($.html());
// Output: <h2 class="title welcome">Hello there!</h2>
Traversing and manipulating the DOM:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<ul id="fruits">
<li class="apple">Apple</li>
<li class="orange">Orange</li>
<li class="pear">Pear</li>
</ul>
`);
$('.pear').attr('id', 'favorite').html('Peach');
const fruits = $('#fruits > li').map((i, el) => $(el).text()).get();
console.log(fruits); // ['Apple', 'Orange', 'Peach']
Extracting data from a table:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>30</td></tr>
<tr><td>Jane</td><td>25</td></tr>
</table>
`);
const data = $('tr').slice(1).map((i, el) => {
const tds = $(el).find('td');
return { name: tds.eq(0).text(), age: parseInt(tds.eq(1).text()) };
}).get();
console.log(data);
// Output: [{ name: 'John', age: 30 }, { name: 'Jane', age: 25 }]
Getting Started
To use Cheerio in your project, first install it via npm:
npm install cheerio
Then, in your JavaScript file:
const cheerio = require('cheerio');
// Load HTML
const $ = cheerio.load('<h1 class="title">Hello, Cheerio!</h1>');
// Manipulate the DOM
$('h1').addClass('welcome').text('Welcome to Cheerio!');
// Output the modified HTML
console.log($.html());
This basic example demonstrates loading HTML, manipulating it, and outputting the result. Cheerio's API closely resembles jQuery, making it intuitive for those familiar with client-side DOM manipulation.
Competitor Comparisons
A JavaScript implementation of various web standards, for use with Node.js
Pros of jsdom
- Provides a full DOM implementation, including window and document objects
- Supports running client-side JavaScript within the simulated environment
- More closely mimics a real browser environment for testing purposes
Cons of jsdom
- Slower performance compared to Cheerio due to its more comprehensive implementation
- Higher memory usage, which can be a concern for large-scale parsing tasks
- More complex setup and configuration required
Code Comparison
jsdom:
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent);
Cheerio:
const cheerio = require('cheerio');
const $ = cheerio.load('<p>Hello world</p>');
console.log($('p').text());
Summary
jsdom provides a more complete browser-like environment, making it ideal for complex web application testing and scenarios requiring client-side JavaScript execution. However, this comes at the cost of performance and resource usage. Cheerio, on the other hand, offers a lightweight and fast solution for simple HTML parsing and manipulation tasks, but lacks the ability to execute JavaScript or provide a full DOM environment. The choice between the two depends on the specific requirements of your project, balancing between functionality and performance needs.
JavaScript API for Chrome and Firefox
Pros of Puppeteer
- Full browser automation, including JavaScript execution and interaction
- Supports screenshots, PDF generation, and performance analysis
- Emulates mobile devices and geolocation
Cons of Puppeteer
- Heavier resource usage and slower execution
- Requires a complete browser environment
- More complex setup and configuration
Code Comparison
Cheerio (HTML parsing):
const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
console.log($('.title').text());
Puppeteer (browser automation):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.$eval('.title', el => el.textContent);
console.log(title);
await browser.close();
})();
Key Differences
Cheerio is a lightweight HTML parser that operates on static HTML, making it fast and efficient for simple scraping tasks. It's ideal for parsing and manipulating HTML without the need for a browser environment.
Puppeteer, on the other hand, provides full browser automation, allowing interaction with dynamic web pages, JavaScript execution, and more complex web scraping scenarios. It's better suited for tasks that require rendering JavaScript or interacting with web applications.
Choose Cheerio for simple, static HTML parsing and manipulation, and Puppeteer for more complex, dynamic web scraping and automation tasks.
Promise based HTTP client for the browser and node.js
Pros of Axios
- Supports both browser and Node.js environments
- Built-in request and response interceptors
- Automatic request and response transformations
Cons of Axios
- Larger bundle size compared to Cheerio
- More complex setup for simple HTTP requests
- Not specialized for HTML parsing or manipulation
Code Comparison
Axios (making an HTTP request):
axios.get('https://example.com')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
Cheerio (parsing HTML):
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
Key Differences
- Axios is primarily an HTTP client for making requests, while Cheerio is focused on parsing and manipulating HTML
- Axios works in both browser and Node.js environments, whereas Cheerio is typically used server-side
- Cheerio provides a jQuery-like API for DOM manipulation, which Axios does not offer
Use Cases
- Use Axios for making HTTP requests, handling API interactions, and working with RESTful services
- Choose Cheerio for server-side web scraping, HTML parsing, and DOM manipulation tasks
Both libraries serve different purposes and can be used together in projects that require both HTTP requests and HTML parsing capabilities.
🏊🏾 Simplified HTTP request client.
Pros of Request
- Designed for making HTTP requests, offering more flexibility for various types of web interactions
- Supports streaming, which can be beneficial for handling large amounts of data
- Provides a simpler API for handling complex HTTP scenarios like authentication and redirects
Cons of Request
- Larger package size and more dependencies compared to Cheerio
- Not specifically optimized for HTML parsing and manipulation
- Deprecated and no longer maintained, which may lead to security vulnerabilities
Code Comparison
Request (making an HTTP GET request):
const request = require('request');
request('http://www.example.com', (error, response, body) => {
console.log('body:', body);
});
Cheerio (parsing HTML):
const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
console.log($.html());
Summary
Request is a versatile HTTP client library, while Cheerio is focused on HTML parsing and manipulation. Request offers more flexibility for various web interactions but is no longer maintained. Cheerio is lightweight and actively maintained, making it a better choice for projects primarily focused on HTML parsing and scraping. The choice between the two depends on the specific requirements of your project.
A light-weight module that brings the Fetch API to Node.js
Pros of node-fetch
- Provides a lightweight, Promise-based HTTP client for making network requests
- Supports both Node.js and browser environments, offering a consistent API
- Implements the Fetch API, making it familiar for developers coming from browser-based JavaScript
Cons of node-fetch
- Limited to HTTP requests and responses, lacking DOM parsing capabilities
- Requires additional libraries for more complex operations like web scraping
- May have a steeper learning curve for developers unfamiliar with Promises or async/await
Code Comparison
node-fetch:
import fetch from 'node-fetch';
const response = await fetch('https://example.com');
const body = await response.text();
console.log(body);
cheerio:
import cheerio from 'cheerio';
import axios from 'axios';
const { data } = await axios.get('https://example.com');
const $ = cheerio.load(data);
console.log($('h1').text());
Key Differences
- node-fetch focuses on making HTTP requests and handling responses
- cheerio specializes in parsing and manipulating HTML/XML documents
- node-fetch is more versatile for general network operations
- cheerio excels at web scraping and DOM manipulation tasks
Use Cases
node-fetch is ideal for:
- API interactions
- Downloading files
- Simple data fetching
cheerio is better suited for:
- Web scraping
- HTML parsing and manipulation
- Extracting specific data from web pages
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
import * as cheerio from 'cheerio';
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>
Installation
npm install cheerio
Features
❤ Proven syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.
ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.
❁ Incredibly flexible: Cheerio wraps around parse5 for parsing HTML and can optionally use the forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document. Cheerio works in both browser and server environments.
API
Loading
First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.
// ESM or TypeScript:
import * as cheerio from 'cheerio';
// In other environments:
const cheerio = require('cheerio');
const $ = cheerio.load('<ul id="fruits">...</ul>');
$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>
Selectors
Once you've loaded the HTML, you can use jQuery-style selectors to find elements within the document.
$( selector, [context], [root] )
selector
searches within the context
scope which searches within the root
scope. selector
and context
can be a string expression, DOM Element, array
of DOM elements, or cheerio object. root
, if provided, is typically the HTML
document string.
This selector method is the starting point for traversing and manipulating the document. Like in jQuery, it's the primary method for selecting elements in the document.
$('.apple', '#fruits').text();
//=> Apple
$('ul .pear').attr('class');
//=> pear
$('li[class=orange]').html();
//=> Orange
Rendering
When you're ready to render the document, you can call the html
method on the
"root" selection:
$.root().html();
//=> <html>
// <head></head>
// <body>
// <ul id="fruits">
// <li class="apple">Apple</li>
// <li class="orange">Orange</li>
// <li class="pear">Pear</li>
// </ul>
// </body>
// </html>
If you want to render the
outerHTML
of a selection, you can use the outerHTML
prop:
$('.pear').prop('outerHTML');
//=> <li class="pear">Pear</li>
You may also render the text content of a Cheerio object using the text
method:
const $ = cheerio.load('This is <em>content</em>.');
$('body').text();
//=> This is content.
The "DOM Node" object
Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:
tagName
parentNode
previousSibling
nextSibling
nodeValue
firstChild
childNodes
lastChild
Screencasts
This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.
Cheerio in the real world
Are you using cheerio in production? Add it to the wiki!
Sponsors
Does your company use Cheerio in production? Please consider sponsoring this project! Your help will allow maintainers to dedicate more time and resources to its development and support.
Headlining Sponsors
Other Sponsors
Backers
Become a backer to show your support for Cheerio and help us maintain and improve this open source project.
License
MIT
Top Related Projects
A JavaScript implementation of various web standards, for use with Node.js
JavaScript API for Chrome and Firefox
Promise based HTTP client for the browser and node.js
🏊🏾 Simplified HTTP request client.
A light-weight module that brings the Fetch API to Node.js
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot