Convert Figma logo to code with AI

inikulin logoparse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

3,636
232
3,636
29

Top Related Projects

20,377

A JavaScript implementation of various web standards, for use with Node.js

28,388

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

The fast & forgiving HTML and XML parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.

A standalone version of the readability lib

Quick Overview

Parse5 is a powerful HTML parsing and serialization library for Node.js. It fully implements the HTML5 specification and provides a simple API for parsing HTML documents and fragments, as well as serializing DOM trees back to HTML.

Pros

  • Fully compliant with the HTML5 specification
  • High performance and memory efficiency
  • Supports both Node.js and browser environments
  • Provides a rich API for DOM manipulation and traversal

Cons

  • Steeper learning curve compared to simpler HTML parsers
  • Larger bundle size than some alternatives
  • May be overkill for simple HTML parsing tasks
  • Limited built-in support for XML parsing

Code Examples

Parsing an HTML document:

const parse5 = require('parse5');

const document = parse5.parse('<html><body><h1>Hello, World!</h1></body></html>');
console.log(document.childNodes[0].tagName); // Output: html

Serializing a DOM tree:

const parse5 = require('parse5');

const document = parse5.parse('<html><body><h1>Hello, World!</h1></body></html>');
const serializedHTML = parse5.serialize(document);
console.log(serializedHTML);
// Output: <html><head></head><body><h1>Hello, World!</h1></body></html>

Parsing an HTML fragment:

const parse5 = require('parse5');

const fragment = parse5.parseFragment('<p>This is a <b>fragment</b>.</p>');
console.log(fragment.childNodes[0].tagName); // Output: p

Getting Started

To use Parse5 in your Node.js project, follow these steps:

  1. Install Parse5 using npm:

    npm install parse5
    
  2. Import Parse5 in your JavaScript file:

    const parse5 = require('parse5');
    
  3. Parse an HTML document:

    const document = parse5.parse('<html><body><h1>Hello, Parse5!</h1></body></html>');
    
  4. Manipulate the DOM or serialize it back to HTML:

    const serializedHTML = parse5.serialize(document);
    console.log(serializedHTML);
    

Competitor Comparisons

20,377

A JavaScript implementation of various web standards, for use with Node.js

Pros of jsdom

  • Provides a full DOM implementation, including JavaScript execution
  • Simulates a browser-like environment for testing and scraping
  • Supports a wider range of web APIs and features

Cons of jsdom

  • Heavier and slower than parse5 due to its comprehensive feature set
  • More complex setup and configuration required
  • May be overkill for simple HTML parsing tasks

Code Comparison

jsdom:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent);

parse5:

const parse5 = require("parse5");
const document = parse5.parse("<p>Hello world</p>");
console.log(document.childNodes[0].childNodes[1].childNodes[0].value);

Key Differences

  • jsdom provides a full DOM environment, while parse5 focuses on HTML parsing
  • parse5 is lighter and faster for basic HTML parsing tasks
  • jsdom offers more extensive features for web scraping and testing
  • parse5 has a simpler API for parsing HTML, while jsdom requires more setup
  • jsdom is better suited for complex web applications, while parse5 is ideal for simple HTML manipulation
28,388

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

Pros of Cheerio

  • Lightweight and fast, with a smaller footprint than Parse5
  • jQuery-like syntax, making it familiar for many developers
  • Extensive API for DOM manipulation and traversal

Cons of Cheerio

  • Not a full HTML5 parser, may struggle with complex or malformed HTML
  • Lacks support for some modern web features and standards
  • Cannot handle JavaScript or render dynamic content

Code Comparison

Cheerio:

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

console.log($.html());

Parse5:

const parse5 = require('parse5');
const document = parse5.parse('<h2 class="title">Hello world</h2>');

const h2 = document.childNodes[0].childNodes[1];
h2.childNodes[0].value = 'Hello there!';
h2.attrs.push({ name: 'class', value: 'title welcome' });

console.log(parse5.serialize(document));

Both libraries serve different purposes. Cheerio is ideal for simple HTML parsing and manipulation with a jQuery-like API, while Parse5 offers a more comprehensive HTML5 parsing solution with better standards compliance. Choose based on your specific project requirements and familiarity with the APIs.

The fast & forgiving HTML and XML parser

Pros of htmlparser2

  • Faster parsing speed, especially for large HTML documents
  • More lightweight and has fewer dependencies
  • Supports streaming, allowing for parsing of partial chunks of HTML

Cons of htmlparser2

  • Less strict adherence to HTML5 parsing specification
  • Limited built-in DOM manipulation capabilities
  • May not handle some complex or malformed HTML structures as well as parse5

Code Comparison

parse5:

const parse5 = require('parse5');
const document = parse5.parse('<html><body><div>Hello, world!</div></body></html>');
console.log(document.childNodes[0].tagName); // 'html'

htmlparser2:

const htmlparser2 = require('htmlparser2');
const parser = new htmlparser2.Parser({
  onopentag: (name, attributes) => { console.log(`Open tag: ${name}`); },
  ontext: (text) => { console.log(`Text: ${text}`); }
});
parser.write('<html><body><div>Hello, world!</div></body></html>');
parser.end();

Both libraries are popular choices for HTML parsing in Node.js, with parse5 focusing on strict HTML5 compliance and htmlparser2 prioritizing speed and flexibility. The choice between them depends on specific project requirements, such as parsing accuracy, performance needs, and desired features like streaming or DOM manipulation.

A very fast HTML parser, generating a simplified DOM, with basic element query support.

Pros of node-html-parser

  • Lightweight and fast, with minimal dependencies
  • Simple API, easy to use for basic HTML parsing tasks
  • Supports both Node.js and browser environments

Cons of node-html-parser

  • Less comprehensive HTML5 parsing capabilities
  • May not handle complex or malformed HTML as well as parse5
  • Smaller community and fewer updates compared to parse5

Code Comparison

node-html-parser:

const { parse } = require('node-html-parser');
const root = parse('<ul id="list"><li>Hello World</li></ul>');
console.log(root.querySelector('#list').innerHTML);

parse5:

const parse5 = require('parse5');
const document = parse5.parse('<ul id="list"><li>Hello World</li></ul>');
const serialized = parse5.serialize(document);
console.log(serialized);

Summary

node-html-parser is a lightweight and simple HTML parsing library, suitable for basic parsing tasks and environments where minimal dependencies are preferred. It offers a straightforward API and works in both Node.js and browsers. However, it may not be as robust as parse5 for handling complex HTML structures or strictly adhering to HTML5 specifications. parse5, on the other hand, provides more comprehensive HTML5 parsing capabilities and has a larger community, but may be overkill for simpler use cases.

A standalone version of the readability lib

Pros of Readability

  • Focuses specifically on extracting readable content from web pages
  • Includes algorithms for content scoring and cleaning
  • Widely used and battle-tested in production environments

Cons of Readability

  • Limited to content extraction, not a full HTML parser
  • May struggle with complex or non-standard page layouts
  • Less actively maintained compared to Parse5

Code Comparison

Parse5:

const parse5 = require('parse5');
const document = parse5.parse('<html><body><div>Hello, world!</div></body></html>');
console.log(document.childNodes[0].tagName); // 'html'

Readability:

const { Readability } = require('@mozilla/readability');
const jsdom = require('jsdom');
const doc = new jsdom.JSDOM(htmlString).window.document;
const reader = new Readability(doc);
const article = reader.parse();
console.log(article.textContent);

Parse5 is a full HTML parser that creates a complete DOM tree, while Readability focuses on extracting and cleaning the main content from a web page. Parse5 is more suitable for general HTML parsing tasks, whereas Readability is specialized for content extraction and readability improvements.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

Build Status NPM Version Downloads Downloads total Coverage

parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular, Lit, Cheerio, rehype and many more.


List of parse5 toolset packages

Online playground

Changelog

NPM DownloadsLast 30 Days