Parsing HTML

How to:

To get started, install a library like node-html-parser. Here’s the terminal command:

npm install node-html-parser

Now, let’s parse some basic HTML in TypeScript:

import { parse } from 'node-html-parser';

const html = `<ul class="fruits">

const root = parse(html);
console.log(root.querySelector('.fruits').textContent);  // "Apple Banana"

And if you want to grab just the bananas:

const bananas = root.querySelectorAll('li')[1].textContent;
console.log(bananas);  // "Banana"

Deep Dive

Parsing HTML isn’t new—it’s been around since the web’s early days. Initially, developers might have used regular expressions, but that got messy fast. Enter the DOM Parser: stable, but browser-bound.

Libraries like node-html-parser abstract the pain away. They let you query HTML like you would with jQuery, but server-side with Node.js. It’s fast, tolerant to dirty HTML, and DOM-friendly.

There’s also jsdom, simulating a whole browser environment. It’s heavier but more thorough, creating a full-blown Document Object Model (DOM) for manipulation and interaction.

Let’s not forget Cheerio, either. It blends speed with a jQuery-like syntax and smaller footprint, sitting happily between the two.

See Also

If you’re thirsty for more, dip into these: