TypeScript:
Parsing HTML
How to:
To get started, install a library like node-html-parser
. Here’s the terminal command:
npm install node-html-parser
Now, let’s parse some basic HTML in TypeScript:
import { parse } from 'node-html-parser';
const html = `<ul class="fruits">
<li>Apple</li>
<li>Banana</li>
</ul>`;
const root = parse(html);
console.log(root.querySelector('.fruits').textContent); // "Apple Banana"
And if you want to grab just the bananas:
const bananas = root.querySelectorAll('li')[1].textContent;
console.log(bananas); // "Banana"
Deep Dive
Parsing HTML isn’t new—it’s been around since the web’s early days. Initially, developers might have used regular expressions, but that got messy fast. Enter the DOM Parser: stable, but browser-bound.
Libraries like node-html-parser
abstract the pain away. They let you query HTML like you would with jQuery, but server-side with Node.js. It’s fast, tolerant to dirty HTML, and DOM-friendly.
There’s also jsdom
, simulating a whole browser environment. It’s heavier but more thorough, creating a full-blown Document Object Model (DOM) for manipulation and interaction.
Let’s not forget Cheerio, either. It blends speed with a jQuery-like syntax and smaller footprint, sitting happily between the two.
See Also
If you’re thirsty for more, dip into these: