JavaScript:
Parsing HTML

How to:

Let’s parse HTML using the DOMParser API in JavaScript.

const parser = new DOMParser();
const htmlString = `<p>Hello, world!</p>`;
const doc = parser.parseFromString(htmlString, 'text/html');
console.log(doc.body.textContent); // Output: Hello, world!

Now, let’s grab something more specific, like an element with a class:

const htmlString = `<div><p class="greeting">Hello, again!</p></div>`;
const doc = parser.parseFromString(htmlString, 'text/html');
const greeting = doc.querySelector('.greeting').textContent;
console.log(greeting); // Output: Hello, again!

Deep Dive

Parsing HTML is old as the web. Initially, it was a browser thing—browsers parsed HTML to display web pages. Over time, programmers wanted to tap into this process, leading to APIs like DOMParser.

Alternatives? Sure. We’ve got libraries like jQuery and tools like BeautifulSoup for Python. But JavaScript’s native DOMParser is fast and built-in, no need for extra libraries.

Implementation-wise, when you parse HTML with DOMParser, it creates a Document object. Think of it as a hierarchical model of your HTML. Once you have it, you can navigate and manipulate it just like you would with a normal web page’s DOM.

Here’s the thing—parsing can trip on malformed HTML. Browsers are forgiving, but DOMParser might not be. Hence, for complex tasks or messy HTML, third-party libraries might do a better cleanup job.

See Also