Parsing HTML

Clojure:
Parsing HTML

How to:

Clojure does not have built-in HTML parsing capabilities, but you can leverage Java libraries or Clojure wrappers such as enlive or hickory. Here’s how to use both:

Using Enlive:

Enlive is a popular choice for HTML parsing and web scraping. First, include it in your project dependencies:

[net.cgrand/enlive "1.1.6"]

Then, you can parse and navigate HTML like so:

(require '[net.cgrand.enlive-html :as html])

(let [doc (html/html-resource (java.net.URL. "http://example.com"))]
  (html/select doc [:div.some-class]))

This snippet fetches an HTML page and selects all <div> elements with the class some-class.

Output might look like:

({:tag :div, :attrs {:class "some-class"}, :content ["Here's some content."]})

Using Hickory:

Hickory provides a way to parse HTML into a format that is easier to work with in Clojure. Add Hickory to your project dependencies:

[hickory "0.7.1"]

Here’s a simple example:

(require '[hickory.core :as hickory]
         '[hickory.select :as select])

;; Parse the HTML into Hickory format
(let [doc (hickory/parse "<html><body><div id='main'>Hello, world!</div></body></html>")]
  ;; Select the div with id 'main'
  (select/select (select/id "main") doc))

This code parses a simple HTML string and uses a CSS selector to find a div with the ID main.

Sample output:

[{:type :element, :tag :div, :attrs {:id "main"}, :content ["Hello, world!"]}]

Both enlive and hickory offer robust solutions for HTML parsing in Clojure, with enlive focusing more on templating and hickory emphasizing data transformation.

Last updated on March 13, 2024

Downloading a web page Sending an HTTP request

Clojure:Parsing HTML

How to:

Using Enlive:

Using Hickory:

Clojure:
Parsing HTML