Lua:
Parsing HTML

How to:

Lua does not have a built-in library for parsing HTML, but you can utilize third-party libraries like LuaHTML or leverage bindings for libxml2 through LuaXML. A popular approach is to use the lua-gumbo library for parsing HTML, which provides a straightforward, HTML5-compliant parsing capability.

Installing lua-gumbo:

First, ensure lua-gumbo is installed. You can typically install it using luarocks:

luarocks install lua-gumbo

Basic Parsing with lua-gumbo:

Here’s how you can parse a simple HTML snippet and extract data from it using lua-gumbo:

local gumbo = require "gumbo"
local document = gumbo.parse[[<html><body><p>Hello, world!</p></body></html>]]

local p = document:getElementsByTagName("p")[1]
print(p.textContent)  -- Output: Hello, world!

Advanced Example - Extracting Links:

To extract href attributes from all anchor tags (<a> elements) in an HTML document:

local gumbo = require "gumbo"
local document = gumbo.parse([[
<html>
<head><title>Sample Page</title></head>
<body>
  <a href="http://example.com/1">Link 1</a>
  <a href="http://example.com/2">Link 2</a>
  <a href="http://example.com/3">Link 3</a>
</body>
</html>
]])

for _, element in ipairs(document.links) do
    if element.getAttribute then  -- Ensure it's an Element and has attributes
        local href = element:getAttribute("href")
        if href then print(href) end
    end
end

-- Sample Output:
-- http://example.com/1
-- http://example.com/2
-- http://example.com/3

This code snippet iterates through all the links in the document and prints their href attributes. The lua-gumbo library’s ability to parse and understand the structure of an HTML document simplifies the process of extracting specific elements based on their tags or attributes.