Parsing HTML

PowerShell:
Parsing HTML

How to:

PowerShell does not natively have a dedicated HTML parser, but you can utilize the Invoke-WebRequest cmdlet to access and parse HTML content. For more complex parsing and manipulation, the HtmlAgilityPack, a popular .NET library, can be employed.

Using `Invoke-WebRequest`:

# Simple example to fetch titles from a webpage
$response = Invoke-WebRequest -Uri 'http://example.com'
# Utilize the ParsedHtml property to access DOM elements
$title = $response.ParsedHtml.title
Write-Output $title

Sample Output:

Example Domain

Using HtmlAgilityPack:

First, you need to install the HtmlAgilityPack. You can do this via NuGet Package Manager:

Install-Package HtmlAgilityPack -ProviderName NuGet

Then, you can use it in PowerShell to parse HTML:

# Load the HtmlAgilityPack assembly
Add-Type -Path "path\to\HtmlAgilityPack.dll"

# Create an HtmlDocument object
$doc = New-Object HtmlAgilityPack.HtmlDocument

# Load HTML from a file or a web request
$htmlContent = (Invoke-WebRequest -Uri "http://example.com").Content
$doc.LoadHtml($htmlContent)

# Use XPath or other query methods to extract elements
$node = $doc.DocumentNode.SelectSingleNode("//h1")

if ($node -ne $null) {
    Write-Output $node.InnerText
}

Sample Output:

Welcome to Example.com!

In these examples, Invoke-WebRequest is best for simple tasks, whereas HtmlAgilityPack offers a much richer set of features for complex HTML parsing and manipulation.

Last updated on March 13, 2024

Downloading a web page Sending an HTTP request

PowerShell:Parsing HTML

How to:

Using Invoke-WebRequest:

Using HtmlAgilityPack:

PowerShell:
Parsing HTML

Using `Invoke-WebRequest`: