PowerShell:
Parsing HTML
How to:
PowerShell does not natively have a dedicated HTML parser, but you can utilize the Invoke-WebRequest
cmdlet to access and parse HTML content. For more complex parsing and manipulation, the HtmlAgilityPack, a popular .NET library, can be employed.
Using Invoke-WebRequest
:
# Simple example to fetch titles from a webpage
$response = Invoke-WebRequest -Uri 'http://example.com'
# Utilize the ParsedHtml property to access DOM elements
$title = $response.ParsedHtml.title
Write-Output $title
Sample Output:
Example Domain
Using HtmlAgilityPack:
First, you need to install the HtmlAgilityPack. You can do this via NuGet Package Manager:
Install-Package HtmlAgilityPack -ProviderName NuGet
Then, you can use it in PowerShell to parse HTML:
# Load the HtmlAgilityPack assembly
Add-Type -Path "path\to\HtmlAgilityPack.dll"
# Create an HtmlDocument object
$doc = New-Object HtmlAgilityPack.HtmlDocument
# Load HTML from a file or a web request
$htmlContent = (Invoke-WebRequest -Uri "http://example.com").Content
$doc.LoadHtml($htmlContent)
# Use XPath or other query methods to extract elements
$node = $doc.DocumentNode.SelectSingleNode("//h1")
if ($node -ne $null) {
Write-Output $node.InnerText
}
Sample Output:
Welcome to Example.com!
In these examples, Invoke-WebRequest
is best for simple tasks, whereas HtmlAgilityPack offers a much richer set of features for complex HTML parsing and manipulation.