Python:
Parsing HTML

How to:

Python provides powerful libraries like BeautifulSoup and requests for web scraping and HTML parsing. To begin, you need to install these libraries if you haven’t already:

pip install beautifulsoup4 requests

Here’s a basic example using requests to fetch the HTML content of a webpage and BeautifulSoup to parse it:

import requests
from bs4 import BeautifulSoup

# Fetch the content of a webpage
URL = 'https://example.com'
page = requests.get(URL)

# Parse the HTML content
soup = BeautifulSoup(page.content, 'html.parser')

# Example of extracting the title of the webpage
title = soup.find('title').text
print(f'Webpage Title: {title}')

Sample output:

Webpage Title: Example Domain

For more complex queries, like extracting all links from a webpage, you can use BeautifulSoup’s various methods for navigating and searching the parse tree:

# Extract all links within <a> tags
links = soup.find_all('a')

for link in links:
    href = link.get('href')
    print(href)

Sample output:

https://www.iana.org/domains/example

BeautifulSoup’s flexibility allows you to tailor your search for the exact data needed, making HTML parsing a powerful tool for programmers working with web content.