Fish Shell:
Parsing HTML

How to:

Fish shell, predominantly, is not designed for parsing HTML directly. However, it excels in gluing together Unix tools like curl, grep, sed, awk, or using specialized tools like pup or beautifulsoup in a Python script. Below are examples that showcase how to leverage these tools from within Fish shell to parse HTML.

Using curl and grep:

Fetching HTML content and extracting lines that contain links:

curl -s https://example.com | grep -oP '(?<=href=")[^"]*'

Output:

/page1.html
/page2.html
...

Using pup (a command-line tool for parsing HTML):

First, ensure pup is installed. Then you can use it to extract elements by their tags, ids, classes, etc.

curl -s https://example.com | pup 'a attr{href}'

Output, similar to the grep example, would list href attributes of <a> tags.

With a Python script and beautifulsoup:

While Fish itself can’t parse HTML natively, it seamlessly integrates with Python scripts. Below is a concise example that uses Python with BeautifulSoup to parse and extract titles from HTML. Ensure you have beautifulsoup4 and requests installed in your Python environment.

parse_html.fish

function parse_html -a url
    python -c "
import sys
import requests
from bs4 import BeautifulSoup

response = requests.get(sys.argv[1])
soup = BeautifulSoup(response.text, 'html.parser')

titles = soup.find_all('title')

for title in titles:
    print(title.get_text())
" $url
end

Usage:

parse_html 'https://example.com'

Output:

Example Domain

Each of these methods serves different use cases and scales of complexity, from simple command-line text manipulation to the full parsing power of beautifulsoup in Python scripts. Depending on your needs and the complexity of the HTML structure, you may choose a straightforward Unix pipeline or a more powerful scripting approach.