Dart:
Parsing HTML

How to:

Dart does not provide built-in support for HTML parsing in its core libraries. However, you can use a third-party package like html to parse and manipulate HTML documents.

First, add the html package to your pubspec.yaml file:

dependencies:
  html: ^0.15.0

Then, import the package into your Dart file:

import 'package:html/parser.dart' show parse;
import 'package:html/dom.dart';

Here’s a basic example of parsing a string containing HTML and extracting data:

void main() {
  var htmlDocument = """
  <html>
    <body>
      <h1>Hello, Dart!</h1>
      <p>This is a paragraph in a sample HTML</p>
    </body>
  </html>
  """;

  // Parse the HTML string
  Document document = parse(htmlDocument);

  // Extracting data
  String title = document.querySelector('h1')?.text ?? "No title found";
  String paragraph = document.querySelector('p')?.text ?? "No paragraph found";

  print('Title: $title');
  print('Paragraph: $paragraph');
}

Output:

Title: Hello, Dart!
Paragraph: This is a paragraph in a sample HTML

To interact with real-world web pages, you might combine html parsing with HTTP requests (using http package to fetch web content). Here’s a quick example:

First, add the http package along with html:

dependencies:
  html: ^0.15.0
  http: ^0.13.3

Then, fetch and parse an HTML page from the web:

import 'package:http/http.dart' as http;
import 'package:html/parser.dart' show parse;

void main() async {
  var url = 'https://example.com';
  
  // Fetch the webpage
  var response = await http.get(Uri.parse(url));
  
  if (response.statusCode == 200) {
    var document = parse(response.body);

    // Assume the page has <h1> tags you're interested in
    var headlines = document.querySelectorAll('h1').map((e) => e.text).toList();
    
    print('Headlines: $headlines');
  } else {
    print('Request failed with status: ${response.statusCode}.');
  }
}

Note: The web scraping technique shown above should be used responsibly and in compliance with the website’s terms of service.