Arduino:
Parsing HTML
How to:
Parsing HTML on Arduino usually demands minimal footprint libraries due to limited device resources. A popular choice for web scraping and parsing is using the ESP8266HTTPClient
and ESP8266WiFi
libraries for ESP8266, or their ESP32 counterparts, given their native support for Wi-Fi capabilities and HTTP protocols. Here’s a basic example to fetch and parse HTML, assuming you’re working with an ESP8266 or ESP32:
First, include the necessary libraries:
#include <ESP8266WiFi.h> // For ESP8266
#include <ESP8266HTTPClient.h>
#include <WiFiClient.h>
// Use analogous ESP32 libraries if using an ESP32
const char* ssid = "yourSSID";
const char* password = "yourPASSWORD";
Connect to your Wi-Fi network:
void setup() {
Serial.begin(115200);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(1000);
Serial.println("Connecting...");
}
}
Make an HTTP request and parse a simple piece of HTML:
void loop() {
if (WiFi.status() == WL_CONNECTED) { //Check WiFi connection status
HTTPClient http; //Declare an object of class HTTPClient
http.begin("http://example.com"); //Specify request destination
int httpCode = http.GET(); //Send the request
if (httpCode > 0) { //Check the returning code
String payload = http.getString(); //Get the request response payload
Serial.println(payload); //Print the response payload
// Parse a specific part, e.g., extracting title from payload
int titleStart = payload.indexOf("<title>") + 7; // +7 to move past the "<title>" tag
int titleEnd = payload.indexOf("</title>", titleStart);
String pageTitle = payload.substring(titleStart, titleEnd);
Serial.print("Page Title: ");
Serial.println(pageTitle);
}
http.end(); //Close connection
}
delay(10000); //Make a request every 10 seconds
}
Sample output (assuming http://example.com has a simple HTML structure):
Connecting...
...
Page Title: Example Domain
This example demonstrates fetching an HTML page and extracting the <title>
tag content. For more complex HTML parsing, consider using regular expressions (with caution due to memory constraints) or string manipulation functions to navigate through the HTML structure. Advanced parsing might require more sophisticated approaches, including custom parsing algorithms tailored to the specific structure of HTML you’re dealing with, as the standard Arduino environment does not include a built-in HTML parsing library.