C++:
Downloading a web page

How to:

In the current C++ version, you can use the CURL library to download web content. Here’s a basic example:

#include <curl/curl.h>
#include <iostream>
#include <string>

static size_t writeCallback(void* contents, size_t size, size_t nmemb, void* userp){
    ((std::string*)userp)->append((char*)contents, size * nmemb);
    return size * nmemb;
}

int main() {
    CURL* curl;
    CURLcode res;
    std::string readBuffer;

    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeCallback);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
        res = curl_easy_perform(curl);
        curl_easy_cleanup(curl);

        if(res == CURLE_OK) {
            std::cout << readBuffer << std::endl;
        }
        else {
            std::cerr << "CURL Error: " << curl_easy_strerror(res) << std::endl;
        }
    }

    return 0;
}

Sample output:

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    ...
</head>
<body>
    <div>
        <h1>Example Domain</h1>
        <p>This domain is for use in illustrative examples in documents. You may use this domain ...</p>
    </div>
</body>
</html>

Deep Dive

Originally, there was no standard way to download web pages with just C++. Programmers used platform-specific solutions or various third-party libraries. Now, libcurl is a widely supported and versatile library for transferring data with URLs. Compiled and linked with your C++ code, curl is a go-to tool.

Alternatives to libcurl include Poco’s HTTPClientSession and C++ Rest SDK (aka Casablanca). While libcurl is C-based and about as low-level as you can comfortably go in terms of HTTP requests, Poco and Casablanca offer more idiomatic C++ interfaces which some may prefer.

Under the hood, when you download a web page, the HTTP protocol kicks into action. A GET request is sent to the server, and assuming all goes well, the server responds with the content wrapped in an HTTP response.

See Also