C#:
Downloading a web page
How to:
C# makes it simple to download a web page with the HttpClient
class. Here’s a quick example:
using System;
using System.Net.Http;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
using (HttpClient client = new HttpClient())
{
try
{
string url = "http://example.com"; // Replace with the desired URL
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody); // Outputs the raw HTML content
}
catch (HttpRequestException e)
{
Console.WriteLine("\nException Caught!");
Console.WriteLine("Message :{0} ", e.Message);
}
}
}
}
This will output HTML content of the specified web page into the console.
Deep Dive
Before HttpClient
, C# used classes like WebClient
and HttpWebRequest
to download web content. HttpClient
is the latest and is designed to be reusable, efficient, and support asynchronous operations making it the preferred choice for new applications.
Alternatives exist. For instance, third-party libraries such as HtmlAgilityPack
can parse HTML, making it easier to navigate the DOM or extract specific pieces of info without dealing with raw HTML strings.
When downloading web pages, remember: respect robots.txt files, handle exceptions, and be mindful of the terms of use for websites.