Bash:
Downloading a web page
How to:
The go-to tool for this job? curl
. It’s a powerful command-line utility that fetches data from the web. Here’s the simplest use case:
curl https://example.com -o webpage.html
This command downloads the HTML of example.com
and writes it to a file named webpage.html
. Check out the output:
# Sample Output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1256 100 1256 0 0 6458 0 --:--:-- --:--:-- --:--:-- 6497
Want to see what you’re downloading in real-time? Drop the -o
and the download prints right in your console:
curl https://example.com
Deep Dive
curl
has been around since 1997, carving its niche for web operations. Why curl
over browser downloads? Automation and script-friendliness. It’s non-interactive and can be woven into bash scripts with ease.
Alternatives worth mentioning: wget
, another command-line powerhouse that can recursively download web pages. For heavy-duty scraping or when needing a real browser context, programmers turn to tools like Selenium, Puppeteer, or Scrapy.
Getting into curl
’s workings: It supports multiple protocols, from HTTP and HTTPS to FTP, and a slew of options (–header, –cookie, –user-agent, etc.) for fine-tuning requests. Plus, it usually comes pre-installed on Unix-based systems.
See Also
- Curl Documentation: https://curl.haxx.se/docs/manpage.html
- Wget Manual: https://www.gnu.org/software/wget/manual/wget.html
- Introduction to web scraping with Python: https://realpython.com/python-web-scraping-practical-introduction/