Web Scraping is a technique used to extract information from websites.
Web Scraping usually happens in two parts:
A web scraper will often send many requests to a targeted website. Sometimes, it will be easy for the web server to tell that these requests are coming from a bot. Other times, the web scraper will intentionally disguise itself so that it appears to be a normal human visitor.
A web scraper will attempt to find patterns in a website's HTML and use those patterns to extract the data he or she is looking for.
For example, the HTML in a Google search result always looks like this:
I Don't Need No Stinking API: Web Scraping For Fun and Profit
Sometimes you need to pull data from a service that doesn't have an API. Not to fear! Here's how (and why) you should consider web scraping.
That code appears for each result listed on a Google results page. To scrape a list of results, a web scraper could pull the list of every
<li class="g"> element on the page, and then pull the link from the
<h3 class="r"> element.
There is a lot of great information about web scraping to help you learn more. Check out some of these great resources.
This is basically the web scraping Bible. The Ultimate Guide to Web Scraping examines various ways that information is sent from a website to your computer, and how that can be intercepted and parsed. It also looks at common traps and anti-scraping tactics and how you might be able to thwart them.
Buy Now - $10.00