Newsletter - sign up here
Search Webster
Webster's pieces from The Oldie
Webster's Webwatch

Screen Scraping

November 2012

The trouble with websites is that they have little mystery; once a site is published, all the information stored in it and the technical wizardry that makes it work are exposed and can be examined, copied, edited and regurgitated at will.  There are no secrets online.

To understand what a computer sees when it looks at a website, try this: right click on any web page, and in the menu that appears, click on View Source or View Page Source.  You’ll find you are looking right under the bonnet of the website, to where the magic is created.  It will seem like gibberish, but your computer understands it.  Most of the information that sits in a website is just as easy to get at; that, after all, is the point of it being online.

However, if we can read it, so can computers, and if they can read it, they can copy, collate and sort all the information they find and deliver it to us to do with as we wish.  It’s a process known colloquially as screen-scraping and some website owners don’t like it at all.

Why might you want to screen-scrape?  Well, consider sites like skyscanner.net, from Edinburgh, or the American kayak.co.uk; they help you to find the cheapest scheduled flights.  You tell them where you want to go and when; they then visit all the airline websites they can, extract (“scrape”) the information for you, sort it and show you the options.  It’s big business; skyscanner.net, for example, turned over £15m last year.  They are a little coy about how they make their money, and they wouldn’t answer my questions, but I imagine that their income comes mainly from advertising placed on the site and deals done with the airlines whose flights they show.

So far so good.  All this seems like a perfectly sensible way of using the information that is being made freely available by other people – and if you don’t want people to use your information, why put it on the web? 

However, the practice undoubtedly rubs some website owners up the wrong way.  They feel irritated because screen-scraping sites take their information without paying for it and re-package it in a way that attracts advertisers, and hence money.  This may seem like sour grapes, but competition for data is fierce; some even claim that screen-scraping is theft, although the legal position is far from clear.

It’s a murky area.  Ryanair have fought a determined and public battle against scrapers who masquerade as price comparison sites but actually act as middle-men selling Ryanair flights after adding a commission.  Monster.com, a huge website that matches job vacancies to CVs lodged with them, is fighting screen-scrapers all the time, mainly to protect their clients’ privacy, but also to combat competitors.

There are also scrapers-for-hire – sort of cyber private eyes - who will undertake individual scraping projects.  It might to extract the names of all the staff or customers of a rival, or to find a list of sales prospects, or to watch their competitors’ product list constantly to spot what is new.

You might think that if information is online, it is fair game, but the truth is that the law, where it exists, if different in every country.  In practical terms it’s a cat and mouse affair; big websites are always on the lookout for any unwelcome scraping activity, so that they can erect defences, with or without legal help.

Most of the time, however, we can benefit from this technical cleverness, if only because it helps us buy a cheaper plane ticket or insurance policy.  My advice, as usual, is to understand exactly who you are dealing with.  By all means use an ingenious site like skyscanner.net to find the best flight, but then navigate your way independently to the website of the airline concerned, and make the booking directly.  That way you can be sure you are buying from the right people, and they can be sure who you are.

It won’t make the business of flying any more attractive; that, I am afraid, is a lost cause, but you might save a few pounds.