Expertise: scraping

No Escape From The Rules of Scrape

What is scraping?

Scraping is the automated extraction or downloading of data from third-party websites in order to reuse the data for other purposes.

Common examples include price comparison websites that rely on prices scraped from other websites, or big data analytics companies that scrape large quantities of data in order to find patterns or send targeted information.

Can public information be reused for other purposes?

Many people think that information in the public domain is freely accessible for further use. However, the data may be protected by several laws:

  • Data protection law. If the information being scraped contains personal data (e.g. email addresses, usernames), then its use falls within the scope of data protection law.
  • Intellectual property law. Any creative content may be protected by Intellectual property law, namely by copyright or database rights.
  • Contract law. If a company has Terms of Use (a disclaimer or API terms) that prohibit the scraping of information from its website, scraping could, under certain circumstances, lead to a breach of contract.
  • Criminal law. Scraping can constitute a criminal offence if it involves intentionally and unlawfully entering a computer system.

How can I legally scrape websites?

By taking into account the following guidelines:

  • Data protection law. Check whether the website’s privacy policy contains any restrictions on scraping. Even if it doesn’t, you still need to scrape in accordance with the GDPR. To do this you have to determine: which legal basisis appropriate for further processing the scraped personal data; whether your intended use is compatible with the original purposes; whether the data subjects have been informed of the collection and use of their personal data. You will also need to respect the data minimization principle (limit the data collection to what is necessary for the intended purpose) and make sure individuals can exercise their data subjects’ rights (e.g. ‘right to erasure’).
  • Intellectual property law. Check whether the information you’re scraping is protected by IP law. If it is, ask permission from the IP holder before scraping. Never republish an entire database or a substantial part thereof in order to avoid infringing someone’s database rights.
  • Contract law: Read the website’s Terms of Services (ToS), disclaimer and API Terms to see whether the act of scraping is explicitly prohibited. In addition, verify if you are actually bound by these terms (note that browse-wrapping is not enough ECLI:NL:GHDHA:2018:61). If terms apply that say that scraping is prohibited and you still do it, you may put yourself in a vulnerable position due to breach of contract. It’s also important to respect the rules of robots.txt. If ToS or robots.txt prevent you from crawling or scraping, you’d better ask for permission from the site owner before doing anything else.
  • Criminal law. If a website has security measures, these were put into place to render data inaccessible, so don’t circumvent them, otherwise you may break the law.