No Escape From The Rules of Scrape
What is scraping?
Scraping is the automated extraction or downloading of data from third-party websites in order to reuse the data for other purposes.
Common examples include price comparison websites that rely on prices scraped from other websites, or big data analytics companies that scrape large quantities of data in order to find patterns or send targeted information.
Can public information be reused for other purposes?
Many people think that information in the public domain is freely accessible for further use. However, the data may be protected by several laws:
- Data protection law. If the information being scraped contains personal data (e.g. email addresses, usernames), then its use falls within the scope of data protection law.
- Intellectual property law. Any creative content may be protected by Intellectual property law, namely by copyright or database rights.
- Criminal law. Scraping can constitute a criminal offence if it involves intentionally and unlawfully entering a computer system.
How can I legally scrape websites?
By taking into account the following guidelines:
- Intellectual property law. Check whether the information you’re scraping is protected by IP law. If it is, ask permission from the IP holder before scraping. Never republish an entire database or a substantial part thereof in order to avoid infringing someone’s database rights.
- Contract law: Read the website’s Terms of Services (ToS), disclaimer and API Terms to see whether the act of scraping is explicitly prohibited. In addition, verify if you are actually bound by these terms (note that browse-wrapping is not enough ECLI:NL:GHDHA:2018:61). If terms apply that say that scraping is prohibited and you still do it, you may put yourself in a vulnerable position due to breach of contract. It’s also important to respect the rules of robots.txt. If ToS or robots.txt prevent you from crawling or scraping, you’d better ask for permission from the site owner before doing anything else.
- Criminal law. If a website has security measures, these were put into place to render data inaccessible, so don’t circumvent them, otherwise you may break the law.