Technology advances at a pace much faster than legislation. One gap in legislation is with regard to website scraping. Website scraping has to the potential to effect businesses in a number of different ways including search engine rankings and competitive pricing. Nonetheless, due to the gap in legislation, United States companies are left alleging violations of decades old statutes, whose intent did not encompass such advanced data collection programs. Typically, web hosts are left claiming violations of Digital Millennial Copyright Act, State Trespass law, Breach of Contract and the Computer Fraud Abuse Act to combat this issue. This article looks at the legality of web scraping and the potential causes of action/defenses companies have regarding this practice.
What is Web Scraping?
Generally speaking, web scraping, refers to a process of mining valuable data or information from a website. Website’s have data both underlying the HTML code and presented visually to the user. Web scraping programs access either of these types of data and compile them into easily downloadable files—typically an excel spreadsheet. Web scraping is essentially content mining, that is, the program will look for useful data from websites, including news, comments, company, pricing etc. Because scraping is collecting massive amounts of displayed data from any given website, the precision of the web scraping often depends on the layout of the website. One change to the design or lay out of a website can disrupt the web scraping operation.
Web scraping is extremely cost effective and valuable for a number of different purposes including contract extraction, price comparison, social media data, business review data, and search engine optimization rankings. Recently, Google’s Webmaster Trends Analyst John Mueller took to Twitter to downplay the importance of copied data in the search engines rankings stating, “scrapers copy all kinds of content without judging the quality, so I don’t think that would be a really useful signal.”
Scrapers copy all kinds of content without judging the quality, so I don't think that would be a really useful signal :).
— 🍌 John 🍌 (@JohnMu) December 7, 2018
However, many web hosts still fear comprehensive and non-stop web scraping will hurt their search engine optimization or search engine rankings. Scrapers republish/repurpose content and compete with web hosts for long/short tail traffic. Many hosts rely on high rankings and long-term traffic to generate business, so any dip can be devastating to a business.
Digital Millennium Copyright Act (“DMCA”)
Web hosts seeking monetary and injunctive relief for web scraping typically cite violations of the Digital Millennium Copyright Act—specifically the anti-circumvention provision. In general, the Digital Millennium Copyright Act (herein after the “DMCA”), protects web hosts from copyright liability, as long as, the copyright owners can request the stolen material be taken down. Of note for web scraping is § 1201, which prohibits the circumventing of technological measures to gain illicit access to copyrighted content. To circumvent a technological measure means to “descramble a scrambled work, to decrypt an encrypted work, or otherwise to avoid, bypass, remove, deactivate, or impair a technological measure, without the authority of the copyright owner.” Courts typically look at physical security measures like encryption, hashing, validation keys, password protection, employee training, or software initiation when determining sufficient technological measures.
Computer Fraud and Abuse Act (“CFAA”)
Web hosts often assert scrapers violations of the Computer Fraud and Abuse Act (“CFAA”) and state law counterparts including the California Comprehensive Computer Data Access and Fraud Act. Enacted in 1984, the CFAA originally sought to increase the government’s ability to regulate hackers and computer related crimes. Specifically, the CFAA forbids certain computer crimes, namely those involving accessing computers without authorization. Furthermore, it also combats a range of different online practices ranging from obtaining information to damaging computer data. Modern CFAA violations occur when a web scraper avoids or ignores an “access restriction” when scraping data from a website. The broad language used in the CFAA creates a low bar for potential plaintiffs. Essentially, any access restriction violation triggers a CFAA claim. The most controversial aspect of this statute is it includes a private right of action for violations. CFAA claims are extremely popular for deterring web scrapers because liability can result in not just injunctive relief or damages, but also criminal charges. Typically, to give rise to criminal violations, CFAA requires a government website or computer be involved, or the use of extortion by the web scraper.
The CFAA access restriction requirement is often triggered regarding the use of computer bots that create fake accounts to accumulate data. LinkedIn takes an active approach to preventing this type of scraping by sending numerous cease and desist letters, as well as, filing lawsuits. One example is in a 9th circuit case called, hiQ Labs Inc. v. LinkedIn Corporation. Here, hiQ’s business model centered around scraping information from LinkedIn profiles and analyzing the information (a practice they had been doing for years). HiQ would then provide their statistical analysis to other businesses. With respect to the CFAA, the court looked at whether by continuing to access LinkedIn profiles after permission had been revoked, hiQ violated the CFAA. HiQ’s argued there is no CFAA violation because the data is publicly available. Ultimately, the court granted hiQ’s injunction to stop LinkedIn from blocking web scraping activities.
Congress did not intend the Computer Fraud and Abuse Act to combat the issue of web scraping when it enacted the statute in 1984. The Ninth Circuit and other courts have been reluctant to extend the civil and criminal statutes to web scraping of public information. Companies planning on using web scraping programs should keep a close eye on litigation like hiQ Labs Inc. v. LinkedIn Corporation and other similar cases as they make their way up through the appellate courts because they will have a tremendous effect on the business practice of web scraping.
State Law Trespass
Web hosts, who are directly injured by scrapers, may seek protection via a state law trespass claim. The current legal standard for electronic trespass to chattel claims, requires a showing of unauthorized inference with a computer system resulting in actual damage. Web hosts often attempt to show actual damage by demonstrating interference with servers or similar technical difficulties. The threshold for these types of claims to survive a motion to dismiss increased, requiring the web host to prove the scraper harmed them. The Ninth Circuit takes the stance that minor server or website problems does not constitute actionable harm. Actionable harm is likely limited to cases in which the web servers crash as the result of web scraping. However, veteran web scrapers rarely crash a web host’s server or site, making trespass claims increasingly unlikely to survive this requirement.
Breach of Contract
Best Practice Takeaways and Conclusion
Justin Banda earned his Bachelor’s degree from California Polytechnic State University in San Luis Obispo in 2017. Currently, Mr. Banda is a Juris Doctorate Candidate at Santa Clara University, School of Law. In addition, Mr. Banda is enrolled in the well-regarded Privacy Certificate Program and he has a strong interest in global privacy compliance, cybersecurity, information technology, and behavioral advertising.
Mr. Banda also works as a Privacy Evaluator for the non-profit Common Sense Media. Through his work as a Privacy Evaluator, Mr. Banda evaluates and grades privacy policies/terms of service for education technology companies so teachers can make smart choices about the learning tools they use with students/schools. Evaluation criteria are focused on best practices in the areas of information security, privacy, transparency and compliance.
Mr. Banda has had the pleasure of publishing research articles with the International Association of Privacy Professionals, Golden Data, and LawZam.