Website Scraping: Legal Causes of Action and Defenses


Technology advances at a pace much faster than legislation. One gap in legislation is with regard to website scraping. Website scraping has to the potential to effect businesses in a number of different ways including search engine rankings and competitive pricing. Nonetheless, due to the gap in legislation, United States companies are left alleging violations of decades old statutes, whose intent did not encompass such advanced data collection programs. Typically, web hosts are left claiming violations of Digital Millennial Copyright Act, State Trespass law, Breach of Contract and the Computer Fraud Abuse Act to combat this issue. This article looks at the legality of web scraping and the potential causes of action/defenses companies have regarding this practice.

What is Web Scraping?

Generally speaking, web scraping, refers to a process of mining valuable data or information from a website. Website’s have data both underlying the HTML code and presented visually to the user. Web scraping programs access either of these types of data and compile them into easily downloadable files—typically an excel spreadsheet. Web scraping is essentially content mining, that is, the program will look for useful data from websites, including news, comments, company, pricing etc. Because scraping is collecting massive amounts of displayed data from any given website, the precision of the web scraping often depends on the layout of the website. One change to the design or lay out of a website can disrupt the web scraping operation.

Web scraping is extremely cost effective and valuable for a number of different purposes including contract extraction, price comparison, social media data, business review data, and search engine optimization rankings. Recently, Google’s Webmaster Trends Analyst John Mueller took to Twitter to downplay the importance of copied data in the search engines rankings stating, “scrapers copy all kinds of content without judging the quality, so I don’t think that would be a really useful signal.”

However, many web hosts still fear comprehensive and non-stop web scraping will hurt their search engine optimization or search engine rankings. Scrapers republish/repurpose content and compete with web hosts for long/short tail traffic. Many hosts rely on high rankings and long-term traffic to generate business, so any dip can be devastating to a business.

Digital Millennium Copyright Act (“DMCA”)

Web hosts seeking monetary and injunctive relief for web scraping typically cite violations of the Digital Millennium Copyright Act—specifically the anti-circumvention provision. In general, the Digital Millennium Copyright Act (herein after the “DMCA”), protects web hosts from copyright liability, as long as, the copyright owners can request the stolen material be taken down. Of note for web scraping is § 1201, which prohibits the circumventing of technological measures to gain illicit access to copyrighted content. To circumvent a technological measure means to “descramble a scrambled work, to decrypt an encrypted work, or otherwise to avoid, bypass, remove, deactivate, or impair a technological measure, without the authority of the copyright owner.” Courts typically look at physical security measures like encryption, hashing, validation keys, password protection, employee training, or software initiation when determining sufficient technological measures.

Courts place a huge emphasis on terms of use contracts. One example occurs in a case called Facebook, Inc. v. Power Ventures, Inc. With regards to the Digital Millennium Copyright Act claim, Power Ventures relied on the same argument used by most defendants in DMCA actions. They argued Facebook did not satisfy the unauthorized use element because consumers control the access to their page on the website. The court ruled Facebook’s terms of use negated this argument because users are barred from using automated programs to access the Facebook website. The court did not consider the data on the social media public (because it is protected by password authentication). Ultimately, Facebook’s Digital Millennium Copyright Act claim survived the motion to dismiss. Moving forward, companies making use of web scraping programs should carefully review the terms of use of the website they intend to scrap, because it can be the deciding point during litigation.

Computer Fraud and Abuse Act (“CFAA”)

Web hosts often assert scrapers violations of the Computer Fraud and Abuse Act (“CFAA”) and state law counterparts including the California Comprehensive Computer Data Access and Fraud Act. Enacted in 1984, the CFAA originally sought to increase the government’s ability to regulate hackers and computer related crimes. Specifically, the CFAA forbids certain computer crimes, namely those involving accessing computers without authorization. Furthermore, it also combats a range of different online practices ranging from obtaining information to damaging computer data. Modern CFAA violations occur when a web scraper avoids or ignores an “access restriction” when scraping data from a website. The broad language used in the CFAA creates a low bar for potential plaintiffs. Essentially, any access restriction violation triggers a CFAA claim. The most controversial aspect of this statute is it includes a private right of action for violations. CFAA claims are extremely popular for deterring web scrapers because liability can result in not just injunctive relief or damages, but also criminal charges. Typically, to give rise to criminal violations, CFAA requires a government website or computer be involved, or the use of extortion by the web scraper.

The CFAA access restriction requirement is often triggered regarding the use of computer bots that create fake accounts to accumulate data. LinkedIn takes an active approach to preventing this type of scraping by sending numerous cease and desist letters, as well as, filing lawsuits. One example is in a 9th circuit case called, hiQ Labs Inc. v. LinkedIn Corporation. Here, hiQ’s business model centered around scraping information from LinkedIn profiles and analyzing the information (a practice they had been doing for years). HiQ would then provide their statistical analysis to other businesses. With respect to the CFAA, the court looked at whether by continuing to access LinkedIn profiles after permission had been revoked, hiQ violated the CFAA. HiQ’s argued there is no CFAA violation because the data is publicly available. Ultimately, the court granted hiQ’s injunction to stop LinkedIn from blocking web scraping activities.

Congress did not intend the Computer Fraud and Abuse Act to combat the issue of web scraping when it enacted the statute in 1984. The Ninth Circuit and other courts have been reluctant to extend the civil and criminal statutes to web scraping of public information. Companies planning on using web scraping programs should keep a close eye on litigation like hiQ Labs Inc. v. LinkedIn Corporation and other similar cases as they make their way up through the appellate courts because they will have a tremendous effect on the business practice of web scraping.

State Law Trespass

Web hosts, who are directly injured by scrapers, may seek protection via a state law trespass claim. The current legal standard for electronic trespass to chattel claims, requires a showing of unauthorized inference with a computer system resulting in actual damage. Web hosts often attempt to show actual damage by demonstrating interference with servers or similar technical difficulties. The threshold for these types of claims to survive a motion to dismiss increased, requiring the web host to prove the scraper harmed them. The Ninth Circuit takes the stance that minor server or website problems does not constitute actionable harm. Actionable harm is likely limited to cases in which the web servers crash as the result of web scraping. However, veteran web scrapers rarely crash a web host’s server or site, making trespass claims increasingly unlikely to survive this requirement.

Breach of Contract

Most, if not every, web host has a user agreement or terms of use contract conspicuously posted on their website. These agreements are enforceable; however, they must still satisfy contract law requirements. Web scrappers need to pay particular attention to these agreements because they outline the type of conduct permitted by users of the website. It is becoming increasingly common for user agreement or terms of use to prohibit any type of commercial web scraping. When a user breaches a user agreement or terms of use contract, a web host may ban user from using the particular product or service, as well as, seek potential civil damages. More commonly, the web host files a lawsuit and the parties settle prior to any jury verdict on this issue (perhaps why there is little case law on this issue). One example of a typical settlement occurs in LinkedIn Corporation v. Robocog Inc. Here, LinkedIn’s user agreement specifically conditions members’ right to access on agreeing not to use of data scraping technologies. The parties reportedly settled this case for $40,000. Web scrapers should consider the risk/reward before scraping data because the typical settlement can be high and litigation is even more expensive.

Best Practice Takeaways and Conclusion

There is no blanket criminal or civil ban on web scraping for business purposes. However, there are numerous causes of action a website operator may rely on to try and combat this technique.  From the perspective of a web scraper, the best thing they can do is review the terms of use of the given website before scraping data. Website operators need to increase physical security measures and keep an up to date, conspicuous user agreement. Inevitably, legislation will catch up to technology and Congress will address this issue with a more modern statute.

Follow and share:


Please enter your comment!
Please enter your name here