Protect Your Blog Content From Web Scraping

by Christy Rakoczy on November 8, 2013
iStock_000003342116XSmall-300x199.jpg

It takes time, money, and effort to create content for your company’s blog. Unfortunately, some unscrupulous website owners don’t want to invest in original material and instead plagiarize other people’s posts through a practice called “web scraping.”

Web scraping typically involves using automated bots or scripts to copy and repost content. Why is it bad for small businesses? Web scraping can decrease your website’s SEO ranking and traffic, subscriber base, and sales and marketing revenue and increase your bandwidth, network, and legal costs.

Here’s how to help prevent this type of IP theft:

  • Understand and uphold copyright law. When you publish original content on a website, copyright law automatically safeguards your ownership of that material. For an added layer of protection, register your work and include a copyright notice (such as © 2013 Intuit, Inc.) on the bottom of each page of your site. Whether or not you’ve formally registered your copyright, if someone reposts your content without permission, you may file a Digital Millennium Copyright Act notice asking that site’s internet service provider or host to remove it.
  • Use anti-scraping software. ScrapeShield is a free app that tracks where stolen content appears, and works with a network called Maze that’s building a database of scrapers to blacklist. If the scraper tries to access any website that is part of a content delivery network called CloudFlare, Maze works to send those scrapers bad data that ties up the scraper’s resources, slowing it down to make it difficult to copy data anywhere on the Internet. Distil Network also has a comprehensive scraping solution that, after a free trial, costs $99 per month and up.
  • Install WordPress plugins. Do you use WordPress? The popular blogging system offers various third-party tools to identify and stop scraping. For example, the free digital fingerprint plugin by ©Feed hides a unique character string in your original HTML or source code and then periodically searches for it elsewhere, producing a list of sites it finds serving your stolen content. Meanwhile, the low-cost AntiScraper plugin ($10 a year after a free trial) works to block blacklisted scrapers from accessing your website in the first place.

Christy Rakoczy is a business writer for Intuit and is passionate about solving small business problems.

Advertisement