Every once in awhile we’ll make several sites—or a lot of sites—for one client. The general subject or purpose is different for each site—justifying the existence of multiple sites. But obviously, there’s crossover in interest. So when we write a blog that pertains to more than one of those sites, it seems logical to share that content in more than one place. Since the client owns the content and both of the sites, it’s not a copyright infringement, so there’s no reason not to share, right?

Well, yes and no.

If you’re keeping up on the latest SEO best practices, you’ve at least heard of the concerns about duplicate content and how it can damage your site’s search ranking. Even if you own the content and both sites it’s being shared on, Google’s going to think that one of those sites is scraping—essentially plagiarizing—the other.

Our dilemma was as follows:

About a year ago we built and launched a public facing site for St.Vincent associates, SpiritOfCaring.org. It serves as an informational and social platform, but for the purposes of this conversation we’ll only discuss the health information provided through this site.

As one of the largest health systems in the state, St.Vincent covers a lot of ground and we’ve built a series of microsites to feature a variety of the major services such as cancer care, women’s health, orthopedics, etc. We maintain ongoing content for the majority of these sites, and we want to be sure the St.Vincent associates see and share all of this great information that’s available to them.

We’ve been including brief excerpts and links to the original articles on the microsites—but the hang up was that we were losing associate traffic on SpiritOfCaring.org to the microsites. It doesn’t seem like a huge issue but it makes it increasingly difficult for our clients to report on the success of SpiritOfCaring.org.

So as a part of a recent and standard annual reskin, we decided just to bring the whole microsite stories over instead of linking out. We’d only be copying a fraction of any given site and supplementing it with plenty of original content, but we were still uncomfortable with spitting in the eye of SEO best practice.

After some pretty painful research on duplicate content, we decided to create a file called robots.txt [link to robotstxt.org/robotstxt.html] and insert it into the main website directory. Basically, the file tells Google to disallow crawling of the indicated category, in the case the Your Health category.

Our reasoning was as such: Since the site it primarily for St.Vincent associates, we have a clearly defined audience and the site subsequently thrives on direct and referral traffic. (Meaning that the ever-coveted keywords aren’t quite so important for getting visitors to the site.) And even though we’re not indexing the content, we still get to track that time on site and any video views or content shares that happen.

So, how does this help you? Well, it doesn’t exactly because it’s unlikely you’ll ever run into an identical situation, but we’d like to share the extremely condensed version of our research on duplicate content.

Best practices and ways to think about duplicate content:

  1. Avoid it whenever possible—we’re aware that this isn’t actually helpful
  2. Never duplicate content within a site—internal duplication is the SEO pits
  3. Never plagiarize—it’s lazy and stupid
  4. Quoting a paragraph or so in a post isn’t going to kill your site, so don’t stress about sharing pieces of good information, but again, don’t take credit for what isn’t yours
  5. If you’ve got the approval to reuse or syndicate content
  • Always give credit to the original source and link to it whenever possible
  • Supplement with your own lead-in and conclusion, or other original content where it makes sense
  • Consider using <no index> tags or a robots.txt on the duplicated content
  • Be sure there’s lots more to your site than just syndicated articles

So you’ve duplicated some content and Google knows about it, what happens?

As of April 2012, the story is that Google will only serve the version that it crawls first in search results, it’s almost like the second copy doesn’t exist. So, hopefully, the original is crawled before the duplicate. (Something to keep in mind when you’re sharing content among your own sites, which site do you want to pull up in the search results?)

Now if you’ve got lots of internal duplication or you get caught plagiarizing, it’s a different story. So just steer clear of both and you won’t have to worry about it.

We think the most important thing to remember is why you’re sharing this duplicated content. It’s going to be a hassle to deal with properly and it’s not going to help your search rankings,  but if it’s content that’s valuable to your visitors, then we think it’s worth the trouble. Search engines–and that means SEO–exist to create the best online experience for the searcher, so have the best site and with the best content first and the optimization should follow naturally. It’s the direction Google’s heading anyway,  so you might as well be ready for it.

Here are some relevant and recent articles if you’re interested in knowing more about duplicate content:
Duplicate Content in a Post-Panda World
Duplicate Content Across Domains
Is Duplicate Content Really Bad for SEO?