Monday, October 14, 2013

Link Building 101: Competitor Analysis

Link Building 101 Competitor Analysis
Link building is something anyone can accomplish. There's no great secret, just hard work, creativity, and determination to get links that matter.
When you're looking for some practical link building opportunities that will help you find and acquire quick, yet quality, links, there are five "quick wins" you should explore at the beginning of a link building campaign:
  1. 404 Pages and Link Reclamation
  2. Competitor Analysis
  3. Fresh Web Explorer/Google Alerts
  4. Local Link Building
  5. Past/Current Relationships

Competitor Analysis/Backlink Profile

Competitor analysis is an integral step in any link building campaign. Why? Because running a backlink analysis on a competitor:
  • Teaches you about the industry:
    • Gives you a sense of which sites within the vertical are providing links
  • Helps you understand your competitors, including:
    • Their link profile, and why they're ranking
    • Their strategies used to acquire links
    • Their resources that didn't acquire many links
Gives you a list of obtainable links (if they can, why not you?)
Competitor backlink analysis is great – you get the initial research into the industry done, it helps you understand the competition, and it gives you a tidy list of high opportunity links.
So, let's dive into the how of competitor backlink analysis:
  1. Make a list of competitors
    • Direct
    • Indirect
    • Industry influencers
    • Those ranking for industry money keywords
    • Watch fluctuations – who's winning and who's losing
  2. Take those competitors and run their sites' through a backlink tool previously mentioned (OSE, Majestic, Ahrefs, CognitiveSEO, etc.)
  3. Backlink Analysis
  4. Download the top 3-4 competitors' backlinks into CSVs. Combine into a single Excel sheet, removing duplicates, and find obtainable quality links already secured by competitors.
Step 2 and 3 were previously covered in "Link Building 101: How to Conduct a Backlink Analysis", and step 1 is pretty self-explanatory.
To recap the advice for these steps:
  • Don't phone-in the list of competitors. Spend time doing research and investigation, giving yourself a well thought out and understood list of potential competitors.
  • Information you should be examining in a backlink analysis:
    • Total number of links
    • Number of unique linking domains
    • Anchor Text usage and variance
    • Fresh/incoming links
    • Recently lost links
    • Page Performance (via top pages)
    • Link quality (via manual examination)
  • Additionally, think creatively while looking through competitors' backlinks. Think about:
    • Which resources/pages performed well
    • Which resources/pages performed poorly
    • Commonalities in competitor's link profiles
    • Differences in competitor's link profiles
    • Strategies likely used to acquire links

How to Find Obtainable Quality Links

So, that takes us to Step 4: downloading competitors links into CSVs, combining in Excel, and drilling down into the data to find worthwhile links and insights.
Honestly, SEER has done an amazing job of writing a very easy to follow guide for Competitor Backlink Analysis in Excel.
To summarize their steps, you:
  • Download CSVs of competitor's backlink portfolios (‘Inbound Links' will give you a list of all the pages linking, ‘Linking Domains' will give you only the domains).
    • Note: if you're unfamiliar with your own (or client's) backlink portfolio, you may wish to include their backlink portfolio in this process for reference.
    • Using OSE don't forget to filter to the whole domain:
Pages on this root domain export to CSV
  • Open the CSVs and combine (copy and paste) all the data into a single Excel sheet.
  • Filter down to clean URLs, keeping the originals intact.
    • Move Column J (target URL) to Column P (to be the last column)
Move Column
    • Delete Column J (the now empty column)
Delete Empty Column
    • Duplicate the URL and Target URL columns on either side
Duplicate URL Target URL columns
    • Remove http:// and www. from both column A and column P - select the column, click control+H (find and replace shortcut), type in what you want to find (http:// and www.) and replace them with nothing (by leaving the second line blank).
Remove http and www
    • You might want to rename column A and P at this point - call them bare URL and bare target URL, or whatever you so desire (in the SEER article they were called ‘clean').
  • Remove duplicates
Remove Duplicates
    • Make sure it's only for column A (bare URL) and P (bare target URL)
Remove Duplicates URL
Notice the check mark on "My data has headers". This is important to keep your data from being jumbled up. Anytime you're removing duplicates make sure this box is checked.
This will give you a complete list of stripped URLs next to the full URL linking (along with the rest of the important information provided by OSE) and a list of full target URLs next to a complete list of stripped target URLs.
Note: you'll still likely have a lot of duplicate URLs in column A (the linking URLs) at this point. This is because there's multiple links on the same page going to different landing pages – which is potentially important information (shows a competitor acquired multiple links per page).
If you'd like to delete these multiple link pages/URLs to reduce data noise, highlight column A, and run ‘Delete Duplicates' again - making sure to have the ‘My data has headers' box is checked:
Remove Duplicates Bare URLs
Now, you'll be down to unique URLs (pages, not domains if you've used Inbound Links) linking to competitors. If you're looking for only referring domains, you should start back at step 1 and download a CSV of referring domains, as opposed to all links.
At this point, you're still dealing with a lot of data, so you'll want to filter it further. I recommend filtering by domain authority to see the most authoritative links first.
Filter Domain Authority
This will make your list ordered from highest domain authority to lowest – pretty useful information. Keep in mind however that the domain authority is thrown off by any subdomains hosted on a popular site – example.wordpress.com, example.blogspot.com, etc.
So, don't take the domain authority as absolute – you'll need to verify.
There's also a few other filters you can use to find interesting data:
  • Page Authority (PA)
  • Anchor Text
  • Number of domains linking (shows best ranking pages - don't get stuck on home pages)
Take time and play around with the data. Look through the top DA's (manually excluding anything artificially inflated), then PA's, check out top performing pages via number of domains linking, and even play around with filtering the anchor text.
This should be the fun part - the analysis. You've filtered the data down to a semi-digestible level, and should start taking advantage to find insights and understand your competitor's links.
Remember, any links your competitor has should be considered fair game for yourself. Once you've determined quality links from domains you haven't secured, look into the link and pursue it appropriately.

More Insights

If you're looking for an even better (and more advanced) deep data insights you can move all this information into pivot tables. Simply select all rows, click over to the insert tab, and select ‘Pivot Table':
Insert Pivot Table
Once here you have the option to choose which fields you'd like to further examine:
Pivot Table Fields to Add
Playing with this data should reveal potential insights, although we're getting a bit beyond Link Building 101.
Furthermore, if you want to really dive into pivot tables (or excel in general), I can't recommend Annie Cushing enough. Check out her Moz article "How to Carve Out Marketing Strategies by Mining Your Competitors' Backlinks".

After '(Not Provided)' & Hummingbird, Where is Google Taking Us Next?

We've come a long way in a little over two decades of search. Archie, Veronica, Jughead, Excite, Wanderer, Aliweb, Altavista, WebCrawler, Yahoo, Lycos, LookSmart, Google, HotBot, Ask, dmoz, AllTheWeb, Goto (Overture), Snap, LiveSearch, Cuil, Bing, Blekko, DuckDuckGo, Yandex, Baidu... and too many other also-rans to name.
The earliest were simply a collection of resources, initially just in alphabetical order, then some introducing an internal search capability. Eventually, some began to crawl the web, while others contented themselves with using the indexes of others.
Among them all, Google now stands out as the giant. About two-thirds of all global searches happen on Google. So that means that those of us who want our sites to be found in Google's search results need to color between the (webmaster guide)lines, while trying to figure out what Google wants to see, today and hopefully, tomorrow.

Search Today

Figuring out what Google prefers to rank isn't really that complex. Pay attention, use some common sense, don't look for silver bullets, and provide quality and value. Get that down pat and you're in pretty good shape.
Most folks who find themselves crosswise of Google got there because they (or someone they hired) tried to take a shortcut. Do shortcuts still work? You bet! Do they still last? Not so much!
Google has gotten a lot better at detecting and handling manipulative tactics. No, they're not perfect – not by a far cry. But the improvement is undeniable, and a couple of recent developments offer hope.
What happened?
Google unleashed a one-two punch recently, with two important changes that stirred up a lot of chatter in SEO and marketing communities. And I'm not convinced they're unrelated. They just mesh too well to be coincidence (not to be confused with correlation, my friends).

1. '(Not Provided)'

No Keyword DataThe recent extension to "(not provided)" for 100 percent of organic Google keywords in Google Analytics got a lot of people up in arms. It was called "sudden", even though it ramped up over a period of two years. I guess "it suddenly dawned on me" would be more accurate.
As my bud, Thom Craver, stated perfectly, if you're one of those who is saying that no keywords means SEO is dead or you can't do your job, then you shouldn't be doing SEO to begin with.
That sums it up pretty well. There are still ways to know what brought users to your pages. It's just not handed to you on a silver platter any more. You'll have to actually work for it.

2. Hummingbird

HummingbirdNow let's look at the other half of that double-tap: Hummingbird. Since Google's announcement of the new search algorithm, there have been a lot of statements that fall on the inaccurate end of the scale. One common theme seems to be referring to it as the biggest algo update since Caffeine.
Wrong on both counts, folks! First, Caffeine is a software set for managing the hardware that crawls and indexes, not search. As such, it's not an algorithm. It was also new, not updated, but we'll let that slide.
That second point, however, applies strongly to Hummingbird. There is no such thing as a Hummingbird update. It's a brand new search algorithm.
Jeez-Louise. if you're going to speak out, at least try not to misinform, OK?

Why Might they be Related?

Now understand, there's a bit of conjecture from here on out. I can't point to any evidence that supports this theory, but I think many of you will agree it makes some sense.
Killing the easy availability of keywords makes sense to me. People have focused on keywords to a degree that approaches (and often passes) ridiculous. Google has finally, however, achieved a sufficient level of semantic ability to allow them to ascertain, with a reasonable amount of accuracy, what a page is about, without having exact keywords to match to a query.
Methinks it's a good idea for the folks who are generating content to try the same.
So... we can no longer see the exact keywords that visitors used to find us in organic search. And we no longer need to use exact keywords to be able to rank in organic search.
Yeah, I know, pure correlation. But still, a pattern, no?
My theory is that there's no coincidence there. In fact, I think it runs deeper.
Think about it. If you're no longer targeting the keywords, you can actually *gasp* target the user. Radical concept for folks who are still stuck in a 2005 rut.
Bottom line: You need to start building your content with concept and context in mind. That'll result in better content, more directed to your visitors – then you can stop worrying about whether Google has a clue about the topic your page is focused on.
Just communicate. If you do it right, it'll come through, for both. Just think things, not strings.

Where is Search Heading Next?

RainbowHere's where I think the Knowledge Graph plays a major role. I've said many times that I thought Google+ was never intended to be a social media platform; it was intended to be an information harvester. I think that the data harvested was intended to help build out the Knowledge Graph, but that it goes still deeper.
Left to its own devices, Google could eventually build out the Knowledge Graph. But it would take time, and it would undoubtedly involve a lot of mistakes, as they dialed their algos in.
With easily verified data via Google+, Google has a database against which they can test their algos' independent findings. That would speed the development process tremendously, probably shaving two or three years off the process.
But my theory doesn't end there. Although I suspect it wasn't a primary motivation, the removal of keywords, coupled with the improved semantic ability of Hummingbird, puts a whole new level of pressure on people to implement structured data. As adoption cranks up, the Knowledge Graph will be built out even faster.
As I said, I doubt that motivating people to implement structured data markup was a primary focus of the recent changes. But I'll bet it was a major benefit that didn't go unnoticed at the 'Plex.
The last week has definitely brought some changes to the way we'll be handling our online marketing and SEO efforts. The Internet continues to evolve. Those who don't follow suit may soon be extinct.
For my part, I'm pleased to see the direction that Google seems to be moving in. It's a win-win.

5 Things We've Learned From Google's New War on Links

It's been 18 months now since Google's Penguin update launched and a similar amount of time since the first manual penalty messages were sent to unsuspecting webmasters.
That's a long time in the world of digital marketing. While most industries deal with a level of change, the rate of iteration across the web is unprecedented.
Such a level of change requires an agile approach to processes. Google practices a Kaizen approach to product development and penalties, so it's imperative that we consistently reexamine how and why we do everything.
The same rule applies to how penalties are dealt with. It's a given that the tolerances Google allows across metrics have changed since those penalties were first introduced. Industry opinions would certainly support that theory.
Strangely, for a content led company, the digital marketing agency I run is now very experienced in penalty recovery, as a result of new clients coming to us looking for a way to market their companies in a different way.
It means, in short, that I have lots of data to draw conclusions from. I want to share our recent findings based on recent real world work, including a few key tips on areas that you may be missing while clean up is going on. Here are some top takeaways.

Link Classification

While Google has long been giving out examples of links that violate their guidelines, in recent weeks things have changed.
Until recently it was so easy to call a "bad" link you could spot them with your eyes closed. The classification was so easy it has spawned a proliferation of "link classifier" tools. And while they prove to be useful as a general overview and to help do things at scale, the pace of Google's iteration has made manual classification an absolute must.
So what has changed?
We've always known that anchor text overuse is a key metric. Here are the results of a charting study we ran across those clients escaping either manual or algorithmic penalties:
Percent of Suspect Links Post-Recovery
It isn't perfect, but the data shows an irrefutable trend toward a less tolerant stance on "spam" by Google.
I don't want this to be seen a definitive result or scientific study because it isn't. It is simply some in-house data we have collated over time that gives a general picture of what's going on. Recovery. in this instance. is classed either as manual revoke or "significant" improvement in rankings and traffic over more than a month.

The Link Types Being Classified as 'Unnatural' are Changing

The view that things are indeed changing has been supported by example links coming through from Google in the past four weeks as part of its manual review communication.
Instead of the usual predictable irrelevant web directory or blog network, the search giant seems to be getting much more picky.
And while I can't share exact links due to client confidentiality, here are a couple examples of specific link types that have been specifically highlighted as being "unnatural":
  • A relevant forum post from a site with good TrustFlow (Majestic's measure of general domain "trust").
  • A Domain Authority (DA) 27 blog with relevant and well-written content (DA is a Moz.com metric measured out of 100).
Ordinarily these links would pass most classification tests, so it was surprising to see them listed as unnatural. Clearly we can't rule out mistakes by whoever reviewed the site in question, but let's assume for a moment this is correct.
In the case of the forum post it had been added by a user with several posts and the text used was a relevant and part of the conversation. It looked natural.
The blog post was the same in being natural in almost all metrics.
The only factor that could have been put into question was the use of anchor text. It was an exact match phrase for a head term this site had been attempting to rank for in the past. That might be an obvious signal and is one of the first places to look for unnatural links, but it gives an interesting nod to where Google may be taking this.

3. Co-Citation and the End of Commercial Anchors?

A lot has been written about the changing face of anchor text use and the rise of co-citation and co-occurrence. I penned a piece a few months ago in fact one the future of link building without links. It seems as though Google now wants to accelerate this by putting more pressure on those still using exact match tactics.
It is certainly my view now that links are playing a less significant role in general rankings. Yes, a site has to have a good core of links, but Google's algorithms are now much more complex. That means Google is looking at more and more metrics to define the search visibility of a domain, which leaves less room for "links" as a contributory factor.
Given that semantic search also isn't reliant on links and that Google has made clear its intentions to move toward this future, it's clear that brand mentions, social sharing, and great content that is produced regularly and on point, is becoming more critical.
Links are by no means dead. Anyone that says that is crazy. But there is certainly more contributing to visibility now.

4. Check Your Page-Level Anchor Text

Penguin 2.0 has also changed the way we look at penalties in general. While it was OK to simply take a domain-wide view of link metrics such as quality, anchor text, and relevance, that's no longer enough.
The search giant has become much more targeted in its application of penalties, certainly since Penguin 2.0. As a result, we're now seeing partial penalties being reported in Webmaster Tools, as well as full manual actions and a plethora of other actions.
This means one thing: Google understands its data better than ever and is looking at the quality of links in a much deeper way, not just as those pointing directly to your site but even where those sites are getting their link juice from.

5. Look Out for Different Pages Ranking

One sure-fire sign of issues with individual page over-optimization or penalization is where Google struggles to index what you would consider as the "right" page for a term. This is often because Google is ignoring the "right" page and instead looking to other pages on your site.
If you see different pages ranking for a specific term within a few weeks, then it's worth checking the anchor text and links specifically pointing to that page.
Often you may find just one or two links pointing to it but 50+ percent may be exact match and that seems now to be enough to create issues.

What Now?

The key is to be informed. Invest in multiple data source to ensure you have the full picture. You can use the following:
The above combination allows you to take a full picture view of every link on your site and gives you a second opinion should you feel it necessary. Removing links is a significant strategy. It pays to have more than one view to back up initial findings on things such as anchor text use and link quality and trust.
Alongside that, it's worth running a check of every linked-to page on your site you can then check anchor text ratios for every one. That way you can reduce the impact of partial actions.
The key is to reduce the use of exact match anchors as much as humanly possible as tolerated percentages are only going one way!
Above all, it may be time to start thinking beyond links entirely and onto a world of "brand as publisher," creating great content from a clearly defined content strategy, and then supporting it with an informed distribution strategy. But that's a story for another day.

How to Build Links Using Expired Domains



Expired
Image Credit: Travis Isaacs/Flickr
Many people have had great success snapping up expired domains and using those sites for link building purposes. One of the main reasons for this was that it saved work, as you could grab a site that already had content and backlinks and at least a baseline established presence.
However, after the past year with all the Google changes that make link building trickier than ever, this process is no longer as easy and safe as it once was, but it can still be valuable if you think about what you're doing and don't just buy every domain that has your desired keyword in it then hastily 301 redirect it to your own site or trash the content with links to your main site, expecting miracles.
Affiliate marketers are also fond of expired domains to use for their work so while we won't go into detail on that, we will cover some topics that are relevant for that specific use.

How to Find Dropped/Expired/Expiring Domains?

Domain Tools is one of the main places that I check but there are many sites that list expired or about-to-expire domains that are up for grabs. Network Solutions has custom email alerts where you can put in a keyword and get an email when domains matching that are expiring so that's a nice option for those of you who like a more passive approach.
Network Solutions Expiring Domains
Snap Names is also good, as is Drop Day. You may find that there are certain sites that are best for your purposes (whether it's keeping an eye on ones you want or getting ones that just expired) so look around and figure out what best suits you.
Want a domain that's at least 9 years old and has a listing in DMOZ? Domain Tools is where I'd go for that, for example:
Domain Tools Dropping Names
Of course if you come across a domain that you like and it's not set to expire any time soon, there's nothing wrong with emailing the owner and asking to buy it.
Domain may be for sale

How to Vet Expired Domains

  • Check to see what domains 301 redirect to them. I use Link Research Tools for this as you can run a backlink report on the domain in question and see the redirects. If you find a domain that has 50 spammy 301s pointing to it, it may be more trouble that it's worth. Preventing a 301 from coming through when you don't control the site that redirects is almost impossible. You can block this on the server level but that won't help you with your site receiving bad link karma from Google. In that case, you may have to disavow those domains.
  • Check their backlinks using your link tool of choice. Is the profile full of nothing but spam that will take ages to clean up or will you have to spend time disavowing the links? If so, do you really want to bother with it? If you want to buy the domain to use for a 301 redirect and it's full of spammy links, at least wait until you've cleared that all up before you 301 it.
  • Check to see if they were ever anything questionable using the Wayback Machine. If the site simply wasn't well done 2 years ago, that's not nearly as big of a problem as if you're going to be using the site for educating people about the dangers of lead and it used to be a site that sold Viagra.
  • Check to see if the brand has a bad reputation. Do some digging upfront so you can save time disassociating yourself from something bad later. You know how sometimes you get a resume from a person and you ask an employee if they know this Susan who also used to work at the same place that your current employee worked years ago and your employee says "oh yes I remember her. She tried to burn the building down once"? Well, Susan might try to burn your building down, too.
  • Check to see if they were part of a link network. See what other sites were owned by the same person and check them out too.
  • Check to see if they have an existing audience. Is there an attached forum with active members, are there people generally commenting on posts and socializing them, etc.?

How Should You Use Expired Domains?

Many people 301 redirect these domains to their main sites or secondary sites in order to give them a boost. Others turn them into part of their legitimate online arsenal and use them as a proper standalone resource.
Some people add them to their existing blog network and interlink them. Some people keep them and use them to sell links. Some people keep them and try to resell them. Some people use them to try their hand at affiliate marketing.
However that's talking about how people use them, not about how they should use them, but how you should use them is up to you.
I once worked with an account where we used tons of microsites. They were standalone sites that each linked to the main brand site and we built links to them. It worked for a while (and still works for many people according to what I see in forums) but as far as I can tell, most of those microsites are no longer in Google's index or no longer contain live links to the brand site. That's because in that case, it stopped working and became more of a danger than anything else. They served no purpose at all other than to host a link to the brand site, and since they gained no authority, it just wasn't worth the trouble of keeping them up.
I've also dealt with someone who successfully bought expired domains and redirected them to subdomains on his main site in order to split it up into a few niche subdomains. He didn't overdo it, and each expired domain had a good history with content relevant to what the subdomain was, so it all worked very well.
As mentioned early on, affiliate marketers also use expired domains. One big benefit of this is that if you plan to just use PPC for affiliate marketing, you don't have to be as concerned about the backlink profile of the domain as you might not care that much about its organic rankings.

Some Good Signs of Expired Domains

Some of these probably depend upon the purpose you have in mind, but here are a few things I like to see on an expired or expiring domain but please keep in mind that these aren't discrete defining features of a quality domain; they are simply a couple of signs that the domain might be a good one to use:
  • Authority links that will pass through some link benefits via a 301 redirect (if I'm going that route.)
  • An existing audience of people who regularly contribute, comment, and socialize the site's content (if I'm going to use it as a standalone site.) If I'm looking to buy a forum, for example, I'd want to make sure that there are contributing members with something to offer already there. If I want a site that I will be maintaining and adding to and plan to build it out further, seeing that there's an audience of people reading the content, commenting on it, and socializing it would make me very happy.
  • A decent (and legitimate) Toolbar PageRank (TBPR) that is in line with where I think it should be. If I see a site that is 7 months old and has a TBPR of 6, I'll obviously be suspicious, and if I found one that was 9 years old and was a TBPR 1, I would hestitate before using it, for example. I also have to admit that while I don't rely on TBPR as a defining metric of quality, I'd be crazy to pretend that it means nothing so it's definitely something I look at.
  • A domain age of at least 2 years if I was going to do anything other than hold it and try to resell it.
  • Internal pages that have TBPR. If there are 5000 pages and only the homepage has any TBPR, I'd be a bit suspicious about why no internal pages had anything.

A Few Red Flags of Expired Domains

  • Suspicious TBPR as mentioned above.
  • The domain isn't indexed in Google. Even if you look at a recently expired site and see it has a TBPR of 4 with good Majestic flow metrics, is 5 years old, and has been updated in some way until it expired (whether through new blog posts, comments, social shares, etc.), it's safe to ssume it's not indexed for a good reason and you probably want to stay away from it.
  • Backlink profile is full of nothing but spam.
  • All comments on the site's posts are spammy ones and trackbacks.

Bottom Line: Is Using Expired Domains a Good Idea?

As with almost anything in SEO right now, some tactics aren't really great ideas for the long-term but since they work for the short-term, people still use them. Some tactics that won't work in one niche will still work well in certain other niches and some sites seem to be able to weather just about any algorithmic change in Google.
That's why it's hard to say that you shouldn't do this, or you should do that, because every case is different, every webmaster/site owner has a different idea about risk, and a lot of people have made a lot of money off doing things that I personally wouldn't do.
I don't have time to keep up the blogging on my own site so I would never expect that I could keep it up on five sites, each devoted to a specific area of my industry, but with the right manpower and the right people, this can be a successful strategy for many.
If you plan to use them for affiliate marketing and you're going to use PPC for that, you don't have to worry about some of the things that you would have to be concerned with if you planned to rank well.
In the end, it depends on what you want to do, how much time and effort you have to put into doing well, and how much risk you can handle, just like everything else.

Monday, September 2, 2013

Google Panda Update Coming Within Days; 'Next Generation' of Penguin in Works

You can expect another Google Panda update to roll out this Friday or Monday, according to Google’s Distinguished Engineer Matt Cutts.
Also, Cutts has revealed that Google is working on a significant change to the Penguin algorithm.

Google Panda Update Coming Within Days

Panda is Google’s algorithm aimed at surfacing high-quality sites higher in search results. It was first released in February 2011.
The next Panda update (or refresh) is due to arrive either Friday (March 15) or Monday (March 18, Cutts said according to reports coming out of the SMX West conference. Google’s last Panda refresh (and only one so far in 2013) was January 22 and affected 1.2 percent of English queries.
Keep on your analytics over the next few days. If you see an unexplainable surge in traffic, it could indicate that Panda is about to maul your website.

Google Penguin: The Next Generation

It isn’t known when the next Penguin update will arrive, but Cutts revealed Google is working on a “new generation of Penguin.” The Penguin algorithm, initially released last April, was designed to reduce web spam and also hit website that had link profiles that appeared unnatural. The most recent Penguin refresh was in October.
Also, Cutts said the update will be significant and one of the most talked about Google algorithm updates this year. Which would make that two years running.
Cutts also put out word that Google plans to target more link networks this year, including one or two big ones within the next few weeks.
Could the next generation of Penguin somehow be related to another big change Cutts had already announced Google is working on involving merchant quality? Hard to know at this point, but what’s clear is the next few months are likely to get pretty bumpy for many websites and merchants on Google.

Algorithm Updates, Duplicate Content & A Recovery

Like many in search engine optimization, I watch the major algorithmic updates or “boulders” roll down the hill from the Googleplex and see which of us will be knocked down. For those of us who get squashed, we stand back up, dust ourselves off, and try to assess why we were the unlucky few that got rolled. We wait for vague responses from Google or findings from the SEO community.
Panda taught us that “quality content” was of focus and even if you were in the clear sites that link to you may have been devalued, thus affecting your overall authority. My overall perception of the Penguin update was that it was designed primarily to attack unnatural link practices and web spamming techniques, as well as a host of less focused topics such as AdSense usage and internal linking queues.
Duplicate content was mentioned here as a part of meeting Google’s quality guidelines but my overall observation was that it was not mentioned by many to be a major factor in the update.

The Head Scratching Period

After the Penguin update of late April 2012 hit, I quickly noticed that one of my client’s traffic began to slowly lose rankings and traffic. At first, it didn't seem to be an overnight slam by Google, but more so a slow decrease in referrals.
I soon began my post Penguin checklist and noticed that no major Penguin topics were ones that should be providing negative effects on the client site. This ultimately left me at the point of content quality.
I reviewed the content of the affected site sections. It looked fine, was informational, not keyword stuffed, and met Google’s Quality Guidelines. Or so I thought.

The Research

Time progressed and other site recommendations were placed on unaffected site areas as I tried to determine why rankings and referrals continued to fall in the aforementioned informational site areas.
I quizzed the site owner as to who developed the content originally. He stated it was himself using content found from other sites and placed on the site in the last few years.
First, I took a look at several years of organic data and noticed that they were hit very hard at the Panda rollout. Shaking my head but also glad we had found the issue we took to the site to pinpoint how much duplication was done.
Using tools such as Webconfs Similar Copy tool and Copyscape we found several site pages with either a large percentage of cross domain scraped duplication down to exact content snippets in copy originated by other sites.

The Resolution

A content writing resource worked quickly to rewrite unique copy for these pages to reduce the percentage of duplication. All of the affected pages were then released in their new unique state.
I had assumed that the recovery period may take a slow progression as the penalty in this case had come about slowly. Surprisingly, soon we saw that our pre-Penguin rankings and traffic appeared in a day’s time.
google-analytics-rankings-traffic-come-back

Questions

The rankings and traffic came back and are still there. After celebrating it is time for some after action review that leads to many questions including:
  • Is duplicate content, scraping, all that is included in Google’s Quality Guidelines more of a factor in the Penguin update than the SEO community considered?
  • If you get your pre-algorithmic update rankings back in a day, why weren’t they all lost in a day?
  • I also understand that there are multiple algorithmic updates on a daily basis, but it is interesting that the ranking and traffic decline happened right during Penguin. There have been other algorithmic happenings and refreshes in the period from then to now but am used to update refreshes being a leash easement on the algorithmic update and usually you see a rankings improvement. Why did I continue to see the slow negative trend?
Ultimately, I think the above story shows that it is quite important to know where your SEO client’s or company’s content originated if it precedes your involvement with site SEO efforts.
A recent post by Danny Goodwin,”Google’s Rating Guidelines Add Page Quality to Human Reviews”, rang for a while inside my head as it reinforces that even more so we need to be mindful of our site content. This includes ensuring it is unique first and foremost but engaging, constantly refreshed and meaningful for consideration in SEO improvement.
Unknown scraping efforts are, in my opinion, more dangerous than incidental on-site content duplication via dynamic parameter usage, session ID usage, on-site copy spinning (e.g., copy variations on location pages, etc.). All of these dangerous practices knowingly or unknowingly fall into the realm of content quality and showing devotion to your site content will allow you to provide fresh and unique copy that post Caffeine (yet again, another big algorithm update) Googlebot will enjoy crawling.

Insights From the Recent Penguin & Panda Updates

Google recently rolled out three major algorithmic updates that have left many websites reeling. In between two Google Panda refreshes (on April 17 and 27) was the April 24 launch of the Penguin Update.
The Panda update is more of a content related update, targeting sites with duplicate content and targeting spammers who scrape content. The first Panda update was over a year ago and Google has been releasing periodic updates ever since.
The Penguin update algorithm appears to be targeting many different factors, including low quality links. The purpose per Google was to catch excessive spammers, but it seems some legit sites and SEOs have been caught with this latest algo change.

What Exactly Happened?

An analysis of six sites that have been affected in a big way by Google Penguin offers some helpful insights. The Penguin algo seems to be looking at three major factors:
  • If the majority of a website’s backlinks are low quality or spammy looking (e.g., sponsored links, links in the footers, links from directories, links from link exchange pages, links from low quality blog networks). 
  • If majority of a website’s backlinks are from unrelated websites. 
  • If too many links are pointing back to a website with exact match keywords in the anchor texts
Some of these sites had only directory type and link exchange type backlinks that were affected. Some other sites had variety of different types of links, including link buys.
Google must be looking at the overall percentage of low quality links as a factor. Penguin doesn’t seem to have affected sites with a better mix of natural looking links and low-quality links.
A few other websites lost search rankings on Google for specific keywords during the Panda and Penguin rollouts. It appears anchor text was to blame in these cases, as the links pointing to these sites concentrated on only one or a few keywords.
What’s it all mean? The impact of Penguin will vary depending on how heavily a site’s link profile is skewed in the direction of the above three factors. Some sites may have lost rankings for everything while some sites may have lost rankings on only specific keywords.

Specific Details About a Few Sites Affected by Penguin

We used some backlink analyzers to look at the below factors to try and figure out what may have caused the drops:
  • Presence of footer links
  • Links from unrelated sites
  • Consecutive sponsored links, with no text descriptions in between the different links
  • Site-wide links 
  • Too many exact match keyword links in anchor texts being the majority of the links 
  • Specific keywords that had dropped in rankings having over 10 percent of the links in anchor texts
We also looked at site SEO and duplicate content as a factor.
Two sites had done little link building other than manual directory submissions and link exchanges. Those sites had the following problems:
  • Majority of links were unrelated due to high number of directory type links. The unrelated links were as high as 90 percent. By unrelated, I mean the subject of the sites linking to the impacted sites didn’t have similar/related content or were too general. 
  • More than 50 percent of links were targeting keywords vs. brand name or non-keywords.
Four sites had a variety of different types of links such as directories, link exchange, articles published on different blogs, sponsored links, and social media links. Those sites exhibited these problems:
  • Between 50 and 70 percent unrelated links 
  • More than 50 percent of links targeting keywords vs. brand name or non-keywords 
  • Those that had sponsored links had some consecutive sponsored links (i.e., a bunch of links with no text descriptions in between) 
  • Those that had sponsored links had some footer links (i.e., the links coming from external sites to them were placed towards the bottom of the page; it could also be on the right panel, but if you view the source code, the links would be in the bottom 5 percent of the text content)
In addition, two sites out of the last four had duplicate content issues.
One affected site had too many doorway pages with city/state pages. Google specifically mentions that doorway pages, which are only built to attract search engine traffic, are against their webmaster guidelines. Regardless, many people still use this technique.
It seems these doorway pages may have affected this specific site’s ranking. From what we can tell the doorway page penalization was due to Panda, as the site started losing rankings on April 17. However, they lost further rankings on April 24, so the Penguin update also hit them.
A different site had some duplicate content issues from their affiliates who copied their content. It’s still unclear if this had an affect on the drop.
Another site was selling links on his website in the footer area. The links were relevant to the subject of the website. Two sponsored links were located on the main page. Some internal pages also had sponsored links, but no more than three on any given page. This also may have been an issue.
The majority of your links shouldn’t be from directories, as two sites learned. Many sites unaffected by Google Penguin also had directory links, but they escaped because they also had relevant and high quality links. The good news: if you do your own relevant link building, then you don’t need to worry about a competitor doing negative SEO to try to get you de-ranked.

What to do if Your Site was Affected by Penguin

Here are four suggestions to start cleaning up your site:
  • If you have links from too many unrelated sites, such as directories, either remove some or try to get more links from related sites. You should have links from related websites at the minimum at 20 percent of your overall links. 
  • If you have too many keyword links coming in, then vary your keywords and mix your brand name and URL in the links. Have at least 20 percent of your keyword links be non-keyword or brand-based. 
  • If you are doing sponsored links, be careful! Cancel or remove any links you have from footers. Remove any sponsored links that don’t include a text description next to it. Contextual links are much better, meaning it’s better to have links from within text content of a website. 
  • Make the above changes a few at a time and wait a few days to see if rankings come back, before proceeding. However, Penguin will only run periodically like Panda, so it could be weeks before any affected websites recover their rankings.
If you do your own SEO, then you probably have an idea of which links are low-quality and what you should do. However, if you are a newbie and don’t know how to analyze your backlinks, try SEOMoz or Majestic SEO. They both offer limited free analysis, but for a more detailed analysis or analyzing more than one website, you would need to get the premium version.
If you use an SEO firm, then you should make sure to ask for a detailed link report to see what exactly your SEO firm is doing. There are many SEO companies that keep their clients in the dark and never send link reports.
You need to make sure the company you are using discloses what they do and that they don’t engage in tactics that Google may not like. If the company refuses to release this info, that means they are either hiding something that they don’t want you to find out, such as black hat tactics or they really don’t have much to show you. In that case, run and cancel their service ASAP.

After Panda & Penguin, is Google Living Up to Its Great Expectations?

evolution-of-penguin
It all starts with Google, doesn’t it? Not really – it’s all about Google today because Google is the most used search engine.
Google, like any other software, evolves and corrects its own bugs and conceptual failures. The goal of the engineers working at Google is to constantly improve its search algorithm, but that’s no easy job.
Google is a great search engine, but Google is still a teenager.
This article was inspired by my high expectations of the Google algorithm that have been blown away in the last year, seeing how Google’s search results “evolved.” If we look at some of the most competitive terms in Google we will see a search engine filled with spam and hacked domains ranking in the top 10.
google-vs-blekko-spam

Why Can Google Still Not Catch Up With the Spammers?

Panda, Penguin, and the EMD update did clear some of the clutter. All of these highly competitive terms have been repeatedly abused for years. I don’t think there was ever a time when these results were clean, in terms of what Google might expect from its own search engine.
Even weirder is that the techniques used to rank this spam are as old as (if not older than) Google itself. And this brings me to a question:
The only difference between now and then is the period of time a spam result will “survive” in the SERPs. Now it's decreased from weeks to days, or even hours in some cases.
One of the side effects of Google's various updates is a new business model: ranking high on high revenue-generating keywords for a short amount of time. For those people involved in this practice, it scales very well.

How Google Ranks Sites Today: A Quick Overview

These are two of the main ranking signal categories:
  • On-page factors.
  • Off-page factors.
On-page and off-page have existed since the beginning of the search engine era. Now let’s take a deeper look at the most important factors that Google might use.
Regarding the on-page factors Google will try to understand and rate the following:
  • How often a site is updated. A site that isn't updated often doesn't mean it's low quality. This just tells Google how often it should crawl the site and it will compare the update frequency to other sites’ update frequency in the same niche to determine a trend and pattern.
  • If the content is unique. Duplicate content matching applies a negative score)
  • If the content provides interest to the users. Bounce rate and traffic data mixed on-page with off-page).
  • If the site is linking out to a bad neighborhood.
  • If the site is inter-linked with high-quality sites in the same niche.
  • If the site is over-optimized from an SEO point of view.
  • Other various smaller on page related factors.
The off-page factors are mainly the links. The social signals are still in their infancy and there is no exact study yet that clearly shows a true correlation of the social signals without being merged with the link signal. It is all speculation until now.
Talking about links they could be easily classified in two big categories:
  • Natural. Link appeared as a result of:
    • The organic development of a page (meritocracy).
    • A result of a “pure” advertising campaign with no intent of directly changing the SERPs.
  • Unnatural. Link appeared:
    • With the purpose to influencing a search engine ranking.
Unfortunately, the unnatural links represent a very large percentage of what the web is today. This is mainly due to Google’s (and the other search engines’) ranking models. The entire web got polluted because of this concept.
When your unnatural link ratio is way higher than your natural (organic) link ratio, it raises a red flag and Google starts watching your site more carefully.
Google tries to fight the unnatural link patterns with various algorithm updates. Some of the most popular updates, that targeted unnatural link patterns and low quality links, are the Penguin and EMD updates.
Google’s major focus today is on improving the way it handles link profiles. This is another difficult task, which is why Google is having a hard time making its way through the various techniques used by SEO pros (black hat or white hat) to influence positively or negatively the natural ranking of a site.

Google's Stunted Growth

Google is like a young teenager stuck on some difficult math problem. Google's learning process apparently involves trying to solve the problem of web spam by applying the same process in a different pattern – why can’t Google just break the pattern and evolve?
Is Google only struggling to maintain an acceptable ranking formula? Will Google evolve or stick with what it’s doing, just in a largely similar format?
Other search engines like Blekko have taken a different route and have tried to crowdsource content curation. While this works well in a variety of niches, the big problem with Blekko is that this content curation is not too “mainstream” putting the burden of the algorithm on the shoulders of its own users. But the pro users appreciate it and make the Blekko results quite good.
In a perfect, non-biased scenario, Google’s ranked results should be:
  • Ranked by non-biased ranking signals.
  • Impossible to be affected by third parties (i.e., negative SEO or positive SEO).
  • Able to tell the difference between bad and good (remember the JCPenny scandal).
  • More diverse and impossible to manipulate.
  • Giving new quality sites a chance to rank near the “giant” old sites.
  • Maintaining transparency.
There is still a long way to go until Google’s technology evolves from the infancy we know today. Will we have to wait until Google is 18 or 21 years old – or even longer – before Google reaches this level of maturity that it dreams of?
Until then, the SEO community is left with testing and benchmarking the way Google evolves – and maybe try to create a book of best practices about search engine optimization.
Google created an entire ecosystem that started backfiring a long time ago. They basically opened the door to all the spam concepts that they are now fighting today.
Is this illegal or immoral, white or black? Who are we to decide? We are no regulatory entity!

Conclusion

Google is a complicated “piece” of software that is being updated constantly, with each update theoretically bringing new fixes and improvements.
None of us were born smart, but we have learned how to become smart as we’ve grown. We never stop learning.
The same applies to Google. We as human beings are imperfect. How could we create a perfect search engine? Are we able to?
I would love to talk with you more. Share your thoughts or ask questions in the comments below.

4 Steps to Panda-Proof Your Website (Before It’s Too Late!)

It may be a new year, but that hasn’t stopped Google from rolling out yet another Panda refresh.
Last year Google unleashed the most aggressive campaign of major algo updates ever in its crusade to battle rank spam. This year looks to be more of the same.
Since Panda first hit the scene two years ago, thousands of sites have been mauled. SEO forums are littered with site owners who have seen six figure revenue websites and their entire livelihoods evaporate overnight, largely because they didn’t take Panda seriously.
If your site is guilty of transgressions that might provoke the Panda and you haven’t been hit yet, consider yourself lucky. But understand that it’s only a matter of time before you do get mauled. No doubt about it: Panda is coming for you.
Over the past year, we’ve helped a number of site owners recover from Panda. We’ve also worked with existing clients to Panda-proof their websites and (knock on wood) haven’t had a single site fall victim to Panda.
Based on that what we’ve learned saving and securing sites, I’ve pulled together a list of steps and actions to help site owners Panda-proof websites that may be at risk.

Step 1: Purge Duplicate Content

Duplicate content issues have always plagued websites and SEOs. But with Panda, Google has taken a dramatically different approach to how they view and treat sites with high degrees of duplicate content. Where dupe content issues pre-Panda might hurt a particular piece of content, now duplicate content will sink an entire website.
So with that shift in attitude, site owners need to take duplicate content seriously. You must be hawkish about cleaning up duplicate content issues to Panda-proof your site.
Screaming Frog is a good choice when you want to identify duplicate pages. This article by Ben Goodsell offers a great tutorial on locating duplicate content issues.
Some suggestions for fixing dupe content issues include:
Now, cleaning up existing duplicate content issues is critical. But it’s just as important to take a preventative measures as well. This means, addressing the root cause of your duplicate content issues before they end up in the index. Yoast offers some great suggestions on how to avoid duplicate content issues altogether.

Step 2: Eradicate Low Quality, Low Value Content

Google’s objective with Panda is to help users find "high-quality" sites by diminishing the visibility (ranking power) of low-quality content, all of which is accomplished at scale, algorithmically. So weeding out low value content should be mission critical for site owners.
But the million dollar question we hear all the time is “what constitutes ‘low quality’ content?”
Google offered guidance on how to asses page-level quality, which is useful to help guide your editorial roadmap. But what about sites that host hundreds or thousands of pages, where evaluating every page by hand isn’t even remotely practical or cost-effective?
A much more realistic approach for larger sites is to look at user engagement signals that Google is potentially using to identify low-quality content. These would include key behavioral metrics such as:
  • Low to no visits.
  • Anemic unique page views.
  • Short time on page.
  • High bounce rates.
Of course, these metrics can be somewhat noisy and susceptible to external factors, but they’re the most efficient way to sniff-out out low value content at scale.
Some ways you can deal with these low value and poor performing pages include:
  • Deleting any content with low to no user engagement signals.
  • Consolidating the content of thin or shallow pages into thicker, more useful documents (i.e., “purge and merge).”
  • Adding additional internal links to improve visitor engagement (and deeper indexation). Tip: make sure these internal links point to high-quality content on your site.
One additional type of low quality content that often gets overlooked is pagination. Proper pagination is highly effective at distributing link equity throughout your site. But high ratios of paginated archives, comments and tag pages can also dilute your site’s crawl budget, cause indexation cap issues and negatively tip the scales of high-to low-value content ratios on your site.
Tips for Panda-proofing pagination include:

Step 3: Thicken-Up Thin Content

Google hates thin content. And this disdain isn’t reserved for spammy scraper sites or thin affiliates only. It’s also directed at sites with little or no original content (i.e., another form of “low value” content).
One of the riskiest content types we see frequently on client sites are thin directory-style pages. These are aggregate feed pages you’d find on ecommerce product pages (both page level and category level); sites with city, state and ZIP code directory type pages (think hotel and travel sites); and event location listings (think ticket brokers). And many sites host thousands of these page types, which other than a big list of hyperlinks have zero-to-no content.
Unlike other low-value content traps, these directory pages are often instrumental in site usability and helping users navigate to deeper content. So deleting them or merging them isn’t an option.
Instead, the best strategy here is to thicken up these thin directory pages with original content. Some recommendations include:
  • Drop a thousand words of original, value-add content on the page in an effort to treat each page as a comprehensive guide on a specific topic.
  • Pipe in API data and content mash-ups (excellent when you need to thicken hundreds or thousands of pages at scale).
  • Encourage user reviews.
  • Add images and videos.
  • Move thin pages off to subdomains, which Google hints at. Though we use this is as more of a “stop gap” approach for sites that have been mauled by Panda and are trying to rebound quickly, rather than a long-term, sustainable strategy.
It’s worth noting that these recommendations can be applied to most types of thin content pages. I’m just using directory style pages as an example because we see them so often.
When it comes to discovering thin content issues at scale, take a look at word count. If you’re running WordPress, there are a couple of plugins you can use to asses word count for every document on your site:
As well, here are some all-purpose plugin recommendations to help in the war against Panda.
All in all, we’re seeing documents that have been thickened up get a nice boost in rankings and SERP visibility. And this isn’t boost isn’t a temporal QDF bump. In the majority of cases, when thickening up thin pages, we’re seeing permanent ranking improvements over competitor pages.

Step 4: Develop High-Quality Content

On the flipside of fixing low or no-value content issues, you must adopt an approach of only publishing the highest quality content on your site. For many sites, this is a total shift in mindset, but nonetheless raising your content publishing standards is essential to Panda-proofing your site.
Google describes “quality content” as “content that you can send to your child to learn something.” Which is a little vague but to me it says two distinct things:
  • Your content should be highly informative.
  • Your content should easy to understand (easy enough that a child can comprehend it).
For a really in-depth look at “What Google Considers Quality Content,” check out Brian Ussery’s excellent analysis.
When publishing content on our own sites, we ask ourselves a few simple quality control questions:
  • Does this content offer value?
  • Is this content you would share with others?
  • Would you link to this content as an informative resource?
If a piece of content doesn’t meet these basic criteria, we work to improve it until it does.
Now, when it comes to publishing quality content, many site owners don’t have the good fortune of having industry experts in house and internal writing resources at their disposal. In those cases, you should consider outsourcing your content generation to the pros.
Some of the most effective ways we use to find professional, authoritative authors include:
  • Placing an ad on Craigslist and conduct a “competition.” Despite what the critics say, this method works really and you can find some excellent, cost-effective talent.  “How to Find Quality Freelance Authors on Craigslist” will walk you through the process.
  • Reaching out to influential writers in your niche with columns on high profile pubs. Most of these folks do freelance work and are eager to take on new projects. You can find these folks with search operators like [intitle:“your product nice” intext:“meet our bloggers”] or [intitle:“your product nice” intext: “meet our authors”] since many blogs publish an author’s profile page.
  • Targeting published authors on Amazon.com is a fantastic way to find influential authors who have experience writing on topics in your niche.
Apart from addressing writing resource deficiencies, the advantages of hiring topic experts or published authors include:
Finally, I wanted to address the issue of frequency and publishing quality content. Ask yourself this: are you publishing content everyday on your blog, sometimes twice a day? If so, ask yourself “why?”
Is it because you read on a popular marketing blog that cranking out blog posts each and every day is a good way to target trending topics and popular terms, and flood the index with content that will rank in hundreds of relevant mid-tail verticals?
If this is your approach, you might want to rethink it. In fact, I’d argue that 90 percent of sites that use this strategy should slow down and publish better, longer, meatier content less frequently.
In a race to “publish every day!!!” you’re potentially polluting the SERPs with quick, thin, low value posts and dragging down the overall quality score of your entire site. So if you fall into this camp, definitely stop and think about your approach. Test the efficacy of fewer, thicker posts vs short-form “keyword chasing” articles.

Panda-Proofing Wrap Up

Bottom line: get your site in-shape before it’s too late. Why risk being susceptible to every Panda update, when Armageddon is entirely avoidable.
The SEO and affiliate forums are littered with site owners who continue to practice the same low value tactics in spite of the clear dangers because they were cheap and they worked. But look at those sites now. Don’t make the same mistake.