Monday, September 2, 2013

Algorithm Updates, Duplicate Content & A Recovery

Like many in search engine optimization, I watch the major algorithmic updates or “boulders” roll down the hill from the Googleplex and see which of us will be knocked down. For those of us who get squashed, we stand back up, dust ourselves off, and try to assess why we were the unlucky few that got rolled. We wait for vague responses from Google or findings from the SEO community.
Panda taught us that “quality content” was of focus and even if you were in the clear sites that link to you may have been devalued, thus affecting your overall authority. My overall perception of the Penguin update was that it was designed primarily to attack unnatural link practices and web spamming techniques, as well as a host of less focused topics such as AdSense usage and internal linking queues.
Duplicate content was mentioned here as a part of meeting Google’s quality guidelines but my overall observation was that it was not mentioned by many to be a major factor in the update.

The Head Scratching Period

After the Penguin update of late April 2012 hit, I quickly noticed that one of my client’s traffic began to slowly lose rankings and traffic. At first, it didn't seem to be an overnight slam by Google, but more so a slow decrease in referrals.
I soon began my post Penguin checklist and noticed that no major Penguin topics were ones that should be providing negative effects on the client site. This ultimately left me at the point of content quality.
I reviewed the content of the affected site sections. It looked fine, was informational, not keyword stuffed, and met Google’s Quality Guidelines. Or so I thought.

The Research

Time progressed and other site recommendations were placed on unaffected site areas as I tried to determine why rankings and referrals continued to fall in the aforementioned informational site areas.
I quizzed the site owner as to who developed the content originally. He stated it was himself using content found from other sites and placed on the site in the last few years.
First, I took a look at several years of organic data and noticed that they were hit very hard at the Panda rollout. Shaking my head but also glad we had found the issue we took to the site to pinpoint how much duplication was done.
Using tools such as Webconfs Similar Copy tool and Copyscape we found several site pages with either a large percentage of cross domain scraped duplication down to exact content snippets in copy originated by other sites.

The Resolution

A content writing resource worked quickly to rewrite unique copy for these pages to reduce the percentage of duplication. All of the affected pages were then released in their new unique state.
I had assumed that the recovery period may take a slow progression as the penalty in this case had come about slowly. Surprisingly, soon we saw that our pre-Penguin rankings and traffic appeared in a day’s time.
google-analytics-rankings-traffic-come-back

Questions

The rankings and traffic came back and are still there. After celebrating it is time for some after action review that leads to many questions including:
  • Is duplicate content, scraping, all that is included in Google’s Quality Guidelines more of a factor in the Penguin update than the SEO community considered?
  • If you get your pre-algorithmic update rankings back in a day, why weren’t they all lost in a day?
  • I also understand that there are multiple algorithmic updates on a daily basis, but it is interesting that the ranking and traffic decline happened right during Penguin. There have been other algorithmic happenings and refreshes in the period from then to now but am used to update refreshes being a leash easement on the algorithmic update and usually you see a rankings improvement. Why did I continue to see the slow negative trend?
Ultimately, I think the above story shows that it is quite important to know where your SEO client’s or company’s content originated if it precedes your involvement with site SEO efforts.
A recent post by Danny Goodwin,”Google’s Rating Guidelines Add Page Quality to Human Reviews”, rang for a while inside my head as it reinforces that even more so we need to be mindful of our site content. This includes ensuring it is unique first and foremost but engaging, constantly refreshed and meaningful for consideration in SEO improvement.
Unknown scraping efforts are, in my opinion, more dangerous than incidental on-site content duplication via dynamic parameter usage, session ID usage, on-site copy spinning (e.g., copy variations on location pages, etc.). All of these dangerous practices knowingly or unknowingly fall into the realm of content quality and showing devotion to your site content will allow you to provide fresh and unique copy that post Caffeine (yet again, another big algorithm update) Googlebot will enjoy crawling.

Insights From the Recent Penguin & Panda Updates

Google recently rolled out three major algorithmic updates that have left many websites reeling. In between two Google Panda refreshes (on April 17 and 27) was the April 24 launch of the Penguin Update.
The Panda update is more of a content related update, targeting sites with duplicate content and targeting spammers who scrape content. The first Panda update was over a year ago and Google has been releasing periodic updates ever since.
The Penguin update algorithm appears to be targeting many different factors, including low quality links. The purpose per Google was to catch excessive spammers, but it seems some legit sites and SEOs have been caught with this latest algo change.

What Exactly Happened?

An analysis of six sites that have been affected in a big way by Google Penguin offers some helpful insights. The Penguin algo seems to be looking at three major factors:
  • If the majority of a website’s backlinks are low quality or spammy looking (e.g., sponsored links, links in the footers, links from directories, links from link exchange pages, links from low quality blog networks). 
  • If majority of a website’s backlinks are from unrelated websites. 
  • If too many links are pointing back to a website with exact match keywords in the anchor texts
Some of these sites had only directory type and link exchange type backlinks that were affected. Some other sites had variety of different types of links, including link buys.
Google must be looking at the overall percentage of low quality links as a factor. Penguin doesn’t seem to have affected sites with a better mix of natural looking links and low-quality links.
A few other websites lost search rankings on Google for specific keywords during the Panda and Penguin rollouts. It appears anchor text was to blame in these cases, as the links pointing to these sites concentrated on only one or a few keywords.
What’s it all mean? The impact of Penguin will vary depending on how heavily a site’s link profile is skewed in the direction of the above three factors. Some sites may have lost rankings for everything while some sites may have lost rankings on only specific keywords.

Specific Details About a Few Sites Affected by Penguin

We used some backlink analyzers to look at the below factors to try and figure out what may have caused the drops:
  • Presence of footer links
  • Links from unrelated sites
  • Consecutive sponsored links, with no text descriptions in between the different links
  • Site-wide links 
  • Too many exact match keyword links in anchor texts being the majority of the links 
  • Specific keywords that had dropped in rankings having over 10 percent of the links in anchor texts
We also looked at site SEO and duplicate content as a factor.
Two sites had done little link building other than manual directory submissions and link exchanges. Those sites had the following problems:
  • Majority of links were unrelated due to high number of directory type links. The unrelated links were as high as 90 percent. By unrelated, I mean the subject of the sites linking to the impacted sites didn’t have similar/related content or were too general. 
  • More than 50 percent of links were targeting keywords vs. brand name or non-keywords.
Four sites had a variety of different types of links such as directories, link exchange, articles published on different blogs, sponsored links, and social media links. Those sites exhibited these problems:
  • Between 50 and 70 percent unrelated links 
  • More than 50 percent of links targeting keywords vs. brand name or non-keywords 
  • Those that had sponsored links had some consecutive sponsored links (i.e., a bunch of links with no text descriptions in between) 
  • Those that had sponsored links had some footer links (i.e., the links coming from external sites to them were placed towards the bottom of the page; it could also be on the right panel, but if you view the source code, the links would be in the bottom 5 percent of the text content)
In addition, two sites out of the last four had duplicate content issues.
One affected site had too many doorway pages with city/state pages. Google specifically mentions that doorway pages, which are only built to attract search engine traffic, are against their webmaster guidelines. Regardless, many people still use this technique.
It seems these doorway pages may have affected this specific site’s ranking. From what we can tell the doorway page penalization was due to Panda, as the site started losing rankings on April 17. However, they lost further rankings on April 24, so the Penguin update also hit them.
A different site had some duplicate content issues from their affiliates who copied their content. It’s still unclear if this had an affect on the drop.
Another site was selling links on his website in the footer area. The links were relevant to the subject of the website. Two sponsored links were located on the main page. Some internal pages also had sponsored links, but no more than three on any given page. This also may have been an issue.
The majority of your links shouldn’t be from directories, as two sites learned. Many sites unaffected by Google Penguin also had directory links, but they escaped because they also had relevant and high quality links. The good news: if you do your own relevant link building, then you don’t need to worry about a competitor doing negative SEO to try to get you de-ranked.

What to do if Your Site was Affected by Penguin

Here are four suggestions to start cleaning up your site:
  • If you have links from too many unrelated sites, such as directories, either remove some or try to get more links from related sites. You should have links from related websites at the minimum at 20 percent of your overall links. 
  • If you have too many keyword links coming in, then vary your keywords and mix your brand name and URL in the links. Have at least 20 percent of your keyword links be non-keyword or brand-based. 
  • If you are doing sponsored links, be careful! Cancel or remove any links you have from footers. Remove any sponsored links that don’t include a text description next to it. Contextual links are much better, meaning it’s better to have links from within text content of a website. 
  • Make the above changes a few at a time and wait a few days to see if rankings come back, before proceeding. However, Penguin will only run periodically like Panda, so it could be weeks before any affected websites recover their rankings.
If you do your own SEO, then you probably have an idea of which links are low-quality and what you should do. However, if you are a newbie and don’t know how to analyze your backlinks, try SEOMoz or Majestic SEO. They both offer limited free analysis, but for a more detailed analysis or analyzing more than one website, you would need to get the premium version.
If you use an SEO firm, then you should make sure to ask for a detailed link report to see what exactly your SEO firm is doing. There are many SEO companies that keep their clients in the dark and never send link reports.
You need to make sure the company you are using discloses what they do and that they don’t engage in tactics that Google may not like. If the company refuses to release this info, that means they are either hiding something that they don’t want you to find out, such as black hat tactics or they really don’t have much to show you. In that case, run and cancel their service ASAP.

After Panda & Penguin, is Google Living Up to Its Great Expectations?

evolution-of-penguin
It all starts with Google, doesn’t it? Not really – it’s all about Google today because Google is the most used search engine.
Google, like any other software, evolves and corrects its own bugs and conceptual failures. The goal of the engineers working at Google is to constantly improve its search algorithm, but that’s no easy job.
Google is a great search engine, but Google is still a teenager.
This article was inspired by my high expectations of the Google algorithm that have been blown away in the last year, seeing how Google’s search results “evolved.” If we look at some of the most competitive terms in Google we will see a search engine filled with spam and hacked domains ranking in the top 10.
google-vs-blekko-spam

Why Can Google Still Not Catch Up With the Spammers?

Panda, Penguin, and the EMD update did clear some of the clutter. All of these highly competitive terms have been repeatedly abused for years. I don’t think there was ever a time when these results were clean, in terms of what Google might expect from its own search engine.
Even weirder is that the techniques used to rank this spam are as old as (if not older than) Google itself. And this brings me to a question:
The only difference between now and then is the period of time a spam result will “survive” in the SERPs. Now it's decreased from weeks to days, or even hours in some cases.
One of the side effects of Google's various updates is a new business model: ranking high on high revenue-generating keywords for a short amount of time. For those people involved in this practice, it scales very well.

How Google Ranks Sites Today: A Quick Overview

These are two of the main ranking signal categories:
  • On-page factors.
  • Off-page factors.
On-page and off-page have existed since the beginning of the search engine era. Now let’s take a deeper look at the most important factors that Google might use.
Regarding the on-page factors Google will try to understand and rate the following:
  • How often a site is updated. A site that isn't updated often doesn't mean it's low quality. This just tells Google how often it should crawl the site and it will compare the update frequency to other sites’ update frequency in the same niche to determine a trend and pattern.
  • If the content is unique. Duplicate content matching applies a negative score)
  • If the content provides interest to the users. Bounce rate and traffic data mixed on-page with off-page).
  • If the site is linking out to a bad neighborhood.
  • If the site is inter-linked with high-quality sites in the same niche.
  • If the site is over-optimized from an SEO point of view.
  • Other various smaller on page related factors.
The off-page factors are mainly the links. The social signals are still in their infancy and there is no exact study yet that clearly shows a true correlation of the social signals without being merged with the link signal. It is all speculation until now.
Talking about links they could be easily classified in two big categories:
  • Natural. Link appeared as a result of:
    • The organic development of a page (meritocracy).
    • A result of a “pure” advertising campaign with no intent of directly changing the SERPs.
  • Unnatural. Link appeared:
    • With the purpose to influencing a search engine ranking.
Unfortunately, the unnatural links represent a very large percentage of what the web is today. This is mainly due to Google’s (and the other search engines’) ranking models. The entire web got polluted because of this concept.
When your unnatural link ratio is way higher than your natural (organic) link ratio, it raises a red flag and Google starts watching your site more carefully.
Google tries to fight the unnatural link patterns with various algorithm updates. Some of the most popular updates, that targeted unnatural link patterns and low quality links, are the Penguin and EMD updates.
Google’s major focus today is on improving the way it handles link profiles. This is another difficult task, which is why Google is having a hard time making its way through the various techniques used by SEO pros (black hat or white hat) to influence positively or negatively the natural ranking of a site.

Google's Stunted Growth

Google is like a young teenager stuck on some difficult math problem. Google's learning process apparently involves trying to solve the problem of web spam by applying the same process in a different pattern – why can’t Google just break the pattern and evolve?
Is Google only struggling to maintain an acceptable ranking formula? Will Google evolve or stick with what it’s doing, just in a largely similar format?
Other search engines like Blekko have taken a different route and have tried to crowdsource content curation. While this works well in a variety of niches, the big problem with Blekko is that this content curation is not too “mainstream” putting the burden of the algorithm on the shoulders of its own users. But the pro users appreciate it and make the Blekko results quite good.
In a perfect, non-biased scenario, Google’s ranked results should be:
  • Ranked by non-biased ranking signals.
  • Impossible to be affected by third parties (i.e., negative SEO or positive SEO).
  • Able to tell the difference between bad and good (remember the JCPenny scandal).
  • More diverse and impossible to manipulate.
  • Giving new quality sites a chance to rank near the “giant” old sites.
  • Maintaining transparency.
There is still a long way to go until Google’s technology evolves from the infancy we know today. Will we have to wait until Google is 18 or 21 years old – or even longer – before Google reaches this level of maturity that it dreams of?
Until then, the SEO community is left with testing and benchmarking the way Google evolves – and maybe try to create a book of best practices about search engine optimization.
Google created an entire ecosystem that started backfiring a long time ago. They basically opened the door to all the spam concepts that they are now fighting today.
Is this illegal or immoral, white or black? Who are we to decide? We are no regulatory entity!

Conclusion

Google is a complicated “piece” of software that is being updated constantly, with each update theoretically bringing new fixes and improvements.
None of us were born smart, but we have learned how to become smart as we’ve grown. We never stop learning.
The same applies to Google. We as human beings are imperfect. How could we create a perfect search engine? Are we able to?
I would love to talk with you more. Share your thoughts or ask questions in the comments below.

4 Steps to Panda-Proof Your Website (Before It’s Too Late!)

It may be a new year, but that hasn’t stopped Google from rolling out yet another Panda refresh.
Last year Google unleashed the most aggressive campaign of major algo updates ever in its crusade to battle rank spam. This year looks to be more of the same.
Since Panda first hit the scene two years ago, thousands of sites have been mauled. SEO forums are littered with site owners who have seen six figure revenue websites and their entire livelihoods evaporate overnight, largely because they didn’t take Panda seriously.
If your site is guilty of transgressions that might provoke the Panda and you haven’t been hit yet, consider yourself lucky. But understand that it’s only a matter of time before you do get mauled. No doubt about it: Panda is coming for you.
Over the past year, we’ve helped a number of site owners recover from Panda. We’ve also worked with existing clients to Panda-proof their websites and (knock on wood) haven’t had a single site fall victim to Panda.
Based on that what we’ve learned saving and securing sites, I’ve pulled together a list of steps and actions to help site owners Panda-proof websites that may be at risk.

Step 1: Purge Duplicate Content

Duplicate content issues have always plagued websites and SEOs. But with Panda, Google has taken a dramatically different approach to how they view and treat sites with high degrees of duplicate content. Where dupe content issues pre-Panda might hurt a particular piece of content, now duplicate content will sink an entire website.
So with that shift in attitude, site owners need to take duplicate content seriously. You must be hawkish about cleaning up duplicate content issues to Panda-proof your site.
Screaming Frog is a good choice when you want to identify duplicate pages. This article by Ben Goodsell offers a great tutorial on locating duplicate content issues.
Some suggestions for fixing dupe content issues include:
Now, cleaning up existing duplicate content issues is critical. But it’s just as important to take a preventative measures as well. This means, addressing the root cause of your duplicate content issues before they end up in the index. Yoast offers some great suggestions on how to avoid duplicate content issues altogether.

Step 2: Eradicate Low Quality, Low Value Content

Google’s objective with Panda is to help users find "high-quality" sites by diminishing the visibility (ranking power) of low-quality content, all of which is accomplished at scale, algorithmically. So weeding out low value content should be mission critical for site owners.
But the million dollar question we hear all the time is “what constitutes ‘low quality’ content?”
Google offered guidance on how to asses page-level quality, which is useful to help guide your editorial roadmap. But what about sites that host hundreds or thousands of pages, where evaluating every page by hand isn’t even remotely practical or cost-effective?
A much more realistic approach for larger sites is to look at user engagement signals that Google is potentially using to identify low-quality content. These would include key behavioral metrics such as:
  • Low to no visits.
  • Anemic unique page views.
  • Short time on page.
  • High bounce rates.
Of course, these metrics can be somewhat noisy and susceptible to external factors, but they’re the most efficient way to sniff-out out low value content at scale.
Some ways you can deal with these low value and poor performing pages include:
  • Deleting any content with low to no user engagement signals.
  • Consolidating the content of thin or shallow pages into thicker, more useful documents (i.e., “purge and merge).”
  • Adding additional internal links to improve visitor engagement (and deeper indexation). Tip: make sure these internal links point to high-quality content on your site.
One additional type of low quality content that often gets overlooked is pagination. Proper pagination is highly effective at distributing link equity throughout your site. But high ratios of paginated archives, comments and tag pages can also dilute your site’s crawl budget, cause indexation cap issues and negatively tip the scales of high-to low-value content ratios on your site.
Tips for Panda-proofing pagination include:

Step 3: Thicken-Up Thin Content

Google hates thin content. And this disdain isn’t reserved for spammy scraper sites or thin affiliates only. It’s also directed at sites with little or no original content (i.e., another form of “low value” content).
One of the riskiest content types we see frequently on client sites are thin directory-style pages. These are aggregate feed pages you’d find on ecommerce product pages (both page level and category level); sites with city, state and ZIP code directory type pages (think hotel and travel sites); and event location listings (think ticket brokers). And many sites host thousands of these page types, which other than a big list of hyperlinks have zero-to-no content.
Unlike other low-value content traps, these directory pages are often instrumental in site usability and helping users navigate to deeper content. So deleting them or merging them isn’t an option.
Instead, the best strategy here is to thicken up these thin directory pages with original content. Some recommendations include:
  • Drop a thousand words of original, value-add content on the page in an effort to treat each page as a comprehensive guide on a specific topic.
  • Pipe in API data and content mash-ups (excellent when you need to thicken hundreds or thousands of pages at scale).
  • Encourage user reviews.
  • Add images and videos.
  • Move thin pages off to subdomains, which Google hints at. Though we use this is as more of a “stop gap” approach for sites that have been mauled by Panda and are trying to rebound quickly, rather than a long-term, sustainable strategy.
It’s worth noting that these recommendations can be applied to most types of thin content pages. I’m just using directory style pages as an example because we see them so often.
When it comes to discovering thin content issues at scale, take a look at word count. If you’re running WordPress, there are a couple of plugins you can use to asses word count for every document on your site:
As well, here are some all-purpose plugin recommendations to help in the war against Panda.
All in all, we’re seeing documents that have been thickened up get a nice boost in rankings and SERP visibility. And this isn’t boost isn’t a temporal QDF bump. In the majority of cases, when thickening up thin pages, we’re seeing permanent ranking improvements over competitor pages.

Step 4: Develop High-Quality Content

On the flipside of fixing low or no-value content issues, you must adopt an approach of only publishing the highest quality content on your site. For many sites, this is a total shift in mindset, but nonetheless raising your content publishing standards is essential to Panda-proofing your site.
Google describes “quality content” as “content that you can send to your child to learn something.” Which is a little vague but to me it says two distinct things:
  • Your content should be highly informative.
  • Your content should easy to understand (easy enough that a child can comprehend it).
For a really in-depth look at “What Google Considers Quality Content,” check out Brian Ussery’s excellent analysis.
When publishing content on our own sites, we ask ourselves a few simple quality control questions:
  • Does this content offer value?
  • Is this content you would share with others?
  • Would you link to this content as an informative resource?
If a piece of content doesn’t meet these basic criteria, we work to improve it until it does.
Now, when it comes to publishing quality content, many site owners don’t have the good fortune of having industry experts in house and internal writing resources at their disposal. In those cases, you should consider outsourcing your content generation to the pros.
Some of the most effective ways we use to find professional, authoritative authors include:
  • Placing an ad on Craigslist and conduct a “competition.” Despite what the critics say, this method works really and you can find some excellent, cost-effective talent.  “How to Find Quality Freelance Authors on Craigslist” will walk you through the process.
  • Reaching out to influential writers in your niche with columns on high profile pubs. Most of these folks do freelance work and are eager to take on new projects. You can find these folks with search operators like [intitle:“your product nice” intext:“meet our bloggers”] or [intitle:“your product nice” intext: “meet our authors”] since many blogs publish an author’s profile page.
  • Targeting published authors on Amazon.com is a fantastic way to find influential authors who have experience writing on topics in your niche.
Apart from addressing writing resource deficiencies, the advantages of hiring topic experts or published authors include:
Finally, I wanted to address the issue of frequency and publishing quality content. Ask yourself this: are you publishing content everyday on your blog, sometimes twice a day? If so, ask yourself “why?”
Is it because you read on a popular marketing blog that cranking out blog posts each and every day is a good way to target trending topics and popular terms, and flood the index with content that will rank in hundreds of relevant mid-tail verticals?
If this is your approach, you might want to rethink it. In fact, I’d argue that 90 percent of sites that use this strategy should slow down and publish better, longer, meatier content less frequently.
In a race to “publish every day!!!” you’re potentially polluting the SERPs with quick, thin, low value posts and dragging down the overall quality score of your entire site. So if you fall into this camp, definitely stop and think about your approach. Test the efficacy of fewer, thicker posts vs short-form “keyword chasing” articles.

Panda-Proofing Wrap Up

Bottom line: get your site in-shape before it’s too late. Why risk being susceptible to every Panda update, when Armageddon is entirely avoidable.
The SEO and affiliate forums are littered with site owners who continue to practice the same low value tactics in spite of the clear dangers because they were cheap and they worked. But look at those sites now. Don’t make the same mistake.

Google Rolling Out First Panda Refresh of 2013 Today

Beware the Panda. According to a tweet from the official @Google Twitter account this morning, a new data refresh is rolling out today.
This update, according to the notice, should only affect 1.2 percent of English language queries. No other information is available so far.
google-panda-tweet-1-22-2013
This is the first Panda data refresh of 2013. It also marks the third consecutive month of Panda data updates.
The first Panda update was nearly two years ago in February 2011. Google's stated goal of Panda is to reward "high-quality sites."
While Google has never formally defined what a "high-quality site" is, Google has its own list of bullet points on their blog post from early 2011. The rationale has always been the same: to find more high-quality sites in search.

Google Goes Boom on Low-Quality Sites...So They Say

Chances are good that you or someone you know has seen some ranking changes today as Google rolled out a new algorithmic update. With the recent announcements aimed at "low quality sites" (many interpret this to mean content farms), even less than two weeks ago, Google stated they were exploring different new methods to detect spam.
"This update is designed to reduce rankings for low-quality sites--sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites--sites with original content and information such as research, in-depth reports, thoughtful analysis and so on," Google announced last night.
No one can say this one came out of left field. Google launched an algorithm tweak in January to combat spam and scraper sites, though that affected a much smaller number of sites.
This IS a big one. We're talking 11.8% across the board. Now, the big question is did they do it right?
From the looks of it, Google is not simply devaluing sites serving duplicated content, they are going after sites here with specific types of backlinks, spying through Chrome extensions, and this is only within the first 24 hours! More will become clear once site owners see drastic changes in their traffic stats.
As with every major Google update, SEO forums are dedicating a thread to this and they are filling up fast with reactions and reports. Since BackLinkForum.com tends to have the skilled gray/blackhat crowd, and because this update is only happening in the U.S. (for now), BLF is a great place to see what is really happening down in the trenches.
Two possible things happening there worth noting:
  1. Sites with the majority of their backlink profiles consisting of profile links could be a target.
  2. Not every content farm was red flagged. This may have been a response to the scraper update along with bigger content farm sites. Similar to the recent Blekko update.
ehow-and-wiki-ranking-high.jpg
Although coming to a conclusion about the update within 24 hours is extremely risky, I would be willing to bet that this is targeting self-service linking as much as content farms.
However, sites like eHow, Answers.com, and even low-level scraper sites still seem to be saturating the SERPs. That leaves me asking, "Who was penalized then?"
As with any Google algorithm changes, some innocent sites are going to be slammed. Some SEOs have reported seeing 40 percent traffic drops to their sites.
This latest update may just more evidence that Google simply can't distinguish between "good" or "bad" content.
Let us know what you're seeing today -- the good, bad, and disastrous.

Blekko Removes Content Farms From Search Results

In an effort to combat web spam, Blekko will block from its search results 20 of the worst-offending SERP clogging content farms, including Demand Media's eHow and Answerbag, TechCrunch reports. This list of barred sites are as follows:

  • ehow.com
  • experts-exchange.com
  • naymz.com

  • activehotels.com

  • robtex.com

  • encyclopedia.com
  • fixya.com

  • chacha.com

  • 123people.com

  • download3k.com

  • petitionspot.com

  • thefreedictionary.com

  • networkedblogs.com

  • buzzillions.com

  • shopwiki.com

  • wowxos.com

  • answerbag.com

  • allexperts.com

  • freewebs.com

  • copygator.com
blekko-spam-clock.png
Blekko seems to be taking spam seriously. Last month, the newest search engine introduced the spam clock, which announced that 1 million new spam pages are created every hour. As of this morning, the total number of spam pages was at 750 million and counting (though Blekko admits it is more "illustrative more than scientifically accurate.")
The reasoning for the spam clock, according to Blekko CEO Rich Skrenta:
"Millions upon millions of pages of junk are being unleashed on the web, a virtual torrent of pages designed solely to generate a few pennies in ad revenue for its creator. I fear that we are approaching a tipping point, where the volume of garbage soars beyond and overwhelms the valuable of what is on the web."
So Blekko seems to be doing its small part for cleaning up its own search results.
Meanwhile, Google has also announced an algorithm change to combat spam. But as Mike Grehan notes in his column today "The Google Spam-Jam," spam "is a problem that Google has had from day one and it's not likely to go away anytime soon" with its current search model.

Google's War on Spam Begins: New Algorithm Live

Google's Matt Cutts today announced the launch of a new algorithm that is intended to better detect and reduce spam in Google's search results and lower the rankings of scraper sites and sites with little original content. Google's main target is sites that copy content from other sites and offer little useful, original content of their own.
Positing on Hacker News, Cutts wrote:
"The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content. Note that the algorithmic change isn't specific to stackoverflow.com though."
On his blog, Cutts wrote:
This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site's content.
Cutts said the change was approved last Thursday and launched earlier this week. Cutts announced Google's intention to up the fight against spam in an Official Google Blog post last Friday.
In response to criticism that Google's results were deteriorating and seeing more spam in recent months, Cutts said a newly redesigned document-level classifier will better detect repeated spammy words, such as those found in "junky" automated, self-promoting blog comments. He also said that spam levels today are much better than five years ago.
At Webmaster World, there is discussion about big drops in traffic. Are you seeing any changes as a result of this change?


Negative SEO Case Study: How to Uncover an Attack Using a Backlink Audit

negative-seo
Ever since Google launched the Penguin update back in April 2012, the SEO community has debated the impact of negative SEO, a practice whereby competitors can point hundreds or thousands of negative backlinks at a site with the intention of causing harm to organic search rankings or even completely removing a site from Google's index. Just jump over to Fiverr and you can find many gigs offering thousands of wiki links, or directory links, or many other types of low-quality links for $5.
By creating the Disavow Links tool, Google acknowledged this very real danger and gave webmasters a tool to protect their sites. Unfortunately, most people wait until it's too late to use the Disavow tool; they look at their backlink profile and disavow links after they've been penalized by Google. In reality, the Disavow Links tool should be used before your website suffers in the SERPs.
Backlink audits have to be added to every SEO professional's repertoire. These are as integral to SEO as keyword research, on-page optimization, and link building. In the same way that a site owner builds links to create organic rankings, now webmasters also have to monitor their backlink profile to identify low quality links as they appear and disavow them as quickly as they are identified.
Backlink audits are simple: download your backlinks from your Google Webmaster account, or from a backlink tool, and keep an eye on the links pointing to your site. What is the quality of those links? Do any of the links look fishy?
As soon as you identify fishy links, you can then try to remove the links by emailing the webmaster. If that doesn't work, head to Google's disavow tool and disavow those links. For people looking to protect their sites from algorithmic updates or penalties, backlink audits are now a webmaster's best friend.
If your website has suffered from lost rankings and search traffic, here's a method to determine whether negative SEO is to blame.

A Victim of Negative SEO?

Google Analytics 2012 vs 2013 Traffic
A few weeks ago I received an email from a webmaster whose Google organic traffic dropped by almost 50 percent within days of Penguin 2.0. He couldn't understand why, given that he'd never engaged in SEO practices or link building. What could've caused such a massive decrease in traffic and rankings?
The site is a 15-year-old finance magazine with thousands of news stories and analysis, evergreen articles, and nothing but organic links. For over a decade it has ranked quite highly for very generic informational financial keywords – everything from information about the economies of different countries, to very detailed specifics about large corporations.
With a long tail of over 70,000 keywords, it's a site that truly adds value to the search engine results and has always used content to attract links and high search engine rankings.
The site received no notifications from Google. They simply saw a massive decrease in organic traffic starting May 22, which leads me to believe they were impacted by Penguin 2.0.
In short, he did exactly what Google preaches as safe SEO. Great content, great user experience, no manipulative link practices, and nothing but value.
So what happened to this site? Why did it lose 50 percent of its organic traffic from Google?

Backlink Audit

I started by running a LinkDetox report to analyze the backlinks. Immediately I knew something was wrong:
Your Average Link Detox Risk 1251 Deadly Risk
Upon further investigation, 55 percent of his links were suspicious, while 7 percent (almost 500) of the links were toxic:
Toxic Suspicious Healthy Links
So the first step was to research those 7 percent toxic links, how they were acquired, and what types of links they were.
In LinkDetox, you can segment by Link Type, so I was able to first view only the links that were considered toxic. According to Link Detox, toxic links are links from domains that aren't indexed in Google, as well as links from domains whose theme is listed as malware, malicious, or having a virus.
Immediately I noticed that he had many links from sites that ended in .pl. The anchor text of the links was the title of the page that they linked to.
It seemed that the sites targeted "credit cards", which is very loosely in this site's niche. It was easy to see that these were scraped links to be spun and dropped on spam URLs. I also saw many domains that had expired and were re-registered for the purpose of creating content sites for link farms.
Also, check out the spike in backlinks:
Backlink Spike
From this I knew that most of the toxic links were spam, and links that were not generated by the target site. I also saw many links to other authority sites, including entrepreneur.com and venturebeat.com. It seems that this site was classified as an "authority site" and was being used as part of a spammers way of adding authority links to their outbound link profile.

Did Penguin Cause the Massive Traffic Loss?

I further investigated the backlink profile, checking for other red flags.
His Money vs Brand ratio looked perfectly healthy:
Money vs Brand Keywords
His ratio of "Follow" links was a little high, but this was to be expected given the source of his negative backlinks:
Follow vs Nofollow Links
Again, he had a slightly elevated number of text links as compared to competitors, which was another minor red flag:
Text Links
One finding that was quite significant was his Deep Link Ratio, which was much too high when compared with others in his industry:
Deep Link Ratio
In terms of authority, his link distribution by SEMrush keyword rankings was average when compared to competitors:
SEMrush Keyword Rankings
Surprisingly, his backlinks had better TitleRank than competitors, meaning that the target site's backlinks ranked for their exact match title in Google – an indication of trust:
metric-comparison-titlerank
Penalized sites don't rank for their exact match title.
The final area of analysis was the PageRank distribution of the backlinks:
Link Profile by Google PageRank
Even though he has a great number of high quality links, the percentage of links that aren't indexed in Google is substantially great. Close to 65 percent of the site's backlinks aren't indexed in Google.
In most cases, this indicates poor link building strategies, and is a typical profile for sites that employ spam link building tactics.
In this case, the high quantity of links from pages that are penalized, or not indexed in Google, was a case of automatic links built by spammers!
As a result of having a prominent site that was considered by spammers to be an authority in the finance field, this site suffered a massive decrease in traffic from Google.

Avoid Penguin & Unnatural Link Penalties

A backlink audit could've prevented this site from being penalized from Google and losing close to 50% of their traffic. If a backlink audit had been conducted, the site owner could've disavowed these spam links, performed outreach to get these links removed, and documented his efforts in case of future problems.
If the toxic links had been disavowed, all of the ratios would've been normalized and this site would've never been pegged as spam and penalized by Penguin.

Backlink Audits

Whatever tool you use - whether it's Ahrefs, LinkDetox, or OpenSiteExplorer – it's important that you run and evaluate your links on a monthly basis. Once you have the links, make sure you have metrics for each of the links in order to evaluate their health.
Here's what to do:
  • Identify all the backlinks from sites that aren't indexed in Google. If they aren't indexed in Google, there's a good chance they are penalized. Take a manual look at a few to make sure nothing else is going on (e.g., perhaps they just moved to a new domain, or there's an error in reporting). Add all the N/A sites to your file.
  • Look for backlinks from link or article directories. These are fairly easy to identify. LinkDetox will categorize those automatically and allow you to filter them out. Scan each of these to make sure you don't throw out the baby with the bathwater, as perhaps a few of these might be healthy.
  • Identify links from sites that may be virus infected or have malware. These are identified as Toxic 2 in LinkDetox.
  • Look for paid links. Google has long been at war with link buying and it's an obvious target. Find any links that have been paid and add them to the list. You can find these by sorting the results by PageRank descending. Evaluate all the high PR links as those are likely the ones that were purchased. Look at each and every one of the high quality links to assess how they were acquired. It's almost always pretty obvious if the link was organic or purchased.
  • Take the list of backlinks and run it through the Juice Tool to scan for other red flags. One of my favorite metrics to evaluate is TitleRank. Generally, pages that aren't ranking for their exact match title have a good chance of having a functional penalty or not having enough authority. In the Juice report, you can see the exact title to determine if it's a valid title (for example, if the title is "Home", of course they won't rank for it, whether they have a penalty). If the TitleRank is 30+, you can review that link by doing a quick check, and if the site looks spammy, add it to your "Bad Links" file. Do a quick scan for other factors, such as PageRank and DomainAuthority, to see if anything else seems out of place.
By the end of this stage, you'll have a spreadsheet with the most harmful backlinks to a site.
Upload this Disavow File, to make sure the worst of your backlinks aren't harming your site. Make sure you then upload this disavow file when performing further tests on Link Detox as excluding these domains will affect your ratios.

Don't be a Victim of Negative SEO!

Negative SEO works; it's a very real threat to all webmasters. Why spend the time, money, and resources building high quality links and content assets when you can work your way to the top by penalizing your competitors?
There are many unethical people out there; don't let them cause you to lose your site's visibility. Add backlink audits and link profile protection as part of your monthly SEO tasks to keep your site's traffic safe. It's no longer optional.

To Be Continued...

At this point, we're still working on link removals, so there is nothing conclusive to report yet on a recovery. However, once the process is complete, I plan to write a follow-up post here on SEW to share additional learnings and insights from this case.

Google Penguin 2.0 Update is Live

google-penguin-watch-out-webspam
Webmasters have been watching for Penguin 2.0 to hit the Google search results since Google's Distinguished Engineer Matt Cutts first announced that there would be the next generation of Penguin in March. Cutts officially announced that Penguin 2.0 is rolling out late Wednesday afternoon on "This Week in Google".
"It's gonna have a pretty big impact on web spam," Cutts said on the show. "It's a brand new generation of algorithms. The previous iteration of Penguin would essentinally only look at the home page of a site. The newer generation of Penguin goes much deeper and has a really big impact in certain small areas."
In a new blog post, Cutts added more details on Penguin 2.0, saying that the rollout is now complete and affects 2.3 percent of English-U.S. queries, and that it affects non-English queries as well. Cutts wrote:
We started rolling out the next generation of the Penguin webspam algorithm this afternoon (May 22, 2013), and the rollout is now complete. About 2.3% of English-US queries are affected to the degree that a regular user might notice. The change has also finished rolling out for other languages world-wide. The scope of Penguin varies by language, e.g. languages with more webspam will see more impact.
This is the fourth Penguin-related launch Google has done, but because this is an updated algorithm (not just a data refresh), we’ve been referring to this change as Penguin 2.0 internally. For more information on what SEOs should expect in the coming months, see the video that we recently released.
Webmasters first got a hint that the next generation of Penguin was imminent when back on May 10 Cutts said on Twitter, “we do expect to roll out Penguin 2.0 (next generation of Penguin) sometime in the next few weeks though.”
Matt Cutts Tweets About Google Penguin
Then in a Google Webmaster Help video, Cutts went into more detail on what Penguin 2.0 would bring, along with what new changes webmasters can expect over the coming months with regards to Google search results.
He detailed that the new Penguin was specifically going to target black hat spam, but would be a significantly larger impact on spam than the original Penguin and subsequent Penguin updates have had.
Google's initial Penguin update originally rolled out in April 2012, and was followed by two data refreshes of the algorithm last year – in May and October.
Twitter is full of people commenting on the new Penguin 2.0, and there should be more information in the coming hours and days as webmasters compare SERPs that have been affected and what kinds of spam specifically got targeted by this new update.
Let us know if you've seen any significant changes, or if the update has helped or hurt your traffic/rankings in the comments.
UPDATE: Google has set up a Penguin Spam Report form.

Google Penguin Tightens the Noose on Manipulative Link Profiles [Report]

Portent, a Seattle-based Internet marketing agency, has released a report offering new insight into Google’s Penguin algorithm. The report, based on primary data gathered by the agency, suggested that Google has been “applying a stricter standard over time.”
penguin-links
In part, the report reads:

In the initial Penguin update, the only sites we saw penalized had link profiles comprised of more than 80 percent manipulative links. Within two months, Google lowered the bar to 65 percent. Then in October 2012, the net got much wider. Google began automatically and manually penalizing sites with 50 percent manipulative links.
Although the report refers to Penguin a penalty, Penguin isn't a penalty. A penalty is a manual action taken against a site.
Yes, the Penguin update has demoted the rankings of sites, but as Google's Distinguished Engineer Matt Cutts has explained, Penguin is an algorithmic change, not a penalty. We explain this more in "Google Penalty or Algorithm Change: Dealing With Lost Traffic."
If Portent's findings are correct, then Google is likely becoming more confident with the accuracy of its Penguin algorithm in terms of minimizing false positives.
What does this mean for webmasters and SEO professionals? Continue to diligently clean up your inbound link profile.
Identify bad inbound links, then remove them or disavow them. Google’s next iteration of Penguin could lower the tolerance level for spammy inbound links even further; this might even be what Matt Cutts was referring to when he stated at this year’s SXSW that the next Penguin release would be significant and one of the more talked about Google algorithm updates this year.

Google Penguin, the Second (Major) Coming: How to Prepare

Unless you've had your head under a rock you've undoubtedly heard the rumblings of a coming Google Penguin update of significant proportions.
To paraphrase Google’s web spam lead Matt Cutts the algorithm filter has "iterated" to date but there will be a "next generation" coming that will have a major impact on SERPs.
Having watched the initial rollout take many by surprise it make sense this time to at least attempt to prepare for what may be lurking around the corner.

Google Penguin: What We Know So Far

We know that Penguin is purely a link quality filter that sits on top of the core algorithm, runs sporadically (the last official update was in October 2012), and is designed to take out sites that use manipulative techniques to improve search visibility.
And while there have been many examples of this being badly executed, with lots of site owners and SEO professionals complaining of injustice, it is clear that web spam engineers have collected a lot of information over recent months and have improved results in many verticals.
That means Google's team is now on top of the existing data pile and testing output and as a result they are hungry for a major structural change to the way the filter works once again.
We also know that months of manual resubmissions and disavows have helped the Silicon Valley giant collect an unprecedented amount of data about the "bad neighborhoods" of links that had powered rankings until very recently, for thousands of high profile sites.
They have even been involved in specific and high profile web spam actions against sites like Interflora, working closely with internal teams to understand where links came from and watch closely as they were removed.
In short, Google’s new data pot makes most big data projects look like a school register! All the signs therefore point towards something much more intelligent and all encompassing.
The question is how can you profile your links and understand the probability of being impacted as a result when Penguin hits within the next few weeks or months?
Let’s look at several evidence-based theories.

The Link Graph – Bad Neighborhoods

Google knows a lot about what bad links look like now. They know where a lot of them live and they also understand their DNA.
And once they start looking it becomes pretty easy to spot the links muddying the waters.
The link graph is a kind of network graph and is made up of a series of "nodes" or clusters. Clusters form around IPs and as a result it becomes relatively easy to start to build a picture of ownership, or association. An illustrative example of this can be seen below:
node-illustration
Google assigns weight or authority to links using its own PageRank currency, but like any currency it is limited and that means that we all have to work hard to earn it from sites that have, over time, built up enough to go around.
This means that almost all sites that use "manipulative" authority to rank higher will be getting it from an area or areas of the link graph associated with other sites doing the same. PageRank isn't limitless.
These "bad neighborhoods" can be "extracted" by Google, analyzed and dumped relatively easily to leave a graph that looks a little like this:
graph-extracted-bad-neighborhoods
They won’t disappear, but Google will devalue them and remove them from the PageRank picture, rendering them useless.
Expect this process to accelerate now the search giant has so much data on "spammy links" and swathes of link profiles getting knocked out overnight.
The concern of course is that there will be collateral damage, but with any currency rebalancing, which is really what this process is, there will be winners and losers.

Link Velocity

Another area of interest at present is the rate at which sites acquire links. In recent months there definitely has been a noticeable change in how new links are being treated. While this is very much theory my view is that Google have become very good now at spotting link velocity "spikes" and anything out of the ordinary is immediately devalued.
Whether this is indefinitely or limited by time (in the same way "sandbox" works) I am not sure but there are definite correlations between sites that earn links consistently and good ranking increases. Those that earn lots quickly do not get the same relative effect.
And it would be relatively straightforward to move into the Penguin model, if it isn't there already. The chart below shows an example of a "bumpy" link acquisition profile and as in the example anything above the "normalized" line could be devalued.
chart-ignore-links-above-this-line

Link Trust

The "trust" of a link is also something of interest to Google. Quality is one thing (how much juice the link carries), but trust is entirely another thing.
Majestic SEO has captured this reality best with the launch of its new Citation and Trust flow metrics to help identify untrusted links.
How is trust measured? In simple terms it is about good and bad neighborhoods again.
In my view Google uses its Hilltop algorithm, which identifies so-called "expert documents" (websites) across the web, which are seen as shining beacons of trust and delight! The closer your site is to those documents the better the neighborhood. It’s a little like living on the "right" road.
If your link profile contains a good proportion of links from trusted sites then that will act as a "shield" from future updates and allow some slack for other links that are less trustworthy.

Social Signals

Many SEO pros believe that social signals will play a more significant role in the next iteration of Penguin.
While social authority, as it is becoming known, makes a lot of sense in some markets, it also has limitations. Many verticals see little to no social interaction and without big pots of social data a system that qualifies link quality by the number of social shares across site or piece of content can't work effectively.
In the digital marketing industry it would work like a dream but for others it is a non-starter, for now. Google+ is Google’s attempt to fill that void and by forcing as many people as possible to work logged in they are getting everyone closer to Plus and the handing over of that missing data.
In principle it is possible though that social sharing and other signals may well be used in a small way to qualify link quality.

Anchor Text

Most SEO professionals will point to anchor text as the key telltale metric when it comes to identifying spammy link profiles. The first Penguin rollout would undoubtedly have used this data to begin drilling down into link quality.
I asked a few prominent SEO professionals their opinions on what the key indicator of spam was in researching this post and almost all pointed to anchor text.
“When I look for spam the first place I look is around exact match anchor text from websites with a DA (domain authority) of 30 or less," said Distilled’s John Doherty. "That’s where most of it is hiding.”
His thoughts were backed up by Zazzle’s own head of search Adam Mason.
“Undoubtedly low value websites linking back with commercial anchors will be under scrutiny and I also always look closely at link trust,” Mason said.
The key is the relationship between branded and non-branded anchor text. Any natural profile would be heavily led by branded (e.g., www.example.com/xxx.com) and "white noise" anchors (e.g., "click here", "website", etc).
The allowable percentage is tightening. A recent study by Portent found that the percentage of "allowable" spammy links has been reducing for months now, standing at around 80 percent pre-Penguin and 50 percent by the end of last year. The same is true of exact match anchor text ratios.
Expect this to tighten even more as Google’s understanding of what natural "looks like" improves.

Relevancy

One area that will certainly be under the microscope as Google looks to improve its semantic understanding is relevancy. As it builds up a picture of relevant associations that data can be used to assign more weight to relevant links. Penguin will certainly be targeting links with no relevance in future.

Traffic Metrics

While traffic metrics probably fall more under Panda than Penguin, the lines between the two are increasingly blurring to a point where the two will shortly become indistinguishable. Panda has already been subsumed into the core algorithm and Penguin will follow.
On that basis Google could well look at traffic metrics such as visits from links and the quality of those visits based on user data.

Takeaways

No one is in a position to be able to accurately predict what the next coming will look like but what we can be certain of is that Google will turn the knife a little more making link building in its former sense a more risky tactic than ever. As numerous posts have pointed out in recent months it is now about earning those links by contributing and adding value via content.
If I was asked what my money was on, I would say we will see a tightening of what is an allowable level of spam still further, some attempt to begin measuring link authority by the neighborhood it comes from and any associated social signals that come with it. The rate at which links are earned too will come under more scrutiny and that means you should think about:
  • Understanding your link profile in much great detail. Tools and data from companies such as Majestic, Ahrefs, CognitiveSEO, and others will become more necessary to mitigate risk.
  • Where you link comes from not just what level of apparent "quality" it has. Link trust is now a key metric.
  • Increasing the use of brand and "white noise" anchor text to remove obvious exact and phrase match anchor text problems.
  • Looking for sites that receive a lot of social sharing relative to your niche and build those relationships.
  • Running back link checks on the site you get links from to ensure their equity isn’t coming from bad neighborhoods as that could pass to you.