Monday, September 2, 2013

4 Steps to Panda-Proof Your Website (Before It’s Too Late!)

It may be a new year, but that hasn’t stopped Google from rolling out yet another Panda refresh.
Last year Google unleashed the most aggressive campaign of major algo updates ever in its crusade to battle rank spam. This year looks to be more of the same.
Since Panda first hit the scene two years ago, thousands of sites have been mauled. SEO forums are littered with site owners who have seen six figure revenue websites and their entire livelihoods evaporate overnight, largely because they didn’t take Panda seriously.
If your site is guilty of transgressions that might provoke the Panda and you haven’t been hit yet, consider yourself lucky. But understand that it’s only a matter of time before you do get mauled. No doubt about it: Panda is coming for you.
Over the past year, we’ve helped a number of site owners recover from Panda. We’ve also worked with existing clients to Panda-proof their websites and (knock on wood) haven’t had a single site fall victim to Panda.
Based on that what we’ve learned saving and securing sites, I’ve pulled together a list of steps and actions to help site owners Panda-proof websites that may be at risk.

Step 1: Purge Duplicate Content

Duplicate content issues have always plagued websites and SEOs. But with Panda, Google has taken a dramatically different approach to how they view and treat sites with high degrees of duplicate content. Where dupe content issues pre-Panda might hurt a particular piece of content, now duplicate content will sink an entire website.
So with that shift in attitude, site owners need to take duplicate content seriously. You must be hawkish about cleaning up duplicate content issues to Panda-proof your site.
Screaming Frog is a good choice when you want to identify duplicate pages. This article by Ben Goodsell offers a great tutorial on locating duplicate content issues.
Some suggestions for fixing dupe content issues include:
Now, cleaning up existing duplicate content issues is critical. But it’s just as important to take a preventative measures as well. This means, addressing the root cause of your duplicate content issues before they end up in the index. Yoast offers some great suggestions on how to avoid duplicate content issues altogether.

Step 2: Eradicate Low Quality, Low Value Content

Google’s objective with Panda is to help users find "high-quality" sites by diminishing the visibility (ranking power) of low-quality content, all of which is accomplished at scale, algorithmically. So weeding out low value content should be mission critical for site owners.
But the million dollar question we hear all the time is “what constitutes ‘low quality’ content?”
Google offered guidance on how to asses page-level quality, which is useful to help guide your editorial roadmap. But what about sites that host hundreds or thousands of pages, where evaluating every page by hand isn’t even remotely practical or cost-effective?
A much more realistic approach for larger sites is to look at user engagement signals that Google is potentially using to identify low-quality content. These would include key behavioral metrics such as:
  • Low to no visits.
  • Anemic unique page views.
  • Short time on page.
  • High bounce rates.
Of course, these metrics can be somewhat noisy and susceptible to external factors, but they’re the most efficient way to sniff-out out low value content at scale.
Some ways you can deal with these low value and poor performing pages include:
  • Deleting any content with low to no user engagement signals.
  • Consolidating the content of thin or shallow pages into thicker, more useful documents (i.e., “purge and merge).”
  • Adding additional internal links to improve visitor engagement (and deeper indexation). Tip: make sure these internal links point to high-quality content on your site.
One additional type of low quality content that often gets overlooked is pagination. Proper pagination is highly effective at distributing link equity throughout your site. But high ratios of paginated archives, comments and tag pages can also dilute your site’s crawl budget, cause indexation cap issues and negatively tip the scales of high-to low-value content ratios on your site.
Tips for Panda-proofing pagination include:

Step 3: Thicken-Up Thin Content

Google hates thin content. And this disdain isn’t reserved for spammy scraper sites or thin affiliates only. It’s also directed at sites with little or no original content (i.e., another form of “low value” content).
One of the riskiest content types we see frequently on client sites are thin directory-style pages. These are aggregate feed pages you’d find on ecommerce product pages (both page level and category level); sites with city, state and ZIP code directory type pages (think hotel and travel sites); and event location listings (think ticket brokers). And many sites host thousands of these page types, which other than a big list of hyperlinks have zero-to-no content.
Unlike other low-value content traps, these directory pages are often instrumental in site usability and helping users navigate to deeper content. So deleting them or merging them isn’t an option.
Instead, the best strategy here is to thicken up these thin directory pages with original content. Some recommendations include:
  • Drop a thousand words of original, value-add content on the page in an effort to treat each page as a comprehensive guide on a specific topic.
  • Pipe in API data and content mash-ups (excellent when you need to thicken hundreds or thousands of pages at scale).
  • Encourage user reviews.
  • Add images and videos.
  • Move thin pages off to subdomains, which Google hints at. Though we use this is as more of a “stop gap” approach for sites that have been mauled by Panda and are trying to rebound quickly, rather than a long-term, sustainable strategy.
It’s worth noting that these recommendations can be applied to most types of thin content pages. I’m just using directory style pages as an example because we see them so often.
When it comes to discovering thin content issues at scale, take a look at word count. If you’re running WordPress, there are a couple of plugins you can use to asses word count for every document on your site:
As well, here are some all-purpose plugin recommendations to help in the war against Panda.
All in all, we’re seeing documents that have been thickened up get a nice boost in rankings and SERP visibility. And this isn’t boost isn’t a temporal QDF bump. In the majority of cases, when thickening up thin pages, we’re seeing permanent ranking improvements over competitor pages.

Step 4: Develop High-Quality Content

On the flipside of fixing low or no-value content issues, you must adopt an approach of only publishing the highest quality content on your site. For many sites, this is a total shift in mindset, but nonetheless raising your content publishing standards is essential to Panda-proofing your site.
Google describes “quality content” as “content that you can send to your child to learn something.” Which is a little vague but to me it says two distinct things:
  • Your content should be highly informative.
  • Your content should easy to understand (easy enough that a child can comprehend it).
For a really in-depth look at “What Google Considers Quality Content,” check out Brian Ussery’s excellent analysis.
When publishing content on our own sites, we ask ourselves a few simple quality control questions:
  • Does this content offer value?
  • Is this content you would share with others?
  • Would you link to this content as an informative resource?
If a piece of content doesn’t meet these basic criteria, we work to improve it until it does.
Now, when it comes to publishing quality content, many site owners don’t have the good fortune of having industry experts in house and internal writing resources at their disposal. In those cases, you should consider outsourcing your content generation to the pros.
Some of the most effective ways we use to find professional, authoritative authors include:
  • Placing an ad on Craigslist and conduct a “competition.” Despite what the critics say, this method works really and you can find some excellent, cost-effective talent.  “How to Find Quality Freelance Authors on Craigslist” will walk you through the process.
  • Reaching out to influential writers in your niche with columns on high profile pubs. Most of these folks do freelance work and are eager to take on new projects. You can find these folks with search operators like [intitle:“your product nice” intext:“meet our bloggers”] or [intitle:“your product nice” intext: “meet our authors”] since many blogs publish an author’s profile page.
  • Targeting published authors on Amazon.com is a fantastic way to find influential authors who have experience writing on topics in your niche.
Apart from addressing writing resource deficiencies, the advantages of hiring topic experts or published authors include:
Finally, I wanted to address the issue of frequency and publishing quality content. Ask yourself this: are you publishing content everyday on your blog, sometimes twice a day? If so, ask yourself “why?”
Is it because you read on a popular marketing blog that cranking out blog posts each and every day is a good way to target trending topics and popular terms, and flood the index with content that will rank in hundreds of relevant mid-tail verticals?
If this is your approach, you might want to rethink it. In fact, I’d argue that 90 percent of sites that use this strategy should slow down and publish better, longer, meatier content less frequently.
In a race to “publish every day!!!” you’re potentially polluting the SERPs with quick, thin, low value posts and dragging down the overall quality score of your entire site. So if you fall into this camp, definitely stop and think about your approach. Test the efficacy of fewer, thicker posts vs short-form “keyword chasing” articles.

Panda-Proofing Wrap Up

Bottom line: get your site in-shape before it’s too late. Why risk being susceptible to every Panda update, when Armageddon is entirely avoidable.
The SEO and affiliate forums are littered with site owners who continue to practice the same low value tactics in spite of the clear dangers because they were cheap and they worked. But look at those sites now. Don’t make the same mistake.

Google Rolling Out First Panda Refresh of 2013 Today

Beware the Panda. According to a tweet from the official @Google Twitter account this morning, a new data refresh is rolling out today.
This update, according to the notice, should only affect 1.2 percent of English language queries. No other information is available so far.
google-panda-tweet-1-22-2013
This is the first Panda data refresh of 2013. It also marks the third consecutive month of Panda data updates.
The first Panda update was nearly two years ago in February 2011. Google's stated goal of Panda is to reward "high-quality sites."
While Google has never formally defined what a "high-quality site" is, Google has its own list of bullet points on their blog post from early 2011. The rationale has always been the same: to find more high-quality sites in search.

Google Goes Boom on Low-Quality Sites...So They Say

Chances are good that you or someone you know has seen some ranking changes today as Google rolled out a new algorithmic update. With the recent announcements aimed at "low quality sites" (many interpret this to mean content farms), even less than two weeks ago, Google stated they were exploring different new methods to detect spam.
"This update is designed to reduce rankings for low-quality sites--sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites--sites with original content and information such as research, in-depth reports, thoughtful analysis and so on," Google announced last night.
No one can say this one came out of left field. Google launched an algorithm tweak in January to combat spam and scraper sites, though that affected a much smaller number of sites.
This IS a big one. We're talking 11.8% across the board. Now, the big question is did they do it right?
From the looks of it, Google is not simply devaluing sites serving duplicated content, they are going after sites here with specific types of backlinks, spying through Chrome extensions, and this is only within the first 24 hours! More will become clear once site owners see drastic changes in their traffic stats.
As with every major Google update, SEO forums are dedicating a thread to this and they are filling up fast with reactions and reports. Since BackLinkForum.com tends to have the skilled gray/blackhat crowd, and because this update is only happening in the U.S. (for now), BLF is a great place to see what is really happening down in the trenches.
Two possible things happening there worth noting:
  1. Sites with the majority of their backlink profiles consisting of profile links could be a target.
  2. Not every content farm was red flagged. This may have been a response to the scraper update along with bigger content farm sites. Similar to the recent Blekko update.
ehow-and-wiki-ranking-high.jpg
Although coming to a conclusion about the update within 24 hours is extremely risky, I would be willing to bet that this is targeting self-service linking as much as content farms.
However, sites like eHow, Answers.com, and even low-level scraper sites still seem to be saturating the SERPs. That leaves me asking, "Who was penalized then?"
As with any Google algorithm changes, some innocent sites are going to be slammed. Some SEOs have reported seeing 40 percent traffic drops to their sites.
This latest update may just more evidence that Google simply can't distinguish between "good" or "bad" content.
Let us know what you're seeing today -- the good, bad, and disastrous.

Blekko Removes Content Farms From Search Results

In an effort to combat web spam, Blekko will block from its search results 20 of the worst-offending SERP clogging content farms, including Demand Media's eHow and Answerbag, TechCrunch reports. This list of barred sites are as follows:

  • ehow.com
  • experts-exchange.com
  • naymz.com

  • activehotels.com

  • robtex.com

  • encyclopedia.com
  • fixya.com

  • chacha.com

  • 123people.com

  • download3k.com

  • petitionspot.com

  • thefreedictionary.com

  • networkedblogs.com

  • buzzillions.com

  • shopwiki.com

  • wowxos.com

  • answerbag.com

  • allexperts.com

  • freewebs.com

  • copygator.com
blekko-spam-clock.png
Blekko seems to be taking spam seriously. Last month, the newest search engine introduced the spam clock, which announced that 1 million new spam pages are created every hour. As of this morning, the total number of spam pages was at 750 million and counting (though Blekko admits it is more "illustrative more than scientifically accurate.")
The reasoning for the spam clock, according to Blekko CEO Rich Skrenta:
"Millions upon millions of pages of junk are being unleashed on the web, a virtual torrent of pages designed solely to generate a few pennies in ad revenue for its creator. I fear that we are approaching a tipping point, where the volume of garbage soars beyond and overwhelms the valuable of what is on the web."
So Blekko seems to be doing its small part for cleaning up its own search results.
Meanwhile, Google has also announced an algorithm change to combat spam. But as Mike Grehan notes in his column today "The Google Spam-Jam," spam "is a problem that Google has had from day one and it's not likely to go away anytime soon" with its current search model.

Google's War on Spam Begins: New Algorithm Live

Google's Matt Cutts today announced the launch of a new algorithm that is intended to better detect and reduce spam in Google's search results and lower the rankings of scraper sites and sites with little original content. Google's main target is sites that copy content from other sites and offer little useful, original content of their own.
Positing on Hacker News, Cutts wrote:
"The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content. Note that the algorithmic change isn't specific to stackoverflow.com though."
On his blog, Cutts wrote:
This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site's content.
Cutts said the change was approved last Thursday and launched earlier this week. Cutts announced Google's intention to up the fight against spam in an Official Google Blog post last Friday.
In response to criticism that Google's results were deteriorating and seeing more spam in recent months, Cutts said a newly redesigned document-level classifier will better detect repeated spammy words, such as those found in "junky" automated, self-promoting blog comments. He also said that spam levels today are much better than five years ago.
At Webmaster World, there is discussion about big drops in traffic. Are you seeing any changes as a result of this change?


Negative SEO Case Study: How to Uncover an Attack Using a Backlink Audit

negative-seo
Ever since Google launched the Penguin update back in April 2012, the SEO community has debated the impact of negative SEO, a practice whereby competitors can point hundreds or thousands of negative backlinks at a site with the intention of causing harm to organic search rankings or even completely removing a site from Google's index. Just jump over to Fiverr and you can find many gigs offering thousands of wiki links, or directory links, or many other types of low-quality links for $5.
By creating the Disavow Links tool, Google acknowledged this very real danger and gave webmasters a tool to protect their sites. Unfortunately, most people wait until it's too late to use the Disavow tool; they look at their backlink profile and disavow links after they've been penalized by Google. In reality, the Disavow Links tool should be used before your website suffers in the SERPs.
Backlink audits have to be added to every SEO professional's repertoire. These are as integral to SEO as keyword research, on-page optimization, and link building. In the same way that a site owner builds links to create organic rankings, now webmasters also have to monitor their backlink profile to identify low quality links as they appear and disavow them as quickly as they are identified.
Backlink audits are simple: download your backlinks from your Google Webmaster account, or from a backlink tool, and keep an eye on the links pointing to your site. What is the quality of those links? Do any of the links look fishy?
As soon as you identify fishy links, you can then try to remove the links by emailing the webmaster. If that doesn't work, head to Google's disavow tool and disavow those links. For people looking to protect their sites from algorithmic updates or penalties, backlink audits are now a webmaster's best friend.
If your website has suffered from lost rankings and search traffic, here's a method to determine whether negative SEO is to blame.

A Victim of Negative SEO?

Google Analytics 2012 vs 2013 Traffic
A few weeks ago I received an email from a webmaster whose Google organic traffic dropped by almost 50 percent within days of Penguin 2.0. He couldn't understand why, given that he'd never engaged in SEO practices or link building. What could've caused such a massive decrease in traffic and rankings?
The site is a 15-year-old finance magazine with thousands of news stories and analysis, evergreen articles, and nothing but organic links. For over a decade it has ranked quite highly for very generic informational financial keywords – everything from information about the economies of different countries, to very detailed specifics about large corporations.
With a long tail of over 70,000 keywords, it's a site that truly adds value to the search engine results and has always used content to attract links and high search engine rankings.
The site received no notifications from Google. They simply saw a massive decrease in organic traffic starting May 22, which leads me to believe they were impacted by Penguin 2.0.
In short, he did exactly what Google preaches as safe SEO. Great content, great user experience, no manipulative link practices, and nothing but value.
So what happened to this site? Why did it lose 50 percent of its organic traffic from Google?

Backlink Audit

I started by running a LinkDetox report to analyze the backlinks. Immediately I knew something was wrong:
Your Average Link Detox Risk 1251 Deadly Risk
Upon further investigation, 55 percent of his links were suspicious, while 7 percent (almost 500) of the links were toxic:
Toxic Suspicious Healthy Links
So the first step was to research those 7 percent toxic links, how they were acquired, and what types of links they were.
In LinkDetox, you can segment by Link Type, so I was able to first view only the links that were considered toxic. According to Link Detox, toxic links are links from domains that aren't indexed in Google, as well as links from domains whose theme is listed as malware, malicious, or having a virus.
Immediately I noticed that he had many links from sites that ended in .pl. The anchor text of the links was the title of the page that they linked to.
It seemed that the sites targeted "credit cards", which is very loosely in this site's niche. It was easy to see that these were scraped links to be spun and dropped on spam URLs. I also saw many domains that had expired and were re-registered for the purpose of creating content sites for link farms.
Also, check out the spike in backlinks:
Backlink Spike
From this I knew that most of the toxic links were spam, and links that were not generated by the target site. I also saw many links to other authority sites, including entrepreneur.com and venturebeat.com. It seems that this site was classified as an "authority site" and was being used as part of a spammers way of adding authority links to their outbound link profile.

Did Penguin Cause the Massive Traffic Loss?

I further investigated the backlink profile, checking for other red flags.
His Money vs Brand ratio looked perfectly healthy:
Money vs Brand Keywords
His ratio of "Follow" links was a little high, but this was to be expected given the source of his negative backlinks:
Follow vs Nofollow Links
Again, he had a slightly elevated number of text links as compared to competitors, which was another minor red flag:
Text Links
One finding that was quite significant was his Deep Link Ratio, which was much too high when compared with others in his industry:
Deep Link Ratio
In terms of authority, his link distribution by SEMrush keyword rankings was average when compared to competitors:
SEMrush Keyword Rankings
Surprisingly, his backlinks had better TitleRank than competitors, meaning that the target site's backlinks ranked for their exact match title in Google – an indication of trust:
metric-comparison-titlerank
Penalized sites don't rank for their exact match title.
The final area of analysis was the PageRank distribution of the backlinks:
Link Profile by Google PageRank
Even though he has a great number of high quality links, the percentage of links that aren't indexed in Google is substantially great. Close to 65 percent of the site's backlinks aren't indexed in Google.
In most cases, this indicates poor link building strategies, and is a typical profile for sites that employ spam link building tactics.
In this case, the high quantity of links from pages that are penalized, or not indexed in Google, was a case of automatic links built by spammers!
As a result of having a prominent site that was considered by spammers to be an authority in the finance field, this site suffered a massive decrease in traffic from Google.

Avoid Penguin & Unnatural Link Penalties

A backlink audit could've prevented this site from being penalized from Google and losing close to 50% of their traffic. If a backlink audit had been conducted, the site owner could've disavowed these spam links, performed outreach to get these links removed, and documented his efforts in case of future problems.
If the toxic links had been disavowed, all of the ratios would've been normalized and this site would've never been pegged as spam and penalized by Penguin.

Backlink Audits

Whatever tool you use - whether it's Ahrefs, LinkDetox, or OpenSiteExplorer – it's important that you run and evaluate your links on a monthly basis. Once you have the links, make sure you have metrics for each of the links in order to evaluate their health.
Here's what to do:
  • Identify all the backlinks from sites that aren't indexed in Google. If they aren't indexed in Google, there's a good chance they are penalized. Take a manual look at a few to make sure nothing else is going on (e.g., perhaps they just moved to a new domain, or there's an error in reporting). Add all the N/A sites to your file.
  • Look for backlinks from link or article directories. These are fairly easy to identify. LinkDetox will categorize those automatically and allow you to filter them out. Scan each of these to make sure you don't throw out the baby with the bathwater, as perhaps a few of these might be healthy.
  • Identify links from sites that may be virus infected or have malware. These are identified as Toxic 2 in LinkDetox.
  • Look for paid links. Google has long been at war with link buying and it's an obvious target. Find any links that have been paid and add them to the list. You can find these by sorting the results by PageRank descending. Evaluate all the high PR links as those are likely the ones that were purchased. Look at each and every one of the high quality links to assess how they were acquired. It's almost always pretty obvious if the link was organic or purchased.
  • Take the list of backlinks and run it through the Juice Tool to scan for other red flags. One of my favorite metrics to evaluate is TitleRank. Generally, pages that aren't ranking for their exact match title have a good chance of having a functional penalty or not having enough authority. In the Juice report, you can see the exact title to determine if it's a valid title (for example, if the title is "Home", of course they won't rank for it, whether they have a penalty). If the TitleRank is 30+, you can review that link by doing a quick check, and if the site looks spammy, add it to your "Bad Links" file. Do a quick scan for other factors, such as PageRank and DomainAuthority, to see if anything else seems out of place.
By the end of this stage, you'll have a spreadsheet with the most harmful backlinks to a site.
Upload this Disavow File, to make sure the worst of your backlinks aren't harming your site. Make sure you then upload this disavow file when performing further tests on Link Detox as excluding these domains will affect your ratios.

Don't be a Victim of Negative SEO!

Negative SEO works; it's a very real threat to all webmasters. Why spend the time, money, and resources building high quality links and content assets when you can work your way to the top by penalizing your competitors?
There are many unethical people out there; don't let them cause you to lose your site's visibility. Add backlink audits and link profile protection as part of your monthly SEO tasks to keep your site's traffic safe. It's no longer optional.

To Be Continued...

At this point, we're still working on link removals, so there is nothing conclusive to report yet on a recovery. However, once the process is complete, I plan to write a follow-up post here on SEW to share additional learnings and insights from this case.

Google Penguin 2.0 Update is Live

google-penguin-watch-out-webspam
Webmasters have been watching for Penguin 2.0 to hit the Google search results since Google's Distinguished Engineer Matt Cutts first announced that there would be the next generation of Penguin in March. Cutts officially announced that Penguin 2.0 is rolling out late Wednesday afternoon on "This Week in Google".
"It's gonna have a pretty big impact on web spam," Cutts said on the show. "It's a brand new generation of algorithms. The previous iteration of Penguin would essentinally only look at the home page of a site. The newer generation of Penguin goes much deeper and has a really big impact in certain small areas."
In a new blog post, Cutts added more details on Penguin 2.0, saying that the rollout is now complete and affects 2.3 percent of English-U.S. queries, and that it affects non-English queries as well. Cutts wrote:
We started rolling out the next generation of the Penguin webspam algorithm this afternoon (May 22, 2013), and the rollout is now complete. About 2.3% of English-US queries are affected to the degree that a regular user might notice. The change has also finished rolling out for other languages world-wide. The scope of Penguin varies by language, e.g. languages with more webspam will see more impact.
This is the fourth Penguin-related launch Google has done, but because this is an updated algorithm (not just a data refresh), we’ve been referring to this change as Penguin 2.0 internally. For more information on what SEOs should expect in the coming months, see the video that we recently released.
Webmasters first got a hint that the next generation of Penguin was imminent when back on May 10 Cutts said on Twitter, “we do expect to roll out Penguin 2.0 (next generation of Penguin) sometime in the next few weeks though.”
Matt Cutts Tweets About Google Penguin
Then in a Google Webmaster Help video, Cutts went into more detail on what Penguin 2.0 would bring, along with what new changes webmasters can expect over the coming months with regards to Google search results.
He detailed that the new Penguin was specifically going to target black hat spam, but would be a significantly larger impact on spam than the original Penguin and subsequent Penguin updates have had.
Google's initial Penguin update originally rolled out in April 2012, and was followed by two data refreshes of the algorithm last year – in May and October.
Twitter is full of people commenting on the new Penguin 2.0, and there should be more information in the coming hours and days as webmasters compare SERPs that have been affected and what kinds of spam specifically got targeted by this new update.
Let us know if you've seen any significant changes, or if the update has helped or hurt your traffic/rankings in the comments.
UPDATE: Google has set up a Penguin Spam Report form.

Google Penguin Tightens the Noose on Manipulative Link Profiles [Report]

Portent, a Seattle-based Internet marketing agency, has released a report offering new insight into Google’s Penguin algorithm. The report, based on primary data gathered by the agency, suggested that Google has been “applying a stricter standard over time.”
penguin-links
In part, the report reads:

In the initial Penguin update, the only sites we saw penalized had link profiles comprised of more than 80 percent manipulative links. Within two months, Google lowered the bar to 65 percent. Then in October 2012, the net got much wider. Google began automatically and manually penalizing sites with 50 percent manipulative links.
Although the report refers to Penguin a penalty, Penguin isn't a penalty. A penalty is a manual action taken against a site.
Yes, the Penguin update has demoted the rankings of sites, but as Google's Distinguished Engineer Matt Cutts has explained, Penguin is an algorithmic change, not a penalty. We explain this more in "Google Penalty or Algorithm Change: Dealing With Lost Traffic."
If Portent's findings are correct, then Google is likely becoming more confident with the accuracy of its Penguin algorithm in terms of minimizing false positives.
What does this mean for webmasters and SEO professionals? Continue to diligently clean up your inbound link profile.
Identify bad inbound links, then remove them or disavow them. Google’s next iteration of Penguin could lower the tolerance level for spammy inbound links even further; this might even be what Matt Cutts was referring to when he stated at this year’s SXSW that the next Penguin release would be significant and one of the more talked about Google algorithm updates this year.

Google Penguin, the Second (Major) Coming: How to Prepare

Unless you've had your head under a rock you've undoubtedly heard the rumblings of a coming Google Penguin update of significant proportions.
To paraphrase Google’s web spam lead Matt Cutts the algorithm filter has "iterated" to date but there will be a "next generation" coming that will have a major impact on SERPs.
Having watched the initial rollout take many by surprise it make sense this time to at least attempt to prepare for what may be lurking around the corner.

Google Penguin: What We Know So Far

We know that Penguin is purely a link quality filter that sits on top of the core algorithm, runs sporadically (the last official update was in October 2012), and is designed to take out sites that use manipulative techniques to improve search visibility.
And while there have been many examples of this being badly executed, with lots of site owners and SEO professionals complaining of injustice, it is clear that web spam engineers have collected a lot of information over recent months and have improved results in many verticals.
That means Google's team is now on top of the existing data pile and testing output and as a result they are hungry for a major structural change to the way the filter works once again.
We also know that months of manual resubmissions and disavows have helped the Silicon Valley giant collect an unprecedented amount of data about the "bad neighborhoods" of links that had powered rankings until very recently, for thousands of high profile sites.
They have even been involved in specific and high profile web spam actions against sites like Interflora, working closely with internal teams to understand where links came from and watch closely as they were removed.
In short, Google’s new data pot makes most big data projects look like a school register! All the signs therefore point towards something much more intelligent and all encompassing.
The question is how can you profile your links and understand the probability of being impacted as a result when Penguin hits within the next few weeks or months?
Let’s look at several evidence-based theories.

The Link Graph – Bad Neighborhoods

Google knows a lot about what bad links look like now. They know where a lot of them live and they also understand their DNA.
And once they start looking it becomes pretty easy to spot the links muddying the waters.
The link graph is a kind of network graph and is made up of a series of "nodes" or clusters. Clusters form around IPs and as a result it becomes relatively easy to start to build a picture of ownership, or association. An illustrative example of this can be seen below:
node-illustration
Google assigns weight or authority to links using its own PageRank currency, but like any currency it is limited and that means that we all have to work hard to earn it from sites that have, over time, built up enough to go around.
This means that almost all sites that use "manipulative" authority to rank higher will be getting it from an area or areas of the link graph associated with other sites doing the same. PageRank isn't limitless.
These "bad neighborhoods" can be "extracted" by Google, analyzed and dumped relatively easily to leave a graph that looks a little like this:
graph-extracted-bad-neighborhoods
They won’t disappear, but Google will devalue them and remove them from the PageRank picture, rendering them useless.
Expect this process to accelerate now the search giant has so much data on "spammy links" and swathes of link profiles getting knocked out overnight.
The concern of course is that there will be collateral damage, but with any currency rebalancing, which is really what this process is, there will be winners and losers.

Link Velocity

Another area of interest at present is the rate at which sites acquire links. In recent months there definitely has been a noticeable change in how new links are being treated. While this is very much theory my view is that Google have become very good now at spotting link velocity "spikes" and anything out of the ordinary is immediately devalued.
Whether this is indefinitely or limited by time (in the same way "sandbox" works) I am not sure but there are definite correlations between sites that earn links consistently and good ranking increases. Those that earn lots quickly do not get the same relative effect.
And it would be relatively straightforward to move into the Penguin model, if it isn't there already. The chart below shows an example of a "bumpy" link acquisition profile and as in the example anything above the "normalized" line could be devalued.
chart-ignore-links-above-this-line

Link Trust

The "trust" of a link is also something of interest to Google. Quality is one thing (how much juice the link carries), but trust is entirely another thing.
Majestic SEO has captured this reality best with the launch of its new Citation and Trust flow metrics to help identify untrusted links.
How is trust measured? In simple terms it is about good and bad neighborhoods again.
In my view Google uses its Hilltop algorithm, which identifies so-called "expert documents" (websites) across the web, which are seen as shining beacons of trust and delight! The closer your site is to those documents the better the neighborhood. It’s a little like living on the "right" road.
If your link profile contains a good proportion of links from trusted sites then that will act as a "shield" from future updates and allow some slack for other links that are less trustworthy.

Social Signals

Many SEO pros believe that social signals will play a more significant role in the next iteration of Penguin.
While social authority, as it is becoming known, makes a lot of sense in some markets, it also has limitations. Many verticals see little to no social interaction and without big pots of social data a system that qualifies link quality by the number of social shares across site or piece of content can't work effectively.
In the digital marketing industry it would work like a dream but for others it is a non-starter, for now. Google+ is Google’s attempt to fill that void and by forcing as many people as possible to work logged in they are getting everyone closer to Plus and the handing over of that missing data.
In principle it is possible though that social sharing and other signals may well be used in a small way to qualify link quality.

Anchor Text

Most SEO professionals will point to anchor text as the key telltale metric when it comes to identifying spammy link profiles. The first Penguin rollout would undoubtedly have used this data to begin drilling down into link quality.
I asked a few prominent SEO professionals their opinions on what the key indicator of spam was in researching this post and almost all pointed to anchor text.
“When I look for spam the first place I look is around exact match anchor text from websites with a DA (domain authority) of 30 or less," said Distilled’s John Doherty. "That’s where most of it is hiding.”
His thoughts were backed up by Zazzle’s own head of search Adam Mason.
“Undoubtedly low value websites linking back with commercial anchors will be under scrutiny and I also always look closely at link trust,” Mason said.
The key is the relationship between branded and non-branded anchor text. Any natural profile would be heavily led by branded (e.g., www.example.com/xxx.com) and "white noise" anchors (e.g., "click here", "website", etc).
The allowable percentage is tightening. A recent study by Portent found that the percentage of "allowable" spammy links has been reducing for months now, standing at around 80 percent pre-Penguin and 50 percent by the end of last year. The same is true of exact match anchor text ratios.
Expect this to tighten even more as Google’s understanding of what natural "looks like" improves.

Relevancy

One area that will certainly be under the microscope as Google looks to improve its semantic understanding is relevancy. As it builds up a picture of relevant associations that data can be used to assign more weight to relevant links. Penguin will certainly be targeting links with no relevance in future.

Traffic Metrics

While traffic metrics probably fall more under Panda than Penguin, the lines between the two are increasingly blurring to a point where the two will shortly become indistinguishable. Panda has already been subsumed into the core algorithm and Penguin will follow.
On that basis Google could well look at traffic metrics such as visits from links and the quality of those visits based on user data.

Takeaways

No one is in a position to be able to accurately predict what the next coming will look like but what we can be certain of is that Google will turn the knife a little more making link building in its former sense a more risky tactic than ever. As numerous posts have pointed out in recent months it is now about earning those links by contributing and adding value via content.
If I was asked what my money was on, I would say we will see a tightening of what is an allowable level of spam still further, some attempt to begin measuring link authority by the neighborhood it comes from and any associated social signals that come with it. The rate at which links are earned too will come under more scrutiny and that means you should think about:
  • Understanding your link profile in much great detail. Tools and data from companies such as Majestic, Ahrefs, CognitiveSEO, and others will become more necessary to mitigate risk.
  • Where you link comes from not just what level of apparent "quality" it has. Link trust is now a key metric.
  • Increasing the use of brand and "white noise" anchor text to remove obvious exact and phrase match anchor text problems.
  • Looking for sites that receive a lot of social sharing relative to your niche and build those relationships.
  • Running back link checks on the site you get links from to ensure their equity isn’t coming from bad neighborhoods as that could pass to you.

Penguin 2.0 Forewarning: The Google Perspective on Links

First and foremost, I don't work for Google. This article represents my opinions, but my company has worked on helping large numbers of sites get Google penalties removed.
The hardest part of these projects is always to get the client to understand what constitutes a bad link. This starts at the very core of how they think about online marketing and search engine optimization (SEO).
There are many who argue that this problem is of Google's own making. They created a world in which abuse wasn't only possible, but that was even very easy to abuse in the beginning. As some would say, they were talking the talk, but not walking the walk.
Some people even got mad. They would yell at Google that they couldn't follow their guidelines because it put them in a situation that was like bringing a knife to a gunfight.
But, as Greg Boser said on stage at SMX Advanced in 2012, Google is now not just talking the talk, but they are walking the walk as well. Penguin and their various attacks on unnatural links have dramatically reshaped their ability to detect and act on link building practices they consider detrimental to their algorithms.
walk-the-walk
These will continue to see dramatic improvements. Penguin 2.0 is just around the corner, and I expect it to have a bigger impact than Penguin 1.0. So let's step back and discuss what Google wants a link to represent.

Links Must Be Citations

This is the core concept. Just like the professor's research paper, which lists other research papers referenced by the professor in creating their paper.
The professor only lists (links) to the other papers most relevant to and most important to to their paper. You can't buy that, and never occurred to researchers to try and do that with each other. This system was pure at its heart.
sydney-reference-list
This notion is at the very core of the original PageRank thesis. Any deviation from it at all is a problem. In fact, here's what Google's Distinguished Engineer Matt Cutts said about it in my last interview with him, when I asked him if he felt the concept of link building was itself problematic:
It segments you into a mindset, and people get focused on the wrong things.
I always wondered why people who read the interview didn't pick up on that a lot more. There was a lot of buzz about the comments he made on infographics and boilerplate content on web sites, but nothing on this comment, which I thought was the most telling statement in the entire interview.
Later on, when I asked him about how publishers can help themselves he said:
By doing things that help build your own reputation, you are focusing on the right types of activity. Those are the signals we want to find and value the most anyway.
With this in mind, let's look at four link building practices that are still common today:
Infographics
This was also featured in my interview with Cutts. The biggest problem these face is that the sex appeal of the infographic is so high that many publishing sites don't care what they need to do to get it.
On top of that, many infographics are inaccurate or unrelated topically to the page receiving the link. Even without these problems it is likely that the great majority of people republishing infographics aren't thoughtfully endorsing the page they end up linking too.
Including rich anchor text links inside a guest post
If the New York Times accepted a guest post from you, what are the chances that they would let you load rich anchor text links inside your post back to the blatant money-making page on your site? Not a chance.
So when Google sees these rich anchor text links inside a guest post, it is a clear signal of a lack of editorial standards at the site publishing the content. This could even hurt the publisher of the content. Note that rich anchor text to other content that is a source is a very different matter.
sydney-great-deals-on-rental-cars
Guest posts that are only loosely related to the topic of the page receiving a link
Let's say you run a business selling golf carts. So you write a decent article on the best golf courses in Bermuda. You don't put rich anchor text in the body of the article, but in the attribution at the bottom you include a link with the anchor text "premium golf carts" to your site.
As before, these are links where the citation value is weak, and the editorial standards of the site are questionable. A link like this smells more like "payment" than a legitimate endorsement.
Award Badges
This is an oldie but (not) goodie that is sadly still being promoted by a few companies. What really makes these programs trivial for Google or Bing to flag is when the award badges seem to appear only on the lesser authoritative sites of a market segment. Like waving the red flag at the bull in the bullfighting ring, you're going to attract some attention!
These are just four examples. Note that I did not even bother with sites that have lots of footer links, lots of links on the right rail of pages, links from foreign language sites, links from markets where you don't sell/promote your stuff, etc. Those things are already being actively attacked by Google.
There are many other examples similar to the four above that can be constructed with a little thought – add yours to the comments if you like! Or, send some examples and I will give you my opinion.

Some Closing Questions For Qualifying Your Links

These questions were included in my recent article on 10 Common Link Building Problems, but I am going to expand upon them here:
1. Would you build the link if Google and Bing did not exist?
Any good link is something that has value even without search engines. Treat this question seriously, as it mirrors the behavior that Google and Bing want to see.
For example, would you spend 2 hours of a marketing person's time and $200 in expenses writing an article for Nameless Blog 23 just because they let you put that rich anchor text link in the article?
Oh, right. Without search engines that rich anchor text notion might not even be in your vocabulary.
2. If you have 2 minutes with a customer, and the law required that you show a random sampling of your links to customer prospects, would you happily show the link to a target customer? Or would it embarrass you?
This supports the notion that every link should be brand building in nature. A nice variant of this question is - would you proudly show it to your children?
3. Did the person giving you the link intend it as a genuine endorsement?
If not, Google wants to torch it, and so should you. This relates to the infographics and badge examples above, for sure, but it also relates to the blog examples. As soon as proper attribution slips into a model that looks a bit like "payment" you are no longer looking at a citation.
4. Do you have to make an argument to justify that it's a good link?
This is my favorite one. A good link shouldn't be the subject of an argument.
No argument is required with good links. When you see a good link, you know it right away. Sometimes I simplify this statement (for fun) by saying, "If you have to argue it is, it isn't."

Summary

If you've been building links that are exposed by these questions, the best thing you can do is start getting in front of it now. Don't wait for Penguin 2.0, or the next wave of link message penalties to come out. Start getting your business on a sound long-term footing now.
Start actively building unimpeachable links, and start working on eliminating the bad ones. I am not saying that you need to stand on the rooftops and yell out "hey Google I sinned come punish me", but you can begin asking sites that are the source of dangerous links to remove them.
I know that this is a hard decision to make. You have revenue, and you're paying people salaries, etc. See if you can find a way to add the good ones fast enough to make up for removals and keep your business moving forward.
And, for the record, and disclaimer purposes, your mileage may vary. I can't project the exact best strategy for sites that I have not even looked at, but I do know that Google is making large investments in the areas of fighting bad links, and I do know that Cutts let us know that a new Penguin update is coming that he referred to it as a "major update".
The last time Cutts made a similar statement, at SXSW last year, we got Penguin 1.0. You can trust that this new update is coming. You can also trust that it is not the last thing that Google will do.
Even if you get by this next update, learn to truly appreciate the meaning of a true citation and adjust your marketing strategy accordingly.

Google Penguin 2013: How to Evolve Link Building into Real SEO

Google has just rolled out Penguin 2.0, a large algorithmic update promising to go “deeper” than the 2012 Penguin release, which put a hurting on websites with number of manipulative links in their profile.
This prospect creates fear for many small businesses who depend on search engine optimization (SEO) for their livelihoods. But there is also a sense of confusion as the line often shifts and the message from Google contradictory.

Sorting out Panda, Penguin, and Manual Actions

Google's Panda update is a different release than Penguin. Panda is geared toward duplicative, thin, or spun content on websites.
Google's Distinguished Engineer Matt Cutts recently stated that Google is actually pulling back on Panda because of too many false positives. This is good for news aggregators and other sites that reuse content appropriately and have been hit hard by the Panda filter.
Penguin is much harder to understand, focusing on backlink patterns, anchor text, and manipulative linking tactics that provide little value to end users. To make matters worse, Google likes to take large manual actions just prior to major algorithm updates. In 2012 we saw the removal of BuildMyRank from the index just prior to Penguin.
Earlier this year we saw major manual action taken against advertorials. Last week Google announced the removal of thousands of link selling websites and we are hearing of a manual spam penalty against Sprint this week.
The proximity of these manual actions with major algorithmic updates is brilliant PR as it associates them together in our memories, discussions and debates - but they are very different things.

Is SEO Enough?

As small business owners move through the here we go again feelings to actually decide what to do in response to Penguin 2013, sorting out the truth is paramount. Google is clearly beating the familiar drum with the same core messages:
  1. Build a great website.
  2. Make awesome content with high end-user value.
  3. Visitors will magically appear.
But the reality is that visitors don’t magically come, at least on any reasonable scale, without organized promotional activities. Many excellent websites have died a slow death due to lack of promotion. And this is where the contradictions emerge in SEO, which has demonstrated extremely high ROI compared to other marketing channels.

Long Live Online Marketing

While discussed many times, webmasters still struggle with shifting their link building activities to real SEO strategy. They fail to see that SEO in 2013 is now integral to online marketing and no longer a standalone activity.
Whereas SEO used to be about tuning a website for optimal consumption by spiders, today’s SEO is about earning recognition, social spread, and backlinks through excellent content marketing. This means SEO is now ongoing, integrated, and strategic – whereas it used to be one-time, isolated, and technical.

Real SEO

Real SEO is the prescription for those who fear Penguin 2013. Here are practical activities that need to be done every month to achieve real SEO:
  • Continually Identify Audience Demand: Your SEO won't be successful if it isn't useful. To serve a need, webmasters must understand what the audience is seeking. Keyword research, as always, is critical. While doing keyword research don’t over-emphasize head terms or money keywords. Focusing on long-tail keywords renders more immediate results, increases the breadth of a website (remember Panda), and builds authority that will ultimately help the head term.
  • Content marketing: In my opinion, content marketing is the new link building. Earn recognition, social spread, and backlinks by giving away valuable information for free. Excellent content has high audience value and points readers to other resources via cocitation. Video is an excellent form of content marketing that is still under-utilized by small businesses. And newsjacking is an emerging form of content marketing that specifically targets hot news topics for viral spread.
  • Work on brand: There is increasing evidence that branded mentions are an important legitimacy signal to Google. Promoting the brand has traditional marketing benefits and also now helps SEO. But be careful not to turn SEO content marketing into an endorsement, as this crosses the line. Find traditional marketing tactics, such as press releases, to drive branding while announcing news-worthy events.
  • Syndicate: The "build it and they will come" philosophy doesn't work on an Internet with more than 500 million active domain names. This is why even excellent content needs to be promoted. Email marketing, social media, community engagement in forums, and guest blog posting are efficient mechanisms for spreading the word about engaging content. Interviews, PPC ads, and local event sponsorship will also get your name and content noticed. Any activity that broadcasts your message, your brand, and builds real community discussion will ultimately support SEO, and should be considered part of the SEO process.

Conclusions

The arrival of Penguin 2013 has many small business owners scared and confused. But SEO remains one of the best online marketing channels.
Real SEO is the path forward for those who wish to make a long-term investment in online marketing. Forward-looking webmasters can prepare their sites for Penguin 2014, 2015, and beyond with well-researched, end-user focused content marketing that provides strong audience value.
Using modern syndication tactics, they can broadcast their message, gain audience mind-share and earn recognition. By spreading valuable content, small business can build their brands and earn bulletproof backlinks.

The Myth of Content Marketing, the New SEO & Penguin 2.0

"What Should Lead Your Online Marketing Strategy: SEO or Content". "Why Content Marketing is the New SEO". "Is Google's love affair with content marketing usurping SEO?" "Content Marketing is the New SEO "
These are actual titles from article in the top search results for [content marketing and SEO].

Content Marketing Isn't New

nyan-cat-matt-cutts
CONTENT MARKETING! DO IT! It's the NEW SEO!
In fact, it's so awesome you don't need anything else! Just produce awesome content and you will be in SEO nirvana! It's like double rainbows and Matt Cutts got together and had baby NyanCats!
Old SEO is dead. This is the new SEO and it's beautiful!
Sound too good to be true? That's because it is.
Content marketing isn't new. It's just a new buzzword picked up by other industries that suddenly found out they could to "do SEO", but they didn't want to "do SEO", so they tried to make it more special. It isn't.
Content marketing has been around since SEO on Google has been called SEO. To not understand this is to not understand what Google and its algorithms measure and how this might affect your site.
Now with the arrival of Penguin 2.0, you might be just setting yourself up for a fall – right out of the rankings. And yes despite all our talk of rankings not mattering, they do, because if you go from somewhere on Page 1 (with personalization) to nowhere on page 51, you will suddenly say, "Oh no! My rankings!"
Rankings matter. SEO matters. And content marketing is SEO. It always has been, and always will be – well, at least until the search engines don't use algorithms and content, but that's a long way off.
Need more proof of the power of content? Back in 2008, I ranked a website in the top 15 for a one-word term in competitive vertical with no links, a domain that was less than a year old, four weeks from launch, with 1,500 pages of unique, solid, quality content. Every word on the site was original, even the Contact Us.
How do I know content was the reason for getting the site ranked in the top 15? Content! To be fair, I can only be 99 percent sure that content, thanks to a Google engineer at a party at an SES Conference who confirmed it was "most likely the reason".
Like I said, the importance of unique, quality content isn't a new concept.

Just What is "Content Marketing"?

content-marketing-2013-buzzword
If you want the best literal explanation, this quote from Quora (found via Ann Smarty and Authority Labs) works very well:
"Content marketing is the umbrella of all techniques that are used to generate traffic, leads, online visibility, and brand awareness/fidelity."
If you want the one that really gets it, then this one from Sugar Rae says it best:
"Content marketing isn't a new strategy, it's merely a new word.
Why ... do we as an industry feel the need to invent a new buzzword for the same services every few years? We've been doing "content marketing" forever.
  • Website = content
  • Promotion of that website = marketing
Website + promotion of said website = content marketing."
And there you go. It is content that you put on your website and promote. That can be text, video, infographics, images, whatever you think of and put on your site. When you release it as part of your site marketed materials, then it is "content marketing".
It's really that simple. Again, it's not new, it's just a new buzzword.
content-marketing-google-trends
Now that we have that straightened out, what does content marketing have to do with Penguin, SEO, future penalties, and you?
Content marketing is not the new SEO. It is SEO and so are a lot of other things.

It's All SEO Now

One client's site I recently reviewed was brilliant. The company had never bought a link, was completely legit, and worked feverishly on their content marketing – yet they had 16 warnings and penalties. Why? Because while content is great and certainly a very important part of any SEO strategy, it isn't all or even most of what you need to be concerned about when thinking about the algorithm.

Taking Your Eyes Off The Ball

black-hat-flags
So while you were spending all that time concentrating on your content marketing, what were you doing about making sure you met the rest of the 200+ points on the algorithm? What about the other things that Penguin was meant to control?
How is your internal linking? Your anchor text either coming in or internally?
How about where your sites are linking externally? Where are you linking to and are you linking to other sites you own? (triangulation - crosslinking)
What about the other changes Google announced are coming this summer (which I will just term, the "no one is home" penalties for lack of a better term)? You know, like spam comment in your forums or blogs? Or your page speed and usability?
How about your page crawls? Sitemaps? Are you showing Google no one is at the helm while you spend all your time focused on cultivating the latest viral video or super infographic?
Starting to see the issue?
Content marketing isn't separate from SEO and isn't the new SEO. It doesn't replace SEO. It is SEO just like all the other items mentioned are SEO.
matt-cutts-over-optimization
SEO isn't just search engine optimization anymore. It is, as Cutts suggested a little while back, search experience optimization and it covers everything on the website, either directly or relationally.
Once you realize that "content marketing" is just using good content practices and that you might have been neglecting the rest of your site SEO, what should you do?

12 Other Google Update Checks (Penguin Included)

1. Titles and Descriptions

Titles and descriptions still remain one of the most misunderstood items on any site and they are still as important as ever. Know what these mean and how to write each properly. Make sure you don't have duplicates, ones that are too long or over-optimized tags.

2. Anchor Text

Is your anchor text over-optimized with keywords? Are you using keywords when domain names should be used? What is the natural way someone would link to your site? This counts with inbound links as well as internally. Beware of over-optimized and overused keyword anchor text.

3. Links – Inbound & Outbound

Run a link check. How do your inbound links look? The threshold for spammy links was about 80 percent, it is now down to about 50 percent. That means 50 percent questionable links can keep your site or a page out of the index.
Know your link profile.
Using outbound links, make sure you are not sending out link juice on ad links, but still make sure you are doing some links offsite. Google doesn't like it when you hoard that link power all for yourself. Share with worthwhile sites, but never with ad links.

4. Links Cross or Triangulate

Sometimes by accident even, sites crosslink to other sites they own or partner with that site while sitting on the same IP addresses or C classes. Do you know if yours do? If they do, delink your sites or put rel=nofollow on those links, or Google may think you are attempting to put up a link network of your own.
Remember, Google can't discern intent, so the appearance of impropriety is all that you need to give yourself a penalty.

5. Page Speed

Google likes to say page speed is a small factor for websites and maybe for some industries this is the case, but in others our experience shows it isn't. This only makes sense. For Google, faster loading sites lower the load on Google's end, so take the page speed tool, check your site, and get it above a 90 percent if you can. That seems to be the magic threshold for most.

6. User-Generated Content Spam

User-generated content spam on your site is directly linked to a penalty now at Google. (Heard about Sprint's latest fiasco?)
It doesn't take a lot to indicate to Google "No One is Home" keeping an eye on things.
Make sure you have checked your blogs and comment areas for things like multiple https or for words such as "free shipping" with a database crawler or in Google with site:domain.com "words go here" and see, is someone scamming you?
Note: If the spammers are very good you may not be able to see it without a Google search.

7. Redirects

Get a tool like Screaming Frog and check your site pages for redirects then make sure those redirected pages have a 301 permanent redirect, which tells Google the page has been permanently moved and it should keep following it.
It's rare you need a different type of page redirect and if you do, then remove the page from the index with a noindex tag in the header. (There are rare cases where this won't be the case, this is just the general rule.)
Also make sure you have your canonicals in place and that they are correct. This should go without saying, but not all sites do it.

8. Over-Optimization on Non-Content Items

A common type of over-optimization happens in the navigation, the header or footer.
This is where someone either adds a keyword to every (or almost every) word to try to rank for the term or where someone adds an overabundance of header or footer links to "help" a site position for known keywords. This won't help and is likely to give the site a penalty.

9. Alt Attributes

How are you using the alt attribute on your images? Don't stuff keywords into this text. Using good alt text, especially when images are replacing text in links, can be very good for a site. In fact, Google will treat this alt text as actual text in these cases.
Go to http://webaim.org to learn the rules for "alt text" content generation.

10. Ad Issues

Google doesn't like it when a site seems to only be there to support the ads on it, so an overabundance of above the fold ads can cause the site to receive a penalty.
What is too much? Google is a little obtuse about this, but find out what is above the fold for your screen size (not your screen, but the site screen size), then hold up a post it note, if it takes up more space than the note, it is probably too large.

11. Crawl Issues

When is the last time you got into your Webmaster Tools and checked how your crawls were going? How is your crawl rate? Are the spiders having any crawl issues?
We once had a client who had 28k crawl errors. These will affect your site strength and authority with the "No One Is Home" devaluations. So keep an eye on your crawl rate and if it is not crawling well, find out why as quickly as possible and fix it!

12. Malware or Rogue Sites

For the most part, we're fortunate that Google will email us and tell you that you have malware on your site – but be careful: this isn't always the case. Periodically you want to do a search for your site, see if you trigger malware warnings in a site search or mobile, then check your analytics to make sure no one is running anything untoward on your site like say a rogue Viagra site. If you want to see how prevalent this is, go to Google search and put in ".gov" Viagra.
Not only can these sites be doing things on your site that could be causing you "hack" issues, but also sending links to their pages on your site causing your link profile to be damaged.

What Else?

This was just a partial list to get you started. We haven't touched authorship, structured data, URL construction or a whole host of things you should be doing these are just some things you need to be checking, but hopefully you get an idea that myopic SEO is not SEO at all.
penguin-update-2
If you haven't been doing much more than content marketing and thought there was something called the "new SEO" and the "old SEO" was dead, my best advice is with the arrival of Penguin 2.0 and several other changes still on the horizon, is to conduct a site audit.
This is going to be the summer of change on Google, and this article has only touched on some of the items known to be part of the Penguin and Panda algorithms and the coming attractions. Don't get caught with your proverbial pants down, wondering, what happened?
With SEO proactive is always better than reactive, because only a small percentage of sites hit by the first Penguin have ever fully recovered. If (or when) your site gets hit, sometimes all you can do is start again.