Google Webmaster Team on SEO

We’ve seen this a lot, keyword stuffing, the strong tag miss-used and of course the heading tags. I’m deeply sorry folks, but Google is right, for good and long term relationships with the search giant you should put all your effort into content. The rest is just good practice. And if you haven’t seen this yet, you should: Search Engine Optimization Starter Guide.

Snippet: Nofollow for Tag Cloud in WordPress

Tag clouds are good, but in a previous post called WordPress & Google Analytics: Tracking Your Tag Cloud I admitted that they’re not very useful, and especially for search engines — which is the topic I’ll discuss today.

I’m not an SEO junkie or anything but I do have some knowledge of the basic rules, one of which is avoid duplicate content. Now fire up your blog homepage — you’ve a list of your latest posts. Open the tag archive of some popular tag — you’ve got that very same list there. So for sure we have to ask Google not to index the tag archives, but we also have to ask Google not to follow links to such archives on every page.

The first step is done using robots.txt and according to your permalink structure. Mine for instance is /tag/tag-name so in order to disallow Google and other search engines to index that, I add the following to my robots.txt file:

User-agent: *
Disallow: /tag/

Please note that this is not a guide to the best robots.txt for your WordPress blog, and in reality you have to disallow more than just the tag directory (archives, search results, etc).

The second step, which is mainly why I wrote this post, is a short snippet for your functions.php file, that attaches a rel attribute with a nofollow value to every link produced by the wp_tag_cloud built-in function. This looks very much like the code snippet I shared in my last post only with a different regular expression. The two can be combined. Here’s the snippet:

function my_wp_tag_cloud($return) {
	$return = preg_replace('/(<a )/', '1rel="nofollow" ', $return);
	return $return;
add_filter('wp_tag_cloud', 'my_wp_tag_cloud');

Not the best regular expression ever, but that should work on the major themes out there. It looks for the opening tag of the anchor and simply attaches rel=nofollow, as easy as that ;)

Now save that and go back to your blog, view the source (Ctrl+U) and verify that it didn’t break anything. Now all your links in your tag cloud are marked as nofollow, thus each page containing the tag cloud will give away much less link weight to your tag archive pages (according to Google’s PageRank algorithm), oh by the way Larry Page is now CEO ;)

Update: Also, as Nathan and Andrew mentioned in the comments, there’s a wp_rel_nofollow function available since WordPress 1.5.0, so you might as well use that directly:

add_filter('wp_tag_cloud', 'wp_rel_nofollow');

Works like a charm, but the manual technique I’ve shown earlier allows you to add other relations to links in the tag cloud, for instance archives, or attach tracking code to your links for Google Analytics (a snippet introduced in an earlier post).

Hope that helps you out in your SEO journey.

URL Shorteners and the Linkrot Apocalypse

First of all I’d like to thank you all for your support on the Twitter Friendly Links plugin for WordPress. It started out as a fairly simple URL shortener tool. Now it’s got loads of new options and some compatibility fixes. Keep the suggestions coming :)

Today I came across a bunch of articles about link relations, the way Google and other search engines treat them, and the way a variety of scripts, plugins, tools, etc. work with them. It seems that there isn’t a strong standard yet (perhaps everybody’s waiting for W3C) so most clients now support different styles.

I’m talking about the shortlink, short_url, short-url and other relations in HTML and HTTP responses. In the Twitter Friendly Links plugin I went with the shortlink specification as stated here although there’s a competing alternative which looks alike. Anyway, I wrote about extending the plugin and creating a little API that could transform long permalinks to short ones within the blog. It seems that there’s no need for that. If Twitter could access the page we link to, look out for the shortlink in the head section (or perhaps the HTTP response) then return THAT short link instead of the old-fashion trimmed one, that’d be great, right?

There’s also one called rev=canonical, which pretty much does the same as a link rel=shortlink. It’s being used on many sites right now, though I’m not sure that Google reads that at all. Webmasters confuse this with the rel=canonical which on the other hand got Google support in February this year. The idea behind rev=canonical is to specify the reversed canonical, i.e. (perhaps) a shorter link to the same page, but it came up with a bunch of security issues for cross-domain linking (like when specifying a short link generated via TinyURL). Also, the rev attribute is gonna be gone in HTML5, but until then we’re free to use it, so that’s why I included this option in the lates (0.3.4) release of Twitter Friendly Links.

If you’re interested in linking relations, you might want to read:

Note, that these are just thoughts, standards yet to come :) We DO have to get rid of those TinyURL ugly links though somehow. Good luck!

P.S. Twitter Friendly Links is now compatible with AskApache Google 404.

More Google SEO Tips

“What are some simple ways that I can improve my website’s performance in Google?”

Yeah, they are simple enough… I bet you’ve been following the Google webmaster central blog and have seen the post about their new SEO starter guide. Well, as they said, everybody knows this stuff already, but they just wanted to remind us all with one handy little guide. That’s okay, there are some pretty interesting facts in the guide, especially the “avoid” sections.

You see, when you’re creating a webpage from scratch, it’s pretty easy to line with the rules. I mean the title tags, the meta description tags, linking structure, site navigation. Well, the real pain in the ass is when you get to edit a complete website with around 300-400 pages, that all have same titles, same descriptions and keywords tags, all done in plain html and a tabled structure. Oh, and it has a stupid javascript menu aswell, like in the 90’s ;)

I’ve also noticed that stuffing keywords in the page title makes absolutely no sense. I mean Google’s first results page (on almost any keyword or keyphrase) contains high ranked pages that have short, descriptive titles, so there really is no need to repeat your keywords 5-6 times in the title, I’d say it’s rather harmful doing so.

Almost forgot. Have you seen the new Google Analytics overview pages? Their new design rocks and the advanced segmentation too – you can get up to 4 factors onto your dashboard line chart, so if you’re using Google Analytics as a stats program, you won’t have to click anywhere else besides your dashboard. Though I still do encourage you to stop wasting time (and money) and start using Google Analytics further more than just a stats program. There’s plenty of information on the web about how to take advantage of all Google Analytics’ features.

Some Handy SEO Tricks

Today we’ll talk about Google and search engine optimization techniques. The reason why I chose this subject, is that I’m having some problems achieving good results in the russian blogging community and other minor projects. We’ll start with a brief overlook at the top 10 google myths.

The top 10 myths about google were talked about on the “Tricks and Treats” webmasters event a couple of weeks ago, so if anybody is interested, you could get detailed information on the Google Webmaster Central Blog. Anyways, I’ve formed a short-list out of the myths:

  1. Don’t worry about duplicate content, it will not penalize your site.
  2. Webmaster Tools validation doesn’t care if your page is HTML or XHTML and whatever doctype you picked.
  3. Getting listed in 1000s of search engines and directories does not make sense.
  4. Google AdWords, AdSense and Analytics are not evil.
  5. Keyword stuffing is bad, very bad.
  6. XML sitemaps are good, very good.
  7. PageRank is only 1 out of over 200 other factors that are used by Google for site rankings.
  8. Resubmitting your site to Google won’t harm it.
  9. Don’t stop working on your site, even if you rank 1st.
  10. Valid HTML/XHTML doesn’t affect the rankings.

I think that’s enough for a quick start. You see, the main problem in SEO is that people are making websites for search engines, rather than people, and that is why most of them don’t rank high in search results. Google is working on giving out good results, ones wanted by people, and that’s basically why sites “for search engines” don’t show up on results’ first page.

Although keyword stuffing, doorway pages and google bombing could still work in some manner. It’s not like in the late 90-s or mid 00-s, but keeping your anchors clean and nice, while posting inbound links to your site, is very important, just like when “google bombing” ;)