scratch that niche!

SEO and Query Strings

Sorry, I’m going to rant a bit here. I’ve never been a fan of SEO consultants. I think 99% of them run a barely legal racket. The good ones, the ones in the 1% agree with me. The industry is rife with fear-mongering snake-oil salesmen who use all kinds of crappy tactics to part marketing managers and web site owners with their hard-earned cash.

One tactic, which Bill Leake of LCG covered at InnoTech 2005 in Austin TX was focusing on the 20% of SEO and ignoring the 80% that matters. SEO consultants, instead of building the quality of inbound links to their clients’ web sites, instead do the SEO copywriting/keyword churn. Why? Because they can bill more for it, naturally.

Another thing I keep hearing all the time, and it makes me want to scream, “Liar! Liar!” (think of that scene from The Princess Bride) is that search engine spiders have trouble negotiating query strings.

You know, these are URLs that look like:

http://blog.tripledogs.com/index.php?cat=4

So do me a favor right now, open a new window and do a google search on No Nonsense XML Web Development with PHP. Go ahead, I’ll wait for you.

Back already? That’s right, that’s my book. Notice what the first entry is? A page on amazon.com. The link looks like this:

www.amazon.com/exec/obidos/tg/detail/-/097524020X?v=glance

Looks like a query string to me. In fact, this amazon.com page is ranked higher than Sitepoint’s (ie, the book’s publisher) page for the same book. Why? Probably because tons more web sites out there link back to amazon.com’s page than sitepoint.com’s.

Look, if Google and other search engines couldn’t traverse dynamic sites, then huge swaths of the internet, like online databases, blogs, threaded discussion forums, and ecommerce sites (to name a few) would go unlisted.

Here’s what Google has to say on the subject of query strings:

  • If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site.
  • Don’t use “&id=” as a parameter in your URLs, as we don’t include these pages in our index.

Wanna learn more? Read Google’s Webmasters Guideline.

BTW, TopDog, our CMS, allows the use of friendly URLs, such as /aboutus.html instead of the use of CMS-generated url, which might be more like /innerpage.php?pageid=17. But notice the use of pageid instead of id. (As Chris Beasley points out below, you should also avoid the use of session variables in your query strings as this creates constantly changing URLs for the same content, and this could be a bad thing.)

Also, please note that friendly URLs are friendly to PEOPLE, who find it easier to remember aboutus.html instead of the query string. The point is brought up in an article about query string myth.

(An article, by the way, whose URL contains a query string [?itemid=518] but which scored very high on a Google search for “common seo myths query strings”.)

Before I go, I gotta quote one more source:

Myth #8: Google Will Not Index Dynamic Pages

Some search engines have, in the past, had problems with dynamic pages, that is, pages that use a query string [14]. This was not due to any technical limitation, but rather, because search engines knew that it was possible to create a set of an infinite amount of dynamic pages, or they could create an endless loop. In either case, the search engines did not want their crawlers to be caught spidering endless numbers of dynamically generated pages.

Google is a newer search engine, and has never had a problem with query strings. However, some dynamic pages can still throw Google for a loop.

Some shopping carts or forums store session information in the URL when cookies [15] are unable to be written. This effectively kills search engines like Google because search engines key their indexes with URLs, and when you put session information in the URL, that URL will change constantly. This is especially true as Google uses multiple IP addresses to crawl the Web, so each crawler will see a different URL on your site, which basically results in those pages not being listed. It is important that if you use such software, you amend it so that if cookies are unable to be written, the software simply does not track session information.

So, you don’t need to use search engine-friendly URLs [16] to be listed in Google. However, these URLs do have other benefits, such as hiding what server side technology you use (so that you may change it seamlessly later), and they are more people-friendly. Additionally, while Google can spider dynamic pages, it may limit the amount of dynamic pages it spiders from one particular site. Your best bet for a good ranking is to use search-engine friendly URLs.

This is from Chris Beasley, writing for Sitepoint. Is SitePoint the publisher of my book? Of course. Does that make this information any less true? Of course not. Please note that Chris seems to advocate the “safer is better” approach, in that a search engine friendly URL might get more of your pages spidered.

So, is it okay to use dynamically generated content with query strings in the URL. Sure. Just avoid the id= keyword and inserting session variables. If you’re feeling paranoid, then use Apache’s mod_rewrite to change your dynamic URLs into something more human readable (ie, from /innerpage.php?pageid=7&category=SEO to /topics/seo/page/7), or use a friendly URL module like the one we wrote for TopDog.

Here’s Google’s FAQ on what to watch out for when it comes to SEO scams. Don’t get taken in by these guys.

Comments

  1. jesse pakin
    October 13th, 2007 | 12:41 am

    look at the link now… what does that mean? that you are incorrect?

    http://www.amazon.com/Nonsense-XML-Web-Development-PHP/dp/097524020X

  2. October 18th, 2007 | 12:26 pm

    Good question! I don’t think that moving to this kind of URL structure (ie., in your example, amazon.com has moved away from query strings in their URLs) has all that much to do with google.

    I think it has more to do with making URLs a bit easier to read (we’re accustomed to looking at complex paths); also, I suspect that Amazon is trying to do a better job of hiding their underlying technology.

    For example, when I do work in CodeIgniter or other MVC framework, the URLs tend to look like http://www.example.com/pages/16 or http://www.example.com/post/5. Underneath exists a PHP (or Ruby or Python) controller, but there really is no need to tell the enduser that.

  3. October 18th, 2007 | 12:30 pm

    BTW, in our latest TopDog rollout, our urls look more like /pages/16 then innerpage.php?pageid=16. Why? Mostly because for content editors, /pages/16 is way easier to remember than the other string!

  4. June 25th, 2008 | 2:28 pm

    Now that Amazon has changed the URL won’t google see that as two different URL’s pointing to the same content?.

    And if you where a site owner would that not hurt you because you where watering down your page rank?

  5. November 7th, 2008 | 3:23 pm

    Google recommends that you, wherever possible, shorten your urls eliminating unnecessary parameters.

    http://www.google.com/support/webmasters/bin/answer.py?answer=76329&hl=en

    Of course Google can crawl these pages. It does however save google a lot of trouble when i doesnt’ need to crawl a site that can generate so many more pages by the combination of parameters select through the sites menus.

  6. sylvia
    December 8th, 2008 | 1:55 am

    Trying to figure out query strings, google site do not seem to like it. New to web- I am a graphic artist doing webdesign, learning the lingo. Have the graphics down just need to know the programming ??? I work on clear cart. Does anybody have feed back?

  7. tkane2000
    April 6th, 2009 | 12:42 pm

    When you create Apache mod_rewrite rules, do those throw a 301 redirect?

Leave a reply