previous post
(We’ll Pay You to Be Our...)
May 03

Apparently everyone needs to think about Search Engine Optimization and PageRank. For the longest while, I was quite happy just blogging my thoughts away without thinking about SEO, PR, and all those other acronyms. Apparently, I’m wrong.

Josh Spaulding’s post on robots.txt showed me the light. Here’s what I learned about robots.txt from Josh’s post and by looking at his robots.txt file:

  • Use it to remove pages from the search engines so that they can concentrate on your content (i.e. remove i.e. your wp-admin page, your contact-us page, etc)
  • The robots.txt file can be read by anyone so the pages that you are trying to hide from the search engines can be seen by humans. (i.e. don’t hide your ebook link in a robots.txt file)
  • Be careful. If you make a mistake, it might tell search engines to avoid pages you want crawled. (But you can see what Google sees in your Google Webmaster Tools account)

Josh’s post pointed us to some information from Matt Cutts, software engineer at Google. Matt was interviewed by Eric Enge at StoneTemple Consulting and said the following:

Now, robots.txt says you are not allowed to crawl a page, and Google therefore does not crawl pages that are forbidden in robots.txt. However, they can accrue PageRank, and they can be returned in our search results… we wouldn’t crawl you from robots.txt, but we could return that URL reference that we saw. [Based on the links from other sites to those pages]… we would return the un-crawled reference.

Here’s what I learned from Matt’s interview:

  • You can use robots.txt to hide certain pages from Google, but they can still turn up in the search results. (Google can use other links to that page to get information, even if they don’t go there themselves.)
  • You can use a meta tag (NoIndex) on any page to keep Google from showing it in the search results. (Google won’t return a NoIndex page in a search result, but Yahoo and the others might do it differently.)
  • A NoIndex page can accumulate PageRank, can still build up and pass on PageRank (Google will still follow links in and out of a noindex page. It just won’t show that page in the results. Matt’s example was to noindex a login page. A login page has very little content, but links out to lots of good places and receives lots of good links.)
  • A meta tag (NoFollow) on a page will keep Google from following outgoing links and so that page will keep the PageRank juice.
  • An anchor tag (<a rel=”NoFollow” href=”site.com” mce_href=”site.com”>) will keep Google from sending PageRank juice to that link.

Is any of this important for beginning websites? I don’t know.

Matt said these were things to “sculpt where you want your PageRank to flow, or where you want Googlebot to spend more time and attention.” We have a PageRank of 0. Does it really matter? (As long as we turn off the pagerank leakage.)

Bottom Line: We looked at Josh’s robots.txt post, as well as at ShoeMoney’s robot.txt post to figure out what we want our robots.txt file to look like.

Question: Have you changed your robots.txt file to build pagerank by persuading the search engines to hang out in specific areas of your site?

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

3 Responses to “Using Robots.txt to Improve your PageRank”

  1. Dennis Edell Says:

    Andy Beard also writes about such things a lot. My totally tech challenged brain keeps getting in the way.

    Dennis Edell’s last blog post..Attention Regular Readers AND Those That Have Linked To Me – PLEASE READ

  2. Webhosting Reality Says:

    I was using robots.txt in my SMF-powered forum sites. The punishing consequence was that Google gave me more than 2,000 indexed pages. I removed the robots.txt (only some parameters) and Google gave me 23,000 indexed pages. And I still rank high. I have PR4 for a one-year old forum.

    Robots.txt, if used incorrectly, could irk the sensitive Googlebot.

    Webhosting Reality’s last blog post..One More Secret of John Chow’s Popularity Is Finally Revealed

  3. Blogueiro Says:

    Hi,

    Yeah, robot.txt is the best way for improving PR, by blocking pages.
    We should only permit G bot to find the posts, and block all other pages.

    Blogueiro’s last blog post..Primeiro Lugar no Google

Leave a Reply

Related Posts

ss_blog_claim=b734c69ef5a7cfe5ec76e92a2b196f51