There are lots of different opinions on what causes Google to index a site or page. With a test on a new test subdomain on a brand new domain, I tried to bust a couple of these myths.
1) Google indexes a website if you add a Google Analytics code. Busted!
Nope. The first thing I did, was adding Google Analytics to all pages on this subdomain. I even visited some of the pages with around ten different IPs. Even after a month, Google didn’t index the website.
2) Google indexes a website if you use Google AdWords. Busted!
The second thing I did, was opening a brand new Google AdWords account. By bidding on a few keywords (with low, lower and really, really low search volumes), I hoped to see if this could trigger Google to index the subdomain. I even visited some of the pages by clicking on a few ads with around five different IPs. Three weeks, the website still wasn’t indexed.
3) Google indexes a website if you add Google AdSense. Busted!
Unlike the previous tests, I already assumed that this one wouldn’t work. Adding AdSense and even clicking a few of the ads (don’t worry, it were PSAs) didn’t manage to get the subdomain indexed.
4) Google indexes a page that can only be reached through nofollow links. Busted!
As you can see on the first page of the test subdomain, one of the links is being nofollowed. Like Matt mentioned a few times, Google shouldn’t index nofollowed pages (accessible by nofollow links only) or give weight to nofollowed anchortext. After doing a site: query, I noticed that the page isn’t indexed. Google doesn’t even return the URL of the page. But here comes the weird part.
5) Google indexes a page that is excluded by robots.txt. Plausible!?!
I always thought that adding a noindex-meta tag to a page would mean about the same as excluding it with robots.txt. This, however, would mean that Google won’t show the page in it’s index. As you can see in the screenshot below, Google actually shows an excluded URL when you do a site: search.
I don’t care if it’s a frikkin’ supplemental page, you shouldn’t show this page if I explicitly tell you not to access the page!
So if you don’t want Google to show a particular URL, I guess you’d better use nofollow in stead of robots.txt…
This was just part of a first test, I hope I will be able to publish some more results/ bust a few more myths soon. If you want to suggest an SEO myth, or if you have test design suggestions or other things, please leave a comment or send an email. Seriously, I’d appreciate it.
Oh, and the credits for the personification of Googlebot go to SEOmoz (like you didn’t know…).