Google Delisting: Lessons Learned

Running a hosted blog is supposed take a lot of the hassle out of website maintenance. But as I have twice found out, there is still a lot each blog owner has to be on the lookout for. The Disney Blog was launched in June of 2004. Its traffic, community, and reputation grew steadily every month until it was ranked in the top 5000 blogs on Technorati. Then I ran into a website owner’s worst nightmare, my site was delisted from the Google Index in August 2006.

In the course of a couple days I researched what I thought had caused
the delisting, posted a plea for help, and took corrective action.
Thankfully a few very helpful people pointed me in the right direction
and I figured out the rest from there. I opened up a Google Sitemaps
account, uploaded a sitemap, and submitted a reinclusion plea. At the
time I thought the main cause was a javascript re-direct I had set up
that pointed a closed blog of mine to The Disney Blog. The blog host
service I use, Typepad, doesn’t support 301-redirects, so I used the
javascript redirect instead. To solve that problem, I turned off the
redirect and password protected the archived blog.

I also took the delisting as an opportunity to domain map my old domain
thedisneyblog.typepad.com to the shorter www.thedisneyblog.com domain.
That’s something I should have done a lot earlier, but since Typepad
doesn’t allow 301-redirects, I was worried about the damage it could
cause to my pagerank/quality rank. Once you’re delisted, that’s no
longer a problem. I also asked as many people as I could to update
their links to reflect the new shorter domain.

As it turns out Typepad does something worse in my opinion. They allow
both URLS (name.typepad.com and www.name.com) to resolve rather than
directing one to the other. So a visitor/bot following a link to the
old domain will still see it as an existing site and index it, rather
than be transferred to the new domain.

[aside: another problem with Typepad is that you have to host your blog
in a subfolder. So Technorati and others end up tracking ‘two’ versions
of your blog based on how people like to you (full URL or short). One
at the short domain (www.thedisneyblog.com) and one at
(www.thedisneyblog.com/tdb/). There appears to be no resolving the two
into one on Technorati’s end. So it’s up to Typepad to figure it out.
Perhaps allow their ‘Pro’ level users the ability to use subdomains for
their domain mapped blogs instead of folders or ‘root’ level blogs with
multiple domain names (as well as they way they have it now).]

About 28 days after I was delisted, and about 25 days after I submitted
the plea for reinclusion, I was magically back in the Google Index
(with both URLs showing in searches). I must have done something right.
But there was no word from Google on what had gone wrong in the first
place, so the delisting could have just been an error.

The traffic I received was still lower than what I had been getting. It
took submitting a few more complete sitemaps, a return of my previous
pagerank (actually, I think I still might be down one number from
pre-August), and some popular posts to finally return traffic to the
previous levels. That was about sometime in November 2006. 

December ’06 was a very good month, thanks in large part to the Scary
Mary video. Then last week on Friday January the 5th, I’m checking my
stats and notice a sudden traffic drop-off. I logged in immediately to
Google Webmaster Tools and my suspicions were correct, I was out of the
index. But strangely there was no notice explaining that I had been
dropped, just a generic message saying the site was not in the index.
It took a few days for that notice to change to something that said I’d
actually done something wrong (but didn’t specify what).

In early December I had closed one blog (Movieland) and merged those
posts into The Disney Blog. But I left the old blog up as a reference
until I could get to manually forwarding each post. So my mind
immediately turned to that as the culprit based on my experience in
August. I took swift corrective action and submitted a plea for
reinclusion on Friday night. I also posted a headsup to my readers that
The Disney Blog had been delisted again.

It was a good thing I made that post. Because what I thought was wrong
wasn’t it. I suspect The Disney Blog made it back into the index in
August due to a combination of factors that included the domain mapping
and time. For whatever reason Google only delists you for 30 days or so
before you re-enter the index. Since I had submitted a sitemap with a
new domain that already had people linking to it, I was reindexed no
questions asked.

This time, when I made the plea for help a few SEO experts were
listening and posted about my plight on their blogs. In turn, someone
from Google read those posts and then made the extra effort to post a
comment on my blog explaining exactly what was happening.

Turns out that the real problem was hidden text that GoogleBot
considered spammy. The truth is, I knew about this hidden text but considered it a feature of Typepad. Since meta-tags are often ignored by
indexing bots, having a div tag that holds a description of your
website in an H level markup seemed like a great idea. It was only
hidden because I had chosen an option in Typepad that let me use an
image instead of text for my header. It was only two lines of text and
it wasn’t trying to spam keywords or anything. It was the actual
description of the website. Useful information for an index bot as far
as I’m concerned.

Anyway, I removed that hidden text from the blog. Emailed Six
Apart/Typepad and let them know that Google cracked down on me for
something they offer as a standard feature and let Google know that
they might end up delisting 400,000 or so blogs for the same reason. As
of the morning of Tuesday January 9th, The Disney Blog was back in
Google’s index with numbers very similar to what I had prior to the
delisting. Whew.


So what lessons have I learned from this whole mess?

I have never deliberately used any SEO tricks for The Disney Blog
(being a blog that covers a very specific topic gets me ranked high
enough in Google as it is) and since I’m not using the ‘advanced
templates’ on Typepad, I can’t add any meta-tags. I was trusting that
the hidden div & h2 tag in Typepad’s code met Google Quality
Standards. I don’t think even Typepad was aware that it didn’t. So this
tells me it was possibly some recent change in Googlebot’s behavior (as
some have suggested).

Perhaps Webmaster tools should allow you to add a site description and
prohibit the Googlebot from accessing certain directories there, since
not all website owners have access to the robot.txt file, even on
Google’s on Google Pages tool. In the meantime, know what’s in your
template/raw html. Don’t allow hidden text to sneak through. Google
will find it.

One of the points in the comment left by Google was that they did try
and contact me via email a week before they delisted the site. If I had
received that email I would have taken care of it immediately. Alas,
the only emails they sent to were all su, webmaster, owner, help, info
@ thedisneyblog.com. At no point did they look for contact info on the
website itself, or use contact info I had provided through the Google
Webmaster Tools or Google Analytics. So I’ve added an email for
webmaster@domain.com, and I would recommend every blogger do the same.
It would be nice if at the same time Google sends out the one-week
warning, they also highlight the offending blog in the Webmaster Tools
(and Google Analytics if that’s being used too) and then also send an
email to the account used to configure either of those tools. I think
that would cover all the bases.

I still don’t know if the duplicate content was causing problems with
the indexing. Perhaps that’s not as big a sin as hidden text. I hope
Typepad, Technorati, and Google can get together and figure out how to
have just one site from a blog show up in their respective indexes no
matter if it’s domain mapped, forwarded, hosted in a folder or at the
root, or via RSS.

Conclusion: Turns out that on the internet, what you don’t know can
hurt you. But thankfully, if you can make yourself heard via your blog,
then you can attract the support of others and find solutions.

I want to thank everyone who has contacted me or offered to lend a
hand. You are very much appreciated. Please let me know if you have any
further questions and I’ll try and answer them.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

One thought on “Google Delisting: Lessons Learned

  1. Anil Dash

    John, I apologize again for the difficulties you’ve had with your site being delisted. The search quality team at Google’s been very responsive, and we’ve been working with them to make sure this doesn’t happen to you or our other customers again.

    It seems, right now, that a design choice we made back in 2003 for TypePad, which is completely valid and standard HTML/CSS, is now something being abused by spammers trying to game Google. That escalating arms race has claimed a lot of casualties along the way; Your site’s ranking, as well as the development and troubleshooting time on our part and at Google.

    We’ll keep working on it, and I appreciate your patience and your willingness to share what you’ve learned with the community.

Comments are closed.