21 Tactics to Increase Blog Traffic (Updated 2012)

It’s easy to build a blog, but hard to build a successful blog with significant traffic. Over the years, we’ve grown the Moz blog to nearly a million visits each month and helped lots of other blogs, too. I launched a personal blog late last year and was amazed to see how quickly it gained thousands of visits to each post. There’s an art to increasing a blog’s traffic, and given that we seem to have stumbled on some of that knowledge, I felt it compulsory to give back by sharing what we’ve observed.

#1 – Target Your Content to an Audience Likely to Share

When strategizing about who you’re writing for, consider that audience’s ability to help spread the word. Some readers will naturally be more or less active in evangelizing the work you do, but particular communities, topics, writing styles and content types regularly play better than others on the web. For example, great infographics that strike a chord (like this one), beautiful videos that tell a story (like this one) and remarkable collections of facts that challenge common assumptions (like this one) are all targeted at audiences likely to share (geeks with facial hair, those interested in weight loss and those with political thoughts about macroeconomics respectively).

A Blog's Target Audience

If you can identify groups that have high concentrations of the blue and orange circles in the diagram above, you dramatically improve the chances of reaching larger audiences and growing your traffic numbers. Targeting blog content at less-share-likely groups may not be a terrible decision (particularly if that’s where you passion or your target audience lies), but it will decrease the propensity for your blog’s work to spread like wildfire across the web.

#2 – Participate in the Communities Where Your Audience Already Gathers

Advertisers on Madison Avenue have spent billions researching and determining where consumers with various characteristics gather and what they spend their time doing so they can better target their messages. They do it because reaching a group of 65+ year old women with commercials for extreme sports equipment is known to be a waste of money, while reaching an 18-30 year old male demographic that attends rock-climbing gyms is likely to have a much higher ROI.

Thankfully, you don’t need to spend a dime to figure out where a large portion of your audience can be found on the web. In fact, you probably already know a few blogs, forums, websites and social media communities where discussions and content are being posted on your topic (and if you don’t a Google search will take you much of the way). From that list, you can do some easy expansion using a web-based tool like DoubleClick’s Ad Planner:

Sites Also Visited via DoubleClick

Once you’ve determined the communities where your soon-to-be-readers gather, you can start participating. Create an account, read what others have written and don’t jump in the conversation until you’ve got a good feel for what’s appropriate and what’s not. I’ve written a post here about rules for comment marketing, and all of them apply. Be a good web citizen and you’ll be rewarded with traffic, trust and fans. Link-drop, spam or troll and you’ll get a quick boot, or worse, a reputation as a blogger no one wants to associate with.

#3 – Make Your Blog’s Content SEO-Friendly

Search engines are a massive opportunity for traffic, yet many bloggers ignore this channel for a variety of reasons that usually have more to do with fear and misunderstanding than true problems. As I’ve written before, “SEO, when done right, should never interfere with great writing.” In 2011, Google received over 3 billion daily searches from around the world, and that number is only growing:

Daily Google Searches 2004-2011
sources: Comscore + Google

Taking advantage of this massive traffic opportunity is of tremendous value to bloggers, who often find that much of the business side of blogging, from inquiries for advertising to guest posting opportunities to press and discovery by major media entities comes via search.

SEO for blogs is both simple and easy to set up, particularly if you’re using an SEO-friendly platform like WordPress, Drupal or Joomla. For more information on how to execute on great SEO for blogs, check out the following resources:

  • Blogger’s Guide to SEO (from SEOBook)
  • The Beginner’s Guide to SEO (from Moz)
  • WordPress Blog SEO Tutorial (from Yoast)
  • SEO for Travel Bloggers (but applicable to nearly any type of blog – from Moz)

Don’t let bad press or poor experiences with spammers (spam is not SEO) taint the amazing power and valuable contributions SEO can make to your blog’s traffic and overall success. 20% of the effort and tactics to make your content optimized for search engines will yield 80% of the value possible; embrace it and thousands of visitors seeking exactly what you’ve posted will be the reward.

#4 – Use Twitter, Facebook and Google+ to Share Your Posts & Find New Connections

Twitter just topped 465 million registered accounts. Facebook has over 850 million active users. Google+ hasnearly 100 million. LinkedIn is over 130 million. Together, these networks are attracting vast amounts of time and interest from Internet users around the world, and those that participate on these services fit into the “content distributors” description above, meaning they’re likely to help spread the word about your blog.

Leveraging these networks to attract traffic requires patience, study, attention to changes by the social sites and consideration in what content to share and how to do it. My advice is to use the following process:

  • If you haven’t already, register a personal account and a brand account at each of the following -Twitter, Facebook, Google+ and LinkedIn (those links will take you directly to the registration pages for brand pages). For example, my friend Dharmesh has a personal account for Twitter and a brand account for OnStartups (one of his blog projects). He also maintains brand pages onFacebook, LinkedIn and Google+.
  • Fill out each of those profiles to the fullest possible extent – use photos, write compelling descriptions and make each one as useful and credible as possible. Research shows that profiles with more information have a significant correlation with more successful accounts (and there’s a lot of common sense here, too, given that spammy profiles frequently feature little to no profile work).
  • Connect with users on those sites with whom you already share a personal or professional relationships, and start following industry luminaries, influencers and connectors. Services likeFollowerWonk and FindPeopleonPlus can be incredible for this:

Followerwonk Search for "Seattle Chef"

  • Start sharing content – your own blog posts, those of peers in your industry who’ve impressed you and anything that you feel has a chance to go “viral” and earn sharing from others.
  • Interact with the community – use hash tags, searches and those you follow to find interesting conversations and content and jump in! Social networks are amazing environment for building a brand, familiarizing yourself with a topic and the people around it, and earning the trust of others through high quality, authentic participation and sharing

If you consistently employ a strategy of participation, share great stuff and make a positive, memorable impression on those who see your interactions on these sites, your followers and fans will grow and your ability to drive traffic back to your blog by sharing content will be tremendous. For many bloggers, social media is the single largest source of traffic, particularly in the early months after launch, when SEO is a less consistent driver.

#5 – Install Analytics and Pay Attention to the Results

At the very least, I’d recommend most bloggers install Google Analytics (which is free), and watch to see where visits originate, which sources drive quality traffic and what others might be saying about you and your content when they link over. If you want to get more advanced, check out this post on 18 Steps to Successful Metrics and Marketing.

Here’s a screenshot from the analytics of my wife’s travel blog, the Everywhereist:

Traffic Sources to Everywhereist from Google Analytics

As you can see, there’s all sorts of great insights to be gleaned by looking at where visits originate, analyzing how they were earned and trying to repeat the successes, focus on the high quality and high traffic sources and put less effort into marketing paths that may not be effective. In this example, it’s pretty clear that Facebook and Twitter are both excellent channels. StumbleUpon sends a lot of traffic, but they don’t stay very long (averaging only 36 seconds vs. the general average of 4 minutes!).

Employing analytics is critical to knowing where you’re succeeding, and where you have more opportunity. Don’t ignore it, or you’ll be doomed to never learn from mistakes or execute on potential.

#6 – Add Graphics, Photos and Illustrations (with link-back licensing)

If you’re someone who can produce graphics, take photos, illustrate or even just create funny doodles in MS Paint, you should leverage that talent on your blog. By uploading and hosting images (or using a third-party service like Flickr to embed your images with licensing requirements on that site), you create another traffic source for yourself via Image Search, and often massively improve the engagement and enjoyment of your visitors.

When using images, I highly recommend creating a way for others to use them on their own sites legally and with permission, but in such a way that benefits you as the content creator. For example, you could have a consistent notice under your images indicating that re-using is fine, but that those who do should link back to this post. You can also post that as a sidebar link, include it in your terms of use, or note it however you think will get the most adoption.

Some people will use your images without linking back, which sucks. However, you can find them by employing the Image Search function of “similar images,” shown below:

Google's "Visually Similar" Search

Clicking the “similar” link on any given image will show you other images that Google thinks look alike, which can often uncover new sources of traffic. Just reach out and ask if you can get a link, nicely. Much of the time, you’ll not only get your link, but make a valuable contact or new friend, too!

#7 – Conduct Keyword Research While Writing Your Posts

Not surprisingly, a big part of showing up in search engines is targeting the terms and phrases your audience are actually typing into a search engine. It’s hard to know what these words will be unless you do some research, and luckily, there’s a free tool from Google to help called the AdWords Keyword Tool.

Type some words at the top, hit search and AdWords will show you phrases that match the intent and/or terms you’ve employed. There’s lots to play around with here, but watch out in particular for the “match types” options I’ve highlighted below:

Google AdWords Tool

When you choose “exact match” AdWords will show you only the quantity of searches estimated for that precise phrase. If you use broad match, they’ll include any search phrases that use related/similar words in a pattern they think could have overlap with your keyword intent (which can get pretty darn broad). “Phrase match” will give you only those phrases that include the word or words in your search – still fairly wide-ranging, but between “exact” and “broad.”

When you’re writing a blog post, keyword research is best utilized for the title and headline of the post. For example, if I wanted to write a post here on Moz about how to generate good ideas for bloggers, I might craft something that uses the phrase “blog post ideas” or “blogging ideas” near the front of my title and headline, as in “Blog Post Ideas for When You’re Truly Stuck,” or “Blogging Ideas that Will Help You Clear Writer’s Block.”

Optimizing a post to target a specific keyword isn’t nearly as hard as it sounds. 80% of the value comes from merely using the phrase effectively in the title of the blog post, and writing high quality content about the subject. If you’re interested in more, read Perfecting Keyword Targeting and On-Page Optimization (a slightly older resource, but just as relevant today as when it was written).

#8 – Frequently Reference Your Own Posts and Those of Others

The web was not made for static, text-only content! Readers appreciate links, as do other bloggers, site owners and even search engines. When you reference your own material in-context and in a way that’s not manipulative (watch out for over-optimizing by linking to a category, post or page every time a phrase is used – this is almost certainly discounted by search engines and looks terrible to those who want to read your posts), you potentially draw visitors to your other content AND give search engines a nice signal about those previous posts.

Perhaps even more valuable is referencing the content of others. The biblical expression “give and ye shall receive,” perfectly applies on the web. Other site owners will often receive Google Alerts or look through their incoming referrers (as I showed above in tip #5) to see who’s talking about them and what they’re saying. Linking out is a direct line to earning links, social mentions, friendly emails and new relationships with those you reference. In its early days, this tactic was one of the best ways we earned recognition and traffic with the SEOmoz blog and the power continues to this day.

#9 – Participate in Social Sharing Communities Like Reddit + StumbleUpon

The major social networking sites aren’t alone in their power to send traffic to a blog. Social community sites like Reddit (which now receives more than 2 billion! with a “B”! views each month), StumbleUpon, Pinterest,Tumblr, Care2 (for nonprofits and causes), GoodReads (books), Ravelry (knitting), Newsvine (news/politics) and many, many more (Wikipedia maintains a decent, though not comprehensive list here).

Each of these sites have different rules, formats and ways of participating and sharing content. As with participation in blog or forum communities described above in tactic #2, you need to add value to these communities to see value back. Simply drive-by spamming or leaving your link won’t get you very far, and could even cause a backlash. Instead, learn the ropes, engage authentically and you’ll find that fans, links and traffic can develop.

These communities are also excellent sources of inspiration for posts on your blog. By observing what performs well and earns recognition, you can tailor your content to meet those guidelines and reap the rewards in visits and awareness. My top recommendation for most bloggers is to at least check whether there’s an appropriate subreddit in which you should be participating. Subreddits and their search functioncan help with that.

#10 – Guest Blog (and Accept the Guest Posts of Others)

When you’re first starting out, it can be tough to convince other bloggers to allow you to post on their sites OR have an audience large enough to inspire others to want to contribute to your site. This is when friends and professional connections are critical. When you don’t have a compelling marketing message, leverage your relationships – find the folks who know you, like you and trust you and ask those who have blog to let you take a shot at authoring something, then ask them to return the favor.

Guest blogging is a fantastic way to spread your brand to new folks who’ve never seen your work before, and it can be useful in earning early links and references back to your site, which will drive direct traffic and help your search rankings (diverse, external links are a key part of how search engines rank sites and pages). Several recommendations for those who engage in guest blogging:

  • Find sites that have a relevant audience – it sucks to pour your time into writing a post, only to see it fizzle because the readers weren’t interested. Spend a bit more time researching the posts that succeed on your target site, the makeup of the audience, what types of comments they leave and you’ll earn a much higher return with each post.
  • Don’t be discouraged if you ask and get a “no” or a “no response.” As your profile grows in your niche, you’ll have more opportunities, requests and an easier time getting a “yes,” so don’t take early rejections too hard and watch out – in many marketing practices, persistence pays, but pestering a blogger to write for them is not one of these (and may get your email address permanently banned from their inbox).
  • When pitching your guest post make it as easy as possible for the other party. When requesting to post, have a phenomenal piece of writing all set to publish that’s never been shared before and give them the ability to read it. These requests get far more “yes” replies than asking for the chance to write with no evidence of what you’ll contribute. At the very least, make an outline and write a title + snippet.
  • Likewise, when requesting a contribution, especially from someone with a significant industry profile, asking for a very specific piece of writing is much easier than getting them to write an entire piece from scratch of their own design. You should also present statistics that highlight the value of posting on your site – traffic data, social followers, RSS subscribers, etc. can all be very persuasive to a skeptical writer.

A great tool for frequent guest bloggers is Ann Smarty’s MyBlogGuest, which offers the ability to connect writers with those seeking guest contributions (and the reverse).

MyBlogGuest

Twitter, Facebook, LinkedIn and Google+ are also great places to find guest blogging opportunities. In particular, check out the profiles of those you’re connected with to see if they run blogs of their own that might be a good fit. Google’s Blog Search function and Google Reader’s Search are also solid tools for discovery.

#11 – Incorporate Great Design Into Your Site

The power of beautiful, usable, professional design can’t be overstated. When readers look at a blog, the first thing they judge is how it “feels” from a design and UX perspective. Sites that use default templates or have horrifying, 1990’s design will receive less trust, a lower time-on-page, fewer pages per visit and a lower likelihood of being shared. Those that feature stunning design that clearly indicates quality work will experience the reverse – and reap amazing benefits.

Blog Design Inspiration
These threads – 1, 2, 3 and 4 – feature some remarkable blog designs for inspiration

If you’re looking for a designer to help upgrade the quality of your blog, there’s a few resources I recommend:

  • Dribbble – great for finding high quality professional designers
  • Forrst – another excellent design profile community
  • Behance – featuring galleries from a wide range of visual professionals
  • Sortfolio – an awesome tool to ID designers by region, skill and budget
  • 99 Designs – a controversial site that provides designs on spec via contests (I have mixed feelings on this one, but many people find it useful, particularly for budget-conscious projects)

This is one area where budgeting a couple thousand dollars (if you can afford it) or even a few hundred (if you’re low on cash) can make a big difference in the traffic, sharing and viral-impact of every post you write.

#12 – Interact on Other Blogs’ Comments

As bloggers, we see a lot of comments. Many are spam, only a few add real value, and even fewer are truly fascinating and remarkable. If you can be in this final category consistently, in ways that make a blogger sit up and think “man, I wish that person commented here more often!” you can achieve great things for your own site’s visibility through participation in the comments of other blogs.

Combine the tools presented in #10 (particularly Google Reader/Blog Search) and #4 (especiallyFollowerWonk) for discovery. The feed subscriber counts in Google Reader can be particularly helpful for identifying good blogs for participation. Then apply the principles covered in this post on comment marketing.

Google Reader Subscriber Counts

Do be conscious of the name you use when commenting and the URL(s) you point back to. Consistency matters, particularly on naming, and linking to internal pages or using a name that’s clearly made for keyword-spamming rather than true conversation will kill your efforts before they begin.

#13 – Participate in Q+A Sites

Every day, thousands of people ask questions on the web. Popular services like Yahoo! Answers,Answers.com, Quora, StackExchange, Formspring and more serve those hungry for information whose web searches couldn’t track down the responses they needed.

The best strategy I’ve seen for engaging on Q+A sites isn’t to answer every question that comes along, but rather, to strategically provide high value to a Q+A community by engaging in those places where:

  • The question quality is high, and responses thus far have been thin
  • The question receives high visibility (either by ranking well for search queries, being featured on the site or getting social traffic/referrals). Most of the Q+A sites will show some stats around the traffic of a question
  • The question is something you can answer in a way that provides remarkable value to anyone who’s curious and drops by

I also find great value in answering a few questions in-depth by producing an actual blog post to tackle them, then linking back. This is also a way I personally find blog post topics – if people are interested in the answer on a Q+A site, chances are good that lots of folks would want to read it on my blog, too!

Just be authentic in your answer, particularly if you’re linking. If you’d like to see some examples, I answer a lot of questions at Quora, frequently include relevant links, but am rarely accused of spamming or link dropping because it’s clearly about providing relevant value, not just getting a link for SEO (links on most user-contributed sites are “nofollow” anyway, meaning they shouldn’t pass search-engine value). There’s a dangerous line to walk here, but if you do so with tact and candor, you can earn a great audience from your participation.

#14 – Enable Subscriptions via Feed + Email (and track them!)

If someone drops by your site, has a good experience and thinks “I should come back here and check this out again when they have more posts,” chances are pretty high (I’d estimate 90%+) that you’ll never see them again. That sucks! It shouldn’t be the case, but we have busy lives and the Internet’s filled with animated gifs of cats.

In order to pull back some of these would-be fans, I highly recommend creating an RSS feed using Feedburner and putting visible buttons on the sidebar, top or bottom of your blog posts encouraging those who enjoy your content to sign up (either via feed, or via email, both of which are popular options).

RSS Feeds with Feedburner

If you’re using WordPress, there’s some easy plugins for this, too.

Once you’ve set things up, visit every few weeks and check on your subscribers – are they clicking on posts? If so, which ones? Learning what plays well for those who subscribe to your content can help make you a better blogger, and earn more visits from RSS, too.

#15 – Attend and Host Events

Despite the immense power of the web to connect us all regardless of geography, in-person meetings are still remarkably useful for bloggers seeking to grow their traffic and influence. The people you meet and connect with in real-world settings are far more likely to naturally lead to discussions about your blog and ways you can help each other. This yields guest posts, links, tweets, shares, blogroll inclusion and general business development like nothing else.

Lanyrd Suggested Events

I’m a big advocate of Lanyrd, an event directory service that connects with your social networks to see who among your contacts will be at which events in which geographies. This can be phenomenally useful for identifying which meetups, conferences or gatherings are worth attending (and who you can carpool with).

The founder of Lanyrd also contributed this great answer on Quora about other search engines/directories for events (which makes me like them even more).

#16 – Use Your Email Connections (and Signature) to Promote Your Blog

As a blogger, you’re likely to be sending a lot of email out to others who use the web and have the power to help spread your work. Make sure you’re not ignoring email as a channel, one-to-one though it may be. When given an opportunity in a conversation that’s relevant, feel free to bring up your blog, a specific post or a topic you’ve written about. I find myself using blogging as a way to scalably answer questions – if I receive the same question many times, I’ll try to make a blog post that answers it so I can simply link to that in the future.

Email Footer Link

I also like to use my email signature to promote the content I share online. If I was really sharp, I’d do link tracking using a service like Bit.ly so I could see how many clicks email footers really earn. I suspect it’s not high, but it’s also not 0.

#17 – Survey Your Readers

Web surveys are easy to run and often produce high engagement and great topics for conversation. If there’s a subject or discussion that’s particularly contested, or where you suspect showing the distribution of beliefs, usage or opinions can be revealing, check out a tool like SurveyMonkey (they have a small free version) orPollDaddy. Google Docs also offers a survey tool that’s totally free, but not yet great in my view.

#18 – Add Value to a Popular Conversation

Numerous niches in the blogosphere have a few “big sites” where key issues arise, get discussed and spawn conversations on other blogs and sites. Getting into the fray can be a great way to present your point-of-view, earn attention from those interested in the discussion and potentially get links and traffic from the industry leaders as part of the process.

You can see me trying this out with Fred Wilson’s AVC blog last year (an incredibly popular and well-respected blog in the VC world). Fred wrote a post about Marketing that I disagreed with strongly and publiclyand a day later, he wrote a follow-up where he included a graphic I made AND a link to my post.

If you’re seeking sources to find these “popular conversations,” Alltop, Topsy, Techmeme (in the tech world) and their sister sites MediaGazer, Memeorandum and WeSmirch, as well as PopURLs can all be useful.

#19 – Aggregate the Best of Your Niche

Bloggers, publishers and site owners of every variety in the web world love and hate to be compared and ranked against one another. It incites endless intrigue, discussion, methodology arguments and competitive behavior – but, it’s amazing for earning attention. When a blogger publishes a list of “the best X” or “the top X” in their field, most everyone who’s ranked highly praises the list, shares it and links to it. Here’s an example from the world of marketing itself:

AdAge Power 150

That’s a screenshot of the AdAge Power 150, a list that’s been maintained for years in the marketing world and receives an endless amount of discussion by those listed (and not listed). For example, why is SEOmoz’s Twitter score only a “13” when we have so many more followers, interactions and retweets than many of those with higher scores? Who knows. But I know it’s good for AdAge. 🙂

Now, obviously, I would encourage anyone building something like this to be as transparent, accurate and authentic as possible. A high quality resource that lists a “best and brightest” in your niche – be they blogs, Twitter accounts, Facebook pages, individual posts, people, conferences or whatever else you can think to rank – is an excellent piece of content for earning traffic and becoming a known quantity in your field.

Oh, and once you do produce it – make sure to let those featured know they’ve been listed. Tweeting at them with a link is a good way to do this, but if you have email addresses, by all means, reach out. It can often be the start of a great relationship!

#20 – Connect Your Web Profiles and Content to Your Blog

Many of you likely have profiles on services like YouTube, Slideshare, Yahoo!, DeviantArt and dozens of other social and Web 1.0 sites. You might be uploading content to Flickr, to Facebook, to Picasa or even something more esoteric like Prezi. Whatever you’re producing on the web and wherever you’re doing it, tie it back to your blog.

Including your blog’s link on your actual profile pages is among the most obvious, but it’s also incredibly valuable. On any service where interaction takes place, those interested in who you are and what you have to share will follow those links, and if they lead back to your blog, they become opportunities for capturing a loyal visitor or earning a share (or both!). But don’t just do this with profiles – do it with content, too! If you’ve created a video for YouTube, make your blog’s URL appear at the start or end of the video. Include it in the description of the video and on the uploading profile’s page. If you’re sharing photos on any of the dozens of photo services, use a watermark or even just some text with your domain name so interested users can find you.

If you’re having trouble finding and updating all those old profiles (or figuring out where you might want to create/share some new ones), KnowEm is a great tool for discovering your own profiles (by searching for your name or pseudonyms you’ve used) and claiming profiles on sites you may not yet have participated in.

I’d also strongly recommend leveraging Google’s relatively new protocol for rel=author. AJ Kohn wrote a great post on how to set it up here, and Yoast has another good one on building it into WordPress sites. The benefit for bloggers who do build large enough audiences to gain Google’s trust is earning your profile photo next to all the content you author – a powerful markup advantage that likely drives extra clicks from the search results and creates great, memorable branding, too.

#21 – Uncover the Links of Your Fellow Bloggers (and Nab ’em!)

If other blogs in your niche have earned references from sites around the web, there’s a decent chance that they’ll link to you as well. Conducting competitive link research can also show you what content from your competition has performed well and the strategies they may be using to market their work. To uncover these links, you’ll need to use some tools.

OpenSiteExplorer is my favorite, but I’m biased (it’s made by Moz). However, it is free to use – if you create a registered account here, you can get unlimited use of the tool showing up to 1,000 links per page or site in perpetuity.

OpenSiteExplorer from Moz

There are other good tools for link research as well, including Blekko, Majestic, Ahrefs and, I’ve heard that in the near-future, SearchMetrics.

Finding a link is great, but it’s through the exhaustive research of looking through dozens or hundreds that you can identify patterns and strategies. You’re also likely to find a lot of guest blogging opportunities and other chances for outreach. If you maintain a great persona and brand in your niche, your ability to earn these will rise dramatically.

Bonus #22 – Be Consistent and Don’t Give Up

If there’s one piece of advice I wish I could share with every blogger, it’s this:

Why Bloggers Give Up Traffic Graph

The above image comes from Everywhereist’s analytics. Geraldine could have given up 18 months into her daily blogging. After all, she was putting in 3-5 hours each day writing content, taking photos, visiting sites, coming up with topics, trying to guest blog and grow her Twitter followers and never doing any SEO (don’t ask, it’s a running joke between us). And then, almost two years after her blog began, and more than 500 posts in, things finally got going. She got some nice guest blogging gigs, had some posts of hers go “hot” in the social sphere, earned mentions on some bigger sites, then got really big press from Time’s Best Blogs of 2011.

I’d guess there’s hundreds of new bloggers on the web each day who have all the opportunity Geraldine had, but after months (maybe only weeks) of slogging away, they give up.

When I started the SEOmoz blog in 2004, I had some advantages (mostly a good deal of marketing and SEO knowledge), but it was nearly 2 years before the blog could be called anything like a success. Earning traffic isn’t rocket science, but it does take time, perseverance and consistency. Don’t give up. Stick to your schedule. Remember that everyone has a few posts that suck, and it’s only by writing and publishing those sucky posts that you get into the habit necessary to eventually transform your blog into something remarkable.

Good luck !!!

Source: SEOmoz.org

Advertisements

Web Site Migration Guide – 2012 Top Tips For SEOs

Site migrations occur now and again for a various reasons but arguably are one of those areas many SEOs and site owners alike do not feel very comfortable with. Typically, site owners want to know in advance what the impact would be, often asking for information like potential traffic loss, or even revenue loss. On the other hand, SEOs need to make sure they follow best practice and avoid common pitfalls in order to make sure traffic loss will be kept to a minimum.

Disclaimer: The suggested site migration process isn’t exhaustive and certainly there are several alternative or complimentary activities, depending on the size of the web site as well as the nature of the undertaken migration. I hope that despite its length, the post will be useful to SEOs and web masters alike.

Phase 1: Establishing Objectives, Effort & Process

This is where the whole migration plan will be established taking into account the main objectives, time constrains, effort, and available resources. This phase is fundamental because if essential business objectives or required resources fail to get appropriately defined, problems may arise in the following phases. Therefore, a considerable amount of time and effort needs to be allocated in this stage.

1.1 Agree on the objectives

This is necessary because it will allow for success to be measured at a later stage on the agreed objectives. Typical objectives include:

  • Minimum traffic loss
  • Minimum ranking drops
  • Key rankings maintenance
  • Head traffic maintenance
  • All the above

1.2 Estimate time and effort

It is really important to have enough time in your hands, otherwise you may have to work day and night to recover those great rankings that have plummeted. Therefore, it is important to make sure that the site owners understand the challenges and the risks. Once they understand that they, it is more likely they will happily allocate the necessary time for a thorough migration.

1.3 Be honest (…and confident)

Every site migration is different. Hence previous success does not guarantee that the forthcoming migration will also be successful. It is important to make your client aware that search engines do not provide any detailed or step-by-step documentation on this topic, as otherwise they would expose their algorithms. Therefore, best practice is followed based on own and other people’s experiences. Being confident is important because clients tend to respect more an expert’s authoritative opinion. This is also important because it can impact on how much the client will trust and follow the SEO’s suggestions and recommendations. Be careful not to overdo it though, because if things later go wrong there will be no excuses.

1.4 Devise a thorough migration process

Although there are some general guidelines, the cornerstone is to devise a flawless process. That needs to take into consideration:

  • Legacy site architecture
  • New Site architecture
  • Technical limitations of both platforms

1.5 Communicate the migration plan

Once the migration process has been established it needs to be communicated to the site owner as well as to those that will implement the recommendations, usually a web development team. Each part needs to understand what they are expected to do as there is no space for mistakes, and misunderstandings could be catastrophic.

Most development agencies tend to underestimate site migrations simpl because they focus almost exclusively on getting the new site up and running. Often, they do not allocate the necessary resources required to implement and test the URL redirects from the old to the new site. It is the SEO’s responsibility to make them realise the amount of work involved, as well as strongly request the new site to move first on a test server (staging environment) so implementation can be tested in advance. No matter how well you may have planned the migration steps, some extra allocated time would always be useful as things do not always go as planned.

In order for a website migration to be successful, all involved parts need to collaborate in a timely manner merely because certain actions need to be taken at certain times. If things do not seem to go the desired way, just explain the risks ranging from ranking drops to potential revenue loss. This is certainly something no site owner wants to hear about, therefore play it as your last card and things are very likely to turn around.

1.6 Find the ideal time

No matter how proactive and organised you are, things can always go wrong. Therefore, the migration shouldn’t take place during busy times for the business or when time or resources are too tight. If you’re migrating a retail site, you shouldn’t be taking any risks a couple of months before Christmas. Wait until January when things get really quiet. If the site falls into the travel sector, you should avoid the spring and summer months as this is when most traffic and revenue is being generated. All that needs to be communicated to the client so they make an ideal business decision. A rushed migration is not a good idea, thus if there isn’t enough time to fit everything in, better (try to) postpone it for a later time.

Phase 2: Actions On The Legacy Site

There are several types of site migrations depending on what exactly changes, which usually falls under one or more of the following elements:

  • Hosting / IP Address
  • Domain name
  • URL structure
  • Site Architecture
  • Content
  • Design

The most challenging site migrations involve changes in most (or all) the above elements. However, for the purposes of this post we will only look at one of the most common and complicated cases, where a web site has undergone a radical redesign resulting in URL, site architecture and content changes. In case the hosting environment is going to change the new hosting location needs to be checked for potential issues.Whoishostingthis and Spy On Web can provide some really useful information. Attention needs to be paid also on the geographic location of the host. If that is going to change, you may need to assess the advantages/disadvantages and decide whether there is a real need for that. Moving a .co.uk web site from a UK-based server to a US one wouldn’t make much sense from a performance point of view.

In case the domain name is changing you may need to consider:

  • Does the previous/new domain contain more/less keywords?
  • Are both domains on the same ccTLD? Would changing that affect rankings?

2.1: Crawl the legacy site

Using a crawler application (e.g. Xenu Link SleuthScreaming FrogIntegrity for Mac) crawl the legacy site making sure that redirects are being identified and reported. This is important in order to avoid redirect chains later. My favourite crawling app is Xenu Link Sleuth because it is very simple to set up and does a seamless job. All crawled URLs need to be exported because they will be processed in Excel later. The following Xenu configuration is recommended because:

  • The number of parallel threads is very low to avoid time outs
  • The high maximum depth value allows for a deep crawl of the site
  • Existing redirections will be captured and reported

Custom settings for site crawling with Xenu Link Sleuth

2.2 Export top pages

Exporting all URLs that have received inbound links is more than vital. This is where the largest part of the site’s link juice is to be found, or in other words, the site’s ability to rank well in the SERPs. What you do with the link juice is another question, but you certainly need to keep it into one place (file).

Open site explorer

Open Site Explorer offers a great deal of information about a site’s top pages such as:

  • Page Authority (PA)
  • Linking Root Domains
  • Social Signals (Facebook likes, Tweets etc.)

In the following screenshot, a few, powerful 404 pages have been detected which ideally should be 301 redirected to a relevant page on the site.

Majestic SEO

Because Open Site Explorer may haven’t crawled/discovered some recent pages, it is always worth carrying out the same exercise using Majestic SEO, either on the whole domain or the www subdomain, depending on what exactly is being migrated. Pay attention to ACRank values, pages with higher ACRank values are the most juiciest ones. Downloading a CSV file with all that data is strongly recommended.

Webmaster tools

In case you don’t have a subscription to Open Site Explorer or Majestic SEO you could use Google’s Web Master Tools. Under Your Site on the Web -> Links to your site you will find Your Most Linked Content. Click on ‘More’ and Download the whole table into a CSV file. In terms of volume, WMT data aren’t anywhere near OSE or Majestic SEO but it is better than nothing. There are several other paid or free backlinks information services that could be used to add more depth into this activity.

Google analytics

Exporting all URLs that received at least one visit over the last 12 months through Google Analytics is an alternative way to pick up a big set of valuable indexed pages. If not 100% sure about how to do that, readthis post Rand wrote a while ago.

Indexed pages in Google

Scrapping the top 500 or top 1000 indexed pages in Google for the legacy site may seem like an odd task but it does have its benefits. Using Scrapebox or the scraper extension for Chrome perform a Google search for site:www.yoursite.com and scrape the top indexed URLs. This step may seem odd but it can identify:

  • 404 pages that are still indexed by Google
  • URLs that weren’t harvested in the previous steps

Again, save all these URLs in another spreadsheet.

2.3 Export 404 pages

Site migrations are great opportunities to tide things up and do some good housekeeping work. Especially with big sites, there is enormous potential to put things in order again; otherwise hundreds or even thousands of 404 pages will be reported again once the new site goes live. Some of those 404 pages may have quality links pointing to them.

These can be exported directly from Webmaster Tools under Diagnostics->Crawl Errors. Simply download the entire table as a CSV file. OSE also reports 404 pages, so exporting them may also be worthwhile. Using the SEO Moz Free API with Excel, we can figure out which of those 404 pages are worth redirecting based on metrics such as high PA, DA, mozRank and number of external links/root domains. Figuring out where to redirect each of these 404 pages can be tricky, as ideally each URL should be redirected to the most relevant page. Sometimes, this is can be “guessed” by looking for keywords in the URL. In cases that it is not possible, it is worth sending an email to the development team or the web master of the site, as they may be able to assist further.

2.4 Measure site performance

This step is necessary when there is an environment or platform change. It is often the case, that a new CMS although does a great job in terms of managing the site’s content, it does affect site performance in a negative way. Therefore, it is crucial to make some measurements before the legacy site gets switched off. If site performance deteriorates, crawling may get affected which could then affect indexation. With some evidence in place, it will be much easier building up a case later, if necessary. Although there are several tools, Pingdom seems to be a reliable one.

The most interesting stuff appears on the summary info box as well as on the Page Analysis Tab. Exporting the data, or even just getting a screenshot of the page could be valuable later. It would be worth running a performance test on some of the most typical pages e.g. a category page, a product page as well as the homepage.

Pingdom Tools Summary

Keep a record of typical loading times as well as the page size. If loading times increase whilst the size of the page remains is the same, something must have gone wrong.

Pingdom Page Analysis Tab

Running a Web Page Test would also be wise so site performance data are cross-referenced across two services just to make sure the results are consistent.

The same exercises should be repeated once the new site is on the test server as well as when it finally goes live. Any serious performance issues need to be reported back to the client so they get resolved.

2.5 Measure rankings

This step should ideally take place just before the new site goes live. Saving a detailed rankings report, which contains as many keywords as possible, is very important so it can be used as a benchmark for later comparisons. Apart from current positions it would be wise to keep a record of the ranking URLs too. Measuring rankings can be tricky though, and a reliable method needs to be followed. Chrome’s Google Global extension and SEO SERP are two handy extensions for checking a few core keywords. With the former, you can see how rankings appear in different countries and cities, whilst the latter is quicker and does keep historical records. For a large number of keywords, proprietary or paid automated services should be used in order to save time. Some of the most popular commercial rank checkers include Advanced Web RankingWeb CEO and SEO Powersuite to name a few.

With Google Global extension for Chrome you can monitor how results appear in different countries, regions and cities.

Phase 3: URL Redirect Mapping

During this phase, pages (URLs) of the legacy site need to be mapped to pages (URLs) on the new site. For those pages where the URL remains the same there is nothing to worry about, provided that the amount of content on the new page hasn’t been significantly changed or reduced. This activity requires a great deal of attention, otherwise things can go terribly wrong. Depending on the size of the site, the URL mapping process can be done manually, which can be very time consuming, or automation can often be introduced to speed things up. However, saving up on time should not affect the quality of the work.

Even though there isn’t any magic recipe, the main principle is that ALL unique, useful or authoritative pages (URLs) of the legacy site should redirect to pages with the same or very relevant content on the new site, using 301 redirects. Always make sure that redirects are implemented using 301 redirects (permanent ) that pass most link equity from the old to the new page (site). The use of 302 (temporary) redirects IS NOT recommended because search engines treat them inconsistently and in most cases do not pass link equity, often resulting in drastic ranking drops.

It’s worth stressing that pages with high traffic need extra attention but the bottom line is that every URL matters. By redirecting only a percentage of the URLs of the legacy site you may jeopardise the new domain’s authority as a whole, because it may appear to search engines as a weaker domain in terms of link equity.

URL Mapping Process (Step-by-step)

  1. Drop all legacy URLs, which were identified and saved in the CSV files earlier (during phase 2), into a new spreadsheet (let’s call it SpreadSheet1).
  2. Remove all duplicate URLs using Excel.
  3. Populate the page titles using the SEO for excel tool.
  4. Using SEO for Excel, check the server response headers. All 404 pages should be kept into a different tab so all remaining URLs are those with a 200 server response.
  5. In a new Excel spreadsheet (let’s call it SpreadSheet2) drop all URLs of the new site (using a crawler application).
  6. Pull in the page titles for all these URLs as in step 3.
  7. Using the VLOOKUP Excel function, match URLs between the two spreadsheets
  8. Matched URLs (if any) should be removed from SpreadSheet1 as they already exist on the new site and do not need to be redirected.
  9. The 404 pages which were moved into a separate worksheet in step 4, need to be evaluated for potential link juice. There are several ways to make this assessment but the most reliable ones are:
    • SEO Moz API (e.g. using the handy Excel extension SEO Moz Free API)
    • Majestic SEO API
  10. Depending on how many “juicy” URLs were identified in the previous step, a reasonable part of them needs to be added into Spreadsheet1.
  11. Ideally, all remaining URLs in SpreadSheet1 need to be 301 redirected. A new column (e.g. Destination URLs) needs to be added in SpreadSheet 1 and populated with URLs from the new site. Depending on the number of URLs to be mapped this can be done:
    • Manually – By looking at the content of the old URL, the equivalent page on the new site needs to be found so the URL gets added in the Destination URLs column.
      1. If no identical page can be found, just chose the most relevant one (e.g. similar product page, parent page etc.)
      2. If the page has no content pay attention to its page title (if known or still cached by Google) or/and URL for keywords which should give you a clue about its previous content. Then, try to find a relevant page on the new site; that would be the mapping URL.
      3. If there is no content, no keywords in the URL and no descriptive page title, try to find out from the site owners what those URLs used to be about.
    • Automatically – By writing a script that maps URLs based on page titles, meta description or URL patterns matching.
  12. Search for duplicate entries again in the ‘old URLs’ row and remove the entire row.
  13. Where patterns can be identified, pattern matching rules using regular expressions are always more preferable because that would reduce the web server’s load. Ending up with thousands one-to-one redirects is not ideal and should be avoided, especially if there is a better solution.

Phase 4: New Site On Test Server

Because human errors do occur, testing that everything has gone as planned is extremely important. Unfortunately, because the migration responsibility falls mainly on the shoulders of the SEO, several checks need to be carried out.

4.1 Block crawler access

The first and foremost thing to do is to make sure that the test environment is not accessible to any search engine crawler. There are several ways to achieve that but some are better than others.

  • Block access in robots.txt (not recommended)

This is not recommended because Google would still crawl the site and possibly index the URLs (but not the content). This implementation also runs the risk of going live if all files on the test server are going to be mirrored on the live one. The following two lines of code will restrict search engines access to the website:

User-Agent: *
Disallow: /

  • Add a meta robots noindex to all pages (not recommended)

This is recommended by Google as a way to entirely prevent a page’s contents from being indexed.

<html>
<head>
<title>…</title>
<meta name=”robots” content=”noindex”>
</head>

The main reason this is not recommended is because it runs the risk to be pushed to the live environment and remove all pages out of the search engines’ index. Unfortunately, web developers’ focus is on other things when a new site goes live and by the time you notice such a mistake, it may be a bit late. In many cases, removing the noindex after the site has gone live can take several days, or even weeks depending on how quickly technical issues are being resolved within an organisation. Usually, the bigger the business, the longer it takes as several people would be involved.
  • Password-protect the test environment (recommended)

This is a very efficient solution but it may cause some issues. Trying to crawl a password protected website is a challenge and not many crawler applications have the ability to achieve this. Xenu Links Sleuth can crawl password-protected sites.

  • Allow access to certain IP addresses (recommended)

This way, the web server allows access to specific external IP addresses e.g. that of the SEO agency. Access to search engine crawlers is restricted and there are no indexation risks.

4.2 Prepare a Robots.txt file

That could be a fairly basic one, allowing access to all crawlers and indicating the path to the XML sitemap such as:

User-agent: *
Allow: /
Sitemap: http://www.yoursite.com/sitemap.xml

However, certain parts of the site could be excluded, particularly if the legacy site has duplicate content issues. For instance, internal search, pagination, or faceted navigation are often generating multiple URLs with the same content. This is a great opportunity to deal with legacy issues, so search engine crawling of the website can become more efficient. Saving up on crawl bandwidth will allow search engine to crawl only those URLs which are worthy of being indexed. That means that deep pages would stand a better chance to be found and rank quicker.

4.3 Prepare XML sitemap(s)

Using your favourite tool, generate an XML sitemap, ideally containing HTML pages only. Xenu again does a great job because it easily generate XML sitemaps containing only HTML pages. For large web sites, generating multiple XML sitemaps for the different parts of the site would be a much better option so indexation issues could be easier identified later. The XML sitemap(s) should then be tested again for broken links before the site goes live.

Source: blogstorm.co.uk

Google Webmaster Tools allow users to test XML sitemaps before they get submitted. This is something worth doing in order to identify errors.

4.4 Prepare HTML sitemap

Even though the XML sitemap alone should be enough to let search engines know about the URLs on the new site, implementing an HTML sitemap could help search engine spiders make a deep crawl of the site. The sooner the new URLs get crawled, the better. Again, check the HTML sitemap for broken links usingCheck My Links (Chrome) or Simple Links Counter (Firefox).

4.5 Fix broken links

Run the crawler application again as more internal/external broken links, (never trust a) 302 redirects, or other issues may get detected.

4.6 Check 301 redirects

This is the most important step of this phase and it may need to be repeated more than once. All URLs to be redirected should be checked. If you do not have direct access to the server one way to check the 301 redirects is by using Xenu’s Check URL List feature. Alternatively, Screaming Frog’s list view can be used in a similar manner. These applications will report whether 301s are in place or not, but not if the destination URL is the correct one. That could only be done in Excel using the VLOOKUP function.

4.7 Optimise redirects

If time allows, the list of redirects needs to be optimised for optimal performance. Because the redirects are loaded into the web server’s memory when the server starts, a high number of redirects can have a negative impact on performance. Similarly, each time a page request is being made, the web server will compare that against the redirects list. Thus, the shorter the list, the quicker the web server will respond. Even though such performance issues can be compensated by increasing the web server’s resources, it is always best practice to work out pattern matching rules using regular expressions, which can cover hundreds or even thousands of possible requests.

4.8 Resolve duplicate content issues

Duplicate content issues should be identified and resolved as early as possible. A few common cases of duplicate content may occur, regardless of what was happening previously on the legacy web site. URL normalisation at this stage will allow for optimal site crawling, as search engines will come across as many unique pages as possible. Such cases include:

  • Directories with and without a trailing slash (e.g. this URL should redirect to that).
  • Default directory indexes (e.g. this URL should redirect to that).
  • Http and https URLs.
  • Case in URLs. (e.g. this URL should redirect to that, or just return the 404 error page like this as opposed to that, which is the canonical one).
  • URLs on different host domains e.g. http://www.examplesite.com and examplesite.com (e.g. this URL should redirect to that).
  • Internal search generating duplicate pages under different URLs.
  • URLs with added parameters after the ? character.

In all the above examples, poor URL normalisation results in duplicate pages that will have a negative impact on:

  • Crawl bandwidth (search engine crawlers will be crawling redundant pages).
  • Indexation (as search engines try to remove duplicate pages from their indexes).
  • Link equity (as it will be diluted amongst the duplicate pages).

4.9 Site & Robots.txt monitoring

Make sure the URL of the new site is monitored using a service like Uptime Robot. Each time the site is down for whatever reason, Uptime Robot will be notified by email, Twitter DM, or even SMS. Another useful service to set up a robots.txt monitoring service such as Robotto. Each time the robots.txt file gets updated you get notified, which is really handy.

Uptime Robot logs all server up/down time events

Phase 5: New Site Goes Live

Finally the new site has gone live. Depending on the authority, link equity and size of the site Google should start crawling the site fairly quickly. However, do not expect the SERPs to be updated instantly. The new pages and URLs will be updated in the SERPs over a period of time, which typically can take from two to four weeks. For pages that seem to take ages to get indexed it may be worth using a ping service like Pingler.

5.1 Notify Google via Webmaster Tools

If the domain name changes, you need to notify Google via the Webmaster Tools account of the old site, as soon as the new site goes live. In order to do that, the new domain needs to be added and verified. If the domain name remains the same, Google will find its way to the new URLs sooner or later. That mainly depends on the domain authority of the site and how frequently Google visits it. It would also be a very good idea to upload the XML sitemap via Webmaster Tools so the indexation process can be monitored (see phase 6).

5.2 Manual checks

No matter how well everything appeared on the test server, several checks need to be carried out and running the crawler application again is the first thing to do. Pay attention for:

  • Anomalies in the robots.txt file
  • Meta robots noindex tags in the <head> section of the HTML source code
  • Meta robots nofollow tags in the source code
  • 302 redirects. 301 redirects should be used instead as 302s are treated inconsistently by search engines and do not pass link equity
  • Check Webmaster Tools for errors messages
  • Check XML sitemap for errors (e.g. broken links, internal 301s)
  • Check HTML sitemap for similar errors (e.g. using Simple Links Counter or Check My Links)
  • Missing or not properly migrated page titles
  • Missing or not properly migrated meta descriptions
  • Make sure that the 404 page returns a 404 server response
  • Make sure the analytics tracking code is present on all pages and is tracking correctly
  • Measure new site performance and compare it with that of the previous site

Using Httpfox, a 302 redirect has been detected

5.3 Monitor crawl errors

Google Webmaster tools, Bing Webmaster Tools and Yandex Webmaster all report crawl errors and is certainly worth checking often during the first days or even weeks. Pay attention to reported errors and dates and always try figure out what has been caused by the new site or the legacy one.

5.4 Update most valuable inbound links

From the CSV files created in step 3.2, figure out which are the most valuable inbound links (using Majestic or OSE data) and then try to contact the web masters of those sites, requesting a URL update. Direct links pass more value than 301 redirects and this time-consuming task will eventually pay back. On the new site, check the inbound links and top pages tabs of OSE and try to identify new opportunities such as:

  1. Links from high authority sites which are being redirected.
  2. High authority 404 pages which should be redirected so the link juice flows to the site.

In the following example, followed and 301 external links have been downloaded in a CSV file.

Pay attention to the ‘301’ columns for cells with the Yes value. Trying to update as many of these URLs as possible so the point directly to the site would pass more link equity to the site:

Identify the most authoritative links and contact website owners to update them so they point to the new URL

5.5 Build fresh links

Generating new, fresh links to the homepage, category and sub-category pages is a good idea because:

  1. With 301 redirects some link juice may get lost, thus new links can compensate for that.
  2. They can act as extra paths for search engine spiders to crawl the site.

5.6 Eliminate internal 301 redirects

Although Web masters are quite keen on implementing 301 redirects, they often do not show the same interest updating the onsite URLs so internal redirects do not occur. Depending on the volume and frequency of internal 301 redirects, some link juice may evaporate, whilst the redirects will unnecessarily add an extra load to the web server. Again, in order to detect internal 301 redirects, crawling the site would be handy.

Phase 6: Measure Impact/Success

Once the new site gas finally gone live, the impact of all the previous hard work needs to be monitored. It may be a good idea monitoring rankings and indexation on a weekly basis but in general no conclusions should be made earlier than 3-4 weeks. No matter how good or bad rankings and traffic appear to be, you need to be patient. A deep crawl can take time, depending on the site’s size, architecture and internal linking. Things to be looking at:

  • Indexation. Submitted and indexed number of URLs reported by Webmaster Tools (see below)
  • Rankings. They usually fluxuate for 1-3 weeks and initially they may drop. Eventually, they should recover around the same positions they were previously (or just about).
  • Open site explorer metrics. Although they do not get updated daily, it is worth keeping an eye on reported figures for Domain Authority, Page Authority and MozRank on a monthly basis. Ideally, the figures should be as close as possible to those of the old site within a couple of months. If not, that is not a very good indication and you may have lost some link equity along the way.
  • Google cache. Check the timestamps of cached pages for different page types e.g. homepage, category pages, product pages.
  • Site performance in Webmaster Tools. This one may take a few weeks until it gets updated but it is very useful to know how Google perceives site performance before and after the migration. Any spikes that stand out need should alarm the web master and several suggestions can be made e.g. using Yslow and Page Speed in Firefox or Page Speed and Speed Tracer in Chrome.

Check site performance in Webmaster Tools for unusual post migration anomalies

Indexation of web pages, images and videos can be monitored in Google Webmaster Tools

Appendix: Site Migration & SEO Useful Tools

Some of the following tools would be very handy during the migration process, for different reasons.

Crawler applications

Xenu Link Sleuth (free)
Analog X Link Examiner (free)
Screaming Frog (paid)
Integrity (For MAC – free)

Scraper applications

Scraper Extension for Chrome
Scrapebox (paid)

Link Intelligence software

Open Site Explorer (free & paid)
Majestic SEO (free & paid)

HTTP Analysers

HTTP Fox (Firefox)
Live HTTP Headers (Firefox)

IP checkers

Show IP (Firefox)
WorldIP (Firefox)
Website IP (Chrome)

Link checkers

Simple Links Counter (Firefox)
Check My Links (Chrome)

Monitoring tools

Uptime Robot (monitors domains for downtime)
Robotto (monitors robots.txt)

Rank checkers

Google Global (Chrome)
SEO SERP (Chrome)
SEO Book Rank Checker (Firefox)

Site performance analysis

Yslow (Firefox)
Page Speed (for Firefox)
Page Speed (for Chrome)
|Speed Tracer (Chrome)

About the author

Modesto Siotos (@macmodi) works a Senior Natural Search Analyst for iCrossing UK, where he focuses on technical SEO issues, link tactics and content strategy. His move from web development into SEO was a trip with no return, and he is grateful to have worked with some SEO legends. Modesto is happy to share his experiences with others and writes regularly for a digital marketing blog.

Source: seomoz blog

6 Ways to Recover from Bad Links

It’s a story we hear too often: someone hires a bad SEO, that SEO builds a bunch of spammy links, he/she cashes their check, and then bam – penalty! Whether you got bad advice, “your friend” built those links, or you’ve got the guts to admit you did it yourself, undoing the damage isn’t easy. If you’ve sincerely repented, I’d like to offer you 6 ways to recover and hopefully get back on Google’s Nice list in time for the holidays.

This is a diagram of a theoretical situation that I’ll use throughout the post. Here’s a page that has tipped the balance and has too many bad (B) links – of course, each (B) and (G) could represent 100s or 1000s of links, and the 50/50 split is just for the visual:

Hypothetical link graph

Be Sure It’s Links

Before you do anything radical (one of these solutions is last-ditch), make sure it’s bad links that got you into trouble. Separating out a link-based penalty from a devaluation, technical issue, Panda “penalty”, etc. isn’t easy. I created a 10 minute audit a while back, but that’s only the tip of the iceberg. In most cases, Google will only devalue bad links, essentially turning down the volume knob on their ability to pass link-juice. Here are some other potential culprits:

  1. You’ve got severe down-time or latency issues.
  2. You’re blocking your site (Robots.txt, Meta Robots, etc.).
  3. You’ve set up bad canonicals or redirects.
  4. Your site has massive duplicate content.
  5. You’ve been hacked or hit with malware.

Diagnosing these issues is beyond the scope of this post, but just make sure the links are the problem before you start taking a machete to your site. Let’s assume you’ve done your homework, though, and you know you’ve got link problems…

1. Wait It Out

In some cases, you could just wait it out. Let’s say, for example, that someone launched an SQL injection attack on multiple sites, pointing 1000s of spammy links at you. In many cases, those links will be quickly removed by webmasters, and/or Google will spot the problem. If it’s obvious the links aren’t your fault, Google will often resolve it (if not, see #5).

Even if the links are your responsibility (whether you built them or hired someone who did), links tend to devalue over time. If the problem isn’t too severe and if the penalty is algorithmic, a small percentage of bad links falling off the link graph could tip the balance back in your favor:

Link graph with bad links removed

That’s not to say that old links have no power, but just that low-value links naturally fall off the link-graph over time. For example, if someone builds a ton of spammy blog comment links to your site, those blog posts will eventually be archived and may even drop out of the index. That cuts both ways – if those links are harming you, their ability to harm will fade over time, too.

2. Cut the Links

Unfortunately, you can’t usually afford to wait. So, why not just remove the bad links?

Link graph with all bad links cut

Well, that’s the obvious solution, but there are two major, practical issues:

(a) What if you can’t?

This is the usual problem. In many cases, you won’t have control over the sites in question or won’t have login credentials (because your SEO didn’t give them to you). You could contact the webmasters, but if you’re talking about 100s of bad links, that’s just not practical. The kind of site that’s easy to spam isn’t typically the kind of site that’s going to hand remove a link, either.

(b) Which links do you cut?

If you thought (a) was annoying, there’s an even bigger problem. What if some of those bad links are actuallyhelping you? Google penalizes links based on patterns, in most cases, and it’s the behavior as a whole that got you into trouble. That doesn’t mean that every spammy link is hurting you. Unfortunately, separating the bad from the merely suspicious is incredibly tough.

For the rest of this post, let’s assume that you’re primarily dealing with (a) – you have a pretty good idea which links are the worst offenders, but you just can’t get access to remove them. Sadly, there’s no way to surgically remove the link from the receiving end (this is actually a bit of an obsession of mine), but you do have a couple of options.

3. Cut the Page

If the links are all (or mostly) targeted at deep, low-value pages, you could pull a disappearing act:

Link graph with page removed

In most cases, you’ll need to remove the page completely (and return a 404). This can neuter the links at the target. In some cases, if the penalty isn’t too severe, you may be able to 301-redirect the page to another, relevant page and shake the bad links loose.

If all of your bad links are hitting a deep page, count yourself lucky. In most cases, the majority of bad links are targeted at a site’s home-page (like the majority of any links), so the situation gets a bit uglier.

4. Build Good Links

In some sense, this is the active version of #2. Instead of waiting for bad links to fade, build up more good links to tip the balance back in your favor:

Link graph with good links added

By “good”, I mean relevant, high-authority links – if your link profile is borderline, focus on quality over quantity for a while. Rand has a great post on link valuation that I highly recommend – it’s not nearly as simple as we sometimes try to make it.

This approach is for cases where you may be on the border of a penalty or the penalty isn’t very severe. Fair warning: it will take time. If you can’t afford that time, have been hit hard, or suspect a manual penalty, you may have to resort to one of the next two options…

5. Appeal to Google

If you’ve done your best to address the bad links, but either hit a wall or don’t see your rankings improve, you may have to appeal to Google directly. Specifically, this means filing a reconsideration request through Google Webmaster Tools. Rhea at Outspoken had an excellent post recently on how to file for reconsideration, but a couple of key points:

  • Be honest, specific and detailed.
  • Show that you’ve made an effort.
  • Act like you mean it (better yet: mean it).

If Google determines that your situation is relevant for reconsideration (a process which is probably semi-automated), then it’s going to fall into the hands of a Google employee. They have to review 1000s of these requests, so if you rant, provide no details, or don’t do your homework, they’ll toss your request and move on. No matter how wronged you may feel, suck it up and play nice.

6. Find a New Home

If all else fails, and you’ve really burned your home to the ground and salted the earth around it, you may have to move:

Link graph with site moved

Of course, you could just buy a new domain, move the site, and start over, but then you’ll lose all of your inbound links and off-page ranking factors, at least until you can rebuild some of them. The other option is to 301-redirect to a new domain. It’s not risk-free, but in many cases a site-to-site redirect does seem to neuter bad links. Of course, it will very likely also devalue some of your good links.

I’d recommend the 301-redirect if the bad links are old and spammy. In other words, if you engaged in low-value tactics in the past but have moved on, a 301 to a new domain may very well lift the penalty. If you’ve got a ton of paid links or you’ve obviously built an active link farm (that’s still in play), you may find the penalty comes back and all your efforts were pointless.

A Modest Proposal

I’d like to end this by making a suggestion to Google. Sometimes, people inherit a bad situation (like a former SEO’s black-hat tactics) or are targeted with bad links maliciously. Currently, there is no mechanism to remove a link from the target side. If you point a link at me, I can’t say: “No, I don’t want it.” Search engines understand this and adjust for it to a point, but I really believe that there should be an equivalent of nofollow for the receiving end of a link.

Of course, a link-based attribute is impossible from the receiving end, and a page-based directive (like Meta Robots) is probably impractical. My proposal is to create a new Robots.txt directive called “Disconnect”. I imagine it looking something like this:

Disconnect: www.badsite.com

Essentially, this would tell search engines to block any links to the target site coming from “www.badsite.com” and not consider them as part of the link-graph. I’d also recommend a wild-card version to cover all sub-domains:

Disconnect: *.badsite.com

Is this computationally possible, given the way Google and Bing process the link-graph? I honestly don’t know. I believe, though, that the Robots.txt level would probably be the easiest to implement and would cover most cases I’ve encountered.

While I recognize that Google and Bing treat bad links with wide latitude and recognize that site owners can’t fully control incoming links, I’ve seen too many cases at this point of people who have been harmed by links they don’t have control over (sometimes, through no fault of their own). If links are going to continue to be the primary currency of ranking (and that is debatable), then I think it’s time the search engines gave us a way to cut links from both ends.

Update (December 15th)

From the comments, I wanted to clarify a couple of things regarding the “Disconnect” directive. First off, this is NOT an existing Robots.txt option. This is just my suggestion (apparently, a few people got the wrong idea). Second, I really did intend this as more of a platform for discussion. I don’t believe Google or Bing are likely to support the change.

One common argument in the comments was that adding a “Disconnect” option would allow black-hats to game the system by placing risky links, knowing they could be easily cut. While this is a good point, theoretically, I don’t think it’s a big practical concern. The reality is that black-hats can already do this. It’s easy to create paid links, link farms, etc. that you control, and then cut them if you run into trouble. Some SEO firms have even built up spammy links to get a short-term boost, and then cut them before Google catches on (I think that was part of the JC Penney scheme, actually).

Almost by definition, the “Disconnect” directive (or any similar tool) would be more for people who can’t control the links. In some cases, these may be malicious links, but most of the time, it would be links that other people created on their behalf that they no longer have control over.

Cheers!!!

Search Engine Optimization for Beginners

 

You’ve finished your web design, uploaded your files, and set up your blog, but you’re still not getting as many visitors as you hoped for. What gives? Chances are you haven’t started working on one of the most important ways to market your site, Search Engine Optimization (SEO).

What SEO is?

Search Engine Optimization refers to the collection of techniques and practices that allow a site to get more traffic from search engines (Google, Yahoo, Microsoft). SEO can be divided into two main areas: off-page SEO (work that takes place separate from the website) and on-page SEO (website changes to make your website rank better). This tutorial will cover both areas in detail! Remember, a website is not fully optimized for search engines unless it employs both on and off-page SEO.

What SEO is Not?

SEO is not purchasing the number #1 sponsored link through Google Adwords and proclaiming that you have a #1 ranking on Google. Purchasing paid placements on search engines is a type of Search Engine Marketing (SEM), and is not covered in this tutorial.

SEO is not ranking #1 for your company’s name. If you’re reading this tutorial, you probably already know that ranking for popular terms is darn near impossible, but specific terms, such as a company name, is a freebie. The search engines usually are smart enough to award you that rank by default (unless you are being penalized).

Who Uses SEO?

If a website is currently ranked #10 on Google for the search phrase, “how to make egg rolls,” but wants to rise to #1, this websites needs to consider SEO. Because search engines have become more and more popular on the web, nearly anyone trying to get seen on the web can benefit from a little SEO loving. 🙂

Do Not Forget Social Media Marketing

I must say that social media marketing has a great impact on your online marketing campaign. It’s ‘word of mouth’ marketing where you have the opportunity to promote your website to millions of users. It greatly helps in branding your product. In fact, It helps you in making successful in current fast competition.

Wake Up SEOs, the New Google is Here

I must admit that lately Google is the cause of my headaches.

No, not just because it decided I was not going to be not provided with useful information about my sites. And neither because it is changing practically every tool I got used since my first days as an SEO (Google Analytics, Webmaster Tools, Gmail…). And, honestly, not only because it released a ravenous Panda.

No, the real question that is causing my headaches is: What the hell does Google want to go with all these changes?

Let me start quoting the definition of SEO Google gives in its Guidelines:

Search engine optimization is about putting your site’s best foot forward when it comes to visibility in search engines, but your ultimate consumers are your users, not search engines.

Technical SEO still matters, a lot!

If you want to put your site’s best foot forward and make it the most visible possible in search engines, then you have to be a master in technical SEO.

We all know that if we do not pay attention to the navigation architecture of our site, if we don’t care about the on-page optimization, if we mess up with the rel=”canonical” tag, the pagination and the faceted navigation of our web, and if we don’t pay attention to the internal content duplication, etc. etc., well, we are not going to go that far with Search.

Is all this obvious? Yes, it is. But people in our circle tend to pay attention just to the last bright shining object and forget what one of the basic pillars of our discipline is: make a site optimized to be visible in the search engines.

The next time you hear someone saying “Content is King” or “Social is the new link building”, snap her face and ask her when it was the last time she logged in Google Webmaster Tools.

Go fix your site, make it indexable and solve all the technical problems it may have. Just after done that, you can start doing all the rest.

User is king

Technical SEO still matters, but that does not mean that it is synonym of SEO. So, if you hear someone affirming it, please snap her face too.

No... content is not the only King. User is the King! Image by Jeff Gregory

User and useful have the same root: use. And a user finds useful a website when it offers an answer to her needs, and if its use is easy and fast..

From the point of view that Google has of User, that means that a site to rank:

  1. must be fast;
  2. must have useful content and related to what it pretends to be about;
  3. must be presented to Google so that it can understand the best it can what it is about.

The first point explains the emphasis Google gives to site speed, because it is really highly correlated to a better user experience.

The second is related to the quality of the content of a site, and it is substantially what Panda is all about. Panda, if we want to reduce it at its minimal terms, is the attempt by Google of cleaning its SERPs of any content it does not consider useful for the end users.

The third explains the Schema.org adoption and why Google (and the other Search Engines) are definitely moving to the Semantic Web: because it helps search engines organize the bazillion contents they index every second. And the most they understand really what is your content about, the better they will deliver it in the SERPs.

The link graph mapped

The decline of Link graph

We all know that just with on-site optimization we cannot win the SERPs war, and that we need links to our site to make it authoritative. But we all know how much the link graph can be gamed.

Even though we still have tons of reasons to complain with Google about the quality of SERPs, especially due to sites that ranks thanks to manipulative link building tactics, it is hard for me to believe that Google is doing nothing in order to counteract this situation. What I believe is that Google has decided to solve the problem not with patches but with a totally new kind of graph.

That does not mean that links are not needed anymore, not at all, as links related factors still represent (and will represent) a great portion of all the ranking factors, but other factors are now cooked in the ranking pot.

Be Social and become a trusted seed

In a Social-Caffeinated era, the faster way to understand if a content is popular is to check its “relative” popularity in the social media environment. I say “relative”, because not all contents are the same and if a meme needs many tweets, +1 and likes/share to be considered more popular than others, it is not so for more niche kind of contents. Combining social signals with the traditional link graph, Google can understand the real popularity of a page.

The problem, as many are saying since almost one year, is that it is quite easy to spam in Social Media.

The Facebook Social Graph from Silicon Angle

For this reason Google introduced the concepts of Author and Publisher and, even more important, Google linked them to the Google Profiles and is pushing Google Plus, which is not just another Social Media, but what Google aims to be in the future: a social search engine.

Rel=”author” and Rel=”publisher” are the solution Google is adopting in order to better control, within other things, the spam pollution of the SERPs.

If you are a blogger, you will be incentivized in marking your content with Author and link it to your G+ Profile, and as a Site, you are incentivized to create your G+ Business page and to promote it with a badge on you site that has the rel=”publisher” in its code.

Trusted seeds are not anymore only sites, but can be also persons (i.e.: Rand or Danny Sullivan) or social facets of an entity… so, the closer I am in the Social Graph to those persons//entity the more trusted I am to Google eyes.

The new Google graph

As we can see, Google is not trying to rely only on the link graph, as it is quite easy to game, but it is not simply adding the social signals to the link graph, because they too can be gamed. What Google is doing is creating and refining a new graph that see cooperating Link graph, Social graph and Trust graphand which is possibly harder to game. Because it can be gamed still, but – hopefully – needing so many efforts that it may become not-viable as a practice.

Wake up SEOs, the new Google is here

As a conclusion, let me borrow what Larry Page wrote on Google+ (bold is mine):

Our ultimate ambition is to transform the overall Google experience […] because we understand what you want and can deliver it instantly.

This means baking identity and sharing into all of our products so that we build a real relationship with our users. Sharing on the web will be like sharing in real life across all your stuff. You’ll have better, more relevant search results and ads.

Think about it this way … last quarter, we’ve shipped the +, and now we’re going to ship the Google part.

I think that it says it all and what we have lived a year now is explained clearly by the Larry Page words.

What can we do as SEOs? Evolve, because SEO is not dieing, but SEOs can if they don’t assume that winter – oops – the change of Google is coming.

The New SEO graph

 

How to Speed up Search Engine Indexing

It’s a common knowledge that nowadays users don’t only search for trusted sources of information but also for fresh content. That’s why the last couple of years, the Search engines have been working on how to speed up their indexing process. Few months ago, Google has announced the completion of their new indexing system called Caffeine which promises fresher results and faster indexation.

The truth is that comparing to the past, the indexing process has became much faster. Nevertheless lots of webmasters still face indexing problems either when they launch a new website or when they add new pages. In this article we will discuss 5 simple SEO techniques that can help you speed up the indexation of your website.

1. Add links on high traffic websites

The best thing you can do in such situations is to increase the number of links that point to your homepage or to the page that you want to index. The number of incoming links and the PageRank of the domain, affect directly both the total number of indexed pages of the website and the speed of indexation.

As a result by adding links from high traffic websites you can reduce the indexing time. This is because the more links a page receives, the greater the probabilities are to be indexed. So if you face indexing problems make sure you add your link in your blog, post a thread in a relevant forum, write press releases or articles that contain the link and submit them to several websites. Additionally social media can be handy tools in such situation, despite the fact that in most of the cases their links are nofollowed. Have in mind that even if the major search engines claim that they do not follow the nofollowed links, experiments have shown that not only they do follow them but also that they index the pages faster (Note that the fact that they follow them does not mean that they pass any link juice to them).

2. Use XML and HTML sitemaps

Theoretically Search Engines are able to extract the links of a page and follow them without needing your help. Nevertheless it is highly recommended to use XML or HTML sitemaps since it is proven that they can help the indexation process. After creating the XML sitemaps make sure you submit them to the Webmaster Consoles of the various search engines and include them in robots.txt. So make sure you keep your sitemaps up-to-date and resubmit them when you have major changes in your website.

3. Work on your Link Structure

As we saw in previous articles, link structure is extremely important for SEO because it can affect your rankings, the PageRank distribution and the indexation. Thus if you face indexing problems check your link structure and ensure that the not-indexed pages are linked properly from webpages that are as close as possible to the root (homepage). Also make sure that your site does not have duplicate content problems that could affect both the number of pages that get indexed and the average crawl period.

A good method to achieve the faster indexation of a new page is to add a link directly from your homepage. Finally if you want to increase the number of indexed pages, make sure you have a tree-like link structure in your website and that your important pages are no more than 3 clicks away from the home page (Three-click rule).

4. Change the crawl rate

Another way to decrease the indexing time in Google is to change the crawl rate from the Google Webmaster Tools Console. Setting the crawl rate to “faster” will allow Googlebot to crawl more pages but unfortunately it will also increase the generated traffic on your server. Of course since the maximum allowed crawl rate that you can set is roughly 1 request every 3-4 seconds (actually 0.5 requests per second + 2 seconds pause between requests), this should not cause serious problems for your server.

crawl-rate

5. Use the available tools

The major search engines provide various tools that can help you manage your website. Bing provides you with the Bing Toolbox, Google supports the Google Webmaster Tools and Yahoo offers the Yahoo Site Explorer. In all the above consoles you can manage the indexation settings of your website and your submitted sitemaps. Make sure that you use all of them and that you regularly monitor your websites for warnings and errors. Also resubmit or ping search engine sitemap services when you make a significant amount of changes on your website. A good tool that can help you speed up this pinging process is the Site Submitter, nevertheless it is highly recommended that you use also the official tools of every search engine.

If you follow all the above tips and you still face indexing problems then you should check whether your website isbanned from the search engines, if it is developed with search engine friendly techniques, whether you have enough domain authority to index the particular amount of pages or if you have made a serious SEO mistake (for example block the search engines by using robots.txt or meta-robots etc). A good way to detect such mistakes is to use the Web SEO Analysis tool which provides detailed diagnostics.  Finally most of the major search engines have special groups and forums where you can seek for help, so make sure you visit them and post your questions.

source: webseoanalytics.com

Search Engine Algorithm Basics

A good search engine does not attempt to return the pages that best match the input query. A good search engine tries to answer the underlying question. If you become aware of this you’ll understand why Google (and other search engines), use a complex algorithm to determine what results they should return. The factors in the algorithm consist of “hard factors” as the number of backlinks to a page and perhaps some social recommendations through likes and +1′ s. These are usually external influences. You also have the factors on the page itself. For this the way a page is build and various page elements play a role in the algorithm. But only by analyzing the on-site and off-site factors is it possible for Google to determine which pages will answer is the question behind the query. For this Google will have to analyze the text on a page.

In this article I will elaborate on the problems of a search engine and optional solutions. At the end of this article we haven’t revealed Google’s algorithm (unfortunately), but we’ll be one step closer to understand some advice we often give as an SEO. There will be some formulas, but do not panic. This article isn’t just about those formulas. The article contains a excel file. Oh and the best thing: I will use some Dutch delights to illustrate the problems.

Croquets and Bitterballen
Behold: Croquets are the elongated and bitterballen are the round ones 😉

True OR False
Search engines have evolved tremendously in recent years, but at first they could only deal with Boolean operators. In simple terms, a term was included in a document or not. Something was true or false, 1 or 0. Additionally you could use the operators as AND, OR and NOT to search documents that contain multiple terms or to exclude terms. This sounds fairly simple, but it does have some problems with it. Suppose we have two documents, which consist of the following texts:

Doc1:
“And our restaurant in New York serves croquets and bitterballen.”

Doc2:
“In the Netherlands you retrieve croquets and frikandellen from the wall.”
Frikandellen
Oops, almost forgot to show you the frikandellen 😉

If we were to build a search engine, the first step is tokenization of the text. We want to be able to quickly determine which documents contain a term. This is easier if we all put tokens in a database. A token is any single term in a text, so how many tokens does Doc1 contain?

At the moment you started to answer this question for yourself, you probably thought about the definition of a “term”. Actually, in the example “New York” should be recognized as one term. How we can determine that the two individual words are actually one word is outside the scope of this article, so at the moment we threat each separate word as a separate token. So we have 10 tokens in Doc1 and 11 tokens in Doc2. To avoid duplication of information in our database, we will store types and not the tokens.

Types are the unique tokens in a text. In the example Doc1 contains twice the token “and”. In this example I ignore the fact that “and” appears once with and once without being capitalized. As with the determination of a term, there are techniques to determine whether something actually needs to be capitalized. In this case, we assume that we can store it without a capital and that “And” & “and” are the same type.

By storing all the types in the database with the documents where we can find them, we’re able to search within the database with the help of Booleans. The search “croquets” will result in both Doc1 and Doc2. The search for “croquets AND bitterballen” will only return Doc1 as a result. The problem with this method is that you are likely to get too much or too little results. In addition, it lacks the ability to organize the results. If we want to improve our method we have to determine what we can use other then the presence / absence of a term in a document. Which on-page factors would you use to organize the results if you were Google?

Zone Indexes
A relatively simple method is to use zone indexes. A web page can be divided into different zones. Think of a title, description, author and body. By adding a weight to each zone in a document, we’re able to calculate a simple score for each document. This is one of the first on page methods search engines used to determine the subject of a page. The operation of scores by zone indexes is as follows:

Suppose we add the following weights ​​to each zone:

Zone Weight
title 0.4
description 0.1
content 0.5

We perform the following search query:
“croquets AND bitterballen”

And we have a document with the following zones:

Zone Content Boolean Score
title New York Café 0 0
description Café with delicious croquets and bitterballen 1 0.1
content Our restaurant in New York serves croquets andbitterballen 1 0.5
Total 0.6

Because at some point everyone started abusing the weights assigned to for example the description, it became more important for Google to split the body in different zones and assign a different weight to each individual zone in the body.

This is quite difficult because the web contains a variety of documents with different structures. The interpretation of an XML document by such a machine is quite simple. When interpreting an HTML document it becomes harder for a machine. The structure and tags are much more limited, which makes the analysis more difficult. Of course there will be HTML5 in the near future and Google supports microformats, but it still has its limitations. For example if you know that Google assigns more weight to content within the <content> tag and less to content in the <footer> tag, you’ll never use the <footer> tag.

To determine the context of a page, Google will have to divide a web page into blocks. This way Google can judge which blocks on a page are important and which are not. One of the methods that can be used is the text / code ratio. A block on a page that contains much more text than HTML code contains probably the main content on the page. A block that contains many links / HTML code and little content is probably the menu. This is why choosing the right WYSIWYG editor is very important. Some of these editors use a a lot of unnecessary HTML code.

The use of text / code ratio is just one of the methods which a search engine can use to divide a page into blocks. Bill Slawski talked about identifying blocks earlier this year.

The advantage of the zone indexes method is that you can calculate quite simple a score for each document. A disadvantage of course is that many documents can get the same score.

Term frequency
When I asked you to think of on-page factors you would use to determine relevance of a document, you probably thought about the frequency of the query terms. It is a logical step to increase weight to each document using the search terms more often.

Some SEO agencies stick to the story of using the keywords on a certain percentage in the text. We all know that isn’t true, but let me show you why. I’ll try to explain it on the basis of the following examples. Here are some formulas to emerge, but as I said it is the outline of the story that matters.

The numbers in the table below are the number of occurrences of a word in the document (also called term frequency or tf). So which document has a better score for the query: croquets and bitterballen ?

croquets and café bitterballen Amsterdam
Doc1 8 10 3 2 0
Doc2 1 20 3 9 2
DocN
Query 1 1 0 1 0

The score for both documents would be as follows:
score(“croquets and bitterballen”, Doc1) = 8 + 10 + 2 = 20
score(“croquets and bitterballen”, Doc2) = 1 + 20 + 9 = 30

Document 2 is in this case closer related to the query. In this example the term “and” gains the most weight, but is this fair? It is a stop word, and we like to give it only a little value. We can achieve this by using inverse document frequency (tf-idf), which is the opposite of document frequency (df). Document frequency is the number of documents where a term occurs. Inverse document frequency is, well, the opposite. As the number of documents in which a term grows, idf will shrink.

You can calculate idf by dividing the total number of documents you have in your corpus by the number of documents containing the term and then take the logarithm of that quotient.

Suppose that the IDF of our query terms are as follows:
Idf(croquets)            = 5
Idf(and)                   = 0.01
Idf(bitterballen)         = 2

Then you get the following scores:
score(“croquets and bitterballen”, Doc1) = 8*5  + 10*0.01 + 2*2 = 44.1
score(“croquets and bitterballen”, Doc2) = 1*5 + 20*0.01 + 9*2 = 23.2

Now Doc1 has a better score. But now we don’t take the length into account. One document can contain much more content then another document, without being more relevant. A long document gains a higher score quite easy with this method.

Vector model
We can solve this by looking at the cosine similarity of a document. An exact explanation of the theory behind this method is outside the scope of this article, but you can think about it as an kind of harmonic mean between the query terms in the document. I made an excel file, so you can play with it yourself. There is an explanation in the file itself. You need the following metrics:

  • Query terms – each separate term in the query.
  • Document frequency – how many documents does Google know containing that term?
  • Term frequency – the frequency for each separate query term in the document (add this Focus Keyword widget made by Sander Tamaëla to your bookmarks, very helpful for this part)

Here’s an example where I actually used the model. The website had a page that was designed to rank for “fiets kopen” which is Dutch for “buying bikes”. The problem was that the wrong page (the homepage) was ranking for the query.

For the formula, we include the previously mentioned inverse document frequency (idf). For this we need the total number of documents in the index of Google. For this we assume N = 10.4 billion.

An explanation of the table below:

  • tf = term frequency
  • df = document frequency
  • idf = inverse document frequency
  • Wt,q = weight for term in query
  • Wt,d = weight for term in document
  • Product = Wt,q * Wt,d
  • Score = Sum of the products

The main page, which was ranking: http://www.fietsentoko.nl/

term Query Document Product
tf df idf Wt,q tf Wf Wt,d
Fiets 1 25.500.000 3.610493159 3.610493159 21 441 0.70711 2.55302
Kopen 1 118.000.000 2.945151332 2.9452 21 441 0.70711 2.08258
Score: 4.6356

The page I wanted to rank: http://www.fietsentoko.nl/fietsen/

term Query Document Product
tf df idf Wt,q tf Wf Wt,d
Fiets 1 25.500.000 3.610493159 3.610493159 22 484 0.61782 2.23063
Kopen 1 118.000.000 2.945151332 2.945151332 28 784 0.78631 2.31584
Score: 4.54647

Although the second document contains the query terms more often, the score of the document for the query was lower (higher is better). This was because the lack of balance between the query terms. Following this calculation, I changed the text on the page, and increased the use of the term “fietsen” and decreased the use of “kopen” which is a more generic term in the search engine and has less weight. This changed the score as follows:

term Query Document Product
tf df idf Wt,q tf Wf Wt,d
Fiets 1 25.500.000 3.610493159 3.610493159 28 784 0.78631 2.83897
Kopen 1 118.000.000 2.945151332 2.945151332 22 484 0.61782 1.81960
Score: 4.6586

After a few days, Google crawled the page and the document I changed started to rank for the term. We can conclude that the number of times you use a term is not necessarily important. It is important to find the right balance for the terms you want to rank.

Speed up the process
To perform this calculation for each document that meets the search query, cost a lot of processing power. You can fix this by adding some static values ​​to determine for which documents you want to calculate the score. For example PageRank is a good static value. When you first calculate the score for the pages matching the query and having an high PageRank, you have a good change to find some documents which would end up in the top 10 of the results anyway.

Another possibility is the use of champion lists. For each term take only the top N documents with the best score for that term. If you then have a multi term query, you can intersect those lists to find documents containing all query terms and probably have a high score. Only if there are too few documents containing all terms, you can search in all documents. So you’re not going to rank by only finding the best vector score, you have the have your statics scores right as well.

Relevance feedback
Relevance feedback is assigning more or less value to a term in a query, based on the relevance of a document. Using relevance feedback, a search engine can change the user query without telling the user.

The first step here is to determine whether a document is relevant or not. Although there are search engines where you can specify if a result or a document is relevant or not, Google hasn’t had such a function for a long time. Their first attempt was by adding the favorite star at the search results. Now they are trying it with the Google+ button. If enough people start pushing the button at a certain result, Google will start considering the document relevant for that query.

Another method is to look at the current pages that rank well. These will be considered relevant. The danger of this method is topic drift. If you’re looking for bitterballen and croquettes, and the best ranking pages are all snack bars in Amsterdam, the danger is that you will assign value to Amsterdam and end up with just snack bars in Amsterdam in the results.

Another way for Google is to use is by simply using data mining. They can also look at the CTR of different pages. Pages where the CTR is higher and have a lower bounce rate then average can be considered relevant. Pages with a very high bounce rate will just be irrelevant.

An example of how we can use this data for adjusting the query term weights is Rochio’s feedback formula. It comes down to adjusting the value of each term in the query and possibly adding additional query terms. The formula for this is as follows:
Rochhio feedback formula

The table below is a visual representation of this formula. Suppose we apply the following values ​​:
Query terms: +1 (alpha)
Relevant terms: +1 (beta)
Irrelevant terms: -0.5 (gamma)

We have the following query:
“croquets and bitterballen”

The relevance of the following documents is as follows:
Doc1   : relevant
Doc2   : relevant
Doc3   : not relevant

Terms Q Doc1 Doc2 Doc3 Weight new query
croquets 1 1 1 0 1 + 1 – 0        = 2
and 1 1 0 1 1 + 0.5 – 0.5  = 1
bitterballen 1 0 0 0 1 + 0 – 0         = 1
café 0 0 1 0 0 + 0.5 – 0     = 0.5
Amsterdam 0 0 0 1 0 + 0 – 0.5     = -0.5  = 0

The new query is as follows:
croquets(2) and(1) bitterballen(1) cafe(0.5)

The value for each term is the weight that it gets in your query. We can use those weights in our vector calculations. Although the term Amsterdam was given a score of -0.5, the adjust negative values back to 0. In this way we do not exclude terms from the search results. And although café did not appear in the original query, it was added and was given a weight in the new query.

Suppose Google uses this way of relevance feedback, then you could look at pages that already rank for a particular query. By using the same vocabulary, you can ensure that you get the most out of this way of relevance feedback.

Takeaways
In short, we’ve considered one of the options for assigning a value to a document based on the content of the page. Although the vector method is fairly accurate, it is certainly not the only method to calculate relevance. There are many adjustments to the model and it also remains only a part of the complete algorithm of search engines like Google. We have taken a look into relevance feedback as well. *cough* panda *cough*. I hope I’ve given you some insights in the methods search engine can use other then external factors. Now it’s time to discuss this and to go play with the excel file 🙂

Have a good day!!!

source: http://www.seomoz.org