I’m going to try to post here more often from now on, just have been busy with all sorts of projects, and everything I wrote for SEO typically I just post on other blogs such as my recent Technical SEO Chrome Tools post.
There has been resurgence of people talking about best practice backlink analysis methods and other satellite topics. It is certainly still an important topic that it seems a lot of people haven’t gotten their head around properly. There have been countless threads started on the SEO subreddit on this, and its come (shockly) to my attention that people still aren’t doing any pre-analysis preparation. By which I mean, doing what you can to cut down the massive list of backlinks to a more approachable number.
Now I must never assume everyone is at the same stage in their experience with this, so a bit of back story: Over the evolution of the algorithms of the search engines, links became no longer a ‘vote’ in confidence and links can now carry negativity instead of the options being positive or neutral. So now, not only can you have your website manually penalised by Google’s web spam (Search Quality) team but also algorithmically via the Google Penguin updates. I’ve now successfully completed backlink analysis for 20 different websites with only 2 of them requiring a second reconsideration request, and those we were conservative with the ones we disavowed the first time iteration of the file.
So, you’re in charge of cleaning up a site’s backlinks, perhaps from a manual penalty or potentially pre-empting a penalty after taking over from a previous SEO agency and you wish to tidy up some of their messy work – whats the first step?
Undoubtedly it is to pull off the backlinks for the site; and then the error that I’m seeing people perpetuate is the “just go through every backlink individually” step directly after this. They advocate not manipulating the data at all. This is just simply a bad idea, while certainly I agree the backlinks do need human eyes on them and services such as Link Risk are no guarantee of site’s quality – not reducing the number of backlinks to those that still exist properly is just plain silly.
So, lets look at some numbers: a backlink profile of 170,000 URLs giving an estimate of 2 minutes per URL, you’ll spend an utterly boring 5666 hours looking at webpages, likely poor ones to check their quality. That will be undoubtedly soul-destroying and I wouldn’t wish that on anyone. Quality is very much in the eye of the beholder, so while it is a task you may want to farm out to the interns or overseas, it is a difficult one to do as everyone will have different thresholds of disavow or not.
Now there are some things that can be done progamatically without compromising the integrity of the backlink analysis work. Now I’ll be using a few different tools, all of them paid (Sorry about that) but you should have access to all of them, as they are very useful for all manner of things, and worth every penny.
Now whatever your chosen backlink aggregator is – Ahrefs, MajesticSEO or other – ensure you import the Webmaster Tools reported backlinks as well and de-duplicate. Now, something that these aggregators do not do very well is cull the deleted links.
Now my preferred method would be to go straight to Screaming Frog and crawl the entire backlink profile, sadly Frog is very resource intensive and without modifications to memory won’t happily go above 14,000 URLs let alone 170,000. What will do that number happily however is Scrapebox’s “Alive Check” addon! This’ll allow us to simply examine every single URL and check what HTTP status code it is returning, this’ll allow us to ensure we’re only looking at sites that have not 404’d, or redirected.
So with my example backlink analysis of 170,000 that I crawled for this household name brand site, only 78,000 of the pages actually returned a 200 OK status code.
Now for the good bit with Screaming Frog, which can just about handle 78,000 URLs if we use a powerful computer such as my home computer with 16 gigs of RAM, and alter the settings of Frog to allow increased memory usage – by default it only utilises 1 gig of ram (if I recall correctly…) However, the good bit comes in the following setting, set a custom filter to examine the site for a string of the domain + TLD, this will ensure that not only does the site still exist, but also the link in question is also still there.
Again in the example used, this then cut down the total number of URLs to 36,000. Using the same formula for time on page, we’re now down at 1200 hours, one fifth of the original size, and we don’t have to be done there either.
There is far more that can be done from this stage, but already we’re slicing the time required for this task in an absolute fraction of the time. There are many more steps in the life-cycle of the task, and many clever people talking about it including Tim Grice’s article “Should you spend time on link removals“?
“I’ll always hire a lazy person to do a hard job. Because a lazy person will find the easiest way to do it.”― Bill Gates
Work smart, not hard.