Marketing Marketing Intelligence

In Search Podcast: 5 Steps to Optimizing Your Crawl Budget With Alina Ghost

In Search Podcast: 5 Steps to Optimizing Your Crawl Budget With Alina Ghost

Free Website Traffic Checker

Discover your competitors' strengths and leverage them to achieve your own success

How effective is your crawl budget optimization? That’s what we’re exploring today with Alina Ghost. Alina is the SEO manager from Debenhams, Boohoo, and has over a decade of SEO experience, having worked for other brands like Tesco and Amara.

In this episode, we get into:

  • Why crawl budget optimization is important
  • Why robots.txt is important for crawl budget optimization
  • How to improve your crawl budget through internal linking
  • Checking for errors and redirects
  • Using log file data to improve your SEO

Why Is Crawl Budget Optimization Important?

David: Alina, why is it so important to focus on crawl budget optimization?

Alina: If you think about the resources that Google or other search engines have, they’re not infinite. So the fact that you have to utilize the time that they spend on your websites, crawl budget is the ultimate thing that you can be doing. Because you can make sure that they don’t look at the pages that you don’t want them to look at and rank to the pages that you do want them to pick up and showcase to your customers and users.

1. robots.txt for Crawl Budget Optimization

D: So you need to funnel Google to all the right places. So today, we’re looking at the five steps to optimize your crawl budget. So starting off with robots.txt as your number one step. Why is that important?

A: So robots.txt is one of the fundamental things that you do as a technical SEO and web developer these days. In basic terms, it’s adding rules to search engines via this one file on your site, to tell them whether or not they can visit particular areas of your site.

For example, if you have a customer area, like a login page, you can add that into that file to say that you don’t want the search engine to have a look at it because it has that private information. And because of what I mentioned earlier, the fact that you don’t want them visiting a page that you don’t want to rank anyway.

D: So what are the advantages to doing it using robots.txt, instead of doing some other way on the page?

A: I guess you can add code into the heading of the page so nothing that the user sees. But you should add it into the robots.txt file because it’s one of the first pages that a search engine would have a look at. So instead of looking at a particular page and knowing what the rules are, imagine it like a game, what you can and can’t do. Essentially, that’s what you’re telling a search engine to do. They can’t look at this particular site or area.

The other thing that you can do with robots.txt, which is quite cool, is use a wildcard (*). That’s basically an asterisk before or after a particular type of URL. For example, if you know that there are a lot of URLs they shouldn’t be visiting, then you can add a wildcard instead of adding every single individual URL to it.

For example, if you have a login page, again, and after the login page, they can actually go to My Orders and My Wishlists. If you don’t want to add that individually, you can add, depending on your URL structure, a URL that looks something like domain.com/login/*. What I’m trying to say is that adding an asterisk will allow you to grab a whole load of URLs that contain the whole area without doing the individual URLs. It’s a bit like redirects. But let’s not get into that.

2. Internal Linking

D: Maybe another episode. So that was robots.txt. Step number one. Step number two is internal linking via navigation.

Internal Links via Navigation

A: So internal linking is so huge in terms of guiding the search engines to a particular area of your site. So when it comes to main header navigation, you can actually tell the search engines what categories are very important to you. If you’re in fashion, then it’s dresses and jeans and things like that. Or if you’re showcasing cars, then it’ll be the particular types of cars that you’re showcasing.

However, there’s other internal linking areas as well, because it’s not just about their header navigation, but it’s also the links that you have with other pages. For example, if you go into a category, then you’ve got the links into the subcategories.

Breadcrumbs

And then there’s also breadcrumbs, which is really important to make sure that each one is associated with a parent page or child page. So it’s ensuring that there is a hierarchy of pages because the more links there are to a particular page, it’s known that the rankings are much more likely to go to those higher pages that have more links. And that’s been tested and trialed. But also, it is quite difficult to understand when it comes to the lower pages. For example, if you’ve got products which are within a subcategory within a subcategory, you just need to make sure that there is a hierarchical spiders graph that is going downwards. That there are actually links to it.

Internal linking is also great for associating pages together. But in terms of crawl budget, if the page isn’t linked to, then if the search engine can’t get to it, it’s unable to see the page or the content on it, and therefore it won’t rank.

D: So you mentioned breadcrumbs there as well. That’s obviously jumping on to step three, which we’ll talk a little bit more about in a second. Just to focus on the big menu section in the header section of your page or the top section of your page, what is the best practice for that? Is there a maximum number of links that you would actually advise to be incorporated as part of that standard top section navigation? And also, is there anything that search engines are less likely to see? I’m thinking of best practice in terms of hovering over or having to click on things before you find certain links.

A: Interesting question. Going back to the first one, there is no optimal amount of links that you should be having in your navigation. The reason for that is because not everybody is the same. You could have a small site, you can have a large site, and even then it’s about trial and error, like A/B testing. Basically, making sure that you have the right amount for you and your business.

Obviously, you don’t want to have like hundreds and hundreds of links in the header navigation. You have to be more strategic about the links that you put. Therefore, you need to strategize to make sure that the very important pages are linked to in the header navigation. And then those important pages link to their child pages, and so on and so forth.

In terms of your second question, are you saying more around the page that needs more visibility? Or are you saying how to predict somebody clicking on a particular page?

D: I guess I’m thinking about from a search engine perspective, is it possible to crawl and see every single link quite easily within that top section of your website? For instance, if links are directly clickable from the top section of your website, you can understand that yes, it’s definitely possible for search engines to see that and determine that those links are probably the more important links on that page. But if you have to, for instance, hover over a section in your top navigation, do search engines de-prioritize the importance of those links?

A: I see what you mean. Basically, if you’ve got a navigation with not a pop up, per se, but something that comes out, which is very common these days, especially with JavaScript. Essentially, how I try to explain it is, imagine you have 100% link authority to your homepage. If you have, let’s say, five links on your header navigation, that 100% is then split, so 20% each, and then anything beyond that is splitting the 20% that you’ve got into how many other links they have on that page. Essentially, you are showing the importance of those pages via the header navigation. And then when something else does pop up, that is actually included within the 100%. So your header navigation, usually, depending how it’s coded, yes, even the ones that are coming out in the header navigation itself, which is common practice these days, they are split against that 100% authority.

Outperform Your Competition - in Every Marketing Channel

The all-in-one solution for data-driven marketing planning and competitor analysis

Start your free trial

3. Breadcrumbs

D: Something else you touched upon was breadcrumbs. That’s step three, internal linking. So is it important for an ecommerce site for every product page to actually have that breadcrumb linking structure at the top.

A: Yeah. So once again, I don’t think I’ve ever heard of an A/B test that showcased that breadcrumbs were not important, because not only are they great for SEO and internal linking, and the crawl budget side of things, but they’re also really important for UX. So in terms of user experience, people use that in terms of navigating and coming back to the categories that they were on before and the pages that they were on before.

Essentially, when it comes to breadcrumbs and internal linking, I’d say that’s really important, especially from a crawl budget point of view. Because that allows more of that link or authority to pass through to the correct pages, whether it’s a parent page or not. And it is to ensure that the hierarchy is still maintained within that. So yes, it’s very important for a crawl budget.

D: Two quick questions in relation to that. Does that mean it’s essential to pick just one core category that’s relevant for each page? And secondly, is there a maximum depth of breadcrumbs that you’d recommend?

A: Yeah, so I think that’s quite a common one, if you are selling a product, whether or not you should dual locate your product. I recommend that you showcase your product, however, maintain the breadcrumb to be the same. So there is always one category that’s going to be the most dominant category and that will always show up in the breadcrumbs. For example, if it’s associated with a category like dresses and also a brand, we’d recommend that the breadcrumb will always be the category to push more of that crawl budget and authority to the dresses page.

Regarding your second question, once again, no, there’s no ultimate number. Depending on the size of your website, if it’s a very big site, then it makes sense to have the category, subcategory, then the subcategory, then go for it. It means that you have a lot of pages that you can actually showcase that Google can visit and rank for. But if you’re a smaller site, then maybe it makes sense to keep it neater and smaller.

4. Check For Errors and Redirects

D: And step four is to check errors and redirects.

A: Yeah, so I guess I touched on this a little bit earlier. Redirects and checking errors are really important when it comes to the crawl budget. If Google is coming and seeing most of your pages are 404ing, i.e., they are invalid and they can’t see any information there, it’s worth making sure that there are permanent redirects in place, like 301s, going to the pages that are more worthy of that crawl budget. Basically, they are worthy to be visited and therefore you don’t want them to visit dead pages, because they’ve got a finite amount of resource to spend on your website, so you’re shepherding them. I like that word, you’re shepherding the search engine into the correct pages, rather than looking at the dead pages that you don’t want them to see.

5. Log File Data

D: That’s a good word. I might take you up on that and use that in the future as well. And step five is log file data. Why is that so important?

A: I’ve added this one last, and I think it is probably one of the hardest things to get information for. So there’s a lot of data there. You need tools to do that, you need your web devs to be on board as well, because that’s a lot of information per day, even per hour, for some companies.

To explain, log file data breaks down each individual visit that you’ve had by search engines or other sources that showcases what pages have been visited. Therefore, you can use that information, put it together with your own crawls, your own investigations that you’ve held, and probably Google Search Console information as well, to see which pages are being visited.

And then you can be more strategic, you can decide whether to add any pages into the robots.txt file, whether to add more links to, in terms of internal navigation, because they’re not being visited at all. And if you’re seeing any errors, 500 errors, or 404 errors, to just sort those out. 404s are usually a little bit easier, because you can redirect those. 500s, you should probably sort it out with your dev guys.

Essentially, it’s so much information. It’s like a goldmine of information, log files data. So I definitely do recommend looking at that on a daily basis.

D: And we can record one episode of that in the future from the sound of it. It sounds like you’ve got some other information to share about that. So that was Alina’s five steps to optimize your crawl budget.

  1. Robots.txt
  2. Internal linking via navigation
  3. Breadcrumbs/internal linking
  4. Check for errors and redirects
  5. Log file data

The Pareto Pickle

Alina, let’s finish off with the Parito Pickle. Parito says that you can get 80% of your results from 20% of your efforts. So what’s one SEO activity you would recommend that provides incredible results for moderate levels of effort?

A: We know that SEO touches many areas. And in this case, I hope I’m not being cliche, but I definitely do believe in SEO automation. So automating your work, whether it’s reporting, which is probably one of the easiest things to automate, or getting a tool to automate information for you.

To give you an example, we’ve recently been testing AI’s writing content for us by feeding in keyword data. No, we’ve not used it live yet, so don’t get too excited. But some of this stuff that comes out, the content, is amazing. And probably, if I can say this, that it’s actually well written more than the copywriters. So, yeah, it’s really well written and creative. It actually makes sense. And it has the keywords there. I guess it’s automating the jobs that we can put aside so we can actually spend our time on something else.

D: Exciting and scary at the same time. I’ve tested Jarvis as a tool for doing that. Is that a tool you’ve used or something else?

A: I’m gonna be completely honest with you. It was a friend of mine who did that. So I don’t know what tools he’s used. But yet he said there were about four different ones that he’s trialed.

D: Okay, we’ll come back and have another conversation about that one as well. I’ve been your host David Bain, you can find Alina Ghost at aghost.co.uk. Alina, thanks so much for being part of the In Search SEO Podcast.

A: Thank you very much.

D: Thanks for listening. Check out all the previous episodes and sign up for a Free Trial of the Rank Ranger platform over at rankranger.com.

author-photo

by Darrell Mordecai

Darrell creates SEO content for Similarweb, drawing on his deep understanding of SEO and Google patents.

This post is subject to Similarweb legal notices and disclaimers.

The #1 keyword research tool

Give it a try or talk to our marketing team — don’t worry, it’s free!

Would you like a free trial?
Wouldn’t it be awesome to see competitors' metrics?
Stop guessing and start basing your decisions on real competitive data
Now you can! Using Similarweb data. So what are you waiting for?
Ready to start digging into the data?
Our comprehensive view of digital traffic gives you the insights you need to win online.