In Search Podcast: 5 Things You Need to Know About SEO on the Edge
Today we’re going to be having a look at how you can improve the quality of your SEO lives by conducting more of your business on the edge with a jump roper, who enjoys scratching his beard thoughtfully while sipping on coffee, and whiskey, or maybe ideally, an Irish Coffee. He is the trainer, speaker, and solver of search problems. A warm welcome to the In Search SEO podcast, senior SEO consultant, Chris Green.
The five tasks are:
- Split testing
- Redirect management
- Bot access logging
- Sitemap building/management
- Injecting content
5 Things You Need to Know About SEO on the Edge
Chris: Thank you for having me, David.
D: You can find Chris over at chris-green.net. So, Chris, you don’t sound it, but are you always on the edge?
C: I don’t know. I think anyone who’s been doing SEO for decades would be a little on edge. The short answer is yes. The long answer, it depends. Wait, that’s another short answer. I have gotten into SEO around the time when penguins and pandas started kicking off so yes, I believe I’m sufficiently on the knife edge. I don’t think I’ve ever recovered from that if I’m being completely honest.
D: I remember you chatting about the edge quite a while ago. You’re certainly one of the prominent thinkers on the topic. So it’s great to have you on to discuss this. And today, you’re gonna be sharing five SEO tasks that are better handled on the edge. Starting off with number one, SEO split testing.
1. SEO Split Testing
C: So testing within SEO has finally picked up a bit more. And there are multiple ways of testing SEO, from the real simple of deploying it, checking with website analytics, has it worked, hasn’t it worked. This is the easiest way which in theory, all of us in SEO should be doing. But the way “the edge” works out is to effectively deploy a change to 50% of pages within a group and let Google visit the test pages and then the control/unchanged pages. This helps you make changes to page groups on your website without actually changing the code base or adding any additional requirements on the server or the CMS. It’s like adding an extra layer that says that on these pages we’re going to show people different versions, which you can do at various points in the process.
So the edge makes it look as if it’s coming from the server, which is great for indexing because Google picks it up as if it’s just code. You can also do this testing in the client. So by using JavaScript, which is fundamentally less reliable, it can work but it’s putting a lot more emphasis on Google. So the edge makes it quicker and the results you get, you can trust more. It’s not perfect, but it’s a lot better and a lot more robust.
D: So do most SEOs come up with their own scripts or use simple scripts to run these split tests? Or is there a specialist split testing software that you would recommend to use in conjunction with the edge?
C: You can go from the sublime to the ridiculous. If you’re just talking about the edge, I’d say there’s probably a handful of players in the space that’s established. So Search Pilot, formally ODN, is literally built on edge infrastructure. They’ve built a meta CMS that lets you control all that. And then they pulled in all the really smart analytics and analysis methodology on top of that. I can take absolutely no claim to owning or starting this; far from it. They’re some of the biggest pioneers. But what you can do with the edge on all of the different kinds of edge infrastructure, Akamai, Cloudflare, and Fastly, is that you can write the scripts to do this yourself. And when you’re talking about the edge, what you need to run these tests is the pages that are going to be the control pages, that are going to be the test. And then the script that effectively makes the changes to the test version. And the complexities around that depend on how complex the test is. If you’re just rewriting page titles, for example, this becomes really quite a straightforward thing to do. I’m not an engineer. I’m an SEO that’s too nosy for my own good sometimes, but these things, especially on Cloudflare, is probably one of the most accessible elements on there. Myself and Simon Thompson years ago, back when we were both in an agency built-tool called Tool Spark that kind of turned out to be more of a beta and a proof of concept. But that was on top of Cloudflare’s infrastructure. And that, again, let you deploy split tests on an edge, essentially, for free at that point, but that ended up being more of a sandbox. So you can go right the way through the enterprise-level software to build your own. And then there are some more emergent platforms that you can run this on. But I think as an SEO, you need to think about what is the stack that you’re building in. Who else do you need to get on board? If you need to mitigate risk and have to publish rights and change histories, then you get on the Enterprise option. If you just got someone who’s bootstrapped, but really wants to test it, build straight on the edge. Find someone who can write code for workers, and you can test stuff.
D: I sense that we could talk about split testing on the edge for about three hours. But let’s move on to the second area that you would recommend as being better and more effective on the edge, redirect management.
2. Redirect Management
C: Yeah, managing redirects is a pain usually because if you’ve got large websites or lots of different infrastructures, knowing where different redirects are controlled and managed, what order they fire in, whether they’re complex, etc. that’s a nightmare. And virtually every big organization has that problem. And one of the big problems you get is you end up passing people between different servers or different CDN layers in one redirect action, which is inefficient. So if you go through the CDN, go to the server, the server then says that you need to go somewhere else, and then you go somewhere else, and then you get redirected somewhere else, it’s really inefficient, kind of costly, and a nightmare to manage.
Now, because of where the CDN or the edge sits, it’s the first thing the user will encounter. If you manage all your redirects by there and ensure that you have flattened any changes at that point, which is relatively simple to do… Firstly, you can reduce the number of redirects. Secondly, you don’t make it to the origin server before it redirects you. So you actually reduce the level of traffic to the origin and the redirect happens a lot quicker, straight from that server. And finally, if you’ve got discipline, and you’ve implemented it correctly, then you have just one place that you need to look over all the redirects, irrespective of all the different platforms. And that simplicity, when you instill discipline in the team, makes it a bit of a no-brainer to be honest.
D: And number three, bot access logging.
3. Bot Access Logging
C: Bot access logging is an interesting one. If you’ve ever tried to do a log file audit, and you’ve said I need my access logs to do the analysis, you go to DevOps or whoever, they’ll either give you a puzzled look, or they’ll say no, that’s too big, we don’t store it, or we store a day’s worth, or you can have it but please join a long queue. That’s really challenging. And what’s more, if you are running CDN in caching, your server access logs may not receive all of the bot traffic anyway. So your logs won’t be complete. Everything that goes through the CDN is picked up by all traffic, whether it’s cached or not. And if you’re using the edge to effectively store this log data and streaming it to a service like Sumo logic or another kind of storage, you’ve got the opportunity of siphoning all of that data off at the edge rather than trying to find it from your servers. But also if you’re writing workers in the correct kind of rationale, or logic at that point, you can set it to only capture the bot traffic that you want. So usually Googlebot or search engine bots, but you can do things like validating IP addresses to make sure it’s not people spoofing, and only collect the access data you need, which greatly reduces the storage space. And some tools out there like Content King, for example, can interface with some CDNs directly to collect data straight from that level. So assuming you’ve got the right level of access, and DevOps have said yes, you can start collecting those logs directly, which means you can do some tech SEO analysis with relatively little lifting.
D: Is there a certain size website in terms of pages where it only becomes worthwhile to look at log files or should every SEO be looking at log files?
C: As a rule of thumb, if your website is under 10,000 pages, I tend to not rely on or go for logs straightaway. Mainly, because gaining access to them is a nightmare. If I can access that data easily and I can analyze it easily… so any of the big SaaS crawlers like Deep Crawl have all got log file analysis. If I can get that data and analyze it, then let’s do it. But if I’m under 10,000 pages and getting that data is a pain, then I won’t get too upset. Now that page count is kind of arbitrary but if you are over a million plus pages, then logfile will have a lot of information and insight that will give you some nice incremental wins. Under that, probably not worth it. D: And number four, the tasks that are more effective to do on the edge, sitemap building/management.
4. Sitemap Building/Management
C: This is a unique one. I’ve had a few projects recently where sitemap generation needs to pull pages from different services, different systems, it’s out of date, it’s not working, the engineering to rebuild all that is incredibly challenging, etc. So what we’ve done is built a service that pulls API data from a SaaS crawler. It pulls in indexable pages, and then builds an XML sitemap on the edge and hosts it at that edge point. We’re effectively using the crawler to crawl the sites every day, it builds and regenerates the fresh sitemap every day and publishes it to the edge. Some may say that’s an over-engineered solution that puts an additional requirement on a third party. And I would agree, but in some situations, it made so much sense to create your single point of truth, the sitemaps in one place, without requesting other content APIs and other services where often that data is not clean, it needs filtering. And writing effectively microservices that then host them on the edge was just far cheaper, far quicker, and more robust. Obviously, the right answer to that is to build it right the first time, but it just simply wasn’t an option.
D: Talking about building it right the first time. Is there a danger with automating the building of XML sitemaps? For it to include too much rubbish?
C: Yes. Actually, I found that happens anywhere. If you ever worked in a CMS, you may have crawled a sitemap and seen test pages, where someone created some pages, not put it in the site structure, and just left it there. And if the logic that builds the sitemap isn’t checking if it is indexable and all these other elements and filters, it could still get published in any other way. So I know if you’re on WordPress, Yoast does a lot of that heavy lifting for you. I think WordPress does a lot more in its core than it used to. But obviously, a CMS like Drupal doesn’t take care of it. And very often people will want pages that you don’t want to make it into the sitemap for various other reasons. Again, it’s just making sure you’ve gotten on top of that and you’re building those filters in which I think is important, whether it’s on the edge or not. In fantasy, you still can be feeding data to Google that you just don’t want it to see. But again, doing it on the edge, a very quick and lightweight solution for that.
D: And number five is injecting content. What kind of content are you talking about there?
5. Injecting Content
C: Anything web and digital-orientated. This one kind of overlaps a little bit with the split testing in the sense that you’re using the edge to add more content in and that content looks as if it’s from the server rather than in the client. If you’ve ever been involved in a subdomain or subdirectory argument about blogs, for example, and you can’t pull the blog through the right infrastructure, well, you can use CDNs to effectively stitch content in. You can say that you want to pull the header from this system but pull the blog content from that system. And in the edge that can be done very quickly and efficiently. A lot of it gets cached and stitched together on the edge. And by the time it is displayed to the user, you effectively got this hybrid content from two different systems. And to be fair, that is something you can do on the origin, with the right inclination and buildability. But doing it on the edge, the different systems you’re pulling from, it almost doesn’t matter. As long as you can clearly identify what it is you need to be pulling in. And you can write the code to effectively do that. It takes place very performant, very quickly, and it gets you what you need.
D: I remember a long time ago, incorporating content using frames and PHP includes. And both of those are very old-fashioned ways of doing this. Are there any downsides to injecting content from other sources, or other web servers? Will there be any potential SEO downsides to doing that?
C: The key ones are if these assets are available on other URLs, and can be indexed on them, then there is an inherent risk. That’s also equally easy to prevent happening if you’re aware that you’re trying to do it. In some instances, you could be using data feeds from other services and stitching them together, rather than the old frameset method of having the header on one page, the body on another page, and showing them on the same page. You can build that in quite easily to stop that from happening. I think the key one is that you need to be receiving content from those two sources reliably, and it needs to be cached reliably. I think a lot with the edge and the more complicated engineering tasks is what happens if the CDN falls over. What is the fallback? And that can be varying in complexity. I think if you’re a big organization and you want significant uptime, like 99.99, then you can build other CDNs to fall back. But if, for example, you’re relying on your CDN to do the stitching together, there are some CDN issues, and you may find that some of those pages just don’t work. But if Cloudflare goes down, then half the internet goes down. In those instances, the question is are we serving the appropriate response to Google to get them to come and check back again later once the disruption is gone?
I think with anything edge related, that’s where the biggest anxiety comes from where what happens if this third-party service falls over. But that’s the nightmare of any web infrastructure. You can never safeguard that even if you’ve got the server in your own office, and you feel happy about that. That’s quite an old-fashioned take on it anyway. But there is no zero-risk method of hosting. You can fall over to others. So you can have a dual CDN strategy. You could have Akamai on one layer and Fastly on another. If Akamai fails, it passes to Fastly, or vice versa. That’s incredibly sophisticated. And that’s an edge case of an edge case. But it’s possible to protect against most of this if you know what you’re doing and you spec it right.
D: I expect a webinar discussion panel on how to actually guarantee 100% uptime. That would be interesting.
C: It’s possible, more possible than it ever has been, I think if you combine Cloudflare and Akamai or Cloudflare and Fastly or Similarweb, you could get pretty close which would be very interesting.
D: Well, let’s finish off with Pareto Pickle. Pareto says that you can get 80% of your results from 20% of your efforts. What’s one SEO activity that you would recommend that provides incredible results for modest levels of effort?
The Pareto Pickle – Publish Changes
C: This nearly made it onto my edge list, but it’s not quite and it’s a little bit hacky. So some people will inherently not like this, but using the edge to get something done. So we talked about Meta CMS briefly. And it’s something the Search Pilot team and John Avildsen between them helped show the world but you can use the edge to publish changes that otherwise would be stuck in dev queues. And the idea of getting it done, getting it live, proving the concept, ignoring the tech debt risk, and ignoring annoying DevOps for a minute because they’re both factors. But all of the value in SEO is it being live, that content being actioned and edge can shortcut that. And it’s not pretty, and it’s not the right way. But pushing some content changes live, and circumventing queues has great results if the alternative is waiting six months and it’s not happening.
D: I’ve been your host David Bain. Chris, thanks so much for being on the In Search SEO podcast.
C: Thanks for having me, David.
D: And thank you for listening. Check out all the previous episodes and sign up for a free trial of the Rank Ranger platform over at rankranger.com.
The #1 keyword research tool
Give it a try or talk to our marketing team — don’t worry, it’s free!