Back to Basics: Sitemaps for SEO
Did you know that when you search Google, you’re not searching the ‘web’? Instead, you are searching Google’s index of the web.
Google crawls the web to add pages it discovers to its index, but here’s the thing: Google’s automated web bots don’t always find pages. This means, one of your first jobs as an SEO is to ensure the pages you want to show up in search results are in Google’s index.
One simple way to do this is by adding a sitemap to your site.
Doing this is the only practical way to make sure all your pages are submitted to Google – in fact, besides links, sitemaps are the second most effective way that search engines find and index pages.
In this post, we’ll cover what a sitemap is and how you can use it to ensure that your pages are indexed.
What is a sitemap?
A sitemap is an XML file that provides lists of pages, videos, and other files on a website, and the relationships between them. It acts as a roadmap for search engines, helping them crawl and index a website more efficiently. They are especially useful for larger websites with frequently updated content.
What are the types of sitemaps?
1. XML sitemaps:
An XML sitemap is a machine-readable file that makes it easy for bots to crawl and index your content. They consist of a list of all of the URLs on your site and include additional metadata that informs search engines:
- When URLs were last modified
- How frequently a URL is updated
- How to prioritize crawling
- Which images exist on each URL
- If a URL has been translated to a different language
2. HTML sitemaps:
HTML sitemaps provide users with a hierarchical list of links to all of a site’s pages in one place.
Although HTML sitemaps were once a popular way for users to navigate a site, in 2022, John Mueller said that HTML sitemaps should never be needed.
Here is the exact quote:
‘HTML sitemaps should never be needed. Sites should always have a clear navigational structure. If you feel the need for an HTML sitemap, spend the time improving your site’s architecture instead.’
According to John, from an SEO standpoint, if you have a large website, your HTML sitemap will not aid indexing. Instead, you should use XML sitemaps and robust internal linking. Furthermore, if users frequently rely on your HTML sitemap for navigation, it suggests the site’s navigation system is inadequate and requires improvement.
But HTML sitemaps are not obsolete. According to the Web Accessibility Initiative, despite not being a requirement, adding one may help your site fulfill its web accessibility guidelines by giving users multiple ways to access your pages.
How to find a sitemap
It’s relatively easy to find a sitemap once you know how. Here are three ways:
1. Manually check XML sitemap locations
The simplest way to find a sitemap is to look for it manually using your humble web browser. Simply add /sitemap.xml to the site’s domain, and you’ll be redirected to their sitemap.
For instance, here is the sitemap for jaybutler.com:
If you don’t find the sitemap this way, there are more manual options to try.
2. Check the robots.txt file
A robots.txt file provides instructions to web crawlers on which areas to crawl. Since web crawlers visit these files to understand how to crawl the site, adding a link to its sitemap there makes perfect sense.
You can see a robots.txt file by adding /robots.txt to a site’s root directory.
Below, we can see similarweb.com’s impressive-looking (if we do say so ourselves) robots.txt file, including three sitemap links.
3. Use search operators
Search modifiers are commands that you can use to refine your search results by providing additional instructions to the search engine.
To find a sitemap, we’ll use both the site: and filetype: operators.
For example, if we were to type ‘site:nike.com filetype:xml’ into the search bar, Google will bring you any indexed XML sitemaps in the search results.
Another option is just to use the site: operator and add sitemap.xml to the search. For instance, if you search for site:nike.com sitemap.xml, Google brings an entire list of indexed sitemaps:
Sitemap.xml structure & syntax
XML sitemaps list the URLs for a site, along with additional metadata about each URL. Here is a brief explanation of the most common sitemap elements:
- <?xml version=”1.0″ encoding=”UTF-8″?>: Defines the XML version and character encoding.
- <urlset>: Root element that contains the list of URLs.
- xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″: Specifies the namespace for the sitemap schema.
- <url>: Defines a single URL entry.
- <loc>: The URL of the page.
- <lastmod>: The date when the page was last modified. (Always use W3C date and time format)
- <changefreq>: How frequently the page is expected to change (e.g., daily, weekly, monthly, yearly).
- <priority>: The priority of the URL relative to other pages on the site (ranges from 0.0 to 1.0).
- <image:image>: This specifies images associated with the URL for image sitemaps.
- hreflang=”x”: Specifies the language and regional variations of a page for multilingual sites.
Here is one of nike.com’s sitemaps:
Why are sitemaps important?
Essentially, sitemaps enable search engines to index your web pages properly. But, even if you don’t have a sitemap, as long as content links to the pages on your site, Google is able to crawl and index your content.
This means if you have a small site and your pages are properly linked, you don’t really need a sitemap.
So, when do you need a sitemap?
- Your site has thousands of pages: Often, large sites have convoluted layouts, which makes crawling difficult. Also, it’s next to impossible to manage internal links on a gigantic site. A sitemap will easily solve this.
- You need to indicate priority pages: In a sitemap, you can assign priority levels to different pages, indicating to search engines which pages are more important. This can help search engines prioritize the crawling and indexing of your most important pages.
- Your site is complicated: A well-structured sitemap can provide search engines with a better understanding of the structure and organization of your website, which can potentially improve the relevance of your pages in search results.
XML sitemap best practices
There are two way to generate a sitemap:
- Automatically (a CMS like WordPress and Wix will do this)
- Building your own sitemap
If you’re building a sitemap yourself, here are some best practices for need to follow:
1. Don’t exceed sitemap limitations
A sitemap should contain fewer than 50,000 URLs or the uncompressed file size should not exceed 50MB.
2. Only include pages you want indexed
There are many instances where you will not want certain pages included in Google’s index. For instance:
- Duplicate content pages
- Private or confidential pages
- Internal search results pages
- Thank you or confirmation pages
It’s important that you leave these pages out of your sitemap.
In fact, a good rule of thumb is only to include canonical URLs. This will avoid potential issues with duplicate content and ensure that search engines only crawl and index the correct versions of your pages, which will maintain a clear and organized structure for your website’s content in search results.
3. Avoid static sitemaps
A static sitemap does not dynamically update itself to reflect changes in your website’s content or structure. You should avoid using them, as when you make changes to your site, your sitemap will become outdated, rendering it useless.
There are many tools online that are designed to crawl sites and generate a sitemap automatically. Since these tools can only create static sitemaps, we recommend avoiding them.
4. Use optimal syntax
Your sitemaps should only include absolute URLs, including complete web addresses, including the protocol (e.g. https:) and domain name. Also make sure to use UTF-8 encoding.
5. Avoid using nested sitemaps
A nested sitemap structure is designed to break up a very large sitemap into multiple smaller sitemap files and then provide a sitemap index file that points to each of those individual sitemaps. You should avoid using this structure as Google does not support it.
6. Compress large sitemaps
When your sitemaps become too large, you can compress them using the gzip method.
7. Submit your sitemap to Google Search Console
With your sitemap in hand, it’s time to submit it to your site. You do that by entering it into your Google Search Console sitemaps page.
Here’s how:
- Navigate to the Sitemaps page
- Enter the URL of your sitemap file, for example https://www.example.com/sitemap.xml
- Click the “Submit” button
8. Use the Sitemap report to spot errors
Once you’ve submitted your sitemap, it’s important to verify that there are no errors. In the sitemaps page you’ll see a table listing all of your sitemaps. Check the status column to see if there are any errors. If there are no errors, you’ll see ‘Success’ in the Status column.
If there are any errors, click on the URL, and you’ll be directed to a page listing your errors, with dropdowns explaining each issue.
9. Add your sitemap link to your robots.txt file
Although you’ve submitted your sitemap using Search Console, we highly recommend that you add it to your robots.txt file. The reason is your robots.txt file is one of the first places search engine crawlers look when accessing a website.
Search engine crawlers use the robots.txt file to understand which parts of your site they are allowed to crawl and index. By including your sitemap in the robots.txt file, you’re essentially pointing search engine bots directly to the location of your sitemap file, making it easier for search engines to find and crawl your website’s pages efficiently.
Sitemaps: Your GPS for search engine crawlers
Although not every site needs a sitemap to have its pages indexed, it always pays to make Google’s job a little easier – and there’s certainly no harm in making sure Google is finding your pages, is there? Submitting one is easy to do and could mean the difference between pages ranking in search results and those same pages being forgotten.
In fact, checking for a sitemap should be a part of your site audit process.
Download our free site audit checklist and never miss a beat.
FAQs
Is a sitemap important for SEO?
A sitemap is important for SEO as it serves as a roadmap for search engines, allowing them to discover and crawl all the pages on your website efficiently. Sitemaps help improve crawlability, indexing, and the prioritization of important pages. They also enable you to notify search engines about new or updated content.
How do I know if my website has a sitemap?
There are a few ways to know if your site has a sitemap:
- Look in the root directory of your website for a file named “sitemap.xml” or similar
- Search your website source code for references to a sitemap URL
- Use tools like Google Search Console or third-party website crawlers, which can detect and report sitemaps.
The #1 keyword research tool
Give it a try or talk to our marketing team — don’t worry, it’s free!