Introduction

A website sitemap can refer either to a text file, usually called sitemap.xml, or to a HMTL page containing links to various parts of a website in order to facilitate navigation. In this post, we will only discuss sitemap.xml. Sitemaps contain lists of URLs helping web crawlers find the pages and content of a website.

The following is a sitemap.xml example containing one URL:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://mysite.com/page.html</loc>
    <lastmod>2014-10-04T13:27:58+03:00<lastmod>
    <changefreq>daily</changefreq>
    <priority>0.7</priority>
    </url>
</urlset>

The <loc> tag is where the URL of a page is set. It is the only mandatory field for each entry. One can set when the page was last modified and how often it is modified with <lastmod> and <changefreq>. Trying to influence search engines by changing these tag values too often does not work, web crawlers will be simply ignore the provided information.

The <priority> field can be used to tell web crawlers which page you think should have more of their attention than others. It is a value between 0.0 and 1.0, which they may take into account. A page's sitemap <priority> has absolutely no impact on its ranking.

Nowadays, it is possible to create sitemaps entries for video, images, mobile phones, news and others.

Learn SEO Best Pratices

1. Every website should have at least one sitemap named sitemap.xml located at its root. Even if it is empty at first, it will be useful later. It should also be registered in the robots.txt (the name is case sensitive) file since not every web crawlers search for it at the root.
2. A sitemap cannot hold more than 50 000 entries or be larger than 10 MB unzipped. One can create several sitemaps per site and help web crawlers find them by listing them in the robots.txt file.
3. Large sitemaps should be compressed using gzip and the .gz file extension (for example: sitemap.xml.gz). In this case, their location and new name must be set into the robots.txt file or many web crawlers will miss them.
4. Don't set any value for <lastmod> or <changefreq> if you don't have reliable information to provide. Web crawlers can figure it out by themselves if necessary.
5. Do set a higher <priority> to important pages when indexing large sites for the first time. Web crawlers will spend their crawling budget on them first.
6. Try to be as exhaustive as possible in your sitemaps. A sitemap generator, such as Xml-Sitemap can help you create them automatically.
7. Do submit your sitemaps into your Google and Bing webmaster accounts. These will provide you with feeback. If you don't have such account or if you want to submit them to other search engines, it is possible to do so directly. Once is enough.

If a sitemap contains URLs which are blocked, NOINDEX or duplicate entries, it is not an issue. Blocked URLs will not be crawled. An empty sitemap.xml is also fine.