Skip to content

sitemap.xml

The repository sitemap.xml should list all the url:s for the latest version of each dataset in the catalogue:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi="http://www.w3.org/2001/XMLSchema-instance" schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://example.org/catalogue/dataset/ds-1</loc>
  </url>
  <url>
    <loc>https://example.org/catalogue/dataset/ds-2</loc>
  </url>
</urlset>

Example sitemap.xml

More information about sitemap.xml can be found on the official webpage: sitemaps.org

On the landing page for each dataset the resource metadata should be availible as schema.org json-ld.

robots.txt

The repository should implement a sitemap.xml endpoint to provide webcrawlers to find and index the landing page for each dataset. The url to sitemap.xml should be added to robots.txt for example:

# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/robotstxt.html

Sitemap: https://example.org/catalogue/sitemap.xml
Crawl-delay: 10

References