Robots and Sitemaps

In version 1.4.1 of Janeway we are introducing the generation of Robot and Sitemap files.

Sites

This document uses the word “sites” to describe the Press site, Journal site(s) and Repository site(s).

Robots

You can generate a robots.txt file for your Janeway sites by running the following management command:

python3 src/manage.py generate_robots

Running this command will generate robots.txt files. If you are using path mode it will generate a single robots file for your entire site. If you are running in domain mode it will create a top level robots.txt for sites without a domain and also an individual robots file for each site with a domain. These files are stored in src/files/robots/.

Here is an example directory where we are running in domain mode:

  • files/
    • robots/
      • journal_orbit_robots.txt
      • repo_olh_robots.txt
      • robots.txt

The build in robots view will return the correct file automatically. At this point we recommend you leave serving the robots file to Janeway, though you could configure your webserver to serve it for you.

Sitemaps

You can generate sitemap.xml files for your Janeway sites by running the following management command:

python3 src/manage.py generate_sitemaps

Running this command will generate:

  • A top level sitemap linking to:
    • Journal sitemap linking to:
      • Issue level sitemap with links to articles
    • Repository sitemap linking to:
      • Subject level sitemap with links to publications

These files are stored in src/files/sitemaps/ and the directory structure looks as follows:

  • files/
    • sitemaps/
      • sitemap.xml
      • orbit/ - Journal
        • 50_sitemap.xml - Issue
      • olh/ - Repository
        • sitemap.xml - Root sitemap, this repository is in domain mode
        • 1_sitemap.xml - Subject

Janeway has a built in view that can handle the serving of the sitemaps files but you can also configure your webserver to serve these files for you, this can be quite complex when in domain mode and may best be left to Janeway to handle however.

Custom Robots/Sitemaps

If you don’t want Janeway to serve robots or sitemap files you can configure your webserver to handle the URL routes that Janeway uses.

Cron

Generation of sitemap files needs to be regular to ensure they are up to date. Janeway’s install_cron command has been updated to install this command for you if you’re using crontab. However you will need to schedule this manually otherwise, currently we recommend you regenerate files every 30 minutes.