Generate sitemaps and robots.txt

Generate XML sitemaps and robots.txt files, hooked into Parklife’s build process. After all HTML pages are built, the sitemap generator discovers them, assigns priorities, and provides accurate last modification dates using Git timestamps.

Add Parklife after build hook

Parkfile
...

Parklife.application.after_build do |application|
  sitemap = Sitemap.new(base_url: application.config.base, build_dir: application.config.build_dir, generate_robots: true)

  return Rails.logger.error("Error generating sitemap: #{sitemap.errors.full_messages.join(', ')}") unless sitemap.valid?

  sitemap.generate!
end

...

Create the sitemap models

Remove default robots.txt

Delete the default robots.txt file from the public/ folder.

[temporary] Install Parklife edge version

At the time of this guide, the build hooks are not included in the latest Parklife release.

Gemfile
...
# TODO: 30jul25 - switch back to gem release once PR #124 is included (re-introduces build callbacks)
#       https://github.com/benpickles/parklife/pull/124
gem "parklife", github: "benpickles/parklife"
...
bundle

Implementation Details

Generated Files

The system generates three files in your build directory:

Sitemap Features

Automatic Discovery

The sitemap automatically discovers all HTML files in the build directory, excluding error pages (404.html, 500.html, etc.).

Priorities

Pages are assigned priorities based on their URL structure:

Last Modification Dates

To provide accurate last modification dates, the system attempts to find the original source file for each HTML page. Source files are discovered using CONTENT_PATTERNS:

The timestamp strategy depends on whether a source file is found:

  1. Git history - Last commit date for the source file
  2. File system - File modification time if the file isn’t tracked in Git
  3. Current time - Used when the source file was not found

Change Frequency

All pages are marked with a “monthly” change frequency by default.

Model Architecture

The implementation follows Rails conventions with a main Sitemap model and supporting classes:

Robots.txt Content

The generated robots.txt file includes:

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml.gz
Sitemap: https://example.com/sitemap.xml

Both compressed and uncompressed sitemap URLs are included for maximum compatibility.

Configuration

The sitemap generator requires:

Error pages defined in ERROR_PAGES constant are automatically excluded from the sitemap.


Commit: Sitemap and Robots


What next?

Optimize metadata for crawlers