Web Snapshots Overview

Prev Next

The Web Snapshots add-on allows agencies to capture their website records in their archive each time pages on the site are updated. Using an agency's XML or HTML sitemap, changes are captured and stored, becoming searchable next to social media records.

Important Note

The Social Media Archiving solution recommends using an XML sitemap that includes updated last modified dates when setting up Web Snapshots to more accurately capture changes to a website.


FAQ

What is an XML sitemap?

A XML sitemap is a file that lists URLs for a site along with additional metadata about each URL. It is an easy way for webmasters to inform search engines about pages on their sites and is used by search engines to more intelligently crawl the site. XML sitemaps are used by the Social Media Archiving to detect newly added or removed URLs and to use the last updated date for meaningful versioning.

These sitemaps should include:

  • URL

  • Location: Absolute, not relative, must begin with the protocol (such as HTTP) and end with a trailing slash

  • When it was last updated

  • How often it usually changes

  • How important it is compared to other URLs on the site

What is an HTML sitemap?

An HTML sitemap is built to help humans navigate around the site. It is a collection of links, but it does not have any more information about the links like the XML sitemap does. It is not recognized by search engines as a sitemap with a valid format. Social Media Archiving can detect URLs from the sitemap, but cannot rely on it for HTML versioning information.

The sitemap needs to reside in the same domain and in the parent folder for all the other URLs for the site to which it refers.

How do I find my sitemap?

  • There are many free tools available to identify your agency's sitemap. The XML Sitemap Validator is a suggested tool

  • If you have a Website Administrator they should know where to find the proper sitemap