This article goes over all the different options for setting up web snapshots in your archive.
Checking Your Website Sitemap Compatibility
We use XML /glossarysitemaps to capture. Please check with your web host or IT team if you cannot find one for your website
Please review our Web Snapshots Overview article for more information
You must allowlist the three IP addresses we use (all AWS) for Web Snapshots:
52.23.29.34
54.235.88.205
54.84.64.101
Should your IT team prefer to whitelist a user agent, they can create a filter that allows user agents that include the text "ASWebsnapshotsUserAgent". The other information included in our user agent is dynamic and will change as we upgrade the browser we are using to capture the websites
The sitemap and URL list must match your website's domain
If you use Google Analytics, you must filter out our Web Snapshots traffic
Adding Your Website Sitemap For Capture
Log into your archive
Navigate to the Configure tab:
Click the Web Snapshots tab:
If this is your first time setting up Web Snapshots, you will need to choose an
When the page reloads, click on the Add Sitemap button:
Fill out the Add Sitemap Menu:
Note:
If you do not have a sitemap.xml or sitemap_index.xml defined for your website, you may provide the URL for an HTML page that contains links to all the URLs that you want to snapshot. This will allow Web Snapshots to automatically detect new and changed URLs as you keep your sitemap or the sitemap index up to date.
Sitemap Name: ArchiveSocial recommends the following naming convention: City of {name} Gov. Site - XML. This will allow the agency to easily identify what sites are connected and the type of sitemap being used.
Sitemap Format: Choose the format of the sitemap you are entering (we suggest XML)
Sitemap URL: the full URL for the sitemap you are adding. For example: http://example.org/path/sitemap.xml
Click Save Sitemap:
Adding Specific URLs
Navigate to the Configure tab:
Select the Web Snapshots tab:
Click Add Site URL
Enter the full URL address for the page:
Click the Save Site URL button
Dynamic Option
Dynamic content is web content that changes based on the behavior, preference, and general interest of a site visitor. This content can be found on websites and in email content and is generated when a user accesses a page. The content is often personalized and what is displayed is based on the data a site has for a user and the time of access. The primary use of this content is to deliver a more positive experience for the end-user.
By default, the Dynamic option for Web Snapshots is turned off. Web Snapshots will detect changes to a page on a site using the XML sitemap and record the change in the archive once a day.
However, if Dynamic is enabled for a sitemap or URL, any widgets on the site or page (such as weather updates, a blog on the website, or a calendar of events), they will be captured once daily for all pages where the widgets appear. For example, if your website has 100 pages and 12 of these pages have a dynamic widget, Web Snapshots will capture those 12 pages once per day as the widgets update.
Note:
Note that enabling the Dynamic option could lead to exceeding record limits each month.
Turning Off Specific URLs
Navigate to the Configure tab:
Select the Web Snapshots tab:
For the URL you wish to turn off, click the gear icon under the Action section:
Toggle the Archiving switch to OFF:
Click the Save button: