For reverse-proxy implementations SiteSpect identifies requests from robots and crawlers using a known list outlined on passthrough and Automatic Robot Detection (ARD) that excludes them from any Campaign data. If you are using the SiteSpect Engine API then the approach will differ and you can find more information on this use case here.
ARD is a SiteSpect feature that provides very accurate detection of robots, crawlers, and other User-Agents that do not explicitly identify themselves as such. These are often referred to as cloaked robots since they masquerade as official browsers, and are not filtered by SiteSpect's explicit Pass-Through settings.
To use the ARD feature, which requires system administrator privileges:
- Select Site, Configuration, Site Settings.
- Select the User Tracking tab and scroll to Automatic Robot Detection settings.
Robot Detection Location
- Prepend to top of page – This is the recommended (and default) setting, and accurately filters robots while still capturing even those users who quickly enter/exit a site (i.e., a bounce). The test code is inserted just after the opening of the <head> tag where available, or at the top of the page otherwise.
- Append to bottom of page – This accurately filters robots and most rapid bounces where a human user clicks to a site, views only one page, then quickly presses their "back button" to return to the referring page (typically a search engine). You may want to use this setting if you do not want rapid bounces counted towards your Campaign data. The test code is inserted just before the end of the tag where available, or at the end of the page otherwise.
- Append to absolute bottom of page – Similar to the prior option, but only attempts to append the test code to the end of the page. This option is recommended only for Sites where the prior two options are problematic.
- Off – ARD is disabled, and only those robots that are explicitly filtered by pass-through settings or HTTP Request Exclusions are caught. When ARD is disabled, its settings lower on this page are not visible.
Robot Detection Method
Select the method you want to use to inject SiteSpect's Automatic Robot Detection onto the page:
- new Image() – This is a non-blocking method for injecting code onto the page using a new HTMLImageElement instance. While it does not block the drawing of the page or subsequent HTTP requests, it can block/delay the load event on the window.
- document.write() – We do not recommend this choice, since it is a blocking injection and will be deprecated soon.
Disable Robot Detection Header Name and Value
IAB/ABC International Spiders and Bots List
In addition to the pass-through list and Automatic Robot Detection (ARD) for managing bots SiteSpect also supports bot detection through a managed list from IAB. The level of IAB detection required will depend on your SiteSpect configuration so please speak to your SiteSpect Consultant, Account Manager or email email@example.com to discuss enabling this.