Refreshen

About RefreshenBot

User-agent: RefreshenBot/1.0 (+https://refreshen.io/bot)

What it does

RefreshenBot fetches publicly-accessible pages from blogs that their owners have signed up to monitor with Refreshen. We look for stale year references, broken external links, and missing internal links.

We do not crawl the web at large. Every fetch is initiated by a customer who has provided a sitemap URL for a domain they own.

What it fetches

/robots.txt: first, before anything else.
The customer-supplied sitemap (e.g. /sitemap.xml).
The HTML of pages matching the path patterns the customer chose to monitor.
Linked external URLs, when checking for broken links: only an HTTP HEAD (or GET if HEAD is not supported), no body fetch beyond status codes.

How often

We scan each customer's site approximately once per week (Pro tier; Free is monthly, Business is daily). Within a scan we're polite: at most one request per second per domain, and we honour any Crawl-delay directive in your robots.txt.

How to block us

Add this to your robots.txt:

User-agent: RefreshenBot
Disallow: /

We also honour the wildcard User-agent: *rules. Disallowed paths are skipped and surfaced as a warning to the account that requested the scan, so the owner knows we couldn't check that path.

Contact

Questions, complaints, or want to confirm a fetch was real? Email tommy@refreshen.io. If you believe an account is using Refreshen to scan a domain they don't own, report it at /report-scanning.