Refreshen
About RefreshenBot
User-agent: RefreshenBot/1.0 (+https://refreshen.io/bot)
What it does
RefreshenBot fetches publicly-accessible pages from blogs that their owners have signed up to monitor with Refreshen. We look for stale year references, broken external links, and missing internal links.
We do not crawl the web at large. Every fetch is initiated by a customer who has provided a sitemap URL for a domain they own.
What it fetches
/robots.txt— first, before anything else.- The customer-supplied sitemap (e.g.
/sitemap.xml). - The HTML of pages matching the path patterns the customer chose to monitor.
- Linked external URLs, when checking for broken links — only an
HTTP HEAD(orGETif HEAD is not supported), no body fetch beyond status codes.
How often
We scan each customer's site approximately once per week (Pro tier; Free is monthly, Business is daily). Within a scan we're polite: at most one request per second per domain, and we honour any Crawl-delay directive in your robots.txt.
How to block us
Add this to your robots.txt:
User-agent: RefreshenBot Disallow: /
We also honour the wildcard User-agent: *rules. Disallowed paths are skipped and surfaced as a warning to the account that requested the scan, so the owner knows we couldn't check that path.
Contact
Questions, complaints, or want to confirm a fetch was real? Email tommy@refreshen.io. If you believe an account is using Refreshen to scan a domain they don't own, report it at /report-scanning.