How to Tune URL Checking for Large Repos

Goal: reduce rate-limiting and access warnings when running check_broken_urls (or check_urls_locale/check_urls_tracking) against a large repository with many links.

Recognise the warnings

Warnings never fail the run (only broken links do), but a large number of them usually means the host is throttling or blocking the checker rather than the links actually being broken:

⚠ 3 links had warnings:
    File 'docs/index.md', line 12
https://example.com/api was skipped due to rate limiting.

    File 'docs/index.md', line 15
https://example.com/a/123 could not be verified (access was forbidden by the server).

See Exit Codes and Issue Severity for what each warning status means, and How URL Checking Works for why they happen.

Slow down requests to a single host

If many links point at the same host, increase the pacing delay between requests to it:

markdown-checker . -f check_broken_urls --per-host-delay=1.0

Reduce concurrency

Fewer concurrent workers means fewer simultaneous requests overall, including across different hosts:

markdown-checker . -f check_broken_urls --max-workers=4

In GitHub Actions, --max-workers defaults to the number of available CPUs rather than 10, which can be more aggressive on larger runners - pass an explicit value to override it.

Adjust retries and timeouts

# Wait longer per request, retry hard failures more times
markdown-checker . -f check_broken_urls --timeout=30 --retries=5

# Wait longer before retrying a 429 with no Retry-After header
markdown-checker . -f check_broken_urls --fallback-retry-delay=60

Give up on a host entirely

If a domain consistently blocks automated requests (e.g. it always returns 403), stop checking it rather than tuning around it - see How to Skip Domains and URLs.

Verify it worked

Re-run the check and compare the warning count in the “N links had warnings” summary line before and after your change.