Broken links are not only bad from the user's view, but also damages your site's search engine visibility, as Google discourages broken links and downgrades your SEO reputation accordingly. You should avoid links to broken content and also avoid having pages on your site that are not working.
404 Watch API is one of the most powerful link checkers in the market with advanced features as below.
- Optionally can respect nofollow attributes on hrefs
- Can check external links
- Optionally can discard query parameters
- Optionally can discard hash parameters
- Can check images/js and css files for broken links
- Can whitelist and exclude domains from checking
- Can trigger a callback URL, when the link checking is done, or you may optionally poll the results for results.
Whitelisting domains
You may have multiple domains that is served from a single site, and need to treat them as internal resources. So you may add the multiple domains to thw whitelisted domains while creating the link checker job and you're done.
Excluding domains and URLs from checking
You may wish to exclude some specific domains from link checking for any reason. You just need to add the domain names to the excluded_domains_list or excluded_urls_list list variables while creating the link checker job
Callback when finished
Link checking is a time consuming job. As you may have multiple hundreds of URLs (and asssets) on your site. So we have chosen an asyncronous approach. You create a link checking job using the POST /job endpoint and get an ID in response. You may wish to poll the GET /job/{id} endpoint for watching out the ongoing link checking process or get the results for finished ones.
Optinally you may also provide a callback URL when creating the link checker job via POST /job endpoint.Doing so, you don't need to poll the GET /job endpoint for results, as the API will call provided callback URL (via HTTP POST) automatically when the process has finished.
Optinally you may also provide a callback_security variable when creating the link checker job. This variable will be placed in the HTTP header using the X-Callback-Secret header. You may check this header for authentication purposes.
Sample Request for creating a new link checker job
Below is a sample request that is done to the POST /job endpoint. It contains many of the config parameters for optimizing the link checking process.
curl --location --request POST 'https://api.apilayer.com/404_watch/job' \
--header 'Content-Type: application/json' \
--header 'apikey: YOUR API KEY' \
--data-raw '{
"url": "https://p1.rs",
"levels": 2,
"fetch_external": false,
"check_images": false,
"check_css": true,
"callback": "https://mydomain.com/callback",
"callback_secret": "supersecret_key",
"check_js": true,
"whitelisted_domains_list": [
"apilayer.com"
]
}'
When called you'll get a response such as below:
{
"id": "c3f7b23e-a239-4af4-b9ec-698a3a6d0a21"
}
You can use this id for querying the results via GET /job/{id} endpoint.
curl --location --request GET 'https://api.apilayer.com/404_watch/job/c3f7b23e-a239-4af4-b9ec-698a3a6d0a21' \
--header 'apikey: YOUR API KEY'
The response contains comprehensive details about the ongoing process and the results. See below:
{
"id": "c3f7b23e-a239-4af4-b9ec-698a3a6d0a21",
"created_at": 1609681539,
"status": "finished",
"url": "https://apilayer.com",
"progress": {
"discovered": 117,
"checked": 117,
"percentage": 100.0
},
"status_codes": {
"503": 4,
"200": 110
},
"content_types": {
"image/svg+xml": 10,
"image/png": 25,
"text/css": 6,
"text/html": 60,
"image/jpeg": 5,
"application/javascript": 8,
"application/x-javascript": 1
},
"options": {
"callback_secret": null,
"check_css": true,
"max_levels": 3,
"check_js": true,
"max_links": 1000,
"excluded_domains_list": [],
"fetch_nofollow": false,
"excluded_urls_list": [],
"fetch_external": true,
"whitelisted_domains_list": "assets.apilayer.com",
"omit_query_params": false,
"callback": null,
"omit_hash_params": true,
"check_images": true
}
}
Date variables above (created_at) are timestamps.
Getting the details for each link that is checked
If you wish to get all the links that is discovered and been checked using the GET /job/{id}/links endpoint. See the following example.
curl --location --request GET 'https://api.apilayer.com/404_watch/job/c3f7b23e-a239-4af4-b9ec-698a3a6d0a21/links' \
--header 'apikey: YOUR KEY'
The response contains all the links as well as the details for content types and http status codes. You may filter and use it the way you desire.
{
"job_id": "d0de484e-c18f-4ee8-b84e-4ba63907e283",
"status": "finished",
"created_at": 1609681539,
"links": [
{
"url": "https://apilayer.com",
"content_type": "text/html",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609681563
},
{
"url": "https://assets.apilayer.com/apis/image_similarity.png",
"content_type": "image/png",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609681578
},
{
"url": "https://apilayer.com/marketplace/description/textgears-api",
"content_type": "text/html",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609681592
},
{
"url": "https://apilayer.com/marketplace/category/text-processing-apis",
"content_type": "text/html",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746185
},
{
"url": "https://js.hs-scripts.com/7564526.js",
"content_type": "application/javascript",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746223
},
{
"url": "https://apilayer.com/marketplace/tag/spelling",
"content_type": "text/html",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746233
},
{
"url": "https://apilayer.com/marketplace/tag/text-tools",
"content_type": "text/html",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746256
},
{
"url": "https://textgears.com/assets/img/logos/apple/120.png",
"content_type": "image/png",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746270
},
{
"url": "https://apilayer.com/assets/css/documentation.css?6",
"content_type": "text/css",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746374
},
{
"url": "https://apilayer.com/assets/js/marketplace/marketplace.js?52",
"content_type": "application/javascript",
"is_timeout": false,
"http_status": 200,
"fetched_at": 1609746577
}
],
"query": {
"limit": 10,
"offset": 0,
"page": 0,
"total_count": 117
}
}