Broken links are not only bad from the user's view, but also damages your site's search engine visibility, as Google discourages broken links and downgrades your SEO reputation accordingly. You should avoid links to broken content and also avoid having pages on your site that are not working.

404 Watch API is one of the most powerful link checkers in the market with advanced features as below.

Optionally can respect nofollow attributes on hrefs
Can check external links
Optionally can discard query parameters
Optionally can discard hash parameters
Can check images/js and css files for broken links
Can whitelist and exclude domains from checking
Can trigger a callback URL, when the link checking is done, or you may optionally poll the results for results.

Whitelisting domains

You may have multiple domains that is served from a single site, and need to treat them as internal resources. So you may add the multiple domains to thw whitelisted domains while creating the link checker job and you're done.

Excluding domains and URLs from checking

You may wish to exclude some specific domains from link checking for any reason. You just need to add the domain names to the excluded_domains_list or excluded_urls_list list variables while creating the link checker job

Callback when finished

Link checking is a time consuming job. As you may have multiple hundreds of URLs (and asssets) on your site. So we have chosen an asyncronous approach. You create a link checking job using the POST /job endpoint and get an ID in response. You may wish to poll the GET /job/{id} endpoint for watching out the ongoing link checking process or get the results for finished ones.

Optinally you may also provide a callback URL when creating the link checker job via POST /job endpoint.Doing so, you don't need to poll the GET /job endpoint for results, as the API will call provided callback URL (via HTTP POST) automatically when the process has finished.

Optinally you may also provide a callback_security variable when creating the link checker job. This variable will be placed in the HTTP header using the X-Callback-Secret header. You may check this header for authentication purposes.

Sample Request for creating a new link checker job

Below is a sample request that is done to the POST /job endpoint. It contains many of the config parameters for optimizing the link checking process.

curl --location --request POST 'https://api.apilayer.com/404_watch/job' \
--header 'Content-Type: application/json' \
--header 'apikey: YOUR API KEY' \
--data-raw '{
  "url": "https://p1.rs",
  "levels": 2,
  "fetch_external": false,
  "check_images": false,
  "check_css": true,
  "callback": "https://mydomain.com/callback",
  "callback_secret": "supersecret_key",
  "check_js": true,
  "whitelisted_domains_list": [
      "apilayer.com"
  ]
}'

When called you'll get a response such as below:

{
    "id": "c3f7b23e-a239-4af4-b9ec-698a3a6d0a21"
}

You can use this id for querying the results via GET /job/{id} endpoint.

curl --location --request GET 'https://api.apilayer.com/404_watch/job/c3f7b23e-a239-4af4-b9ec-698a3a6d0a21' \
--header 'apikey: YOUR API KEY'

The response contains comprehensive details about the ongoing process and the results. See below:

{
    "id": "c3f7b23e-a239-4af4-b9ec-698a3a6d0a21",
    "created_at": 1609681539,
    "status": "finished",
    "url": "https://apilayer.com",
    "progress": {
        "discovered": 117,
        "checked": 117,
        "percentage": 100.0
    },
    "status_codes": {
        "503": 4,
        "200": 110
    },
    "content_types": {
        "image/svg+xml": 10,
        "image/png": 25,
        "text/css": 6,
        "text/html": 60,
        "image/jpeg": 5,
        "application/javascript": 8,
        "application/x-javascript": 1
    },
    "options": {
        "callback_secret": null,
        "check_css": true,
        "max_levels": 3,
        "check_js": true,
        "max_links": 1000,
        "excluded_domains_list": [],
        "fetch_nofollow": false,
        "excluded_urls_list": [],
        "fetch_external": true,
        "whitelisted_domains_list": "assets.apilayer.com",
        "omit_query_params": false,
        "callback": null,
        "omit_hash_params": true,
        "check_images": true
    }
}

Date variables above (created_at) are timestamps.

Getting the details for each link that is checked

If you wish to get all the links that is discovered and been checked using the GET /job/{id}/links endpoint. See the following example.

curl --location --request GET 'https://api.apilayer.com/404_watch/job/c3f7b23e-a239-4af4-b9ec-698a3a6d0a21/links' \
--header 'apikey: YOUR KEY'

The response contains all the links as well as the details for content types and http status codes. You may filter and use it the way you desire.

{
    "job_id": "d0de484e-c18f-4ee8-b84e-4ba63907e283",
    "status": "finished",
    "created_at": 1609681539,
    "links": [
        {
            "url": "https://apilayer.com",
            "content_type": "text/html",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609681563
        },
        {
            "url": "https://assets.apilayer.com/apis/image_similarity.png",
            "content_type": "image/png",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609681578
        },
        {
            "url": "https://apilayer.com/marketplace/description/textgears-api",
            "content_type": "text/html",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609681592
        },
        {
            "url": "https://apilayer.com/marketplace/category/text-processing-apis",
            "content_type": "text/html",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746185
        },
        {
            "url": "https://js.hs-scripts.com/7564526.js",
            "content_type": "application/javascript",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746223
        },
        {
            "url": "https://apilayer.com/marketplace/tag/spelling",
            "content_type": "text/html",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746233
        },
        {
            "url": "https://apilayer.com/marketplace/tag/text-tools",
            "content_type": "text/html",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746256
        },
        {
            "url": "https://textgears.com/assets/img/logos/apple/120.png",
            "content_type": "image/png",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746270
        },
        {
            "url": "https://apilayer.com/assets/css/documentation.css?6",
            "content_type": "text/css",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746374
        },
        {
            "url": "https://apilayer.com/assets/js/marketplace/marketplace.js?52",
            "content_type": "application/javascript",
            "is_timeout": false,
            "http_status": 200,
            "fetched_at": 1609746577
        }
    ],
    "query": {
        "limit": 10,
        "offset": 0,
        "page": 0,
        "total_count": 117
    }
}

404 Watch API

Whitelisting domains

Excluding domains and URLs from checking

Callback when finished

Sample Request for creating a new link checker job

Getting the details for each link that is checked

Related Products