Scraper API (Lite Edition)

Scraper API (Lite Edition)

Scrape websites bypassing rate limitations. Ability to simulate originating IP from any country. Fast and simple.
Free Plan $0.00 Monthly Subscribe
1,500 Requests / Monthly
Free for Lifetime
No Credit Card Required
MOST POPULAR
Starter Plan $8.99 Monthly Subscribe
150,000 Requests / Monthly
Standard Support
Ability to fetch any website
Ability to bypass rate limits
Supports both GET and POST requests
Ability to choose originating country
Ability to set Cookies
Supports HTTP Authentication
Pro Plan $39.99 Monthly Subscribe
1,500,000 Requests / Monthly
Standard Support
Ability to fetch any website
Ability to bypass rate limits
Supports both GET and POST requests
Ability to choose originating country
Ability to set Cookies
Supports HTTP Authentication
Custom Plan Volume Monthly Contact Us
Any requests volume you need
Ability to fetch any website
Ability to bypass rate limits
Supports both GET and POST requests
Ability to choose originating country
Ability to set Cookies
Supports HTTP Authentication

Scraper API helps you to scrape any website bypassing all rate limitations. Ability to simulate originating IP from any country. Fast and simple.

Web scraping involves complex routines like simulating a browser's behavior and dealing with rate limits of the destination site. Our Scraper API allows many features of simulating a desktop browser's behavior by giving you the ability of:

  • Setting the referrer and user-agent,
  • Selecting the originating country where we fire the HTTP request,
  • Adding cookies to the request header,
  • Setting any HTTP header (even the non-standard ones),
  • Setting HTTP Auth username and password.
  • APILayer's Scraper API is one of the versatile and affordable ways to collect any data from the web.

Important!: This API does not simulate a browser and render the results, so it can't scrape many client heavy pages and also bypass some bot prevention tools. Check out our Advanced Scraper API built specifically for this purpose. This API is intended to be a cost efficient alternative for basic scraping needs.

How to use CSS Selectors?

When a remote web page is fetched by default, whole HTML is returned as String. If you wish us to parse the HTML automatically and just return a specific portion of the data you can set a selector parameter and the API will parse the HTML and just return the desired info. See the following example:


curl --location \
--request GET 'https://api.apilayer.com/scraper?url=apilayer.com&selector=%23logoAndNav%20a.navbar-brand' \
--header 'apikey: API KEY'

Please note that the selector parameter is URLEncoded since some CSS selectors (such as # character) may confuse the URL parameters. The result for this query is below. Please note that not the whole apilayer.com homepage is fetched. Instead just the A tag with the logo exists in the returning data, thanks to the #logoAndNav a.navbar-brand CSS selector.


{
    "data-selector": [
        "<a class=\"navbar-brand\" href=\"/index\">\n <img src=\"https://.../assets/logo/logo.png\"/>\n</a>\n"
    ],
    "headers": {
        "Date": "Sun, 06 Sep 2020 09:48:32 GMT",
        "Content-Type": "text/html; charset=utf-8"
    },
    "url": "http://apilayer.com",
    "selector": "#logoAndNav a.navbar-brand"
}

Image files scraping

The Scraping API is capable of fetching the image files and returning them back to you. Just point the url to an image file and see for yourself. This is one of the most powerful features of this API, which is quite rare among our competitors.

Scraping Image Files

Although it can also download JSON files, TXT files and other text formats. it doesn't support application/octet-stream and any other binary formats because of security and scalability concerns.

Rotating IP addresses

We use anonymizer proxy servers, as well as our own infrastructure to change the IP addresses, as well as the HTTP request header information each time you make a new request. We utilize more than 1 million "data center" IP addresses from over 100 countries to route your request through.

There are many reasons why you need this API for web scraping:

  • It helps you to overcome IP fingerprinting and rate-limiting problems
  • It saves you from getting your original IP banned due to high volume of requests
  • Ability of setting originating country allows you to see geography specific content

Setting Custom HTTP Headers

You may wish to set your custom HTTP Headers with your request and our Scraper API lets you do so. You can set any header by just prefixing X- to the name of the header and API will remove the X- prefix and pass it to the remote site. For example if you wish to set your custom User-Agent, Referer and Content-Type take the following example (if nothing is set, we auto generate these headers)


curl --location --request GET 'https://api.apilayer.com/scraper?url=apilayer.com' \
--header 'X-Content-Type: application/json' \
--header 'X-User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0' \
--header 'X-Referer: https://www.google.com' \
--header 'apikey: YOUR_APILAYER_API_KEY'

As you see Content-Type, User-Agent and Referer headers are prefixed by a X- string. You can set any header as you wish, such as Cookie, Api Key, Languages or anything you desire. Take a look at a full list of HTTP Headers on Mozilla site.

Can I scrape Google Search Results with it?

This API is not triggering a headless browser in memory, so it is not suitable for Google Search results scraping. That is a much more complicated task then just asking for HTML content and parsing it. Check out this blog post on doing it yourself or simply check out the Google Search Results API that was built just for this purpose.

Google Search Results API will fetch and parse the results in JSON format, from any country with no worries and even fetch the ads!

Also you may wish to check out our Advanced Scraper API which can simulate a real browser, render the web pages (Angular, React, Vue is also supported), execute any JS code and return its results.

For all other programming languages see the documentation tab for more information.

Privacy

Except for obvious legal purposes, we do not store any information on our servers. We'll just work as a proxy and we will never inject any code in response nor interfere with your data in any way. We'll just take the scraping request, fetch data from the remote server and pass it back to you in JSON format. That's it!

Common areas web scraping used for:

Every individual person or business owner has their own reasons to use the data that is gathered but below is the common areas where web scraping plays important role:

  • Price monitoring website for comparison
  • E-commerce: Competitor monitoring, Market analysis
  • Collecting stock market data
  • Real Estate Listing
  • Machine Learning : to supply a wide variety of data to train and test your model.
  • Brand protection
  • Market research
  • Lead generation

Is web scraping legal

Web Scraping is in the legal-gray area and it all depends if you have the right intentions to use the scraped data. Most of the data on the websites are open for public consumption and can be copied by other parties unless otherwise stated in the website copyright and term of use statements. If the website you try to scrape has many authentication methods (blocking IPs, captchas..) to prevent scraping, you should respect it. Being ethical is important.

What is web scraping?

In the simplest terms, web scraping is the process of extracting information from website(s). With the help of web scraper, you retrieve high volumes of data from the web and transform it into structured data where later can be accessed for further analysis.

Related Articles

Scraper API (Lite Edition) Reference

This API is organized around REST. Our API has predictable resource-oriented URLs, accepts form-encoded request bodies, returns JSON-encoded responses, and uses standard HTTP response codes, authentication, and verbs.

Just Getting Started?

Check out our development quickstart guide.

Authentication

Scraper API (Lite Edition) uses API keys to authenticate requests. You can view and manage your API keys in the Accounts page.

Your API keys carry many privileges, so be sure to keep them secure! Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, and so forth.

All requests made to the API must hold a custom HTTP header named "apikey". Implementation differs with each programming language. Below are some samples.

All API requests must be made over HTTPS. Calls made over plain HTTP will fail. API requests without authentication will also fail.

Endpoints

Scraper Http Get

Parameters

url (required)

URL to scrape

Location: Query, Data Type: string

auth_password (optional)

Optional: HTTP Realm auth password

Location: Query, Data Type: string

auth_username (optional)

Optional: HTTP Realm auth username

Location: Query, Data Type: string

country (optional)

Optional: 2 character country code. If you wish to scrape from an IP address of a specific country, set it here.

Location: Query, Data Type: string

selector (optional)

Optional: CSS selector (URLEncoded) Ex: a.navbar-brand

Location: Query, Data Type: string

** A word enclosed with curly brackets "{ }" in the code means that it is a parameter and it should be replaced with your own values when executing. (also overwriting the curly brackets).
Returns

Below is a sample response from the endpoint


If you wish to play around interactively with real values and run code, see...

Scraper Http Post

Parameters

url (required)

URL to scrape

Location: Query, Data Type: string

auth_password (optional)

Optional: HTTP Realm auth password

Location: Query, Data Type: string

auth_username (optional)

Optional: HTTP Realm auth username

Location: Query, Data Type: string

body (optional)

Optional: HTTP Body as a Dictionary Ex: {"name": "test", "email": "[email protected]"}

Location: Body, Data Type: string

country (optional)

Optional: 2 character country code. If you wish to scrape from an IP address of a specific country, set it here.

Location: Query, Data Type: string

selector (optional)

Optional: CSS selector (URLEncoded) Ex: a.navbar-brand

Location: Query, Data Type: string

** A word enclosed with curly brackets "{ }" in the code means that it is a parameter and it should be replaced with your own values when executing. (also overwriting the curly brackets).
Returns

Below is a sample response from the endpoint


If you wish to play around interactively with real values and run code, see...

Rate Limiting

Each subscription has its own rate limit. When you become a member, you start by choosing a rate limit that suits your usage needs. Do not worry; You can upgrade or downgrade your plan at any time. For this reason, instead of starting with a larger plan that you do not need, we can offer you to upgrade your plan after you start with "free" or "gold plan" options and start using the API.

When you reach a rate limit (both daily and monthly), the service will stop responding and returning the HTTP 429 response status code (Too many requests) for each request with the following JSON string body text.

{
"message":"You have exceeded your daily\/monthly API rate limit. Please review and upgrade your subscription plan at https:\/\/apilayer.com\/subscriptions to continue."
}

A reminder email will be sent to you when your API usage reaches both 80% and 90%, so that you can take immediate actions such as upgrading your plan in order to prevent your application using the API from being interrupted.

You can also programmatically check your rate limit yourself. As a result of each request made to the APILayer, the following 4 fields provide you with all the necessary information within the HTTP Headers.

x-ratelimit-limit-month: Request limit per month
x-ratelimit-remaining-month: Request limit remaining this month
x-ratelimit-limit-day: Request limit per day
x-ratelimit-remaining-day: Request limit remaining today

You can contact our support unit if you need any assistance with your application regarding to handle the returned result by looking at the header information.

Error Codes

APILayer uses standard HTTP response codes to indicate the success or failure of an API request. In general: Codes in the 2xx range indicate success. Codes in the 4xx range indicate a clientside error, which means that failed given the information provided (e.g., a missing parameter, unauthorized access etc.). Codes in the 5xx range indicate an error with APILayer's servers (normally this should'nt happen at all).

If the response code is not 200, it means the operation failed somehow and you may need to take an action accordingly. You can check the response (which will be in JSON format) for a field called 'message' that briefly explains the error reported.

Status Code Explanation
400 - Bad Request The request was unacceptable, often due to missing a required parameter.
401 - Unauthorized No valid API key provided.
404 - Not Found The requested resource doesn't exist.
429 - Too many requests API request limit exceeded. See section Rate Limiting for more info.
5xx - Server Error We have failed to process your request. (You can contact us anytime)

You can always contact for support and ask for more assistance. We'll be glad to assist you with building your product.