Download Scraper API software

Scraper API is designed to perform web scraping tasks. Here are a few things to consider before getting started:

In Scraper API each request will be retried until it can be successfully completed (up to 60 seconds). Remember to set a timeout to 60 seconds to ensure this scraping process goes smoothly. In cases where every request fails in 60 seconds, it will return a 500 error, you may retry the request and it will not be charged for the unsuccessful request (it is only charged for successful requests, 200 and 404 status codes). Make sure to catch these types of errors! They occur on roughly 1-2% of requests for difficult to scrape websites. With Scraper API you can scrape images, PDFs or files just as any other URL, just remember that there is a 2MB limit per request.

Image result for scraper api"

If the plan exceeds the concurrent connection limit, the API will respond with a 429 status code, this can be solved by slowing down the request rate.

There is no overage allowed on the free plan, if requests are exceeded by 1000 requests per month on a free plan, a 403 error will appear.

Each request will be returned with a string containing the raw HTML code from the page requested, along with headers and cookies.

Scraper API depicts a single API endpoint. Simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains API key, and URL which contains the URL you want to scrape.

If you are crawling a page that requires to render the JavaScript on the page, using a headless browser these pages can be fetched. This feature is available for only the Business and Enterprise plans. To extract JavaScript, simply set render=true and Scraper API will use a headless Google Chrome instance to fetch the page:

If you would like to preserve the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep headers=true. Use this feature to get customized results. Do not use this feature to avoid blocks, we handle that internally.

To reuse the same proxy for multiple requests, simply use session_number= flag (e.g. session_number=123). The value of the session can be an integer, simply send an integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). These sessions expire 60 seconds after the last usage.

Image result for scraper api"

To ensure the requests come from the United States, please use the country_code= flag (e.g. country_code=us). United States (us) and the European Union (EU) geo-targeting are available on the Startup plan and higher plans. Business plan customers have access to Canada (ca), United Kingdom (UK), Germany (de), France (Fr), Spain (es), Brazil (BR), Mexico (MX), India (in), Japan (jp), China (CN), and Australia (AU). Other countries are available to Enterprise customers upon request.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Translate »