The Apify SDK is offered as the Apify NPM package offering a lot of scraping tools. As a basic crawler, Apify SDK provides a framework for parallel crawling of static or dynamic URLs. Cheerio crawler enables the parallel crawling of web pages using a cheerio HTML parser. This is the most efficient web crawler, but it does not work on JavaScript websites. Puppeteer crawler enables the parallel web page crawling using the headless Chrome browser and Puppeteer. The chrome browser pool is automatically scaled up and down. Puppeteer pool provides web browser tabs jobs from an automatically-managed pool with configurable browser recycling. It reuses the disk cache to speed up the crawling of websites by reducing proxy bandwidth.

Apify SDK request list signifies the URLs to crawl. The URLs can be accepted in code or in a text file hosted on the web. The crawling is resumed on restarting the Node.js process restarts. The Apify SDK request queue denotes a queue of URLs to crawl stored on a local file system or in the Apify cloud. The queue is used for deep crawling of websites with several URLs and links to other pages. The data structure supports breadth-first and depth-first crawling orders. The Apify SDK dataset provides a store for structured scraped data and export to formats like JSON, JSON, CSV, XML, Excel or HTML. The structured extracted data is stored on a local file system or in the Apify Cloud. Datasets store and share bulk tabular crawling results.

The Apify SDK key-value collects arbitrary data records or files, along with MIME content type. It is the ultimate choice for saving web page screenshots, PDFs and the state of crawlers. This data is also saved on a local file system or in the Apify Cloud. The autoscaled pool runs asynchronous background tasks automatically adjusting the concurrency depending on free system memory and CPU usage. This is useful for data scraping tasks at the maximum capacity of the system. Puppeteer Utils provide several helper functions for web scraping to inject jQuery into web pages or to hide browser origin. Apify SDK NPM package provides several helper functions to run code on the Apify Cloud taking advantage of the pool of proxies, job scheduler and data storage, etc.

