Apify SDK request list signifies the URLs to crawl. The URLs can be accepted in code or in a text file hosted on the web. The crawling is resumed on restarting the Node.js process restarts. The Apify SDK request queue denotes a queue of URLs to crawl stored on a local file system or in the Apify cloud. The queue is used for deep crawling of websites with several URLs and links to other pages. The data structure supports breadth-first and depth-first crawling orders. The Apify SDK dataset provides a store for structured scraped data and export to formats like JSON, JSON, CSV, XML, Excel or HTML. The structured extracted data is stored on a local file system or in the Apify Cloud. Datasets store and share bulk tabular crawling results.
The Apify SDK key-value collects arbitrary data records or files, along with MIME content type. It is the ultimate choice for saving web page screenshots, PDFs and the state of crawlers. This data is also saved on a local file system or in the Apify Cloud. The autoscaled pool runs asynchronous background tasks automatically adjusting the concurrency depending on free system memory and CPU usage. This is useful for data scraping tasks at the maximum capacity of the system. Puppeteer Utils provide several helper functions for web scraping to inject jQuery into web pages or to hide browser origin. Apify SDK NPM package provides several helper functions to run code on the Apify Cloud taking advantage of the pool of proxies, job scheduler and data storage, etc.