80legs is a unique Data Extraction tool to serve Startups, SMEs. 80legs provides end-to-end data extraction solutions. The salient key Features of 80legs are image extraction, disparate data collection, email address extraction, phone number extraction, and web data extraction.
80legs can crawl over 2 billion webpages a day using Plura grid of over 50,000 computers.
The service can be retrieved by setting up and executing a job. For the crawling procedure, the job needs a seed list by a text file up to 1 GB in size. Other job parameters are as follows:
Outgoing links are used to specify which links to crawl. URL depth level to measure a seed. Crawling type. Number of URLs to crawl. MIME types that specify the page types to crawl. Analyze options like keyword matching, regular expressions, running custom code, etc.
When a job runs, the crawler starts reading and analyzing the content of web pages starting with the seed ones and considering the outgoing links options. By specifying keywords simple analysis is available but the complex analysis is accomplished by using pre-built 80legs application. The analysis application is written in Java. 80legs plans to open an application store for developers to sell their applications.
80legs paid subscription gives access to a Python API to interact with the crawling engine.