DataStreamer runs APIs for social media, weblogs, news, video, and live web content to users for scraping jobs in any language.
DataStreamer provides full-text search APIs based on elastic search and provides advanced search facilities for quality content index.
If the user is running DataStreamer for the first time, he possibly wants to be using the Search API of DataStreamer.
DataStreamer search API allows us to search for random text strings and search with complex Boolean logic by using filters and other advanced features like aggregations. Using DataStreamer search APIs, the acquired results are then stored as ordinary JSON documents.
DataStreamer can Index weblogs, mainstream news, and social media. RSS, Atom, HTML, microformats, and microdata web formats. All Datastreamer APIs are powered by JSON for rapid implementation.
Streaming API of Datastreamer is a full streaming API that handles 95% of the data indexing requirements without coding knowledge. Just start it up and Streaming the API of Datastreamer spools JSON files to disk.
Admin Console of Datastreamer provides full visibility into the crawl. DataStreamer provides a comprehensive admin console for use by customers.
300M Sources Indexed
Indexing over 300M sources available through the Datastreamer API with a very vast coverage of social media, weblogs, mainstream news, and more.
The Classifier API of Datastreamer allows developers to submit text, URLs or labels for this content based on the machine learning platform.
DataStreamer Parser API runs ad hoc parsing and metadata management of arbitrary URLs on the website. DataStreamer performs data augmentation of the metadata including gender and sentiment detection.
DataStreamer Parser API provides API access to content on a granular basis. If a URL is not indexed yet the Datastreamer Parser API allows users to still extract the content by using Datastreamer content schema and machine learning infrastructure.
When Datastreamer index the content it performs the following operations:
Fetch the URL content, language classification, sentiment text calculation, chrome (sidebar-navigation) removal, gender detection, category classification (tech, politics, science), image analysis, etc.
Streaming API of Datastreamer is designed for bulk access to massive amounts of content.