Use this component when you wish to acquire data from other sources or extract structured data from text. Most tools in this component include data cleaning components to, for example, detect and/or correct inconsistent data.
FrameIt is a system for creating custom frames for text corpora. FrameIt uses Python3 + Spacy2.
Some features include:
– Intent detection for individual sentences using a CNN model
– Entity extraction paired with intents using either CNN or heuristic models
– SRL system allows for loading multiple Frames for intent detection simultaneously, allowing for the differentiation of similar domains
– Easy to train and customize using jupyter notebooks
Usagi is an open source platform to build data discovery systems. Usagi crawls and extracts metadata about datasets and builds catalogs and indices to make datasets discoverable by search and browsing.
The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.
PDFminer is Python package for extracting information from PDF files into text.
PDFminer includes a tool that can convert PDF files into HTML in addition to text.
Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get the quotes people said, etc.
Koko is an information extraction tool (developed in Python 3) that allows users to query a text corpus and extract those entities that is of interest to them.
Google Cloud Natural Language API provides developers with access to Google-powered, machine learning-based text analysis components such as sentiment analysis, entity recognition, and syntax analysis.
NLTK is an open-source platform for building Python programs to process human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. NLTK also provides wrappers for industrial-strength NLP libraries.
Requests is a HTTP library for Python that provides the necessary apis to scrap websites. Requests can make complex requests to visit a page and get content, such as those requiring additional headers, complex POST data, or authentication credentials.
This library brings the Google Maps API Web Services to your Python application. Analytics
The Python Client for Google Maps Services is a Python Client library for the following Google Maps APIs: Directions API Distance Matrix API Elevation API Geocoding API Geolocation API Time Zone API Roads API Places API