Natural Language Processing with Python
Introduction Natural language processing, or NLP, is a process of analyzing the text and extracting insights from it. It is used everywhere, from search engines such as Google or Bing , to voice interfaces such as Siri or Cortana . The pipeline usually involves tokenization , replacing and correcting words, part-of-speech tagging , named-entity recognition and classification. In this article we'll be describing tokenization, by using a full example from Kaggle notebook . The full code can be found on GitHub repository . Installation For the purposes of NLP, we'll be using NLTK Python library, a leading platform to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries. Installing the package is easy using the Python p...