Posts

Showing posts from March, 2020

Tweets Analysis with Python and NLP

Image
Introduction You should be already familiar with the concepts of NLP from our previous post , so today we'll see more useful case of analysis the tweets and classifying them into marketing and non-marketing tweets. We won't get into details of tweets retrieval, this can be done with various packages with Tweepy being the most popular one. Baseline For the purpose of the discussion we already have 2 sets of tweets separated into files and are uploaded into GitHub folder . First we download the datasets, add target column as 1 for marketing tweets and unite the datasets. Then we'll check the baseline classification results, without any pre-processing. We do this so later we could understand whether our changes improve the metrics. We'll be using Random Forest for classification, since it doesn't expect linear features or even features that interact linearly and it can handle very well high dimensional spaces as well as large number of training examples. Plu