About

Sentiment Analysis is a field of natural language processing that seeks to use machine learning techniques to determine sentiment scores for a body of text. The idea is to determine the polarity of the phrase or sentence as negative, neutral or positive. In some cases to determine if a statement is objective or subjective.

Our dataset is from the Kaggle website, "it contains 1.6 million tweets extracted using the twitter api". Since this a very large dataset, we will be using pandas_profiling for quick data analysis. We also expect longer runtimes for some of the codes.

The polarity of the tweets are annotated as 0 = negative and 4 = positive.

To perform the sentiment analysis, the vaderSentiment library will be used. It was created specifically for analyzing sentiments expressed in seocial media. VADER stands for Valence Aware Dictionary and Sentiment Reasoner.

With the vaderSentiment library, we are going to reclassify polarity as negative, neutral and negative sentiment. Typical thresholds from the vaderSentiment GitHub Page as follows:

Sentiment	Compound
Positive	>= 0.05
Neutral	> -0.05 and < 0.05
Negative	<= -0.05

Install required dependencies

!pip install vaderSentiment

Requirement already satisfied: vaderSentiment in /usr/local/lib/python3.6/dist-packages (3.3.2)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from vaderSentiment) (2.23.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->vaderSentiment) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->vaderSentiment) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->vaderSentiment) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->vaderSentiment) (2.10)

import sys
!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension

Requirement already up-to-date: pandas-profiling[notebook] in /usr/local/lib/python3.6/dist-packages (2.9.0)
Requirement already satisfied, skipping upgrade: matplotlib>=3.2.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (3.2.2)
Requirement already satisfied, skipping upgrade: tangled-up-in-unicode>=0.0.6 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.0.6)
Requirement already satisfied, skipping upgrade: requests>=2.23.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (2.23.0)
Requirement already satisfied, skipping upgrade: seaborn>=0.10.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.11.0)
Requirement already satisfied, skipping upgrade: attrs>=19.3.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (20.2.0)
Requirement already satisfied, skipping upgrade: htmlmin>=0.1.12 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.1.12)
Requirement already satisfied, skipping upgrade: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (1.1.3)
Requirement already satisfied, skipping upgrade: missingno>=0.4.2 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.4.2)
Requirement already satisfied, skipping upgrade: jinja2>=2.11.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (2.11.2)
Requirement already satisfied, skipping upgrade: visions[type_image_path]==0.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.5.0)
Requirement already satisfied, skipping upgrade: numpy>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (1.18.5)
Requirement already satisfied, skipping upgrade: phik>=0.9.10 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.10.0)
Requirement already satisfied, skipping upgrade: tqdm>=4.43.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (4.51.0)
Requirement already satisfied, skipping upgrade: joblib in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (0.17.0)
Requirement already satisfied, skipping upgrade: ipywidgets>=7.5.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (7.5.1)
Requirement already satisfied, skipping upgrade: scipy>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (1.4.1)
Requirement already satisfied, skipping upgrade: confuse>=1.0.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (1.3.0)
Requirement already satisfied, skipping upgrade: jupyter-core>=4.6.3; extra == "notebook" in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (4.6.3)
Requirement already satisfied, skipping upgrade: jupyter-client>=6.0.0; extra == "notebook" in /usr/local/lib/python3.6/dist-packages (from pandas-profiling[notebook]) (6.1.7)
Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling[notebook]) (2.4.7)
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling[notebook]) (1.2.0)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling[notebook]) (2.8.1)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling[notebook]) (0.10.0)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests>=2.23.0->pandas-profiling[notebook]) (2.10)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests>=2.23.0->pandas-profiling[notebook]) (1.24.3)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests>=2.23.0->pandas-profiling[notebook]) (2020.6.20)
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests>=2.23.0->pandas-profiling[notebook]) (3.0.4)
Requirement already satisfied, skipping upgrade: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling[notebook]) (2018.9)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.11.1->pandas-profiling[notebook]) (1.1.1)
Requirement already satisfied, skipping upgrade: networkx>=2.4 in /usr/local/lib/python3.6/dist-packages (from visions[type_image_path]==0.5.0->pandas-profiling[notebook]) (2.5)
Requirement already satisfied, skipping upgrade: imagehash; extra == "type_image_path" in /usr/local/lib/python3.6/dist-packages (from visions[type_image_path]==0.5.0->pandas-profiling[notebook]) (4.1.0)
Requirement already satisfied, skipping upgrade: Pillow; extra == "type_image_path" in /usr/local/lib/python3.6/dist-packages (from visions[type_image_path]==0.5.0->pandas-profiling[notebook]) (7.0.0)
Requirement already satisfied, skipping upgrade: numba>=0.38.1 in /usr/local/lib/python3.6/dist-packages (from phik>=0.9.10->pandas-profiling[notebook]) (0.48.0)
Requirement already satisfied, skipping upgrade: nbformat>=4.2.0 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling[notebook]) (5.0.8)
Requirement already satisfied, skipping upgrade: ipykernel>=4.5.1 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling[notebook]) (4.10.1)
Requirement already satisfied, skipping upgrade: ipython>=4.0.0; python_version >= "3.3" in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling[notebook]) (5.5.0)
Requirement already satisfied, skipping upgrade: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling[notebook]) (3.5.1)
Requirement already satisfied, skipping upgrade: traitlets>=4.3.1 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling[notebook]) (4.3.3)
Requirement already satisfied, skipping upgrade: pyyaml in /usr/local/lib/python3.6/dist-packages (from confuse>=1.0.0->pandas-profiling[notebook]) (3.13)
Requirement already satisfied, skipping upgrade: tornado>=4.1 in /usr/local/lib/python3.6/dist-packages (from jupyter-client>=6.0.0; extra == "notebook"->pandas-profiling[notebook]) (5.1.1)
Requirement already satisfied, skipping upgrade: pyzmq>=13 in /usr/local/lib/python3.6/dist-packages (from jupyter-client>=6.0.0; extra == "notebook"->pandas-profiling[notebook]) (19.0.2)
Requirement already satisfied, skipping upgrade: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.1->matplotlib>=3.2.0->pandas-profiling[notebook]) (1.15.0)
Requirement already satisfied, skipping upgrade: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx>=2.4->visions[type_image_path]==0.5.0->pandas-profiling[notebook]) (4.4.2)
Requirement already satisfied, skipping upgrade: PyWavelets in /usr/local/lib/python3.6/dist-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.5.0->pandas-profiling[notebook]) (1.1.1)
Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from numba>=0.38.1->phik>=0.9.10->pandas-profiling[notebook]) (50.3.2)
Requirement already satisfied, skipping upgrade: llvmlite<0.32.0,>=0.31.0dev0 in /usr/local/lib/python3.6/dist-packages (from numba>=0.38.1->phik>=0.9.10->pandas-profiling[notebook]) (0.31.0)
Requirement already satisfied, skipping upgrade: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.2.0)
Requirement already satisfied, skipping upgrade: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (2.6.0)
Requirement already satisfied, skipping upgrade: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (1.0.18)
Requirement already satisfied, skipping upgrade: pexpect; sys_platform != "win32" in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (4.8.0)
Requirement already satisfied, skipping upgrade: pickleshare in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.7.5)
Requirement already satisfied, skipping upgrade: pygments in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (2.6.1)
Requirement already satisfied, skipping upgrade: simplegeneric>0.8 in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.8.1)
Requirement already satisfied, skipping upgrade: notebook>=4.4.1 in /usr/local/lib/python3.6/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (5.3.1)
Requirement already satisfied, skipping upgrade: wcwidth in /usr/local/lib/python3.6/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.2.5)
Requirement already satisfied, skipping upgrade: ptyprocess>=0.5 in /usr/local/lib/python3.6/dist-packages (from pexpect; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.6.0)
Requirement already satisfied, skipping upgrade: terminado>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.9.1)
Requirement already satisfied, skipping upgrade: Send2Trash in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (1.5.0)
Requirement already satisfied, skipping upgrade: nbconvert in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (5.6.1)
Requirement already satisfied, skipping upgrade: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.3)
Requirement already satisfied, skipping upgrade: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (1.4.2)
Requirement already satisfied, skipping upgrade: testpath in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.4.4)
Requirement already satisfied, skipping upgrade: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.6.0)
Requirement already satisfied, skipping upgrade: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (3.2.1)
Requirement already satisfied, skipping upgrade: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.8.4)
Requirement already satisfied, skipping upgrade: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (0.5.1)
Requirement already satisfied, skipping upgrade: packaging in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[notebook]) (20.4)
Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: OK

Import required Libraries

import pandas as pd 
import numpy as np 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

Upload dataset and create dataframe

from google.colab import files
files.upload()

Saving training.1600000.processed.noemoticon.csv to training.1600000.processed.noemoticon (1).csv

df = pd.read_csv('training.1600000.processed.noemoticon.csv', sep="," , header=None, encoding='latin-1', parse_dates=True, infer_datetime_format=True )
df

df.columns = ['Polarity', 'tweet_id', 'date', 'flag', 'user', 'text']

df.head(5)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1600000 entries, 0 to 1599999
Data columns (total 6 columns):
 #   Column    Non-Null Count    Dtype 
---  ------    --------------    ----- 
 0   Polarity  1600000 non-null  int64 
 1   tweet_id  1600000 non-null  int64 
 2   date      1600000 non-null  object
 3   flag      1600000 non-null  object
 4   user      1600000 non-null  object
 5   text      1600000 non-null  object
dtypes: int64(2), object(4)
memory usage: 73.2+ MB

Perform Exploratory Data Analysis using Pandas_Profiling

This is a very large dataset with 1.6 million rows, we will use pandas_profiling to help us explore the data better and faster.

from pandas_profiling import ProfileReport

# generate report 

profile = ProfileReport(df, title = 'Pandas Profiling Report', explorative=True)

# to view it in Google Colab 

profile.to_notebook_iframe()

df['text'].astype('string', copy=True) # convert text from object to string

0          @switchfoot http://twitpic.com/2y1zl - Awww, t...
1          is upset that he can't update his Facebook by ...
2          @Kenichan I dived many times for the ball. Man...
3            my whole body feels itchy and like its on fire 
4          @nationwideclass no, it's not behaving at all....
                                 ...                        
1599995    Just woke up. Having no school is the best fee...
1599996    TheWDB.com - Very cool to hear old Walt interv...
1599997    Are you ready for your MoJo Makeover? Ask me f...
1599998    Happy 38th Birthday to my boo of alll time!!! ...
1599999    happy #charitytuesday @theNSPCC @SparksCharity...
Name: text, Length: 1600000, dtype: string

df['flag'].unique()

array(['NO_QUERY'], dtype=object)

df['Polarity'].unique()

array([0, 4])

Apply Vader Sentiment Analysis function

sia = SentimentIntensityAnalyzer()

sia_t = lambda x: sia.polarity_scores(x)  # this function will return a dictionary of values

df['pos','compound', 'neu', 'neg' ] = df['text'].apply(sia_t)

print(df.head())

   Polarity  ...                          (pos, compound, neu, neg)
0         0  ...  {'neg': 0.117, 'neu': 0.768, 'pos': 0.114, 'co...
1         0  ...  {'neg': 0.291, 'neu': 0.709, 'pos': 0.0, 'comp...
2         0  ...  {'neg': 0.0, 'neu': 0.842, 'pos': 0.158, 'comp...
3         0  ...  {'neg': 0.321, 'neu': 0.5, 'pos': 0.179, 'comp...
4         0  ...  {'neg': 0.138, 'neu': 0.862, 'pos': 0.0, 'comp...

[5 rows x 7 columns]

df_sia = pd.json_normalize(df[('pos', 'compound', 'neu', 'neg')], max_level=0)

df_sia.head()

df_sia.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1600000 entries, 0 to 1599999
Data columns (total 4 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   neg       1600000 non-null  float64
 1   neu       1600000 non-null  float64
 2   pos       1600000 non-null  float64
 3   compound  1600000 non-null  float64
dtypes: float64(4)
memory usage: 48.8 MB

new_df = df.join(df_sia, how='left')

new_df.head()

new_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1600000 entries, 0 to 1599999
Data columns (total 11 columns):
 #   Column                     Non-Null Count    Dtype  
---  ------                     --------------    -----  
 0   Polarity                   1600000 non-null  int64  
 1   tweet_id                   1600000 non-null  int64  
 2   date                       1600000 non-null  object 
 3   flag                       1600000 non-null  object 
 4   user                       1600000 non-null  object 
 5   text                       1600000 non-null  object 
 6   (pos, compound, neu, neg)  1600000 non-null  object 
 7   neg                        1600000 non-null  float64
 8   neu                        1600000 non-null  float64
 9   pos                        1600000 non-null  float64
 10  compound                   1600000 non-null  float64
dtypes: float64(4), int64(2), object(5)
memory usage: 134.3+ MB

new_df.drop(columns=('pos', 'compound', 'neu', 'neg'), inplace=True) # drop the dictionary column

new_df.columns  # check to see dictionary dropped

Index(['Polarity', 'tweet_id', 'date', 'flag', 'user', 'text', 'neg', 'neu',
       'pos', 'compound'],
      dtype='object')

import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

sns.displot(x=new_df['compound'])
plt.show()

Create new column for Sentiment Type

conditions = [
              (new_df['compound'] > 0.05),
              (new_df['compound'] > -0.05) & (new_df['compound'] <= 0.05),
              (new_df['compound'] <= -0.05)
              ]

## create list of values to assign to the conditions

values = ['positive', 'neutral', 'negative']

## create a new column 

new_df['Sentiment'] = np.select(conditions, values)

new_df.head()

values1 = [4, 2, 0]

new_df['Polarity_new'] = np.select(conditions, values1)
new_df.head()

import seaborn as sns
%matplotlib inline

sns.displot(new_df['Polarity_new'])
plt.show()

sns.displot(new_df['Polarity'])
plt.show()

df_neg_neu = new_df[(new_df['Polarity'] == 0) & (new_df['Polarity_new'] == 4)]
                     
print(df_neg_neu['text'].head(20))

2     @Kenichan I dived many times for the ball. Man...
6                                           Need a hug 
7     @LOLTrish hey  long time no see! Yes.. Rains a...
14    @smarrison i would've been the first, but i di...
15    @iamjazzyfizzle I wish I got to watch it with ...
18    @LettyA ahh ive always wanted to see rent  lov...
19    @FakerPattyPattz Oh dear. Were you drinking ou...
21    one of my friend called me, and asked to meet ...
23               this week is not going as i had hoped 
28    ooooh.... LOL  that leslie.... and ok I won't ...
33    @julieebaby awe i love you too!!!! 1 am here  ...
38    @fleurylis I don't either. Its depressing. I d...
41    He's the reason for the teardrops on my guitar...
43    @JonathanRKnight Awww I soo wish I was there t...
44    Falling asleep. Just heard about that Tracy gi...
45    @Viennah Yay! I'm happy for you with your job!...
46    Just checked my user timeline on my blackberry...
47    Oh man...was ironing @jeancjumbe's fave top to...
51    @localtweeps Wow, tons of replies from you, ma...
54                                        I need a hug 
Name: text, dtype: object

Create a Word Cloud using the Positive Words

from wordcloud import WordCloud

wc = WordCloud(max_words = 2000, background_color='yellow', width = 1600, height = 1600).generate(''.join(new_df[new_df.Polarity_new == 4].text))

plt.figure(figsize = (16,16), facecolor=None)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

Create a Word Cloud using the Neutral words

from wordcloud import WordCloud

wc = WordCloud(max_words = 2000, background_color='white', width = 1600, height = 1600).generate(''.join(new_df[new_df.Polarity_new == 2].text))

plt.figure(figsize = (16,16), facecolor=None)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

Create a Word Cloud using the Negative Words

from wordcloud import WordCloud

wc = WordCloud(max_words = 2000, width = 1600, height = 1600).generate(''.join(new_df[new_df.Polarity_new == 0].text))

plt.figure(figsize = (16,16), facecolor=None)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

References

vaderSentiment Python Library accessed 18-October-2020.
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
Kaggle Sentiment Analysis Dataset accessed 18-October-2020.
Pandas Profiling accessed 20-October-2020.

	0	1	2	3	4	5
0	0	1467810369	Mon Apr 06 22:19:45 PDT 2009	NO_QUERY	_TheSpecialOne_	@switchfoot http://twitpic.com/2y1zl - Awww, t...
1	0	1467810672	Mon Apr 06 22:19:49 PDT 2009	NO_QUERY	scotthamilton	is upset that he can't update his Facebook by ...
2	0	1467810917	Mon Apr 06 22:19:53 PDT 2009	NO_QUERY	mattycus	@Kenichan I dived many times for the ball. Man...
3	0	1467811184	Mon Apr 06 22:19:57 PDT 2009	NO_QUERY	ElleCTF	my whole body feels itchy and like its on fire
4	0	1467811193	Mon Apr 06 22:19:57 PDT 2009	NO_QUERY	Karoli	@nationwideclass no, it's not behaving at all....
...	...	...	...	...	...	...
1599995	4	2193601966	Tue Jun 16 08:40:49 PDT 2009	NO_QUERY	AmandaMarie1028	Just woke up. Having no school is the best fee...
1599996	4	2193601969	Tue Jun 16 08:40:49 PDT 2009	NO_QUERY	TheWDBoards	TheWDB.com - Very cool to hear old Walt interv...
1599997	4	2193601991	Tue Jun 16 08:40:49 PDT 2009	NO_QUERY	bpbabe	Are you ready for your MoJo Makeover? Ask me f...
1599998	4	2193602064	Tue Jun 16 08:40:49 PDT 2009	NO_QUERY	tinydiamondz	Happy 38th Birthday to my boo of alll time!!! ...
1599999	4	2193602129	Tue Jun 16 08:40:50 PDT 2009	NO_QUERY	RyanTrevMorris	happy #charitytuesday @theNSPCC @SparksCharity...

	neg	neu	pos	compound
0	0.117	0.768	0.114	-0.0173
1	0.291	0.709	0.000	-0.7500
2	0.000	0.842	0.158	0.4939
3	0.321	0.500	0.179	-0.2500
4	0.138	0.862	0.000	-0.4939