Russia and Ukraine war on Official Twitter
Short story on how people are reacting to Russia Ukraine war based on the data from Twitter using python
- Introduction
- Results
- Uploading the Data
- Calling the Libraries
- Hashtag analysis of Twitter
- Cleaning the Data
- New columns – Subjectivity and Polarity
- word cloud
- Plot by Language
- New Column – Negative, Neutral, Positive
- The Scatter Plot in Python
- The Bar chart in Python
- Printing most liked tweet
- Printing most retweet tweet
Uploading the Data
df_tweet = pd.read_csv("/content/filename.csv")
df_tweet.head()
df_tweet['tweet'].head()
Submit the Form to Download the Data:
By using the Pandas library we are able to upload the data to python. In the whole code, we will save the data into the df_tweet variable.
Calling all the Libraries
from textblob import TextBlob
from wordcloud import WordCloud
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from nltk.stem import WordNetLemmatizer
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import spacy
from nltk.corpus import sentiwordnet as swn
from IPython.display import clear_output
import plotly.express as px
import seaborn as sns
import plotly
plotly.offline.init_notebook_mode (connected = True)
Hashtag analysis of Twitter
# function to collect hashtags
def hashtag_extract(text_list):
hashtags = []
# Loop over the words in the tweet
for text in text_list:
ht = re.findall(r"#(\w+)", text)
hashtags.append(ht)
return hashtags
For analyzing Hashtags used in the tweets, we first need to extract these tweets.
ht = re.findall(r"#(\w+)", text)
The above code is used to extract text which starts with ‘#’. Finally, we save all the words in a list of hashtags’.
Hashtag bar chart
def generate_hashtag_freqdist(hashtags):
a = nltk.FreqDist(hashtags)
d = pd.DataFrame({'Hashtag': list(a.keys()),
'Count': list(a.values())})
# selecting top 15 most frequent hashtags
d = d.nlargest(columns="Count", n = 25)
plt.figure(figsize=(16,7))
ax = sns.barplot(data=d, x= "Hashtag", y = "Count")
plt.xticks(rotation=80)
ax.set(ylabel = 'Count')
plt.show()
A function to count hashtags, and plot a bar plot.

Calling Function hashtags extract function
hashtags = hashtag_extract(df_tweet["tweet"])
hashtags = sum(hashtags, [])
generate_hashtag_freqdist(hashtags)
Sending tweets to the functions. As a response, we get a bar chart from the seaborn library.
Cleaning the Data in Python
def cleanText(text):
text = re.sub(r'@[A-Za-z0-9]+',"",str(text))
text = re.sub(r'#','',str(text))
text = re.sub(r'RT[\s]+','',str(text))
text = re.sub(r'https?:\/\/s+','',str(text))
return text
df_tweet['tweet'] = df_tweet['tweet'].apply(cleanText)
df_tweet
To do sentimental analysis, first, we need to clean the data. There are a lot of symbols and other characters which are useless in the Tweets.
Creating 2 new columns – Subjectivity and Polarity
#Subjective
def getSubjectivity(text):
return TextBlob(str(text)).sentiment.subjectivity
#polarity
def getPolarity(text):
return TextBlob(text).sentiment.polarity
#new columns
df_tweet['Subjectivity'] = df_tweet['tweet'].apply(getSubjectivity)
df_tweet['Polarity'] = df_tweet['tweet'].apply(getPolarity)
What is Subjectivity?
The subjective sentence expresses some personal feelings, views, or beliefs. the subjective sentence is “I like iPhone.” Subjective expressions come in many forms, e.g., opinions, allegations, desires, beliefs, suspicions, and speculations.
What is Polarity?
It simply means emotions expressed in a sentence. Emotions are closely related to sentiments. The strength of sentiment or opinion is typically linked to the intensity of certain emotions, e.g., joy and anger. For example – “I am happy with this car.”
Making a word cloud in python using our data
allWords = ' '.join( [twts for twts in df_tweet['tweet']])
wordCloud = WordCloud(width = 800, height=500, random_state = 21, max_font_size = 110).generate(allWords)
plt.figure(figsize=(10,7))
plt.imshow(wordCloud, interpolation = 'bilinear')
plt.axis('off')
plt.show()

How many languages do People use on Twitter?
ax = sns.countplot(x=df_tweet['language'],data= df_tweet, order = df_tweet['language'].value_counts().index)
sns.set(rc={'figure.figsize':(15,15)})
i=0
for p in ax.patches:
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1,
df_tweet['language'].value_counts()[i],ha="center", fontsize = 8)
i += 1

New column on whether the tweet was negative, neutral or positive.
df_tweet['tweet'] = df_tweet['tweet'].apply(lambda x: re.split('https:\/\/.*', str(x))[0])
from numpy.ma.core import negative
#create a fuction to compute the negative, neutral and postive analysis
def getAnalysis(score):
if score < 0:
return 'Negative'
elif score == 0:
return 'Neutral'
else:
return 'Positive'
df_tweet['Analysis'] = df_tweet['Polarity'].apply(getAnalysis)
This set of codes will create a new column name ‘Analysis’ filled with Negative, Neutral, and Positive adjectives depending upon our If-else statement.
Let us plot Scatter Plot
# df_norm_col=(df_tweet['Polarity'].mean())/df_tweet['Subjectivity'].std()
sns.jointplot(df_tweet['Polarity'],df_tweet['Subjectivity'], cmap='kde')
plt.title('Sentimental analysis')
plt.show()

Count of Positive, Negative, and Neutral Tweets on Russia vs Ukraine War
ptweets = df_tweet[df_tweet['Analysis'] == 'Positive']
ptweets = ptweets['tweet']
round((ptweets.shape[0] / df_tweet.shape[0]) * 100,1)
ntweets = df_tweet[df_tweet['Analysis'] == 'Negative']
ntweets = ntweets['tweet']
round((ntweets.shape[0] / df_tweet.shape[0]) * 100,1)
neutweets = df_tweet[df_tweet['Analysis'] == 'Neutral']
neutweets = neutweets['tweet']
round((neutweets.shape[0] / df_tweet.shape[0]) * 100,1)
df_tweet['Analysis'].value_counts()
plt.title('Sentiment Analysis')
plt.xlabel('Sentiments')
plt.ylabel('Counts')
df_tweet['Analysis'].value_counts().plot(kind='bar')
plt.show()

Printing most Liked tweet
df_tweet[['username','name','tweet','likes_count', 'language', 'Subjectivity', 'Polarity', 'Analysis' ]].sort_values(by=['likes_count'], ascending= False).head(5)
Printing most retweeted tweet
df_tweet[['username','name','tweet','retweets_count', 'language', 'Subjectivity', 'Polarity', 'Analysis' ]].sort_values(by=['retweets_count'], ascending= False).head(5)