Russia and Ukraine war on Official Twitter

Short story on how people are reacting to Russia Ukraine war based on the data from Twitter using python

  • Introduction
  • Results
  • Uploading the Data
  • Calling the Libraries
  • Hashtag analysis of Twitter
  • Cleaning the Data
  • New columns – Subjectivity and Polarity
  • word cloud
  • Plot by Language
  • New Column – Negative, Neutral, Positive
  • The Scatter Plot in Python
  • The Bar chart in Python
  • Printing most liked tweet
  • Printing most retweet tweet

Explanation video available on Youtube

Uploading the Data

df_tweet = pd.read_csv("/content/filename.csv")

Submit the Form to Download the Data:

    By using the Pandas library we are able to upload the data to python. In the whole code, we will save the data into the df_tweet variable.

    Calling all the Libraries

    from textblob import TextBlob
    from wordcloud import WordCloud
    import numpy as np 
    import matplotlib.pyplot as plt
    from nltk.stem import WordNetLemmatizer
    import nltk
    from nltk.corpus import stopwords 
    from nltk.tokenize import word_tokenize 
    import spacy
    from nltk.corpus import sentiwordnet as swn
    from IPython.display import clear_output
    import plotly.express as px
    import seaborn as sns
    import plotly
    plotly.offline.init_notebook_mode (connected = True)

    Hashtag analysis of Twitter

    # function to collect hashtags
    def hashtag_extract(text_list):
        hashtags = []
        # Loop over the words in the tweet
        for text in text_list:
            ht = re.findall(r"#(\w+)", text)
        return hashtags

    For analyzing Hashtags used in the tweets, we first need to extract these tweets.

    ht = re.findall(r"#(\w+)", text) 

    The above code is used to extract text which starts with ‘#’. Finally, we save all the words in a list of hashtags’.

    Hashtag bar chart

    def generate_hashtag_freqdist(hashtags):
        a = nltk.FreqDist(hashtags)
        d = pd.DataFrame({'Hashtag': list(a.keys()),
                          'Count': list(a.values())})
        # selecting top 15 most frequent hashtags     
        d = d.nlargest(columns="Count", n = 25)
        ax = sns.barplot(data=d, x= "Hashtag", y = "Count")
        ax.set(ylabel = 'Count')

    A function to count hashtags, and plot a bar plot.

    hashtags used on Twitter

    Calling Function hashtags extract function

    hashtags = hashtag_extract(df_tweet["tweet"])
    hashtags = sum(hashtags, [])

    Sending tweets to the functions. As a response, we get a bar chart from the seaborn library.

    Cleaning the Data in Python

    def cleanText(text):
      text = re.sub(r'@[A-Za-z0-9]+',"",str(text))
      text = re.sub(r'#','',str(text))
      text = re.sub(r'RT[\s]+','',str(text))
      text = re.sub(r'https?:\/\/s+','',str(text))
      return text
    df_tweet['tweet'] = df_tweet['tweet'].apply(cleanText)

    To do sentimental analysis, first, we need to clean the data. There are a lot of symbols and other characters which are useless in the Tweets.

    Creating 2 new columns – Subjectivity and Polarity

    def getSubjectivity(text):
      return TextBlob(str(text)).sentiment.subjectivity
    def getPolarity(text):
      return TextBlob(text).sentiment.polarity
    #new columns
    df_tweet['Subjectivity'] = df_tweet['tweet'].apply(getSubjectivity)
    df_tweet['Polarity'] = df_tweet['tweet'].apply(getPolarity)

    What is Subjectivity?

    The subjective sentence expresses some personal feelings, views, or beliefs. the subjective sentence is “I like iPhone.” Subjective expressions come in many forms, e.g., opinions, allegations, desires, beliefs, suspicions, and speculations.

    What is Polarity?

    It simply means emotions expressed in a sentence. Emotions are closely related to sentiments. The strength of sentiment or opinion is typically linked to the intensity of certain emotions, e.g., joy and anger. For example – “I am happy with this car.”

    Making a word cloud in python using our data

    allWords = ' '.join( [twts for twts in df_tweet['tweet']])
    wordCloud = WordCloud(width = 800, height=500, random_state = 21, max_font_size = 110).generate(allWords)
    plt.imshow(wordCloud, interpolation = 'bilinear')

    How many languages do People use on Twitter?

    ax = sns.countplot(x=df_tweet['language'],data= df_tweet, order = df_tweet['language'].value_counts().index)
    for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x()+p.get_width()/2., height + 0.1,
            df_tweet['language'].value_counts()[i],ha="center", fontsize = 8)
        i += 1
    How many languages People used on Twitter?

    New column on whether the tweet was negative, neutral or positive.

    df_tweet['tweet'] = df_tweet['tweet'].apply(lambda x: re.split('https:\/\/.*', str(x))[0])
    from numpy.ma.core import negative
    #create a fuction to compute the negative, neutral and postive analysis 
    def getAnalysis(score):
      if score < 0:
        return 'Negative'
      elif score == 0:
       return 'Neutral'
        return 'Positive'
    df_tweet['Analysis'] = df_tweet['Polarity'].apply(getAnalysis)

    This set of codes will create a new column name ‘Analysis’ filled with Negative, Neutral, and Positive adjectives depending upon our If-else statement.

    Let us plot Scatter Plot

    # df_norm_col=(df_tweet['Polarity'].mean())/df_tweet['Subjectivity'].std()
    sns.jointplot(df_tweet['Polarity'],df_tweet['Subjectivity'], cmap='kde')
    plt.title('Sentimental analysis')
    scatterplot using python

    Count of Positive, Negative, and Neutral Tweets on Russia vs Ukraine War

    ptweets = df_tweet[df_tweet['Analysis'] == 'Positive']
    ptweets = ptweets['tweet']
    round((ptweets.shape[0] / df_tweet.shape[0]) * 100,1)
    ntweets = df_tweet[df_tweet['Analysis'] == 'Negative']
    ntweets = ntweets['tweet']
    round((ntweets.shape[0] / df_tweet.shape[0]) * 100,1)
    neutweets = df_tweet[df_tweet['Analysis'] == 'Neutral']
    neutweets = neutweets['tweet']
    round((neutweets.shape[0] / df_tweet.shape[0]) * 100,1)
    plt.title('Sentiment Analysis')
    sentimental analysis using python on Twitter data

    Printing most Liked tweet

    df_tweet[['username','name','tweet','likes_count', 'language', 'Subjectivity', 'Polarity', 'Analysis' ]].sort_values(by=['likes_count'], ascending= False).head(5)

    Printing most retweeted tweet

    df_tweet[['username','name','tweet','retweets_count', 'language', 'Subjectivity', 'Polarity', 'Analysis' ]].sort_values(by=['retweets_count'], ascending= False).head(5)

    Similar Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *