• idriss tsafack

Wall street bet sentimental analysis and Game stop, AMC, Nokia prediction using R programming

Hey there! Hope you are doing great! In this post I will show how to use sentimental analysis to predict Game stop, AMC, Nokia stock prices using R programming. This idea came to me last week as I was reading all the hot news about the short squeeze scene that was happening on the stock market. I said why not doing some sentiment analysis and these stocks (Game stop (GME), AMC, Nokia (NOK) or Dogecoin (DOGE)) and predict their daily price.


Photo credit by Brian John YouTube


Indeed, this post is the first of a series of 3 posts (at most) because I am thinking about building a bot around this called "wall street bot".


In this first post, I will show precisely how to scrap YouTube comments concerning (Game stop (GME), AMC, Nokia (NOK) or Dogecoin (DOGE)) and see what people think about what is happening on the financial market.


Hence, the goal of this post is to get you on the main procedure and the algorithm you can use to extract the comments about Game stop, AMC, Black berry, Nokia and Dogecoin. Therefore, on the next post we will go more in depth by selecting more targeted topic related videos and provide daily sentiments. Also, we will extract the historical daily stock prices of these companies in other to see how far these YouTube comments or tweeter comments can influence stock prices.


YouTube is one of the most popular social media platform where we can find some of the biggest influencers of the retail stock traders. Some of those influencer can provide up to 5000 comments for only one video, which is huge. In that sense I decided to see what is the sentiment of these retails watching all those videos.


Sentiment analysis is an operation that gained a lot of popularity. Indeed, it comes from the fact that researchers and market participants tend to agree that not only price is useful to predict stock prices. Sentiment analysis is widely used in Marketing, political opinion mining, etc.


Useful packages for stock market sentiment analysis


To be more precised, in this post we will show only the code for the web scrapping procedure. Then, we start by importing the essential packages for this purpose. The main packages for this webscrapping are : "tuber" to get access to youtube, "httpuv" for web authentication, "syuzhet" for tokenization and transformations of comments into numbers. Also I added some usual packages. Here is the code for the packages loading.



Google API Key for YouTube authentication


The next step is to get my google API KEY for YouTube authentication. This step is crucial as it helps us to connect our R programming console with the web authentication. You can follow the procedure to get Google API key and API secret by clicking this link . Once we get the API Key and the API secret code, you can define them and store them in some variables. The code related to his part is the following


Once we did that, we can use the function yt_oauth(API_Key , Api_secret , token= " " ). This function is useful for authentication.

Extracting comments from YouTube videos


The next step is to select 4 popular videos that are uploaded in a huge channel stock market channel (more than one million of subscribers) . The video is selected based on the fact that the topic is about Game stop or AMC, or Nokia, Black Berry or short squeezed. Then, I selected the four videos and got their video ID on YouTube. I cannot precise the video because we are providing statistics meaning that it could not be accepted by the author of the video. To get the comments of the video, we use the function get_all_comments(video_id = "Video Id"). We did it for all the four video and here is the related code.



Converting comments into sentiment scores


This will generate data frame containing a set of 15 variables on the set of comments related to each video selected. Then, based on the variable containing the comments, we use the function iconv() to convert the comment text into a sentiment score. Indeed, each comment is transformed into a list of ten different value, representing the different categories of sentiments we can have.



Statistics on the sentiment scores


Once we have calculated the score of comments, we can derive some stats concerning the sentiment scores. You can calculate the mean or the sum, per column. This will give us an idea of how people feel about this topic and what are the more predominant feelings or sentiments. Also, we can calculate the rate of each sentiment. For this post I have calculated the percentage of representation for each sentiment. Here is the related code.



Therefore, we can plot the related statistics all together using the barplot() function. Here is the code par(mfrow = c(nrow,ncol)) is a function used to create a matrix in which we can display our graphics. As we have four different YouTube videos, we can create a 2-by-2 matrix , i.e we use par(mfrow = c(2,2)). Here is the code for the plotting procedure.


Plotting the sentiment scores


Therefore, we obtain the following figure showing that in overall retail traders are in majority presenting a positive sentiment to this short squeezed situation on the stock market. Indeed, positive sentiments represent more than 20% of the overall scores. Negative sentiments feelings take the second rank (between 10% and 15%) followed by trust feeling (roughly 15%). Also we can see that people are more confident and joyfull about this situation than scary.


We can also produce more precised statistics about the topic. The main goal of this post was to get you on the main procedure and the algorithm you can use to extract the comments about Game stop, AMC, Black berry, Nokia and Dogecoin. Therefore, on the next post we will go more in depth by selecting more targeted topic related videos and provide daily sentiments. Also, we will extract the historical daily stock prices of these companies in other to see how far these YouTube comments or tweeter comments can influence stock prices.


So that's all for this post. There are many other features that we can learn, but we prefer to prepare it for another post. I hope you liked it. If you liked the post, please share it with friends and your community of machine learning and data science.

See you on the next post.



Department of Economics,

 University of California Irvine

© 2018 by IDRISS TSAFACK TEUFACK

  • LinkedIn Social Icône
  • Facebook Social Icône
  • Twitter Icône sociale