Emoji are full of emotion. It can enhance or adjust the tone and mood of the text, at the same time, the emotions expressed by emoji can be more obvious than words only.

For example this emoji👿(angry face with horns). It is in the form of a devil character obviously, so there may have thoughts considered it is a "bad" emoji, like some evil thoughts in someone's head.

But still, this kind of description is very abstract. Therefore, can we visualize the sentiment types and levels expressed by these emojis?

The answer is "Yes" of course! We performed Natural Language Processing (NLP) on public samples of no less than 50 million tweets, and we use Text Sentiment Analysis (also called opinion mining) to associate the emoji sentiment with a set of values and visualize it. We spent a lot of time, energy and computing power to make our emoji sentiment analysis more scientific and rigorous, so that everyone can feel the charm and get better understanding of emoji.

Visual charts for emoji sentiment analysis

Data such as emoji sentiment value were calculated according to text sentiment analysis, which belongs to statistics and probability theory. After we solved the problem of algorithm and computing power, then encountered new difficulties: how to make users understand these professional data?

A thought crosses our mind—— Hey! How about we visualize emoji sentiment values with some cool designs!

💡: We converted the calculated and analyzed data (left) into a chart that can be better understood by users (right), which can more intuitively show the emotional division of emoji;

So let me explain the following chart, which depicts the results of emoji's sentiment tendency analysis in actual communication.

💡: 2 types of charts for emoji 👿's sentiment analysis, here we only analyze the chart above.

The semicircle arc in the picture is divided into three colors, representing different emotional tendencies of emoji 👿. As we can see, the proportions of these three emotions are about 4:1:5, green and orange have similar proportions, which means this emoji is more a neutral one (surprise!!).

  • The gray cursor is the Confidence Level, a statistical concept. Its position and width indicate: Expected Value ± Confidence.
  • Expected Value: It is the weighted average of dispersive random variables of emoji sentiment value, used to express probability, c ∈ [−1, 0, +1].
  • Confidence: It is the parameter set of emoji sentiment value distribution, shows a reasonable error range in the form of positive and negative interval. And larger the number of corpus samples are, the smaller the error.

Simply speaking, the closer cursor is to the left, the emotion of this emoji is more negative. On the contrary, the closer cursor is to the right, the more positive emotion expressed by this emoji. And the narrower the cursor is, the judgment of the emotion is more accurate.

Now, can you understand our chart of emoji sentiment analysis?

What is Sentiment Analysis?

I believe that you have understood the data chart of emoji sentiment analysis, so let's talk about what is sentiment analysis.

Sentiment analysis is also called opinion mining. Technically speaking, sentiment analysis is part of natural language processing (NLP) research. Sentiment analysis methods are divided into two types: Machine learning and Dictionary-based methods. But with the development of deep learning technology, the use of this technology for sentiment analysis has become the mainstream.

The sentiment analysis process includes data preprocessing, feature engineering and model training. Generally speaking, the data preprocessing stage mainly divides the text into words and removes stop words and punctuation. However, our sentiment analysis retains punctuation marks and stop words. In the feature engineering stage, we chose the word embedding representation (Word2Vec) proposed by the Google team in 2013, which is divided into the CBOW (continuous bag of words) model and the Skip-gram model. The model results are as follows:

💡:On the left is the CBOW model, on the right is the Skip-gram model

CBOW is based on the context to predict the target word to train to obtain the word vector. As shown in the figure, W(t) is predicted based on four words W(t-2), W(t-1), W(t+1), W(t+2); And skip-gram is to predict the surrounding words according to the target word to train to obtain the word vector. As shown in the figure is to predict W(t-2),W(t-1),W(t+1),W(t+2) according to W(t).

In the model prediction stage, we divide the data into two parts: The training set divided at a ratio of 5:1 and the test set and the training set are processed out of order.

Application of sentiment analysis in emoji

Sentiment analysis is a comprehensive analysis method that combines deep learning and statistics. We have obtained the sentiment value of emoji after complex analysis and calculation of reams of data, and the complete emoji sentiment analysis process is as follows:

The process of Emoji Sentiment Analysis

  • Label social networking corpus
  • Data preprocessing
  • Divide the dataset: Training Set(80%), Testing Set(20%)
  • Use LSTM neural network to build a model
  • According to the performance of the model on the test set, adjust the hyperparameters to improve the generalization ability of the model
  • Perform the same data preprocessing action on unlabeled data
  • Use the trained sentiment prediction model to predict the sentiment tendency on unlabeled data

We perform sentiment analysis on emoji, and use deep learning technology to train to get an emoji text sentiment classifier. For the output layer of the classifier, we choose the sigmoid function to activate, and project the output layer results into the interval of 0-1. The closer the text is to 0, the more negative it is, and the closer it is to 1, the more positive it is.

The sigmoid function formula is: F(x)=1/(1+e-x)

We use a large sample data of 50 million tweets containing emoji as an analysis corpus, then put the corpus need to be analyzed into the trained sentiment classifier for sentiment prediction. Finally, the results of sentiment prediction by the classifier are divided into three types: negative, neutral and positive. The classification criteria are:

Anyway, it is difficult, but we made it!!

The usage and prospect of Emoji Sentiment Analysis

Sentiment analysis is widely used in marketing, advertising, psychology, medical and other fields. We decide to do emoji sentiment analysis is to help people get deeper understanding of emoji in actual social interaction, eliminate ambiguity and misunderstanding of emoji more effectively.

For example, when you are chatting with your friend (or your crush) online, the other person may sometimes send you emojis that you don't quite understand. Actually, there are many situations like this, such as:

  • What does 🤒 mean if someone replies to my photo?
  • What does it mean when a girl sends 😊 to me?
  • If my crush sends me , is that a good sign?

As we know, you cannot tell an emoji's accurate meaning, however, through our sentiment analysis, you may can analyze these emojis psychologically, like "she sent me a 💞, it is a positive sign, maybe I should ask her out" or "why my boyfriend sends me a 😒? It is kind of negative, is he mad at me?", something like this. With our Emoji Sentiment Analysis, I believe you can understand emoji more deeply.

All in all, we have paid a lot for this emoji sentiment analysis project. Even though it is difficult, we still want to make it deeper. We are now analyzing emoji in different languages, the emotional gain effect of emoji on pure text, and several other advanced studies have also been considered. All these are to explore more usage of emoji, and hope you find emoji are interesting and informative.