Function Menu

Categories

skin-tone

hair-style

gender

Sentiment

Version

Every emoji is special. As long as you analyze it in a right way, you can find out many information such as meanings, category, related topic, even some unique usage of an emoji at a glance👀. After we perform a great amount of calculation and analysis to every emoji, we finally get Emoji Tag Cloud, keywords and phrases related to a specific emoji, and represent it in a novelty visual way.

🔺The Tag Cloud of emoji [unicorn🦄]

How we get tags?

As we all know, Twitter is a global famous social networking service, and people like to tweet texts with emojis to express their emotions or just as decoration. Most of our tags is from tweets all over the world. We analyze tweets from 2018.01 to 2021.11 and extract the tags from 812 million of tweets that contain emojis. Then we use statistics and algorithms to pick out the texts that are highly relevant to a specific emoji, and distinguish the language. Throught this, we can even know how an emoji is used in different country.

Here we use english site as an example. These 2 emojis in our english page: 😎(smiling face with sunglasses) and 🦄(unicorn). We perform tag extraction to them and get words like [cool] or [nicki].

Here shows up a new problem❓: Of course there are lots of texts related to an emoji, then how we choose the most suited tag, and how to arrange them?

In this situation, they involve further algorithms.

The professional technology explanation of tag extraction

There are many types of tag extraction technologies with different effects, such as abstract extraction for articles, and keyword tagging algorithms for short texts. Our "Emoji Tag Cloud-Twitter Tag Extraction" is based on the TF-IDF algorithm. It carried out with a modified process based on the characteristics of the Twitter data, which is an unsupervised short text label extraction algorithm.

To facilitate understanding, here we summarize the procedure of tag extracting into 3 steps.

  • First, we perform emoji extraction and text cleaning for each tweet with a month as a unit, remove topics and nicknames like [@xx], [#xx], and delete the tweet URL. During the text cleaning, we also filter stop words in different languages (for example, remove some modal particles like ah, oh, etc.), abbreviations, word forms, capitalization and other factors, finally get word frequency information data corresponding to each emoji.
  • Second, we use the TF-IDF (term frequency-inverse document frequency) text representation algorithm to calculate an initial label text weight based on the results obtained in the previous step.
  • The calculation formula is: TF-IDF = TF * IDF

    TF (Term Frequency) is obtained by dividing the number of occurrences of a word corresponding to an emoji by the total number of words corresponding to the emoji. IDF is Inverse Document Frequency, IDF = log( N / N(w) ), while [N] represents the total number of emojis, [N(w)] represents the number of emojis containing the word [w].

    🔺When a word appears in both emojiA and emojiB, it means that the word is not representative enough, and the weight of this word should be reduced. From the formula IDF = log( N / N(w) ), it can be seen that the range of IDF between 0 and positive infinity decreases with the increase of N(w).

    When a term appears more times in an article, it means that the weight of the term is greater. However, the words that appear most often are words that express tone or have no actual meaning, such as [aww], [oh] or [RT]. It is difficult to avoid the filtering of such words if only sorting and filtering by the TF value. So the IDF inverse document number is introduced as a constraint, in order to calculate a more accurate value to represent the weight of the label text.

    At the end of step 2, we will filter the entries that appear more than 15% of the total number of emojis.

  • The units calculated in the first two steps are monthly data, and the total data is four years. At the last step, we will perform another round of consolidated statistical calculations on all monthly data.
  • Convert the calculated four-year tweet data into the form of [(sum(tfidf_m) / M) * log(M)], and further calculate the weight of each tag entry. [sum(tfidf_m)] represents the sum of the TF-IDF values of the term in each month, and [M] represents the number of months in which the term appears.

These, then, is the approximate calculation method of emoji tag data. After the final data is summarized, we will also manually check and filter according to the language in order to get more accurate tag results.


In addition, the tags will also use CLDR short name and CLDR keywords for reference, they are the most basic tag text, which means you will always see some of these words in the Emoji Tag Cloud.

🔺 When an emoji is submitted to Unicode Consortium, it is necessary to have CLDR short name and CLDR keywords in its proposal, so these words must take into tag choosing consideration. The information of emoji [unicorn🦄], we put its short name and some of keywords into its tag cloud.

How to use our Emoji Tag Cloud?

It been quite a time we released the Emoji Tag Cloud. Personally, I believe it is a very funny and useful tool to observe a specific emoji, sometimes you can even know which group or trending topic prefer to using this emoji. Let me show you how to use our Emoji Tag Cloud!

As mentioned above, the weight of each tag text is different. You can simply judge the correlation between tags and emoji by the size of circle (the bigger, the more relevant). Or you can put your mouse on the circle, there will be a small square with [number, tag text] show up. In this situation, the smaller the number is, the tag in this circle is more relevant to the emoji. You can also click these tags to search other related emojis!

Still, we use unicorn as example. the Tag Cloud of emoji [unicorn🦄] is like below:

As you can see, the Top 5 tags of 🦄 are [unicorn], [nicki], [unicorns], [plt] and [barbz].

Word [unicorn] is the CLDR short name of this emoji, and the rest of 4 tags are all extracted from twitter. [nicki] and [barbz] are related to Nicki Minaj and her fan group, and word [plt] is refer to [Pretty Little Thing], a UK-based fashion retailer or just this phrase. 🦄 is a popular emoji in SNS, and a very representative emoji for Nicki Minaj fans. If you love Nicki, you definitely should use this emoji!


All in all, by using our Emoji Tag Cloud, you can easily find out the related contents of a specific emoji, maybe you can even know more about pop culture and avoid embarrassment of not knowing the basic and extended meaning of an emoji. Sometimes the using of an emoji can become a social phenomenon (like 🥺 in Japan). In this case, for some people, Emoji Tag Cloud is also a great tool to get know about internet culture, all depend on how you use it.

All these are to explore more usage of emoji, and hope you find emoji are interesting and informative. In order to provide you accurate emoji-related contents, the data will also keep updating. If you have any more advice about Emoji Tag Cloud, please let us know in the comments below👇!