KathNiel, Twitter bots, polls: Quality, not just buzz
Just past midnight on November 25, 2015, a Twitter user named @chiminychurva posted a tweet mentioning Philippine presidential candidate Rodrigo Duterte.
There was nothing really unusual about the tweet (we'll come back later to what it actually said). It was not retweeted, favorited, or even replied to. The poster’s account, with only 329 followers, isn’t especially popular.
But within the next minute-and-a-half after this tweet was posted, 6 more tweets mentioning Duterte were posted. The next minute, 38 more Duterte tweets were posted. The minute after that, 113 more.
By 12:20 am, tweets mentioning Duterte were being posted at a rate of 237 per minute. The surge would continue over the next two hours, peaking at 703 tweets per minute by 1:03 am before eventually coming to an abrupt halt just past 2:00 am.
In the two hours between 12:00 am and 2:15 am on November 25, 2015, a total of 30,195 tweets mentioning Duterte had been posted. This was more than the number of tweets about him that had been posted when the Davao City mayor announced his presidential bid 4 days prior on November 21. This was even more than all the tweets that had been posted about any presidential candidate over the previous 29 days.
In the world of celebrity and politics, Twitter buzz is like currency – a way of measuring the popularity of a person, brand, or topic. It’s assumed that the more people are tweeting about you, the more famous and influential you are. To some extent, that’s true. But like all forms of measurement, it’s imperfect.
While there were 30,000 tweets mentioning Duterte posted on November 25, we were surprised to learn that not only were these probably not beneficial to his campaign, but that his inclusion in those tweets also had more to do with the ABS-CBN love team #KathNiel than with politics.
Looking for bots
Over a month-long period between October 26 and November 25, 2015, our team captured a dataset of 116,071 tweets about the Philippine presidential candidates. We first built this Twitter pipeline to better understand how Filipinos are using Twitter to talk about the elections. Read our earlier blog post about the most-used emojis in tweets about the Philippine presidential candidates.
One of the things we wanted to investigate in this data was how much of the Twitter conversation about the Philippine presidential candidates might be artificially driven by Twitter bots.
Wikipedia defines a Twitterbot as "a program used to produce automated posts on the Twitter microblogging service, or to automatically follow Twitter users.” A quick Google search will lead you to many online vendors that sell followers, favorites, retweets, and mentions for as cheap as a few dollars.
We first tried using the University of Indiana’s existing “bot or not” application. This application and API analyzes over 1,000 of a Twitter user’s features, like their networks of friends and followers and the content of their tweets, to assign that user a “bot or not” or BON score. The higher the score, the more likely that an account is a bot.
Unfortunately, because it takes about one to two minutes to get the BON score of one Twitter account, it would take us at least 24 straight days to get BON scores of all 34,455 users in our sample. We decided to find other strategies for identifying bots using the data we already had scraped and stored.
So we researched other typical characteristics of Twitterbot accounts. Some say that Twitter users who post an insane number of tweets are likely to be bots. But what number qualifies as “insane?”
On average, users in our sample tweet around 22 times a day. So we figured it would be safe to assume that users who tweet more than a few hundred times a day are probably bots. We found that 3.4% of the users in our sample – around 1,216 – tweet more than 200 times a day. Let’s call them “high-frequency tweeters.” Here’s a quick look at some of the most prolific ones:
What could be spammier than accounts that tweet literally thousands of times per day? Some of these accounts’ usernames were also pretty bot-like. For example, take a look at users @lepitennicell10, @lepitennicell4, @lepitennicell2, @lepitennicell69, and @lepitennicell1.
They have almost the same username, differentiated only by a few extra numbers or letters, suggesting that they were created with the help of a computer script designed to quickly generate a lot of unique usernames – useful if you're trying to open a bunch of Twitter accounts at the same time.
About 48% of tweets about Duterte were were posted by high-frequency tweeters, versus only 2% to 6% of the tweets about the other candidates.
It's great publicity for a brand or personality to bag a fleeting spot on Twitter’s list of trending topics. But to make a topic trend, you need a lot of different users tweeting about your topic at the same time. Not enough real people willing to do this? You can also use Twitterbots to make it happen.
To find more bot-created tweets, we looked for groups of 5 or more tweets that were posted at exactly the same time (down to the second) and at least 80% similar in content – let’s call these “mirror tweets.”
We measured the similarity between simultaneously-posted tweets using a natural language processing metric called “Levenshtein ratio", which scores the similarity between two bits of text on a scale of 0 (totally different) to 1 (exactly the same).
We figure that if two, 5, or even 10 people coincidentally tweet at exactly the same time, their posts would at least be varied. But if 10 different users tweet almost exactly the same message at the exact same time, they’re probably bots.
We found that about 26% of the tweets mentioning Duterte were mirror tweets, a proportion far more than the other candidates:
Who has the “bottiest” tweets?
So far, we’ve tried to identify bot-posted tweets two ways:
- Were the tweets posted by users who tweet more than 200 times a day?
- Did the tweets belong to clusters of very similar, simultaneously posted tweets?
There were about 12,629 tweets in our sample that fit both these criteria. Let’s call them “botty” tweets. The chart below shows the proportion of botty tweets that comprise the the total tweets about each candidate:
While it’s true that during the sample period, there were more botty tweets about Duterte than about other candidates, we also knew that there was a really unusual surge of Duterte-related tweets on November 25. If most of the botty tweets about Duterte were posted during that surge, we can’t generalize that Duterte also has more botty tweets on a regular, consistent basis.
We checked how many botty tweets were posted on each day during our sample period. Sure enough, almost all botty tweets were posted on November 25:
The next obvious question: Why? What happened on that day that could have caused a bunch of bots to tweet about Duterte? And what does this mean?
We wondered if there were any newsworthy events related to Duterte that happened on November 25. But the biggest Duterte-related news during this time had already happened on November 21, four days earlier, when Duterte announced his candidacy.
So to understand this surge of botty tweets, we had to take a closer look at what they were actually saying. Here’s a random sample of some of the tweets posted during that two-hour period:
Seeing a pattern here? Not only were these tweets similar, but most of them were direct reposts of the exact same tweet. The very first instance of this tweet was posted by user @chiminychurva at 12:14 am that day. A few moments later, it was echoed by a few more bot-like accounts similarly named "nightybutera."
Then the floodgates opened. The chart below shows the frequency of tweets about the presidential candidates posted every three minutes that evening. Those containing the exact phrase “Kathniel Mode pala si Duterte” are in blue:
One tweet to rule them all
Not only were these tweets all essentially the same message, but they didn’t even speak positively about Duterte. For the benefit of those who aren’t familiar with Philippine showbiz or politics, let’s break down the tweet’s layers of meaning, one by one:
In a nutshell, the tweet pokes fun at Duterte by comparing him to ABS-CBN’s on-screen love team of Kathryn Bernardo and Daniel Padilla, also known to their fans as “KathNiel.” But what does one have to do with the other?
First, it’s important to remember that Duterte had just announced his presidential bid 4 days before this tweet was posted. Before that, Duterte had repeatedly insisted he had no intention to run for president, despite intense lobbying from his enthusiastic supporters. After a months-long “will he or won’t he?” tease, he finally announced his candidacy on November 21.
The second important piece of the puzzle to understand is KathNiel and the #PushAwards. The KathNiel love team has a rabidly vocal fanbase on Twitter, who often work feverishly to make KathNiel-related hashtags trend.
A few weeks before this tweet was posted, KathNiel fans were campaigning for the couple to win at the #PushAwards, ABS-CBN's first digital media awards, which were conducted via online voting. KathNiel eventually won several categories at the #PushAwards awarding ceremony on November 11.
In summary, the tweet sarcastically compares Duterte's campaign for votes to KathNiel's own campaign to win at the #PushAwards.
We doubt that Duterte’s camp or supporters were behind these bots. After all, this tweet isn’t exactly good PR for his campaign. By using the hashtag #DuArte2016 and the word “trapo”, the tweet criticizes Duterte for being overly dramatic and a scheming, traditional politician.
That’s not exactly a favorable view of the candidate. Although some would argue that campaign operators don’t distinguish between bad and good publicity.
That leaves one to wonder, why did this tweet trend at all?
If you look more closely at the bot-like Twitter accounts that contributed to this trend, most of their timelines are not filled with Duterte-related tweets, but with tweets about KathNiel.
In fact, until as late as February, some of these bots were still rebroadcasting #PushAwardsKathNiels tweets. This tells us that the bots were probably created not to promote Duterte, but to promote KathNiel and the #PushAwards, and that the inclusion of Duterte’s name in the tweet could have been a coincidence.
Bots and buzz
While trending topics are sometimes powered by real users, our analysis has revealed a clear case of a Twitter trend fueled almost entirely by a reserve army of bot-like accounts.
Equally interesting is the fact that candidates don’t necessarily benefit from their name being mentioned thousands of times on Twitter, especially when bots dilute the diversity and authenticity of the conversation.
When it comes to evaluating a candidate’s popularity based on their Twitter buzz, perhaps quality should matter more than quantity this election season. – Rappler.com
Thinking Machines Data Science is a global data science consultancy that helps companies with data strategy, data-driven storytelling, and machine learning. See our blog for more data stories at stories.thinkingmachin.es.