Archive for December, 2014

The Mexican GovBots Did NOT Take Down #YaMeCanse, But We Can Keep #YaMeCanse# Trending

Monday, December 8th, 2014

Perhaps it was the excitement of hearing about a new phenomenon in censorship that prompted me to write a little too hastily about how the government of Mexico might have used Twitter bots to spam and trash the #YaMeCanse hashtag out of the trending topics list. As reported by Lo Que Sigue, Sopitas.com, Aristegui Noticias, and myself, #YaMeCanse, the hashtag used as the rallying cry for Mexico’s 43 missing students, was suddenly dropped from the trending topic list by an army of bots, presumably coordinated by the federal government. No proof is provided by any of us that the government was behind this, but a series of videos and screenshots originally provided by Lo Que Sigue lead us to believe that a swarm of bots is at least responsible.

It was NOT the bots

December 3 at 10:36 AM was the last time @TrendieMX reported #YaMeCanse to be trending. By 9:36 PM of the next day, #YaMeCanse2 was already trending. Let’s take a look at what the Topsy trends for #YaMeCanse looks like for the month of November and the first days of December.

Usage of #YaMeCanse

 

To make sense of what happened, we need to understand what Twitter is doing to calculate a trending topic. We don’t have access to specific information about how the trending algorithm functions, but we do know how trending algorithms in work general and we have some clues about what Twitter has done in the past to tweak its algorithms. The relevant issue here can be described as “the Justin Bieber problem”. Many of you might remember how some years ago Justin Bieber was constantly trending due to the millions of Beliebers continuously tweeting about him. Twitter wants to tell us what’s trending right now, and not one hour ago or one month ago. As Twitter is quoted saying in this Mashable article:

“The new algorithm identifies topics that are immediately popular, rather than topics that have been popular for a while or on a daily basis, to help people discover the ‘most breaking’ breaking news from across the world. (We had previously built in this ’emergent’ algorithm for all local trends, described below.) We think that trending topics which capture the hottest emerging trends and topics of discussion on Twitter are the most interesting.”

Instead of merely looking at volume of Bieber tweets (of which there are many), Twitter looks at speed and “burstiness” of the tweets. However, there’s more to it. If Twitter only measured “burstiness”, you might see “Good Morning” trending every morning of every day. For this, Twitter establishes a baseline of expected frequencies based on history. Twitter “knows” there is usually a spike of “Good Morning” tweets every morning and corrects for this. As this video on trend detection in twitter social data explains, a ratio is calculated for each term based on the past frequency of the term and the present frequency.

What most likely happened is that after a couple of weeks of trending, the baseline for #YaMeCanse rose from zero (it didn’t exist before 11/7) to the frequency of people tweeting at the end of November. Twitter treated the volume and speed of the hashtag as something it would expect and dropped it off the trending list.

Baseline shift on #YaMeCanse

Spam bots should have no impact on the algorithm. The spam team at Twitter identifies the bots and they are not counted towards the algorithm. Additionally, it’s worth mentioning that Twitter has a team of low-paid human workers manually sorting through hashtags to eliminate advertiser spam. Even so, there is no evidence of an increase in bots during the time the hashtag was dropped from the list. The team at Lo Que Sea provided this video as proof of the presence of bots (not that we need proof of that in general)

Screenshot of Lo Que Sea video

Why are individual and unconnected tweets labeled as bots? If I tweet and only one person RT’s me, by their standards, I’m a bot. You can run the simulation from the video yourself on flocker.outliers.es. Use #YaMeCanse2 and wait for the same pattern of connected and disconnected tweets to occur. Then zoom in on the disconnected tweets and look up a couple of usernames. You’ll find a lot of those disconnected nodes are real people. You’ll also run into bots, but having no one retweet your tweet does not make you a bot.

This has happened before.

This would not be the first time that people have cried censorship upon the disappearance of a hashtag from the TT list. The Mashable article quoted above was a response to Beliebers accusing Twitter of censorship. Similarly, occupiers accused Twitter of censorship when #OccupyWallStreet was taken off the list. In both occasions Twitter had to step in and say this was just a result of how the algorithm works. In some cases we should be glad the algorithm works like this, otherwise we’d see #JustinBieber constantly trending. But how about when it’s something important like #YaMeCanse?

At this point I should say that if it were possible for the Mexican government to use such a tactic to censor people on social media, they probably would. We’ve already seen how Peña Nieto’s campaign used bots to promote the candidate on Twitter. And earlier this year, an initiative put forth by Peña Nieto on Radio and Telecom caused a lot of controversy when people claimed the law would allow the government to censor online content and to interrupt cell reception during protests. There’s also the case of 1DMX.org which was censored by GoDaddy under pressure from the US Consulate in Mexico.

We’ve Discovered How To Get Around the Algorithm

I believe the immediate response by the Mexican Twitterverse in the creation of #YaMeCanse2 has revealed an exploitable feature in the algorithm. It took less than two days for people to adapt to the new hashtag. Topsy Trends shows that #YaMeCanse2 doesn’t have significantly more traffic than #YaMeCanse had before being taken down, but the reason why #YaMeCanse2 was able to trend so quickly is because its baseline at the time was zero. This means whenever #YaMeCanse2’s baseline shifts up enough for it to de-trend, we can just start again with #YaMeCanse3. We can keep going with this as long as we keep the speed at which people tweet constant or as long as Twitter doesn’t catch on and modifies the algorithm to account for us just adding a number at the end of the phrase (In which case we can just add the word “tres”). This is also why we keep seeing so many different Bieber hashtags, they’re all different phrases that didn’t exist before.

The lesson here for the Mexican folk is that if we want to continue to have a #YaMeCanse hashtag trending, we need to coordinate to increment the number at the end of the tag each time it expires. When #YaMeCanse2 falls off the list, we simply switch over to #YaMeCanse3.

Censorship on Twitter Using Bots? How #YaMeCanse Was Knocked Off Twitter Trending Topics

Thursday, December 4th, 2014

UPDATE BELOW, AND ALSO THIS POST WITH MORE INFO.

In late September of this year, 43 students in the Mexican state of Guerrero went missing. In an attempt to prevent students from disrupting a political event for his wife, the mayor of Iguala ordered local police to stop and detain the students. This set in motion a series of events that resulted in several murdered students and 43 missing students. People later learned the missing students were handed over to a local cartel and were subsequently killed and burned until no traces of their bodies were left behind. This announcement was made during a press conference by Mexico’s Attorney General, Jesús Murillo Karam where at the end of the conference, tired and exasperated, said “Ya me cansé.” I’ve had enough.

#YaMeCanse

Mexicans took to social media and responded with “We’re tired too…” Of the violence. Of the injustice. Of the impunity. Of the corruption. The #YaMeCanse hashtag became the rallying cry for discourse online and protests all over Mexico. The hashtag has been on Twitter trending topics almost since Murillo Karam’s press conference. Yesterday, the hashtag suddenly disappeared from the list even though usage had not waned.

Usage of #YaMeCanse

 

This sudden disappearance of such a popular hashtag raised some eyebrows. Determining trending topics is a little more complicated than simply calculating the number of mentions of a hashtag. Twitter has an algorithm that determines trending topics based on several factors. According to Twitter, one of the rules against usage of trending topics is “Repeatedly Tweeting the same topic/hashtag without adding value to the conversation in an attempt to get the topic trending or trending higher.” It is very likely that the overwhelming spamming of the #YaMeCanse caused Twitter’s algorithms to treat the hashtag as spam and proceded to remove it from the trending list.

As reported in sopitas.com, an army of bots had been RT and tweeting the #YaMeCanse hashtag for several days.

“Who says that online censorship and repression does not exist online? A storm of bots tries to disappear #YaMeCanse”

Spam Tweets

Another analysis by Lo Que Sigue shows the difference between connected and disconnected tweets symbolizing real people versus bots.

#YaMeCanse2

Not to be easily dissuaded, the Mexican twitterverse quickly came up with a simple solution: #YaMeCanse2, which is currently trending. An added cleverness to adding the number ‘2’ is that it forces people to ask “What happened to regular #YaMeCanse? Where’s #YaMeCanse1?” which leads people to find out about the attack. It’s a sort of the Barbra Streissand effect where in an attempt to censor one hashtag, not only do people evade the censorship, but in doing so call attention to the attempt at censorship.

It’s quite possible that this is not a coordinated attack on the hashtag by some entity. It could be just regular bots hijacking a popular hashtag. And it is very tempting to attribute to this “attack” to the government of Mexico. I would not be surprised at all if it was, and I’d be willing to bet that the Mexican government is behind this (it wouldn’t be the first time), but I would like to find definitive proof. The people behind Lo Que Sigue working to start an Indiegogo campaign to try and find the origin of these bots. Perhaps we don’t have to wait around for this to get funded and we can crowdsource/collaborate to try and see if tracing the origin of the bots is possible. I would welcome any ideas on how to do this.

UPDATE:

So Trending Topics are more complicated than they seem. It’s hard to tell whether bots had any role in dropping the hashtag from the trending list. It seems that Twitter is actually looking for “bursts” of tweets, and how fast these tweets appear ( ∂Tw/∂t?). It is entirely possible that volume of tweets remained stable but the “burstiness” was gone. I don’t know. Twitter’s algorithms are very private. Even if bots played no part in dropping the hashtag, the possibility of that happening might still exist. After all, riding hashtags to promote unrelated content is shunned by Twitter. Whether they can detect that algorithmically, I’m not sure, but I wouldn’t be surprised. If they can detect that, then it’s entirely possible to spam a hashtag using bots. Perhaps the only way to find out is to actually measure the volume and speed of the bots. Doing this, it turns out, is very hard.