Posts Tagged ‘algorithms’

The Mexican GovBots Did NOT Take Down #YaMeCanse, But We Can Keep #YaMeCanse# Trending

Monday, December 8th, 2014

Perhaps it was the excitement of hearing about a new phenomenon in censorship that prompted me to write a little too hastily about how the government of Mexico might have used Twitter bots to spam and trash the #YaMeCanse hashtag out of the trending topics list. As reported by Lo Que Sigue, Sopitas.com, Aristegui Noticias, and myself, #YaMeCanse, the hashtag used as the rallying cry for Mexico’s 43 missing students, was suddenly dropped from the trending topic list by an army of bots, presumably coordinated by the federal government. No proof is provided by any of us that the government was behind this, but a series of videos and screenshots originally provided by Lo Que Sigue lead us to believe that a swarm of bots is at least responsible.

It was NOT the bots

December 3 at 10:36 AM was the last time @TrendieMX reported #YaMeCanse to be trending. By 9:36 PM of the next day, #YaMeCanse2 was already trending. Let’s take a look at what the Topsy trends for #YaMeCanse looks like for the month of November and the first days of December.

Usage of #YaMeCanse

 

To make sense of what happened, we need to understand what Twitter is doing to calculate a trending topic. We don’t have access to specific information about how the trending algorithm functions, but we do know how trending algorithms in work general and we have some clues about what Twitter has done in the past to tweak its algorithms. The relevant issue here can be described as “the Justin Bieber problem”. Many of you might remember how some years ago Justin Bieber was constantly trending due to the millions of Beliebers continuously tweeting about him. Twitter wants to tell us what’s trending right now, and not one hour ago or one month ago. As Twitter is quoted saying in this Mashable article:

“The new algorithm identifies topics that are immediately popular, rather than topics that have been popular for a while or on a daily basis, to help people discover the ‘most breaking’ breaking news from across the world. (We had previously built in this ’emergent’ algorithm for all local trends, described below.) We think that trending topics which capture the hottest emerging trends and topics of discussion on Twitter are the most interesting.”

Instead of merely looking at volume of Bieber tweets (of which there are many), Twitter looks at speed and “burstiness” of the tweets. However, there’s more to it. If Twitter only measured “burstiness”, you might see “Good Morning” trending every morning of every day. For this, Twitter establishes a baseline of expected frequencies based on history. Twitter “knows” there is usually a spike of “Good Morning” tweets every morning and corrects for this. As this video on trend detection in twitter social data explains, a ratio is calculated for each term based on the past frequency of the term and the present frequency.

What most likely happened is that after a couple of weeks of trending, the baseline for #YaMeCanse rose from zero (it didn’t exist before 11/7) to the frequency of people tweeting at the end of November. Twitter treated the volume and speed of the hashtag as something it would expect and dropped it off the trending list.

Baseline shift on #YaMeCanse

Spam bots should have no impact on the algorithm. The spam team at Twitter identifies the bots and they are not counted towards the algorithm. Additionally, it’s worth mentioning that Twitter has a team of low-paid human workers manually sorting through hashtags to eliminate advertiser spam. Even so, there is no evidence of an increase in bots during the time the hashtag was dropped from the list. The team at Lo Que Sea provided this video as proof of the presence of bots (not that we need proof of that in general)

Screenshot of Lo Que Sea video

Why are individual and unconnected tweets labeled as bots? If I tweet and only one person RT’s me, by their standards, I’m a bot. You can run the simulation from the video yourself on flocker.outliers.es. Use #YaMeCanse2 and wait for the same pattern of connected and disconnected tweets to occur. Then zoom in on the disconnected tweets and look up a couple of usernames. You’ll find a lot of those disconnected nodes are real people. You’ll also run into bots, but having no one retweet your tweet does not make you a bot.

This has happened before.

This would not be the first time that people have cried censorship upon the disappearance of a hashtag from the TT list. The Mashable article quoted above was a response to Beliebers accusing Twitter of censorship. Similarly, occupiers accused Twitter of censorship when #OccupyWallStreet was taken off the list. In both occasions Twitter had to step in and say this was just a result of how the algorithm works. In some cases we should be glad the algorithm works like this, otherwise we’d see #JustinBieber constantly trending. But how about when it’s something important like #YaMeCanse?

At this point I should say that if it were possible for the Mexican government to use such a tactic to censor people on social media, they probably would. We’ve already seen how Peña Nieto’s campaign used bots to promote the candidate on Twitter. And earlier this year, an initiative put forth by Peña Nieto on Radio and Telecom caused a lot of controversy when people claimed the law would allow the government to censor online content and to interrupt cell reception during protests. There’s also the case of 1DMX.org which was censored by GoDaddy under pressure from the US Consulate in Mexico.

We’ve Discovered How To Get Around the Algorithm

I believe the immediate response by the Mexican Twitterverse in the creation of #YaMeCanse2 has revealed an exploitable feature in the algorithm. It took less than two days for people to adapt to the new hashtag. Topsy Trends shows that #YaMeCanse2 doesn’t have significantly more traffic than #YaMeCanse had before being taken down, but the reason why #YaMeCanse2 was able to trend so quickly is because its baseline at the time was zero. This means whenever #YaMeCanse2’s baseline shifts up enough for it to de-trend, we can just start again with #YaMeCanse3. We can keep going with this as long as we keep the speed at which people tweet constant or as long as Twitter doesn’t catch on and modifies the algorithm to account for us just adding a number at the end of the phrase (In which case we can just add the word “tres”). This is also why we keep seeing so many different Bieber hashtags, they’re all different phrases that didn’t exist before.

The lesson here for the Mexican folk is that if we want to continue to have a #YaMeCanse hashtag trending, we need to coordinate to increment the number at the end of the tag each time it expires. When #YaMeCanse2 falls off the list, we simply switch over to #YaMeCanse3.