Archive for the ‘Uncategorized’ Category

Airports as a Preview of the Future

Thursday, October 6th, 2016

I’ve been toying with the idea that airports offer a view into what the U.S.  could look like in the future. There are a lot of trends that have emerged in this country that I believe are in full force in airports. This is a list of what I’ve been thinking.

  • A security apparatus, justified by the constant threat of terrorism.
    • Very tight controls on who comes in, and what you can possess.
    • Clearly biased against minorities
    • Constant presence of military (travel in uniform, priority boarding for those in active duty)
    • Constant reminder of the threats the country faces (CNN on every screen)
  • Social stratifications (passengers, first class, business class, fast food restaurant employees, janitors, security)
    • Passengers are divided by class (boarding order, rewards membership, first class, economy class)
    • Visible class and race division between passengers and airport/shops/restaurant employees
  • Lack of options, favors large corporations (food, stores, wifi)
    • Limited space, high barrier of entry for vendors (I think, would like to look into this more)
    • One or two WiFi providers (who also profit off your data)
    • Ok, this one requires more thought, but I think I’m onto something here
  • Constant surveillance
    • By video with cameras, and your data from WiFi providers
    • ID required at different checkpoints
  • Only airport (state) approved behavior
    • Low tolerance of misbehavior, breaking of rules
    • No protests, no space for dissent, no civil disobedience

Much of this is aided by the use of technology. There is an entire chapter devoted to airports as “coded spaces” in “Code/Space” by Rob Kitchin and Martin Dodge. The chapter is meant to illustrate how software transforms and often defines a space, but it also provides a good list of ways in which technology affords more control over people and objects in an airport. Control of the type which any state would love to have.

Consent Without Consent

Thursday, October 6th, 2016

I recently read an essay by Noam Chomsky titled “Consent Without Consent” where he explains how one of the ways the US has justified its involvement in other countries (to put it mildly) is through the term of the same name as the title. The concept of “consent without consent”, as coined by Franklin Henry Giddings, is that “if in later years [the colonized] see and admit that the disputed relation was for the highest of interest, it may be reasonably held that authority has been imposed with the consent of the governed.” The example Giddings uses is the liberation of the Phillippines. By liberation, he means “‘slaughtering the natives in an English fashion’ so that ‘misguided creatures’ who resist us will at least ‘respect our arms’ and later come to recognize that we wish them ‘liberty’ and ‘happiness,’ at least those who survive the ‘wholesale killing’ they are forcing us to undertake.”

It’s easy to see how this excuse can be applied to many of the actions taken by those in power. Often the elite think they know what is best for the rest, and they act with the best of intentions, thinking that the initial suffering of the few will eventually be for the good of the many. I don’t know under what conditions a thought like this is justified and when it is not. But it’s definitely worth thinking about.

A (Probably) Incomplete Taxonomy of Peñabots (And Their Friends)

Saturday, November 28th, 2015

In this post I catalog the different ways local, state, and federal governments in Mexico have used social media bots in different ways to their advantage. This is the other side of the excellent efforts made by projects like Botivist to link activists with the use of bots. However, what may be useful to activists may also be useful to state actors. I think a list of this kind is necessary because it is important to be able to recognize the ways in which such state actors (even well-funded private actors) can subvert speech online. In this list I mention cases where large groups of people rather than software are used. Partly because it is hard to discern between the two methods, and mostly because the outcome is essentially the same. The methods for spreading or silencing information can be achieved through either means (manual or automated). I believe this to be especially true since governments usually have plenty of resources at their disposal. At the end I pose some questions yet to be answered.

  • Fake Support
    1. Spread campaign messages by tweeting the same message or similar messages in a coordinated effort
    2. Retweeting supporters or online content (news, posts, photos) that is favorable to the campaign or political agenda.
    3. Coordinating large number of people with real and fake accounts to tweet the same hashtag to get it trending (used in conjunction to bumping off hashtags)
    4. Padding Twitter follower count.
  • Drowning Out Oppositional Voices
    1. Flood hashtags with spam so that discourse is impossible. This can occur when many bots tweet a trending hashtag so often that the feed becomes difficult to read because of Twitter’s auto-refresh feature. Filling the hashtag with spam could also hinder discussion since finding meaningful conversation among in a feed can be difficult. Another consequence of filling a hashtag with spam could be the removal of a hashtag from the TT list.
    2. Knock hashtags off TT list by replacing it with other favorable hashtags.
    3. Spread misinformation?
    4. Trolling
  • Defamation and Intimidation
    1. Intimidation
    2. Defamation
      • Lydia Cacho Ribeiro, Sept 2013, from QR gov.

Subverting Speech vs. PR Campaign
If a presidential campaign hires a PR company which utilizes a large amount of volunteers or paid workers or software bots to tweet in a coordinated effort to propagate your message, is this free speech? Does this subvert other’s speech? If this army of bots floods a discussion forum, does this act as censorship? What about when an activist organization does the same thing? I think there needs to be a discussion about what’s considered ethical when employing the use of bots.

Detecting Bots
It is difficult to know when we are dealing with software bots, with a PR campaign made up of volunteers, or merely just regular citizens tweeting in support of the government. Several people have tried to analyze Twitter data to “prove” the presence of bots, but I’m still left with a lot of questions.  What if there really was a lot of non-political Twitterers who were excited about the long weekend (#EnDiaDePuente)? What if the Twitter algorithm worked against the activist hashtag? I’ve been to tech conferences where the hashtag gets spammed, what if we’re dealing with regular Twitter spam and not state-sponsored spam? Are there any dangers in misclassifying spam and bots? Are there any negative effects in constantly blaming every piece of spam on the Peñabots?

Trending Topic Algorithm
The claims that TT such as #YaMeCanse were bumped by bots are hard to prove (despite the fancy graph videos). Similar claims of censorship occurred during the Occupy Wall Street protests where many activists claimed Twitter was censoring them by not adding #occupywallstreet to the TT list. In response, several bloggers, journalists and Twitter themselves, explained that the TT algorithms look for “trendiness” and “burstiness”, and that while OWS might have had a large volume of Tweets, it did not display the spikes in volume the algorithm is constantly looking for. Although I don’t doubt the presence of bots (even State bots), I am not convinced that anything other than the regular algorithmic forces were at play in knocking #YaMeCanse off the TT list. Furthermore, more research and discussions need to be had around the importance of trending topics to activist causes. Are they crucial? Are they overvalued?

Analyzing Twitter Data
Twitter is VERY noisy. Even very specific words that seem to only talk about one specific event like “balacera” (shootout) are hard to analyze. Especially when the word becomes part of quotidian culture. For example, in Mexico people tend to tweet when there is a shootout in their city, usually to warn others to stay out of the area. In trying to analyze ‘balacera’ tweets, I’ve found that people also talk about past shootouts, potential shootouts, and even tell jokes about shootouts. So one needs to be careful when analyzing tweets. Some of the best insights on Twitter use I’ve seen have come from qualitative research methods like interviews.

Bots Are People Too

Friday, November 27th, 2015

I came to love bots because of @TwoHeadlines, @CongressEdits and NYTimes Haiku. Not only did they bring me great entertainment, but as is the case with @CongressEdits, I thought they were providing a great service. However, not all bots are cool. Lurking among us are bots sponsored by governments. Bots whose mission is to give a false impression of support or to drown out opposing views. These types of bots have been used in recent years by the governments of Mexico, Venezuela, Russia and Syria. However the methods by which these governments use bots are not as clear and transparent as they are with political bots like @CongressEdits. In this post, I’ll use the government of Mexico’s use of political bots as an example of how the traditional definition of political bots might not be enough cover what’s happening in Mexico.

I’ll start by using Wikipedia’s definition of bots and is the one widely used to describe bots. Bots are “software application that runs automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone.” I like this definition because it leads me to ask the question: What if you had access to enough resources and low-skilled labor such that you could perform the simple and repetitive tasks with real people? This is how our story of the Peñabots begins.

Shortly after the start of the 2012 Mexican presidential election, then candidate Enrique Peña Nieto rose to Twitter dominance. It seemed the candidate was enjoying an unusual amount of support from social media users. To some savvy netizens, this seemed suspicious. Sure enough, Twitterverse eventually discovered that the EPN campaign was making use of large armies of volunteers to show support for the candidate and to drown out any opposition that could appear online. The name Peñabot appeared shortly after this discovery in November 2011. It is unclear whether the word “bot” in “Peñabot” was originally used to fit the traditional definition of a “software that performs a repetitive task” or if it was originally used to refer to actual volunteers who under direction from campaign managers would mindlessly perform repetitive tasks such as tweeting the same text and retweeting other supporters. In case of the latter, the word Peñabot is similar to Limbots and Obamabots and is often used in parallel with Pejezombies, or the followers of competing candidate Lopez Obrador.


Although people seem to be pretty confident that the campaign (and now actual government) used software bots, it has been confirmed that EPN did in fact use armies of volunteers to take over the conversation on Twitter. One former EPN social media manager confessed that at one point the campaign was coordinating 20,000 volunteers to show support for the candidate and to target and drown out any opposing hashtag that could surface during the campaign trail.

The ability to coordinate such a large army of volunteer may be effective in bumping hashtags off the trending topic list since Twitter’s algorithms measure “burstiness” of tweets. One example where volunteers overran a hashtag was with #MarchaAntiEPN (March Against EPN) in the state of Tabasco where volunteers bumped the hashtag off the list with #TodoTabascoConEPN (All of Tabasco With EPN). People discovered the presence of Peñabots because the majority of #TodoTabascoConEPN tweets came from a single location in Mexico City (apparently tweets were geotagged?).

There is no reason why EPN’s government couldn’t carry out their campaign using actual software bots (RT’ing pro-government tweets, tweeting from a list of pre-written messages, crowding hashtags and making your own hashtag trend). However, since they relied so heavily on “manual” tweeting to achieve something that a bot might be able to do, I wonder if it makes sense to expand the definition of bots to include such large-scale tweeting campaigns?  Does it matter to citizens whether a pro-government message comes from a person or a bot if they can’t tell which one produced the message? Does it matter to citizens if they knew it was a bot or a person acting as part of a larger campaign? 

Human or Bot?

On Evidence-Based Sentencing and the Variables of Race, Age, and Social Achievement

Saturday, July 25th, 2015

I was reading a paper on evidence-based sentencing called “Risk in Sentencing: Constitutionally-Suspect Variables and Evidence-Based Sentencing“. And in it, the authors list (list is generated by another study), fifteen different variables with statistically significant relationships with recidivism. Here are some (on a 0.30 scale):

  • Criminal companions: z=0.21
  • Antisocial personality: z=0.18
  • Adult criminal history: z=0.17
  • Family rearing practices: z=0.14
  • Social achievement (education, marital status, employment): z=0.13
  • Race: z=0.17
  • Age: z=0.11
  • Gender: z=0.06
  • Socio-economic status of origin: z=0.05

Immediate things that pop out: race, criminal companions (who you hang out with), social achievements (education, marital status, employment), age, gender, and socio-economic status of origin. According to this study, these factors indicate some probability of recidivism. Luckily, several of these variables (such as race and age) are constitutionally barred from being taken into account in sentencing decisions. But the point I want to make is that I don’t think most of these should be a factor in determining a person’s sentence. And I think this study is a great example of why we should be careful when drawing conclusions from analyzing data. I sometimes tell this joke: “100% of divorces are caused by marriage”. It’s silly, but I think it’s relevant here. Yes, divorces begin with marriage, but if you blame marriage on divorce, you’re kind of missing some important underlying cause. Sure, young poor uneducated black people who hang out with other criminals might have an increased chance of recidivism, but is that really the underlying cause? Is it really their fault that they are young, poor, uneducated and black living in a neighborhood where everyone else is young, poor, uneducated, and black?

This is a great example of algorithms just pointing out the obvious and yet missing the larger picture. It’s like Google’s flu detector which actually might only be a winter detector. We need to think about how we construct these algorithms and how we are employing them to make decisions that might affect hundreds of thousands of people. We shouldn’t be asking “how does the race variable relate to recidivism?” There’s nothing “variable” about race. Or age. Or socio-economic status. These are the wrong questions. Instead, why don’t we ask ourselves “What can we do, to improve a person’s life, such that the color of their skin doesn’t correlate with a high recidivism rate?” I think that’s a more worthwhile pursuit.


Placemeter pays YOU for your data…

Wednesday, October 8th, 2014
Note: I have set a new goal to post at least once a week, even if the posts are short.

Turns out you may have some data to offer that is actually more valuable than just your online shopping patterns: the view outside your window. Placemeter is a relatively new startup that pays New Yorkers up to $50 to place their phones against their windows and record movements on the street below. Using nifty computer vision algorithms, Placemeter extracts data from the images recorded by your phone. The short video below gives a sense of what they are trying to track.

The front page immediately addresses the issue of privacy. The company will not use the data to record anything that goes on inside your home, they will not use the data to identify people on the street, and the video they record isn’t stored. They only store raw data extracted from the video.

Their business model is simple: they pay you a little bit per month to record information which they will later sell to third parties. You provide the product they later sell (hey, at least they pay you for it). Since their goal is to sell data to businesses and city governments, they are mostly interested in views of restaurants, shops, or bars. This means lots of people like me can’t participate (I have a very lovely view of a wall). This got me thinking on who else can and can’t participate. If you happen to live in (and have a view of) Times Square, your view could be worth dozens of dollars! What about a view from a quiet Staten Island street? Or from the Bronx? Basically in order to participate you just have to live in the right place. A place that is probably expensive too.

One redditor applied to sell his/her view and was rejected because the street wasn’t busy enough, but that he/she would be considered when the company started “sending out unpaid meters”. I imagine this means the company would mail you a sensor for free and you would record data for them. If this happens I can see them shifting the rhetoric towards “help us analyse and improve your urban environment”, which this article already does.

Seeing as how the most valuable data would come from a select group of New Yorkers, most of their most valuable data might come from the already freely available video feeds around the city (they should fill out the survey for the OD500).

RT @MartinLutherKingJr I Have a Dream… #CivilRights

Wednesday, August 28th, 2013

On August 28, 1963, Martin Luther King Jr. delivered his famous ‘I Have a Dream’ speech during the March on Washington for Jobs and Freedom. The event, attended by over 250,000 people turned out to be a defining moment for the Civil Rights Movement. Fifty years later, the country has made great strides towards equality, but a lot remains to be done not just in terms of equality, but also in a wide range of social issues. While some of the issues of today are the same as the issues fifty years ago, new technologies and new ideas have quickly empowered us to react to these struggles in ways where we don’t yet fully understand its effect. The widespread adoption of the internet and social media have by no means replaced traditional marches like Dr. King’s, but rather it has augmented the form in which people participate in social movements. Going forward, it’s important to ask what is the effect of these new forms of activism. Will traditional marches like the one fifty years ago ever be replaced? Will online forms of protest ever match the effects of, for example, the 60’s anti-war movement? What would Dr. King make of the hundreds of thousands of people who tweet for a cause?

Screen Shot 2013-08-28 at 9.16.38 AM


Click on the image to see the Processing sketch. Code included.

*This post was posted in this blog in tandem with The GovLab.
**No… the tweets are not live.

Mexican Drug War Data

Sunday, November 4th, 2012

A project to gather crime drug war crimes data.

The recent publishing of stop and frisk data from the NYCLU has stirred a lot of controversy, particularly because the data showed strong evidence of discrimination by the NYPD towards black and Hispanic New Yorkers. The NYCLU was able to conduct this study because of a piece of legislation introduced by the New York City Council requiring the NYPD to provide quarterly reports on stop-and-frisk data. Since then, the NYPD has also kept a computerized database of its stop-and-frisk program. The level of detail and granularity in these reports have made it possible for organizations such as the NYCLU to conduct successful studies on the results of the NYPD’s stop-and-frisk program. As a result, these studies have brought important social issues into the limelight and have demonstrated (as has been demonstrated countless times before) the importance of publicly accessible data.

The stop-and-frisk data’s level of detail is something to be dreamed of in Mexico. Unfortunately, after 6 years of violence, there is still no publicly accessible data set of stop-and-frisk-data granularity available on the violence in Mexico. We think that having a database with detailed reports on each incident would help better understand what has and is happening in Mexico. Our project will attempt to create a database of detailed “incidents” that have occurred in Mexico since the start of the drug war. We will attempt to do this in two ways. The populating of the database will first rely on people visiting the site to input past events by skimming through different news sources and providing a detailed account of what happened by using a friendly form on the site. Even if we got sufficient participation to go through all the news sources and capture all reported events, there is an inherent problem in that not all events get reported in the main stream media. In fact, the media has been doing such a bad job at reporting the violence in Mexico that people have taken matters into their own hands and have taken on the role of citizen reporters to warn other people about shootings in Mexico. Twitter has become one of the greater tools of citizen reporting with people’s prolific use of hashtags. From the project launch we plan on keeping a live recording of events as they unfold on Twitter and we will rely on people to confirm events and provide greater detail.

Preliminary twitter stream analysis indicates that events are more widely reported in twitter and much faster than any news source in Mexico. Events also seem to be relatively easy to spot. The graph below is a histogram of tweets containing the word “balacera” (shooting) over a period of twelve days. The spikes represent increased activity that could potentially indicate a shooting is taking place. By identifying these events (by measuring ∂Tweet/∂t) we could reach out to people and ask them to help validate and provide more information on the shooting.

Gathering information live from tweets is not a new idea, and there is by no means an absence of information pertaining to the violence in Mexico. However, the data contains only what is reported by the government (I, for one, find it hard to believe that in 48 months, my hometown of Monterrey only had 297 murders and the border town of Nuevo Laredo only had 159) and the smallest level of detail is murder per city per month. One example of a provider of crime data is the Citizen Institute of Studies on Insecurity (ICESI). The site contains several accessible data sets but their data is organized by state per year. Whatever information they used to come up with their results is not accessible to the public, and attempts to contact them about obtaining more information have been stressful and ultimately futile. UPDATE: They seem to not exist anymore.

Several attempts at harnessing Twitter data for live reporting of shootings have been attempted. Most notable is Retio, a project started over a year ago by a group of engineers in Mérida. While it has mixed success rates in different cities, Retio has been able to harness the power of citizen reporting in major cities like Monterrey, Guadalajara and Mexico City quite effectively. However, Retio relies on users to actively tweet to one of Retio’s many different Twitter accounts (1 account per city). The tweets are then automatically categorized by report type and then retweeted from the respective city account. But the site has several shortcomings. First, while the site relies on people mentioning certain hashtags and accounts, the system still does a bad job eliminating spam because it looks at individual tweets. Second, reports contain very little information; it seems like Retio’s job is to simply map events and retweet the incident. Third, users are given no choice in anonymity (we are still debating on the benefits of anonymity). Lastly, and perhaps more importantly, the information gathered by Retio is not publicly available in a machine-readable format. Another big citizen reporting tool available is Centro de Integración Ciudadana (CIC). CIC does not rely on Twitter data, and just like Retio their reports do not contain much information and is not freely available in a machine-readable format. Whatever the shortcomings are for these different tools, they do prove that the Mexican citizenry is engaged and willing to participate.

The goal of our project is to create an easily accessible database that will (hopefully) provide better information than what is currently out there. We hope to gain support from citizens by actively reaching out to engaged citizens via Twitter and asking for their support. The idea is still very much in its early development, and it might seem like we’re reaching for “low-hanging fruit” but we’re fairly confident that we can provide a better service than some of the other citizen reporting projects that currently exist.