Mexican Drug War Data
A project to gather crime drug war crimes data.
The recent publishing of stop and frisk data from the NYCLU has stirred a lot of controversy, particularly because the data showed strong evidence of discrimination by the NYPD towards black and Hispanic New Yorkers. The NYCLU was able to conduct this study because of a piece of legislation introduced by the New York City Council requiring the NYPD to provide quarterly reports on stop-and-frisk data. Since then, the NYPD has also kept a computerized database of its stop-and-frisk program. The level of detail and granularity in these reports have made it possible for organizations such as the NYCLU to conduct successful studies on the results of the NYPD’s stop-and-frisk program. As a result, these studies have brought important social issues into the limelight and have demonstrated (as has been demonstrated countless times before) the importance of publicly accessible data.
The stop-and-frisk data’s level of detail is something to be dreamed of in Mexico. Unfortunately, after 6 years of violence, there is still no publicly accessible data set of stop-and-frisk-data granularity available on the violence in Mexico. We think that having a database with detailed reports on each incident would help better understand what has and is happening in Mexico. Our project will attempt to create a database of detailed “incidents” that have occurred in Mexico since the start of the drug war. We will attempt to do this in two ways. The populating of the database will first rely on people visiting the site to input past events by skimming through different news sources and providing a detailed account of what happened by using a friendly form on the site. Even if we got sufficient participation to go through all the news sources and capture all reported events, there is an inherent problem in that not all events get reported in the main stream media. In fact, the media has been doing such a bad job at reporting the violence in Mexico that people have taken matters into their own hands and have taken on the role of citizen reporters to warn other people about shootings in Mexico. Twitter has become one of the greater tools of citizen reporting with people’s prolific use of hashtags. From the project launch we plan on keeping a live recording of events as they unfold on Twitter and we will rely on people to confirm events and provide greater detail.
Preliminary twitter stream analysis indicates that events are more widely reported in twitter and much faster than any news source in Mexico. Events also seem to be relatively easy to spot. The graph below is a histogram of tweets containing the word “balacera” (shooting) over a period of twelve days. The spikes represent increased activity that could potentially indicate a shooting is taking place. By identifying these events (by measuring ∂Tweet/∂t) we could reach out to people and ask them to help validate and provide more information on the shooting.
Gathering information live from tweets is not a new idea, and there is by no means an absence of information pertaining to the violence in Mexico. However, the data contains only what is reported by the government (I, for one, find it hard to believe that in 48 months, my hometown of Monterrey only had 297 murders and the border town of Nuevo Laredo only had 159) and the smallest level of detail is murder per city per month. One example of a provider of crime data is the Citizen Institute of Studies on Insecurity (ICESI). The site contains several accessible data sets but their data is organized by state per year. Whatever information they used to come up with their results is not accessible to the public, and attempts to contact them about obtaining more information have been stressful and ultimately futile. UPDATE: They seem to not exist anymore.
Several attempts at harnessing Twitter data for live reporting of shootings have been attempted. Most notable is Retio, a project started over a year ago by a group of engineers in Mérida. While it has mixed success rates in different cities, Retio has been able to harness the power of citizen reporting in major cities like Monterrey, Guadalajara and Mexico City quite effectively. However, Retio relies on users to actively tweet to one of Retio’s many different Twitter accounts (1 account per city). The tweets are then automatically categorized by report type and then retweeted from the respective city account. But the site has several shortcomings. First, while the site relies on people mentioning certain hashtags and accounts, the system still does a bad job eliminating spam because it looks at individual tweets. Second, reports contain very little information; it seems like Retio’s job is to simply map events and retweet the incident. Third, users are given no choice in anonymity (we are still debating on the benefits of anonymity). Lastly, and perhaps more importantly, the information gathered by Retio is not publicly available in a machine-readable format. Another big citizen reporting tool available is Centro de Integración Ciudadana (CIC). CIC does not rely on Twitter data, and just like Retio their reports do not contain much information and is not freely available in a machine-readable format. Whatever the shortcomings are for these different tools, they do prove that the Mexican citizenry is engaged and willing to participate.
The goal of our project is to create an easily accessible database that will (hopefully) provide better information than what is currently out there. We hope to gain support from citizens by actively reaching out to engaged citizens via Twitter and asking for their support. The idea is still very much in its early development, and it might seem like we’re reaching for “low-hanging fruit” but we’re fairly confident that we can provide a better service than some of the other citizen reporting projects that currently exist.