Impure Hip Hop Dissent

September 7th, 2015

I was reading an essay by Tommie Shelby called “Impure Dissent: Hip Hop and the Political Ethics of Marginalized Black Urban Youth”. In it, Shelby argues that political rap, although it’s lyrics are sometimes misogynistic, homophobic, celebrate violence against cops, and valorize gunplay and street crime, still exhibit an important form of political dissent that should not be cast aside because of its “impurity”. He calls this political rap music “impure dissent”, and argues that it has some intrinsic value since it is not meant to elicit some sort of social change, or somehow change the status quo.

Among the many interesting ideas mentioned, Shelby says that beyond condemning an injustice, impure hip hop dissent has two further functions: “to publicly pledge loyalty to the oppressed, and to explicitly withhold loyalty from the state. {…} This dissent is the expression of solidarity with the oppressed against perceived injustice, not so much because those in power may change course as a result, but because the dissenters want to make clear whose side they are on.”

Another thing that struck me was that while the political speech in hip hop may make its way into the public sphere, it does not necessarily ask for a rational communicative exchange. How could it? When you’re singing about glorifying street crime, you’re not inviting other people to take part in a conversation about its merits. This one-sided dissent may strike other people as “morally impure” since dissenters are refusing to listen much less reply to criticism. “These dissenters … may seem to be lacking in [the] appropriate civic spirit of reciprocity.” Shelby offers another possibility. Dissenters may hold the opinion that critics are arguing in bad faith, and that their unsympathetic attitudes towards their plight is an indication that a meaningful dialog is just not possible.

This reminded me of some of the insensitive responses to the Ferguson protests, in particular the hashtag #PantsUpDontLoot. Perhaps the tiny faction of individuals who were actually involved in the looting could be described as practicing “impure protesting”. They might not have been looking for dialog. Especially not from someone who would reduce the entirety of all racial injustices in the country to “pull up your pants.” Maybe they were looting because they were just really fucking angry.

Shelby’s essay comes from the book “From Voice To Influence: Understanding Citizenship in a Digital Age”, which I highly recommend. It was a happy coincidence that I read this today since I was planning on going to the movies to watch “Straight Outta Compton.”

Movie watching inspired by this sign I saw while walking around Brooklyn today.

Movie watching inspired by this sign I saw while walking around Brooklyn today.

Also: When I thought about writing this I was thinking about how impure hip hop dissent might compare to narcocorridos. Several times I have heard narcocorridos compared to hip hop since they promote and glorify violence. And much like hip hop artists, many narcocorrido artists come from very humble backgrounds and have faced years of neglect and injustice from the state. However, the more I thought about it, the harder it was to make the case that narcocorridos can qualify as impure dissent. Mainly because, in my limited knowledge of narcolyrics, much of that music is not political at all. It mostly just celebrates narcoculture. It’s worth looking into, though. I might also have a very biased opinion since I come from a place that was very much affected by narco violence.

On Evidence-Based Sentencing and the Variables of Race, Age, and Social Achievement

July 25th, 2015

I was reading a paper on evidence-based sentencing called “Risk in Sentencing: Constitutionally-Suspect Variables and Evidence-Based Sentencing“. And in it, the authors list (list is generated by another study), fifteen different variables with statistically significant relationships with recidivism. Here are some (on a 0.30 scale):

  • Criminal companions: z=0.21
  • Antisocial personality: z=0.18
  • Adult criminal history: z=0.17
  • Family rearing practices: z=0.14
  • Social achievement (education, marital status, employment): z=0.13
  • Race: z=0.17
  • Age: z=0.11
  • Gender: z=0.06
  • Socio-economic status of origin: z=0.05

Immediate things that pop out: race, criminal companions (who you hang out with), social achievements (education, marital status, employment), age, gender, and socio-economic status of origin. According to this study, these factors indicate some probability of recidivism. Luckily, several of these variables (such as race and age) are constitutionally barred from being taken into account in sentencing decisions. But the point I want to make is that I don’t think most of these should be a factor in determining a person’s sentence. And I think this study is a great example of why we should be careful when drawing conclusions from analyzing data. I sometimes tell this joke: “100% of divorces are caused by marriage”. It’s silly, but I think it’s relevant here. Yes, divorces begin with marriage, but if you blame marriage on divorce, you’re kind of missing some important underlying cause. Sure, young poor uneducated black people who hang out with other criminals might have an increased chance of recidivism, but is that really the underlying cause? Is it really their fault that they are young, poor, uneducated and black living in a neighborhood where everyone else is young, poor, uneducated, and black?

This is a great example of algorithms just pointing out the obvious and yet missing the larger picture. It’s like Google’s flu detector which actually might only be a winter detector. We need to think about how we construct these algorithms and how we are employing them to make decisions that might affect hundreds of thousands of people. We shouldn’t be asking “how does the race variable relate to recidivism?” There’s nothing “variable” about race. Or age. Or socio-economic status. These are the wrong questions. Instead, why don’t we ask ourselves “What can we do, to improve a person’s life, such that the color of their skin doesn’t correlate with a high recidivism rate?” I think that’s a more worthwhile pursuit.


Book: World Dynamics

July 4th, 2015

I like books. I am not a fast reader. I have not read thousands of books. I don’t even read all the books I own. But I like them nonetheless. I especially like collecting early editions of books that have a special meaning to me or that tried to predict the future, or books that were influential especially in design and technology and how those two fields would or could transform society.

Following Rune Madsen’s lead, I decided to write every now and then about the books I’ve collected. The first one is “World Dynamics” from Jay W. Forrester:

Photo Jul 03, 8 09 47 PM

Forrester was the founder of system dynamics. Unlike the dynamic systems class you might’ve taken in college were you draw bond graphs and model a box-sliding-down-a-ramp-on-a-spring-on-a-damper-on-a-pully-on-a-water-pump-on-a-generator-on-a-moving-train using a system of equations, Forrester’s system dynamics was meant to model complex problems like population growth, use and exhaustion of resources, industrial processes, or determining the success and failures of corporations (the original intent). “World Dynamics” is an application of system dynamics to model the world’s population growth and the exhaustion of resources.

Idea for the study supposedly originated when Forrester met with the founder of Club of Rome, a think-tank that deals with a variety of international issues. Back in the 60’s and early 70’s people had begun to think about the environment, the fear of overpopulation and the exhaustion of resources. This book was meant to give a prediction of what the world could look like in the future. Another book that came out of the Club of Rome was “The Limits to Growth” by Donella Meadows (still trying to get my hands on that one).

The model Forrester produced, predicted that the limiting factor in bringing the world into equilibrium would not only be population growth and the availability of food, but also pollution, crowding, and depletion of resources. Industrialization might be a bigger threat than overpopulation because of the limits of the environment.

The goal of producing the model was to search for an equilibrium where we could sustainably live with earth’s renewable resources. In the search for equilibrium Forrester suggests cutting food production to reduce the population. I’m not sure how exactly he envisioned this playing out, but it seems he was depending on a reduction in birth rates.

Photo Jul 03, 4 51 32 PM



Slightly less complicated than the Afghanistan plan.

This book is one of a long list of books and research that relied heavily on the hope that system dynamics could potentially solve the world’s problems. The idea back then was that if we had all the variables, and we understood how all of them behaved, we might be able to model just about anything, including, literally, the entire world.

And yet, here we are. Still unable to solve the world’s problems, having long abandoned the idea that we can model the world simply by coming up with the right equations.


Verizon’s Morse Code Post, Translated

February 26th, 2015

Today the FCC ruled in favor of Net Neutrality. Opponents such as Verizon, thought this was an antiquated decision. So in what is probably the most childish response by a corporation I’ve seen to date, Verizon responded with a blog post stating their disappointment at the FCC’s decision. In Morse Code. They also provided a link to a PDF. With typewriter-style-smudged-ink text. Dated Feb. 26, 1934.

All right Verizon, challenge accepted.

I copy-pasted their blog post and ran it through a short python script I made. This is what comes out (formatting mine).

“Today’s decision by the FCC to encumber broadband internet services with badly antiquated regulations is a radical step that presages a time of uncertainty for consumers, innovators and investors. Over the past two decades a bipartisan, light-touch policy approach unleashed unprecedented investment and enabled the broadband internet age consumers now enjoy. The FCC today chose to change the way the commercial internet has operated since its creation. Changing a platform that has been so successful should be done, if at all, only after careful policy analysis, full transparency, and by the legislature, which is constitutionally charged with determining policy. As a result, it is likely that history will judge today’s actions as misguided. The FCC’s move is especially regrettable because it is wholly unnecessary. The FCC had targeted tools available to preserve an open internet, but instead chose to use this order as an excuse to adopt 300-plus pages of broad and open-ended regulatory arcana that will have unintended negative consequences for consumers and various parts of the internet ecosystem for years to come. What has been and will remain constant before, during and after the existence of any regulations is Verizon’s commitment to an open internet that provides consumers with competitive broadband choices and internet access when, where, and how they want.”

“Verizon’s commitment to an open internet that provides consumers with competitive broadband choices” Really, Verizon? Really?

The Mexican GovBots Did NOT Take Down #YaMeCanse, But We Can Keep #YaMeCanse# Trending

December 8th, 2014

Perhaps it was the excitement of hearing about a new phenomenon in censorship that prompted me to write a little too hastily about how the government of Mexico might have used Twitter bots to spam and trash the #YaMeCanse hashtag out of the trending topics list. As reported by Lo Que Sigue,, Aristegui Noticias, and myself, #YaMeCanse, the hashtag used as the rallying cry for Mexico’s 43 missing students, was suddenly dropped from the trending topic list by an army of bots, presumably coordinated by the federal government. No proof is provided by any of us that the government was behind this, but a series of videos and screenshots originally provided by Lo Que Sigue lead us to believe that a swarm of bots is at least responsible.

It was NOT the bots

December 3 at 10:36 AM was the last time @TrendieMX reported #YaMeCanse to be trending. By 9:36 PM of the next day, #YaMeCanse2 was already trending. Let’s take a look at what the Topsy trends for #YaMeCanse looks like for the month of November and the first days of December.

Usage of #YaMeCanse


To make sense of what happened, we need to understand what Twitter is doing to calculate a trending topic. We don’t have access to specific information about how the trending algorithm functions, but we do know how trending algorithms in work general and we have some clues about what Twitter has done in the past to tweak its algorithms. The relevant issue here can be described as “the Justin Bieber problem”. Many of you might remember how some years ago Justin Bieber was constantly trending due to the millions of Beliebers continuously tweeting about him. Twitter wants to tell us what’s trending right now, and not one hour ago or one month ago. As Twitter is quoted saying in this Mashable article:

“The new algorithm identifies topics that are immediately popular, rather than topics that have been popular for a while or on a daily basis, to help people discover the ‘most breaking’ breaking news from across the world. (We had previously built in this ’emergent’ algorithm for all local trends, described below.) We think that trending topics which capture the hottest emerging trends and topics of discussion on Twitter are the most interesting.”

Instead of merely looking at volume of Bieber tweets (of which there are many), Twitter looks at speed and “burstiness” of the tweets. However, there’s more to it. If Twitter only measured “burstiness”, you might see “Good Morning” trending every morning of every day. For this, Twitter establishes a baseline of expected frequencies based on history. Twitter “knows” there is usually a spike of “Good Morning” tweets every morning and corrects for this. As this video on trend detection in twitter social data explains, a ratio is calculated for each term based on the past frequency of the term and the present frequency.

What most likely happened is that after a couple of weeks of trending, the baseline for #YaMeCanse rose from zero (it didn’t exist before 11/7) to the frequency of people tweeting at the end of November. Twitter treated the volume and speed of the hashtag as something it would expect and dropped it off the trending list.

Baseline shift on #YaMeCanse

Spam bots should have no impact on the algorithm. The spam team at Twitter identifies the bots and they are not counted towards the algorithm. Additionally, it’s worth mentioning that Twitter has a team of low-paid human workers manually sorting through hashtags to eliminate advertiser spam. Even so, there is no evidence of an increase in bots during the time the hashtag was dropped from the list. The team at Lo Que Sea provided this video as proof of the presence of bots (not that we need proof of that in general)

Screenshot of Lo Que Sea video

Why are individual and unconnected tweets labeled as bots? If I tweet and only one person RT’s me, by their standards, I’m a bot. You can run the simulation from the video yourself on Use #YaMeCanse2 and wait for the same pattern of connected and disconnected tweets to occur. Then zoom in on the disconnected tweets and look up a couple of usernames. You’ll find a lot of those disconnected nodes are real people. You’ll also run into bots, but having no one retweet your tweet does not make you a bot.

This has happened before.

This would not be the first time that people have cried censorship upon the disappearance of a hashtag from the TT list. The Mashable article quoted above was a response to Beliebers accusing Twitter of censorship. Similarly, occupiers accused Twitter of censorship when #OccupyWallStreet was taken off the list. In both occasions Twitter had to step in and say this was just a result of how the algorithm works. In some cases we should be glad the algorithm works like this, otherwise we’d see #JustinBieber constantly trending. But how about when it’s something important like #YaMeCanse?

At this point I should say that if it were possible for the Mexican government to use such a tactic to censor people on social media, they probably would. We’ve already seen how Peña Nieto’s campaign used bots to promote the candidate on Twitter. And earlier this year, an initiative put forth by Peña Nieto on Radio and Telecom caused a lot of controversy when people claimed the law would allow the government to censor online content and to interrupt cell reception during protests. There’s also the case of which was censored by GoDaddy under pressure from the US Consulate in Mexico.

We’ve Discovered How To Get Around the Algorithm

I believe the immediate response by the Mexican Twitterverse in the creation of #YaMeCanse2 has revealed an exploitable feature in the algorithm. It took less than two days for people to adapt to the new hashtag. Topsy Trends shows that #YaMeCanse2 doesn’t have significantly more traffic than #YaMeCanse had before being taken down, but the reason why #YaMeCanse2 was able to trend so quickly is because its baseline at the time was zero. This means whenever #YaMeCanse2’s baseline shifts up enough for it to de-trend, we can just start again with #YaMeCanse3. We can keep going with this as long as we keep the speed at which people tweet constant or as long as Twitter doesn’t catch on and modifies the algorithm to account for us just adding a number at the end of the phrase (In which case we can just add the word “tres”). This is also why we keep seeing so many different Bieber hashtags, they’re all different phrases that didn’t exist before.

The lesson here for the Mexican folk is that if we want to continue to have a #YaMeCanse hashtag trending, we need to coordinate to increment the number at the end of the tag each time it expires. When #YaMeCanse2 falls off the list, we simply switch over to #YaMeCanse3.

Censorship on Twitter Using Bots? How #YaMeCanse Was Knocked Off Twitter Trending Topics

December 4th, 2014


In late September of this year, 43 students in the Mexican state of Guerrero went missing. In an attempt to prevent students from disrupting a political event for his wife, the mayor of Iguala ordered local police to stop and detain the students. This set in motion a series of events that resulted in several murdered students and 43 missing students. People later learned the missing students were handed over to a local cartel and were subsequently killed and burned until no traces of their bodies were left behind. This announcement was made during a press conference by Mexico’s Attorney General, Jesús Murillo Karam where at the end of the conference, tired and exasperated, said “Ya me cansé.” I’ve had enough.


Mexicans took to social media and responded with “We’re tired too…” Of the violence. Of the injustice. Of the impunity. Of the corruption. The #YaMeCanse hashtag became the rallying cry for discourse online and protests all over Mexico. The hashtag has been on Twitter trending topics almost since Murillo Karam’s press conference. Yesterday, the hashtag suddenly disappeared from the list even though usage had not waned.

Usage of #YaMeCanse


This sudden disappearance of such a popular hashtag raised some eyebrows. Determining trending topics is a little more complicated than simply calculating the number of mentions of a hashtag. Twitter has an algorithm that determines trending topics based on several factors. According to Twitter, one of the rules against usage of trending topics is “Repeatedly Tweeting the same topic/hashtag without adding value to the conversation in an attempt to get the topic trending or trending higher.” It is very likely that the overwhelming spamming of the #YaMeCanse caused Twitter’s algorithms to treat the hashtag as spam and proceded to remove it from the trending list.

As reported in, an army of bots had been RT and tweeting the #YaMeCanse hashtag for several days.

“Who says that online censorship and repression does not exist online? A storm of bots tries to disappear #YaMeCanse”

Spam Tweets

Another analysis by Lo Que Sigue shows the difference between connected and disconnected tweets symbolizing real people versus bots.


Not to be easily dissuaded, the Mexican twitterverse quickly came up with a simple solution: #YaMeCanse2, which is currently trending. An added cleverness to adding the number ‘2’ is that it forces people to ask “What happened to regular #YaMeCanse? Where’s #YaMeCanse1?” which leads people to find out about the attack. It’s a sort of the Barbra Streissand effect where in an attempt to censor one hashtag, not only do people evade the censorship, but in doing so call attention to the attempt at censorship.

It’s quite possible that this is not a coordinated attack on the hashtag by some entity. It could be just regular bots hijacking a popular hashtag. And it is very tempting to attribute to this “attack” to the government of Mexico. I would not be surprised at all if it was, and I’d be willing to bet that the Mexican government is behind this (it wouldn’t be the first time), but I would like to find definitive proof. The people behind Lo Que Sigue working to start an Indiegogo campaign to try and find the origin of these bots. Perhaps we don’t have to wait around for this to get funded and we can crowdsource/collaborate to try and see if tracing the origin of the bots is possible. I would welcome any ideas on how to do this.


So Trending Topics are more complicated than they seem. It’s hard to tell whether bots had any role in dropping the hashtag from the trending list. It seems that Twitter is actually looking for “bursts” of tweets, and how fast these tweets appear ( ∂Tw/∂t?). It is entirely possible that volume of tweets remained stable but the “burstiness” was gone. I don’t know. Twitter’s algorithms are very private. Even if bots played no part in dropping the hashtag, the possibility of that happening might still exist. After all, riding hashtags to promote unrelated content is shunned by Twitter. Whether they can detect that algorithmically, I’m not sure, but I wouldn’t be surprised. If they can detect that, then it’s entirely possible to spam a hashtag using bots. Perhaps the only way to find out is to actually measure the volume and speed of the bots. Doing this, it turns out, is very hard.


“Social Physics: How Good Ideas Spread–The Lessons From a New Science” – Alex Pentland

November 10th, 2014

I started reading Alex “Sandy” Pentland’s book, Social Physics. Several things interest me about this book. I’m very interested in how society behaves in today’s world where we are increasingly connected to more people by weak social ties. Also interesting is that advances in data collection and analysis are bound to reach a point where we can continuously monitor and analyze people’s behavior. Who will have this knowledge? How will they use it? What will this world look like? Lastly, I’m interested on how good ideas spread and how that can help us design better organizations and institutions.

Alex Pentland thinks it is possible to create a mathematical explanation about why society behaves the way it does. He calls this discipline, social physics.

“Social physics is a quantitative social science that describes reliable, mathematical connections between information and idea flow on the one hand and people’s behavior on the other. Social physics helps us understand how ideas flow from person to person through the mechanism of social learning and how this flow of ideas ends up shaping the norms, productivity, and creative output of our companies, cities, and societies. “

The goal of applying this science to society is to shape outcomes. Pentland believes we can create systems that build a society “better at avoiding market crashes, ethnic and religious violence, political stalemates, widespread corruption, and dangerous concentration of power.”

All of this would sound great, if it didn’t sound kind of scary. There are a lot of concerns about privacy, which Pentland addresses, and which I’m sure he’ll talk more about in the coming chapters. However, even if he is able to get around the privacy issues, the ability to affect how society behaves would give whoever has the ability to do so great power. This is perhaps a little paranoid on my part, but I don’t think misusing the ability to “fix” society, as he puts it, is out of the question. Pentland does write about it:

“This vision of a data-driven society implicitly assumes the data will not be abused. The ability to see the details of the market, political revolutions, and to be able to predict and control them is a case of Promethean fire—it could be used for good or for ill.”

My second concern is best summarized by Nicholas Carr in his article in “The Limits of Social Engineering”.

“Pentland may be right that our behavior is determined largely by social norms and the influences of our peers, but what he fails to see is that those norms and influences are themselves shaped by history, politics, and economics, not to mention power and prejudice. People don’t have complete freedom in choosing their peer groups. Their choices are constrained by where they live, where they come from, how much money they have, and what they look like. A statistical model of society that ignores issues of class, that takes patterns of influence as givens rather than as historical contingencies, will tend to perpetuate existing social structures and dynamics. It will encourage us to optimize the status quo rather than challenge it.” (h/t to Cathy O’Neil for linking to this piece).

The case studies in the book so far take place in groups where this might not be a huge issue like eToro, an online trading and investment network. Carr’s (and my) concern may not be a huge issue in these scenarios especially because Pentland is measuring very specific metrics like return on investments. However I do believe there is real danger in applying this sort of analyses in places like, say Ferguson, MO. It will be interesting to read the different case studies and to try and identify places where this concern might arise.

It would be very unfair of me to end this without writing about the actual focus of the book (although I’m already a little nauseous fro writing this on the train). The book will focus on the two most important concepts of social physics: idea flow within social networks and social learning, that is, how we take these new ideas and turn them into habit and how learning can be accelerated and shaped by social pressure.

I like to believe that there are better systems of collaboration and cooperation that can make organizations more effective, communities more resilient, and authorities more accountable. Elinor Ostrom developed her work on governing the commons by studying how communities behaved around issues like irrigation and water management. Similarly, I do think Pentland’s insights on idea flow and social learning can help us understand how to design better organizations, communities, and institutions.

The Dangers of Evidence-Based Sentencing

October 27th, 2014
Note: This post was originally published on and cross-posted on

What is Evidence-based Sentencing?

For several decades, parole and probation departments have been using research-backed assessments to determine the best supervision and treatment strategies for offenders to try and reduce the risk of recidivism. In recent years, state and county justice systems have started to apply these risk and needs assessment tools (RNA’s) to other parts of the criminal process.

Of particular concern is the use of automated tools to determine imprisonment terms. This relatively new practice of applying RNA information into the sentencing process is known as evidence-based sentencing (EBS).

What the Models Do

The different parameters used to determine risk vary by state, and most EBS tools use information that has been central to sentencing schemes for many years such as an offender’s criminal history. However, an increasing amount of states have been utilizing static factors such as gender, age, marital status, education level, employment history, and other demographic information to determine risk and inform sentencing. Especially alarming is the fact that the majority of these risk assessment tools do not take an offender’s particular case into account.

This practice has drawn sharp criticism from Attorney General Eric Holder who says “using static factors from a criminal’s background could perpetuate racial bias in a system that already delivers 20% longer sentences for young black men than for other offenders.” In the annual letter to the US Sentencing Commission, the Attorney General’s Office states that “utilizing such tools for determining prison sentences to be served will have a disparate and adverse impact on offenders from poor communities already struggling with social ills.” Other concerns cite the probable unconstitutionality of using group-based characteristics in risk assessments.

Where the Models Are Used

It is difficult to precisely quantify how many states and counties currently implement these instruments, although at least 20 states have implemented some form of EBS. Some of the states or states with counties that have implemented some sort of EBS (any type of sentencing: parole, imprisonment, etc) are: Pennsylvania, Tennessee, Vermont, Kentucky, Virginia, ArizonaColorado, California, Idaho, Indiana, Missouri, Nebraska, Ohio, Oregon, Texas, and Wisconsin.

The Role of Race, Education, and Friendship

Overwhelmingly states do not include race in the risk assessments since there seems to be a general consensus that doing so would be unconstitutional. However, even though these tools do not take race into consideration directly, many of the variables used such as economic status, education level, and employment correlate with race. African-Americans and Hispanics are already disproportionately incarcerated and determining sentences based on these variables might cause further racial disparities.

The very socioeconomic characteristics such as income and education level used in risk assessments are the characteristics that are already strong predictors of whether someone will go to prison. For example, high school dropouts are 47 times more likely to be incarcerated than people in their similar age group who received a four-year college degree. It is reasonable to suspect that courts that include education level as a risk predictor will further exacerbate thesedisparities.

Some states, such as Texas, take into account peer relations and considers associating with other offenders as a “salient problem”. Considering that Texas is in 4th place in the rate of people under some sort of correctional control (parole, probation, etc) and that the rate is 1 in 11 for black males in the United States it is likely that this metric would disproportionately affect African-Americans.

Sonja Starr’s paper

Even so, in some cases, socioeconomic and demographic variables receive significant weight. In her forthcoming paper in the Stanford Law Review, Sonja Starr provides a telling example of how these factors are used in presentence reports. From her paper:

For instance, in Missouri, pre-sentence reports include a score for each defendant on a scale from -8 to 7, where “4-7 is rated ‘good,’ 2-3 is ‘above average,’ 0-1 is ‘average’, -1 to -2 is ‘below average,’ and -3 to -8 is ‘poor.’ Unlike most instruments in use, Missouri’s does not include gender. However, an unemployed high school dropout will score three points worse than an employed high school graduate—potentially making the difference between “good” and “average,” or between “average” and “poor.” Likewise, a defendant under age 22 will score three points worse than a defendant over 45. By comparison, having previously served time in prison is worth one point; having four or more prior misdemeanor convictions that resulted in jail time adds one point (three or fewer adds none); having previously had parole or probation revoked is worth one point; and a prison escape is worth one point. Meanwhile, current crime type and severity receive no weight.

Starr argues that such simple point systems may “linearize” a variable’s effect. In the underlying regression models used to calculate risk, some of the variable’s effects do not translate linearly into changes in probability of recidivism, but they are treated as such by the model.

Another criticism Starr makes is that they often make predictions on an individual based on averages of a group. Starr says these predictions can predict with reasonable precision the average recidivism rate for all offenders who share the same characteristics as the defendant, but that does not make it necessarily useful for individual predictions.

The Future of EBS Tools

The Model Penal Code is currently in the process of being revised and is set to include these risk assessment tools in the sentencing process. According to Starr, this is a serious development because it reflects the increased support of these practices and because of the Model Penal Code’s great influence in guiding penal codes in other states. Attorney General Eric Holder has already spoken against the practice, but it will be interesting to see whether his successor will continue this campaign.

Even if EBS can accurately measure risk of recidivism (which is uncertain according to Starr), does that mean that a greater prison sentence will result in less future offenses after the offender is released? EBS does not seek to answer this question. Further, if knowing there is a harsh penalty for a particular crime is a deterrent to commit said crime, wouldn’t adding more uncertainty to sentencing (EBS tools are not always transparent and sometimes proprietary) effectively remove this deterrent?

Even though many questions remain unanswered and while several people have been critical of the practice, it seems like there is great support for the use of these instruments. They are especially easy to support when they are overwhelmingly regarded as progressive and scientific, something Starr refutes. While there is certainly a place for data analytics and actuarial methods in the criminal justice system, it is important that such research be applied with the appropriate caution. Or perhaps not at all. Even if the tools had full statistical support, the risk of further exacerbating an already disparate criminal justice system should be enough to halt this practice.

Both Starr and Holder believe there is a strong case to be made that the risk prediction instruments now in use are unconstitutional. But EBS has strong advocates, so it’s a difficult subject. Ultimately, evidence-based sentencing is used to determine a person’s sentencing not based on what the person has done, but who that person is.

De-anonymizing open data, just because you can… should you?

October 23rd, 2014

If an essential part of the data reveals personally identifiable information (PII), should the data not be released? Should the users of open data be the ones responsible for ensuring proper use of the data?

I mention this question because of an article by an intrepid Gawker reporter who decided he could correlate photos of celebrities in NYC taxis (with visible Taxi medallions) and the de-anonymized database on every NYC cab ride in 2013 to determine whether celebrities tipped their cab drivers. Of course, this article is another example of “Celebrities doing normal people things like using taxis”, but the underlying question here is just because you can violate people’s privacy does it mean you should?

Identifying celebrities and their cab rides was first done by an intern at Neustar, Anthony Tockar. In his post he recognizes that it is relatively easy to reveal personal information about people. Not only could he match cab rides to a couple of celebrities, but he also showed how you can easily see who frequently visits Hustler’s. Tockar says:

Now while this information is relatively benign, particularly a year down the line, I have revealed information that was not previously in the public domain.

He uses these examples to introduce a method of privatizing data called “differential privacy.” Differential privacy basically adds noise to the data when you zoom in on it so you can’t identify specific information about an individual, but you can still get accurate results when you look at the data as a whole. This is best exemplified by the graphic below.

This shows the average speed of cab drivers throughout the day. The top half is the actual average speed of all drivers and the average speed of all drivers after the data is run through the differential privacy algorithm. The bottom half shows the same for an individual cab driver. Click on the graphic to go to an interactive tool that lets you play around with the privacy parameter, ε.

But we’re still struggling with getting data off PDF’s or worse, filing cabinets. It’ll take years before we can create such privacy mechanisms for current open data! What to do in the meantime? It would seem that Gawker stopped reading after “Bradley Cooper left no tip” (actually, we don’t know since tips are not recorded if paid in cash). Just because someone could look up ten celebrities’ cab rides does it mean they should have? The reporter even quotes Tockar’s quote about “revealing information not previously in the public domain”. The irony seems to have been lost on Gawker. I’m of the opinion that Gawker shouldn’t have published an article about celebrities’ cab rides no more than it should publish their phone numbers if they were available inside a phone book. Unless it was trying to make a point about privacy and open data, which would’ve made for a great conversation piece.  Except it wasn’t since it was all about tipping. They even reached out to publicists for comments on the tipping.

Ultimately, who cares about Bradley Cooper taking a taxi. But when you go “hey, let’s see how many celebrities I can ID from this data” and write an article about it without questioning the privacy implications, you’re basically saying “Yes, because you can, it means you should.”

UPDATE: ok, so apparently there is a reason it’s called “Gawker”. See this example where this same author tries to out a Fox News reporter. Today I learned.

Reddit is NOT a failed state….

October 9th, 2014

It has it’s problems for sure, but I wouldn’t be so quick to dismiss it as having failed.

I’m referring to a The Verge article posted about a month ago following the celebrity nude photo leaks. The main argument for FAIL is the fact that instead of chastising the users who help spread the leaked photos, Reddit protected them under the shield of free speech. I’m not here to argue whether Reddit acted appropriately or not in protecting the individuals (personally, they could’ve been kicked out, banned, arrested, and I would’ve been content with that). But I do not think this transgression in privacy, abuse of free speech, and overall disgusting behavior by a small group of a larger community a failed state makes.

Is this indicative of pervasive malicious behavior across Reddit? Absolutely. We didn’t need r/TheFappening to figure that out. Just talk to woman redditors about their experiences as participants.

But at least we’re talking about these issues. It’s not so much the fact that we are, it’s the fact that we have the ability to do so. Through their karma system, Reddit has built a system that promotes good behavior and–sometimes–reproves the bad. It’s a primitive system for sure, especially since it’s not immune to the hivemind behavior (for example, apparently the r/nyc hivemind thinks people have ZERO responsability to give up your seat in the subway for a pregnant woman (maybe they’re right and I’m wrong)). This system, I think, allows the hive to go through iterations of what they believe to be correct. In effect, every now and then it corrects itself. Take the terrible “detective work” conducted during the immediate aftermath of the Boston Marathon bombing. After the hive realized it was wrong (so wrong), whenever there was a post asking for some sort of crowdsourced detective work, it was often met with someone who commented on the terrible results that came out of the last time they tried to play detectives.  As a result, Reddit for the most part now knows: We should avoid digital vigilantism.

In the coming years we will increasingly see Nobel Prize winner Elinor Ostrom’s principles on governing the commons applied to digital spaces. Although primitively (and perhaps unintentionally) Reddit has created a space where communities are able to define their own boundaries, (sort of) align “governance” rules with their preferences, (kind of) ensure that those who participate in the community can have a say on the rules, and are (barely) able to sanction those who misbehave. It has a long way to go for sure. What happened with r/TheFappening is a case where a group of very misguided individuals were able to gather in one place and as a community behave inappropriately. In that case what Reddit might be lacking is some greater oversight over communities and their leaders. An oversight that’s not dictatorial, but rather an oversight that is also provided by a community (a council of communities?).

Another problem with Reddit (or any digital space, actually) is that whenever someone goes through the trouble of committing a crime–say stealing nude celeb photos–the “morality cost” of engaging in the immoral behavior is significantly decreased by the internet’s ability to massively distribute information at a significantly low cost. For the most part, the consequences for engaging in such immoral behavior do not exist. Especially when it costs nothing to click on a link. This is maybe one of the internet’s biggest weaknesses: it’s ability to facilitate engagement in immoral behavior.

We need to design digital spaces that somehow take this into account. Spaces where the community can more meaningfully participate and deal with the bad apples more effectively. Is Reddit, and the rest of the internet, full of misguided individuals who do some fucked up shit? Yes, but this doesn’t mean we need to take it to the back of the barn and shoot it. It means we need to think about how we create these digital spaces in the future. Or do away with it if you want, but then let’s take the good lessons and the bad, and let’s make something better.