Archive for October, 2014

The Dangers of Evidence-Based Sentencing

Monday, October 27th, 2014
Note: This post was originally published on mathbabe.org and cross-posted on thegovlab.org.

What is Evidence-based Sentencing?

For several decades, parole and probation departments have been using research-backed assessments to determine the best supervision and treatment strategies for offenders to try and reduce the risk of recidivism. In recent years, state and county justice systems have started to apply these risk and needs assessment tools (RNA’s) to other parts of the criminal process.

Of particular concern is the use of automated tools to determine imprisonment terms. This relatively new practice of applying RNA information into the sentencing process is known as evidence-based sentencing (EBS).

What the Models Do

The different parameters used to determine risk vary by state, and most EBS tools use information that has been central to sentencing schemes for many years such as an offender’s criminal history. However, an increasing amount of states have been utilizing static factors such as gender, age, marital status, education level, employment history, and other demographic information to determine risk and inform sentencing. Especially alarming is the fact that the majority of these risk assessment tools do not take an offender’s particular case into account.

This practice has drawn sharp criticism from Attorney General Eric Holder who says “using static factors from a criminal’s background could perpetuate racial bias in a system that already delivers 20% longer sentences for young black men than for other offenders.” In the annual letter to the US Sentencing Commission, the Attorney General’s Office states that “utilizing such tools for determining prison sentences to be served will have a disparate and adverse impact on offenders from poor communities already struggling with social ills.” Other concerns cite the probable unconstitutionality of using group-based characteristics in risk assessments.

Where the Models Are Used

It is difficult to precisely quantify how many states and counties currently implement these instruments, although at least 20 states have implemented some form of EBS. Some of the states or states with counties that have implemented some sort of EBS (any type of sentencing: parole, imprisonment, etc) are: Pennsylvania, Tennessee, Vermont, Kentucky, Virginia, ArizonaColorado, California, Idaho, Indiana, Missouri, Nebraska, Ohio, Oregon, Texas, and Wisconsin.

The Role of Race, Education, and Friendship

Overwhelmingly states do not include race in the risk assessments since there seems to be a general consensus that doing so would be unconstitutional. However, even though these tools do not take race into consideration directly, many of the variables used such as economic status, education level, and employment correlate with race. African-Americans and Hispanics are already disproportionately incarcerated and determining sentences based on these variables might cause further racial disparities.

The very socioeconomic characteristics such as income and education level used in risk assessments are the characteristics that are already strong predictors of whether someone will go to prison. For example, high school dropouts are 47 times more likely to be incarcerated than people in their similar age group who received a four-year college degree. It is reasonable to suspect that courts that include education level as a risk predictor will further exacerbate thesedisparities.

Some states, such as Texas, take into account peer relations and considers associating with other offenders as a “salient problem”. Considering that Texas is in 4th place in the rate of people under some sort of correctional control (parole, probation, etc) and that the rate is 1 in 11 for black males in the United States it is likely that this metric would disproportionately affect African-Americans.

Sonja Starr’s paper

Even so, in some cases, socioeconomic and demographic variables receive significant weight. In her forthcoming paper in the Stanford Law Review, Sonja Starr provides a telling example of how these factors are used in presentence reports. From her paper:

For instance, in Missouri, pre-sentence reports include a score for each defendant on a scale from -8 to 7, where “4-7 is rated ‘good,’ 2-3 is ‘above average,’ 0-1 is ‘average’, -1 to -2 is ‘below average,’ and -3 to -8 is ‘poor.’ Unlike most instruments in use, Missouri’s does not include gender. However, an unemployed high school dropout will score three points worse than an employed high school graduate—potentially making the difference between “good” and “average,” or between “average” and “poor.” Likewise, a defendant under age 22 will score three points worse than a defendant over 45. By comparison, having previously served time in prison is worth one point; having four or more prior misdemeanor convictions that resulted in jail time adds one point (three or fewer adds none); having previously had parole or probation revoked is worth one point; and a prison escape is worth one point. Meanwhile, current crime type and severity receive no weight.

Starr argues that such simple point systems may “linearize” a variable’s effect. In the underlying regression models used to calculate risk, some of the variable’s effects do not translate linearly into changes in probability of recidivism, but they are treated as such by the model.

Another criticism Starr makes is that they often make predictions on an individual based on averages of a group. Starr says these predictions can predict with reasonable precision the average recidivism rate for all offenders who share the same characteristics as the defendant, but that does not make it necessarily useful for individual predictions.

The Future of EBS Tools

The Model Penal Code is currently in the process of being revised and is set to include these risk assessment tools in the sentencing process. According to Starr, this is a serious development because it reflects the increased support of these practices and because of the Model Penal Code’s great influence in guiding penal codes in other states. Attorney General Eric Holder has already spoken against the practice, but it will be interesting to see whether his successor will continue this campaign.

Even if EBS can accurately measure risk of recidivism (which is uncertain according to Starr), does that mean that a greater prison sentence will result in less future offenses after the offender is released? EBS does not seek to answer this question. Further, if knowing there is a harsh penalty for a particular crime is a deterrent to commit said crime, wouldn’t adding more uncertainty to sentencing (EBS tools are not always transparent and sometimes proprietary) effectively remove this deterrent?

Even though many questions remain unanswered and while several people have been critical of the practice, it seems like there is great support for the use of these instruments. They are especially easy to support when they are overwhelmingly regarded as progressive and scientific, something Starr refutes. While there is certainly a place for data analytics and actuarial methods in the criminal justice system, it is important that such research be applied with the appropriate caution. Or perhaps not at all. Even if the tools had full statistical support, the risk of further exacerbating an already disparate criminal justice system should be enough to halt this practice.

Both Starr and Holder believe there is a strong case to be made that the risk prediction instruments now in use are unconstitutional. But EBS has strong advocates, so it’s a difficult subject. Ultimately, evidence-based sentencing is used to determine a person’s sentencing not based on what the person has done, but who that person is.

De-anonymizing open data, just because you can… should you?

Thursday, October 23rd, 2014

If an essential part of the data reveals personally identifiable information (PII), should the data not be released? Should the users of open data be the ones responsible for ensuring proper use of the data?

I mention this question because of an article by an intrepid Gawker reporter who decided he could correlate photos of celebrities in NYC taxis (with visible Taxi medallions) and the de-anonymized database on every NYC cab ride in 2013 to determine whether celebrities tipped their cab drivers. Of course, this article is another example of “Celebrities doing normal people things like using taxis”, but the underlying question here is just because you can violate people’s privacy does it mean you should?

Identifying celebrities and their cab rides was first done by an intern at Neustar, Anthony Tockar. In his post he recognizes that it is relatively easy to reveal personal information about people. Not only could he match cab rides to a couple of celebrities, but he also showed how you can easily see who frequently visits Hustler’s. Tockar says:

Now while this information is relatively benign, particularly a year down the line, I have revealed information that was not previously in the public domain.

He uses these examples to introduce a method of privatizing data called “differential privacy.” Differential privacy basically adds noise to the data when you zoom in on it so you can’t identify specific information about an individual, but you can still get accurate results when you look at the data as a whole. This is best exemplified by the graphic below.

This shows the average speed of cab drivers throughout the day. The top half is the actual average speed of all drivers and the average speed of all drivers after the data is run through the differential privacy algorithm. The bottom half shows the same for an individual cab driver. Click on the graphic to go to an interactive tool that lets you play around with the privacy parameter, ε.

But we’re still struggling with getting data off PDF’s or worse, filing cabinets. It’ll take years before we can create such privacy mechanisms for current open data! What to do in the meantime? It would seem that Gawker stopped reading after “Bradley Cooper left no tip” (actually, we don’t know since tips are not recorded if paid in cash). Just because someone could look up ten celebrities’ cab rides does it mean they should have? The reporter even quotes Tockar’s quote about “revealing information not previously in the public domain”. The irony seems to have been lost on Gawker. I’m of the opinion that Gawker shouldn’t have published an article about celebrities’ cab rides no more than it should publish their phone numbers if they were available inside a phone book. Unless it was trying to make a point about privacy and open data, which would’ve made for a great conversation piece.  Except it wasn’t since it was all about tipping. They even reached out to publicists for comments on the tipping.

Ultimately, who cares about Bradley Cooper taking a taxi. But when you go “hey, let’s see how many celebrities I can ID from this data” and write an article about it without questioning the privacy implications, you’re basically saying “Yes, because you can, it means you should.”

UPDATE: ok, so apparently there is a reason it’s called “Gawker”. See this example where this same author tries to out a Fox News reporter. Today I learned.

Reddit is NOT a failed state….

Thursday, October 9th, 2014

It has it’s problems for sure, but I wouldn’t be so quick to dismiss it as having failed.

I’m referring to a The Verge article posted about a month ago following the celebrity nude photo leaks. The main argument for FAIL is the fact that instead of chastising the users who help spread the leaked photos, Reddit protected them under the shield of free speech. I’m not here to argue whether Reddit acted appropriately or not in protecting the individuals (personally, they could’ve been kicked out, banned, arrested, and I would’ve been content with that). But I do not think this transgression in privacy, abuse of free speech, and overall disgusting behavior by a small group of a larger community a failed state makes.

Is this indicative of pervasive malicious behavior across Reddit? Absolutely. We didn’t need r/TheFappening to figure that out. Just talk to woman redditors about their experiences as participants.

But at least we’re talking about these issues. It’s not so much the fact that we are, it’s the fact that we have the ability to do so. Through their karma system, Reddit has built a system that promotes good behavior and–sometimes–reproves the bad. It’s a primitive system for sure, especially since it’s not immune to the hivemind behavior (for example, apparently the r/nyc hivemind thinks people have ZERO responsability to give up your seat in the subway for a pregnant woman (maybe they’re right and I’m wrong)). This system, I think, allows the hive to go through iterations of what they believe to be correct. In effect, every now and then it corrects itself. Take the terrible “detective work” conducted during the immediate aftermath of the Boston Marathon bombing. After the hive realized it was wrong (so wrong), whenever there was a post asking for some sort of crowdsourced detective work, it was often met with someone who commented on the terrible results that came out of the last time they tried to play detectives.  As a result, Reddit for the most part now knows: We should avoid digital vigilantism.

In the coming years we will increasingly see Nobel Prize winner Elinor Ostrom’s principles on governing the commons applied to digital spaces. Although primitively (and perhaps unintentionally) Reddit has created a space where communities are able to define their own boundaries, (sort of) align “governance” rules with their preferences, (kind of) ensure that those who participate in the community can have a say on the rules, and are (barely) able to sanction those who misbehave. It has a long way to go for sure. What happened with r/TheFappening is a case where a group of very misguided individuals were able to gather in one place and as a community behave inappropriately. In that case what Reddit might be lacking is some greater oversight over communities and their leaders. An oversight that’s not dictatorial, but rather an oversight that is also provided by a community (a council of communities?).

Another problem with Reddit (or any digital space, actually) is that whenever someone goes through the trouble of committing a crime–say stealing nude celeb photos–the “morality cost” of engaging in the immoral behavior is significantly decreased by the internet’s ability to massively distribute information at a significantly low cost. For the most part, the consequences for engaging in such immoral behavior do not exist. Especially when it costs nothing to click on a link. This is maybe one of the internet’s biggest weaknesses: it’s ability to facilitate engagement in immoral behavior.

We need to design digital spaces that somehow take this into account. Spaces where the community can more meaningfully participate and deal with the bad apples more effectively. Is Reddit, and the rest of the internet, full of misguided individuals who do some fucked up shit? Yes, but this doesn’t mean we need to take it to the back of the barn and shoot it. It means we need to think about how we create these digital spaces in the future. Or do away with it if you want, but then let’s take the good lessons and the bad, and let’s make something better.

 

Placemeter pays YOU for your data…

Wednesday, October 8th, 2014
Note: I have set a new goal to post at least once a week, even if the posts are short.

Turns out you may have some data to offer that is actually more valuable than just your online shopping patterns: the view outside your window. Placemeter is a relatively new startup that pays New Yorkers up to $50 to place their phones against their windows and record movements on the street below. Using nifty computer vision algorithms, Placemeter extracts data from the images recorded by your phone. The short video below gives a sense of what they are trying to track.

The front page immediately addresses the issue of privacy. The company will not use the data to record anything that goes on inside your home, they will not use the data to identify people on the street, and the video they record isn’t stored. They only store raw data extracted from the video.

Their business model is simple: they pay you a little bit per month to record information which they will later sell to third parties. You provide the product they later sell (hey, at least they pay you for it). Since their goal is to sell data to businesses and city governments, they are mostly interested in views of restaurants, shops, or bars. This means lots of people like me can’t participate (I have a very lovely view of a wall). This got me thinking on who else can and can’t participate. If you happen to live in (and have a view of) Times Square, your view could be worth dozens of dollars! What about a view from a quiet Staten Island street? Or from the Bronx? Basically in order to participate you just have to live in the right place. A place that is probably expensive too.

One redditor applied to sell his/her view and was rejected because the street wasn’t busy enough, but that he/she would be considered when the company started “sending out unpaid meters”. I imagine this means the company would mail you a sensor for free and you would record data for them. If this happens I can see them shifting the rhetoric towards “help us analyse and improve your urban environment”, which this article already does.

Seeing as how the most valuable data would come from a select group of New Yorkers, most of their most valuable data might come from the already freely available video feeds around the city (they should fill out the survey for the OD500).