De-anonymizing Stop and Frisk Data.

I started with the premise that 87% of Americans are uniquely identifiable by knowing their date of birth, zip code, and gender. The Stop and Frisk (SNF) data gives you date of birth, precinct, gender, race, height, weight, eye color, hair color, and build. The original SNF data set contains 685,724 stops for 2011. However, out of those stops, only 2/3 had valid dates of birth. By ‘valid’ I mean, between the ages of 0 to 112 (around 275,000 people were born on Dec. 31, 1900). Since date of birth is crucial to de-anonymization, I excluded those data points from the analysis. My numbers will therefore differ from the NYCLU’s report, since they did include these entries.

After cleaning the data a bit, I chose to only use D.O.B, gender, race, precinct, and height to de-anonymize the data. I did not choose the rest of the descriptors because the police officer conducting the stop might not always enter the same information for the same person. First, only in 55% of stops did the suspect provide a photo ID which could provide accurate details of their weight, hair color, etc. The police officer would have had to guess all of the person’s information correctly every time for the other 45% of stops. Second, people’s weight, build and hair color can change over the year or not easily identifiable at night. Lastly, I realize that height also, changes, especially in people below 20 years, but I wanted to play it a little safe somehow. I thought height would be easy for a police officer to guess correctly, so I kept height. Using these descriptors, I found 364,706 unique individuals. 22,649 of whom were stopped more than once. Here are the top 20 people stopped in 2011, and the number of times they have been stopped.

Screen Shot 2012-12-20 at 7.44.34 PM

The string of numbers is the person’s “name”. From left to right, the numbers mean precinct, gender, race, DOB, and height. You’ll notice that 18/20 precincts are precincts 60, 61, and 101 (Coney Island, Gravesend, and Far Rockaway) I’ll write more on these later, but first some numbers on people stopped more than once.

  • 6.2% of people stopped where stopped more than once (22,476 out of 364,706).
  • 60.7% of people stopped more than once where black.
  • 29.0% were hispanic.
  • 7.9% were white.
  • 2.4% were others (Asians, Pacific Islanders, Native Americans, Others)

Going back to precincts 60, 61, and 101. After I first noticed that the overwhelming presence of these three precincts in the top 20 list, I mapped out all the people who had been stopped more than once and got a map with points pretty much all over New York City. Notice, each dot represents a person, not a stop. The position of the person is the average position of all the person’s stops.

Then I mapped out everyone stopped more than 5 times.
People Stopped More Than 5 Times

Out of the 22,000+ people stopped more than once, there where 340 that were stopped more than 5 times. Here is a table of the top 10 precincts with people stopped more than 5 times.

Area/Neighborhood Precinct # People Stopped > 5 Times
Far Rockaways 101 83
Sheepshead Bay 61 58
East New York 75 24
Williamsburg 90 23
Coney Island 60 21

These top 5 precincts contain 61% of the people stopped more than 5 times. It would be interesting to find out what is going on there, but there doesn’t seem to be an evident explanation. So far I have not found a common characteristic that these precincts share. Here are some facts about the 340 people that have been stopped more than 5 times:

  • Precinct 101 accounts for 1/4 of the 340 people stopped more than 5 times.
  • One woman was stopped 14 times in Precinct 61 (Sheepshead Bay)
  • 72% of people stopped > 5 times were African American/Black
  • 13% were Hispanic
  • 15% were White
  • The average age is 24.7 years (max 56, min 16)
  • These 340 people make up 2,686 stops.
  • 71% of those stops included a frisk.
  • Less than 5% (124) of stops led to an arrest.
  • Only 7 of those arrests were because of the criminal possession of a weapon (0.26%).

I don’t expect these numbers to be an accurate representation of all multiple stops in NYC. However, I do think that 1) they reveal a pattern, and 2) these numbers are a best case scenario, and in fact, I think the real numbers are way worse. After all, we know that there is at least one person who was stopped more than 60 times before he turn 18.

If you would like to read about the Top 10 most stopped individuals in New York City, check out the comic book I made for my Data Rep class.

Tags: , , , , , , , ,

  • http://www.paulmay.org Paul May

    This is a really fascinating project Luis. Where can I get the data and the toolkit you used to make the maps? Is this stuff in a github repository? Congratulations on the project.