“Social Physics: How Good Ideas Spread–The Lessons From a New Science” – Alex Pentland

November 10th, 2014

I started reading Alex “Sandy” Pentland’s book, Social Physics. Several things interest me about this book. I’m very interested in how society behaves in today’s world where we are increasingly connected to more people by weak social ties. Also interesting is that advances in data collection and analysis are bound to reach a point where we can continuously monitor and analyze people’s behavior. Who will have this knowledge? How will they use it? What will this world look like? Lastly, I’m interested on how good ideas spread and how that can help us design better organizations and institutions.

Alex Pentland thinks it is possible to create a mathematical explanation about why society behaves the way it does. He calls this discipline, social physics.

“Social physics is a quantitative social science that describes reliable, mathematical connections between information and idea flow on the one hand and people’s behavior on the other. Social physics helps us understand how ideas flow from person to person through the mechanism of social learning and how this flow of ideas ends up shaping the norms, productivity, and creative output of our companies, cities, and societies. “

The goal of applying this science to society is to shape outcomes. Pentland believes we can create systems that build a society “better at avoiding market crashes, ethnic and religious violence, political stalemates, widespread corruption, and dangerous concentration of power.”

All of this would sound great, if it didn’t sound kind of scary. There are a lot of concerns about privacy, which Pentland addresses, and which I’m sure he’ll talk more about in the coming chapters. However, even if he is able to get around the privacy issues, the ability to affect how society behaves would give whoever has the ability to do so great power. This is perhaps a little paranoid on my part, but I don’t think misusing the ability to “fix” society, as he puts it, is out of the question. Pentland does write about it:

“This vision of a data-driven society implicitly assumes the data will not be abused. The ability to see the details of the market, political revolutions, and to be able to predict and control them is a case of Promethean fire—it could be used for good or for ill.”

My second concern is best summarized by Nicholas Carr in his article in “The Limits of Social Engineering”.

“Pentland may be right that our behavior is determined largely by social norms and the influences of our peers, but what he fails to see is that those norms and influences are themselves shaped by history, politics, and economics, not to mention power and prejudice. People don’t have complete freedom in choosing their peer groups. Their choices are constrained by where they live, where they come from, how much money they have, and what they look like. A statistical model of society that ignores issues of class, that takes patterns of influence as givens rather than as historical contingencies, will tend to perpetuate existing social structures and dynamics. It will encourage us to optimize the status quo rather than challenge it.” (h/t to Cathy O’Neil for linking to this piece).

The case studies in the book so far take place in groups where this might not be a huge issue like eToro, an online trading and investment network. Carr’s (and my) concern may not be a huge issue in these scenarios especially because Pentland is measuring very specific metrics like return on investments. However I do believe there is real danger in applying this sort of analyses in places like, say Ferguson, MO. It will be interesting to read the different case studies and to try and identify places where this concern might arise.

It would be very unfair of me to end this without writing about the actual focus of the book (although I’m already a little nauseous fro writing this on the train). The book will focus on the two most important concepts of social physics: idea flow within social networks and social learning, that is, how we take these new ideas and turn them into habit and how learning can be accelerated and shaped by social pressure.

I like to believe that there are better systems of collaboration and cooperation that can make organizations more effective, communities more resilient, and authorities more accountable. Elinor Ostrom developed her work on governing the commons by studying how communities behaved around issues like irrigation and water management. Similarly, I do think Pentland’s insights on idea flow and social learning can help us understand how to design better organizations, communities, and institutions.

The Dangers of Evidence-Based Sentencing

October 27th, 2014
Note: This post was originally published on mathbabe.org and cross-posted on thegovlab.org.

What is Evidence-based Sentencing?

For several decades, parole and probation departments have been using research-backed assessments to determine the best supervision and treatment strategies for offenders to try and reduce the risk of recidivism. In recent years, state and county justice systems have started to apply these risk and needs assessment tools (RNA’s) to other parts of the criminal process.

Of particular concern is the use of automated tools to determine imprisonment terms. This relatively new practice of applying RNA information into the sentencing process is known as evidence-based sentencing (EBS).

What the Models Do

The different parameters used to determine risk vary by state, and most EBS tools use information that has been central to sentencing schemes for many years such as an offender’s criminal history. However, an increasing amount of states have been utilizing static factors such as gender, age, marital status, education level, employment history, and other demographic information to determine risk and inform sentencing. Especially alarming is the fact that the majority of these risk assessment tools do not take an offender’s particular case into account.

This practice has drawn sharp criticism from Attorney General Eric Holder who says “using static factors from a criminal’s background could perpetuate racial bias in a system that already delivers 20% longer sentences for young black men than for other offenders.” In the annual letter to the US Sentencing Commission, the Attorney General’s Office states that “utilizing such tools for determining prison sentences to be served will have a disparate and adverse impact on offenders from poor communities already struggling with social ills.” Other concerns cite the probable unconstitutionality of using group-based characteristics in risk assessments.

Where the Models Are Used

It is difficult to precisely quantify how many states and counties currently implement these instruments, although at least 20 states have implemented some form of EBS. Some of the states or states with counties that have implemented some sort of EBS (any type of sentencing: parole, imprisonment, etc) are: Pennsylvania, Tennessee, Vermont, Kentucky, Virginia, ArizonaColorado, California, Idaho, Indiana, Missouri, Nebraska, Ohio, Oregon, Texas, and Wisconsin.

The Role of Race, Education, and Friendship

Overwhelmingly states do not include race in the risk assessments since there seems to be a general consensus that doing so would be unconstitutional. However, even though these tools do not take race into consideration directly, many of the variables used such as economic status, education level, and employment correlate with race. African-Americans and Hispanics are already disproportionately incarcerated and determining sentences based on these variables might cause further racial disparities.

The very socioeconomic characteristics such as income and education level used in risk assessments are the characteristics that are already strong predictors of whether someone will go to prison. For example, high school dropouts are 47 times more likely to be incarcerated than people in their similar age group who received a four-year college degree. It is reasonable to suspect that courts that include education level as a risk predictor will further exacerbate thesedisparities.

Some states, such as Texas, take into account peer relations and considers associating with other offenders as a “salient problem”. Considering that Texas is in 4th place in the rate of people under some sort of correctional control (parole, probation, etc) and that the rate is 1 in 11 for black males in the United States it is likely that this metric would disproportionately affect African-Americans.

Sonja Starr’s paper

Even so, in some cases, socioeconomic and demographic variables receive significant weight. In her forthcoming paper in the Stanford Law Review, Sonja Starr provides a telling example of how these factors are used in presentence reports. From her paper:

For instance, in Missouri, pre-sentence reports include a score for each defendant on a scale from -8 to 7, where “4-7 is rated ‘good,’ 2-3 is ‘above average,’ 0-1 is ‘average’, -1 to -2 is ‘below average,’ and -3 to -8 is ‘poor.’ Unlike most instruments in use, Missouri’s does not include gender. However, an unemployed high school dropout will score three points worse than an employed high school graduate—potentially making the difference between “good” and “average,” or between “average” and “poor.” Likewise, a defendant under age 22 will score three points worse than a defendant over 45. By comparison, having previously served time in prison is worth one point; having four or more prior misdemeanor convictions that resulted in jail time adds one point (three or fewer adds none); having previously had parole or probation revoked is worth one point; and a prison escape is worth one point. Meanwhile, current crime type and severity receive no weight.

Starr argues that such simple point systems may “linearize” a variable’s effect. In the underlying regression models used to calculate risk, some of the variable’s effects do not translate linearly into changes in probability of recidivism, but they are treated as such by the model.

Another criticism Starr makes is that they often make predictions on an individual based on averages of a group. Starr says these predictions can predict with reasonable precision the average recidivism rate for all offenders who share the same characteristics as the defendant, but that does not make it necessarily useful for individual predictions.

The Future of EBS Tools

The Model Penal Code is currently in the process of being revised and is set to include these risk assessment tools in the sentencing process. According to Starr, this is a serious development because it reflects the increased support of these practices and because of the Model Penal Code’s great influence in guiding penal codes in other states. Attorney General Eric Holder has already spoken against the practice, but it will be interesting to see whether his successor will continue this campaign.

Even if EBS can accurately measure risk of recidivism (which is uncertain according to Starr), does that mean that a greater prison sentence will result in less future offenses after the offender is released? EBS does not seek to answer this question. Further, if knowing there is a harsh penalty for a particular crime is a deterrent to commit said crime, wouldn’t adding more uncertainty to sentencing (EBS tools are not always transparent and sometimes proprietary) effectively remove this deterrent?

Even though many questions remain unanswered and while several people have been critical of the practice, it seems like there is great support for the use of these instruments. They are especially easy to support when they are overwhelmingly regarded as progressive and scientific, something Starr refutes. While there is certainly a place for data analytics and actuarial methods in the criminal justice system, it is important that such research be applied with the appropriate caution. Or perhaps not at all. Even if the tools had full statistical support, the risk of further exacerbating an already disparate criminal justice system should be enough to halt this practice.

Both Starr and Holder believe there is a strong case to be made that the risk prediction instruments now in use are unconstitutional. But EBS has strong advocates, so it’s a difficult subject. Ultimately, evidence-based sentencing is used to determine a person’s sentencing not based on what the person has done, but who that person is.

De-anonymizing open data, just because you can… should you?

October 23rd, 2014

If an essential part of the data reveals personally identifiable information (PII), should the data not be released? Should the users of open data be the ones responsible for ensuring proper use of the data?

I mention this question because of an article by an intrepid Gawker reporter who decided he could correlate photos of celebrities in NYC taxis (with visible Taxi medallions) and the de-anonymized database on every NYC cab ride in 2013 to determine whether celebrities tipped their cab drivers. Of course, this article is another example of “Celebrities doing normal people things like using taxis”, but the underlying question here is just because you can violate people’s privacy does it mean you should?

Identifying celebrities and their cab rides was first done by an intern at Neustar, Anthony Tockar. In his post he recognizes that it is relatively easy to reveal personal information about people. Not only could he match cab rides to a couple of celebrities, but he also showed how you can easily see who frequently visits Hustler’s. Tockar says:

Now while this information is relatively benign, particularly a year down the line, I have revealed information that was not previously in the public domain.

He uses these examples to introduce a method of privatizing data called “differential privacy.” Differential privacy basically adds noise to the data when you zoom in on it so you can’t identify specific information about an individual, but you can still get accurate results when you look at the data as a whole. This is best exemplified by the graphic below.

This shows the average speed of cab drivers throughout the day. The top half is the actual average speed of all drivers and the average speed of all drivers after the data is run through the differential privacy algorithm. The bottom half shows the same for an individual cab driver. Click on the graphic to go to an interactive tool that lets you play around with the privacy parameter, ε.

But we’re still struggling with getting data off PDF’s or worse, filing cabinets. It’ll take years before we can create such privacy mechanisms for current open data! What to do in the meantime? It would seem that Gawker stopped reading after “Bradley Cooper left no tip” (actually, we don’t know since tips are not recorded if paid in cash). Just because someone could look up ten celebrities’ cab rides does it mean they should have? The reporter even quotes Tockar’s quote about “revealing information not previously in the public domain”. The irony seems to have been lost on Gawker. I’m of the opinion that Gawker shouldn’t have published an article about celebrities’ cab rides no more than it should publish their phone numbers if they were available inside a phone book. Unless it was trying to make a point about privacy and open data, which would’ve made for a great conversation piece.  Except it wasn’t since it was all about tipping. They even reached out to publicists for comments on the tipping.

Ultimately, who cares about Bradley Cooper taking a taxi. But when you go “hey, let’s see how many celebrities I can ID from this data” and write an article about it without questioning the privacy implications, you’re basically saying “Yes, because you can, it means you should.”

UPDATE: ok, so apparently there is a reason it’s called “Gawker”. See this example where this same author tries to out a Fox News reporter. Today I learned.

Reddit is NOT a failed state….

October 9th, 2014

It has it’s problems for sure, but I wouldn’t be so quick to dismiss it as having failed.

I’m referring to a The Verge article posted about a month ago following the celebrity nude photo leaks. The main argument for FAIL is the fact that instead of chastising the users who help spread the leaked photos, Reddit protected them under the shield of free speech. I’m not here to argue whether Reddit acted appropriately or not in protecting the individuals (personally, they could’ve been kicked out, banned, arrested, and I would’ve been content with that). But I do not think this transgression in privacy, abuse of free speech, and overall disgusting behavior by a small group of a larger community a failed state makes.

Is this indicative of pervasive malicious behavior across Reddit? Absolutely. We didn’t need r/TheFappening to figure that out. Just talk to woman redditors about their experiences as participants.

But at least we’re talking about these issues. It’s not so much the fact that we are, it’s the fact that we have the ability to do so. Through their karma system, Reddit has built a system that promotes good behavior and–sometimes–reproves the bad. It’s a primitive system for sure, especially since it’s not immune to the hivemind behavior (for example, apparently the r/nyc hivemind thinks people have ZERO responsability to give up your seat in the subway for a pregnant woman (maybe they’re right and I’m wrong)). This system, I think, allows the hive to go through iterations of what they believe to be correct. In effect, every now and then it corrects itself. Take the terrible “detective work” conducted during the immediate aftermath of the Boston Marathon bombing. After the hive realized it was wrong (so wrong), whenever there was a post asking for some sort of crowdsourced detective work, it was often met with someone who commented on the terrible results that came out of the last time they tried to play detectives.  As a result, Reddit for the most part now knows: We should avoid digital vigilantism.

In the coming years we will increasingly see Nobel Prize winner Elinor Ostrom’s principles on governing the commons applied to digital spaces. Although primitively (and perhaps unintentionally) Reddit has created a space where communities are able to define their own boundaries, (sort of) align “governance” rules with their preferences, (kind of) ensure that those who participate in the community can have a say on the rules, and are (barely) able to sanction those who misbehave. It has a long way to go for sure. What happened with r/TheFappening is a case where a group of very misguided individuals were able to gather in one place and as a community behave inappropriately. In that case what Reddit might be lacking is some greater oversight over communities and their leaders. An oversight that’s not dictatorial, but rather an oversight that is also provided by a community (a council of communities?).

Another problem with Reddit (or any digital space, actually) is that whenever someone goes through the trouble of committing a crime–say stealing nude celeb photos–the “morality cost” of engaging in the immoral behavior is significantly decreased by the internet’s ability to massively distribute information at a significantly low cost. For the most part, the consequences for engaging in such immoral behavior do not exist. Especially when it costs nothing to click on a link. This is maybe one of the internet’s biggest weaknesses: it’s ability to facilitate engagement in immoral behavior.

We need to design digital spaces that somehow take this into account. Spaces where the community can more meaningfully participate and deal with the bad apples more effectively. Is Reddit, and the rest of the internet, full of misguided individuals who do some fucked up shit? Yes, but this doesn’t mean we need to take it to the back of the barn and shoot it. It means we need to think about how we create these digital spaces in the future. Or do away with it if you want, but then let’s take the good lessons and the bad, and let’s make something better.


Placemeter pays YOU for your data…

October 8th, 2014
Note: I have set a new goal to post at least once a week, even if the posts are short.

Turns out you may have some data to offer that is actually more valuable than just your online shopping patterns: the view outside your window. Placemeter is a relatively new startup that pays New Yorkers up to $50 to place their phones against their windows and record movements on the street below. Using nifty computer vision algorithms, Placemeter extracts data from the images recorded by your phone. The short video below gives a sense of what they are trying to track.

The front page immediately addresses the issue of privacy. The company will not use the data to record anything that goes on inside your home, they will not use the data to identify people on the street, and the video they record isn’t stored. They only store raw data extracted from the video.

Their business model is simple: they pay you a little bit per month to record information which they will later sell to third parties. You provide the product they later sell (hey, at least they pay you for it). Since their goal is to sell data to businesses and city governments, they are mostly interested in views of restaurants, shops, or bars. This means lots of people like me can’t participate (I have a very lovely view of a wall). This got me thinking on who else can and can’t participate. If you happen to live in (and have a view of) Times Square, your view could be worth dozens of dollars! What about a view from a quiet Staten Island street? Or from the Bronx? Basically in order to participate you just have to live in the right place. A place that is probably expensive too.

One redditor applied to sell his/her view and was rejected because the street wasn’t busy enough, but that he/she would be considered when the company started “sending out unpaid meters”. I imagine this means the company would mail you a sensor for free and you would record data for them. If this happens I can see them shifting the rhetoric towards “help us analyse and improve your urban environment”, which this article already does.

Seeing as how the most valuable data would come from a select group of New Yorkers, most of their most valuable data might come from the already freely available video feeds around the city (they should fill out the survey for the OD500).

How to Build a Website From Scratch

January 23rd, 2014

When I signed up to build the Open Data 500 website, I wanted to go through the entire process of making a website from scratch. Full stack. Just to sort of see what it was like.
After spending 5 entire 10-hour days trying to trouble shoot a feature on the site, I decided to write a post on the skills needed to build an entire website from scratch.

To build an entire website from scratch you need to know the following:

  • HTML5
  • CSS
  • JavaScript
  • jQuery
  • D3
  • ParsleyJS
  • Modernizr
  • Tornado
  • Python
  • MongoDB
  • Mongoengine
  • CSV
  • JSON
  • geoJSON
  • Regular Expressions
  • Seamless
  • Heroku
  • Command Line
  • Git / Github
  • Google Analytics
  • MailChimp
  • DNS Records (A, CNAME, MX, etc)
  • Oh yeah, go directly to hell, GoDaddy
  • Polar Vortex Survival Skills
  • Basic Pharmacology
  • UX
  • UI
  • FU
  • F712U
  • Scheme (might as well)
  • Creative Commons Licensing
  • PHP (throw in a couple more languages, just in case)
  • Java
  • Ruby
  • C#
  • C♭
  • Perl
  • .NET
  • Obviously not WordPress
  • Ballmer Peaks
  • Double-team keyboarding
  • Windows
  • Mac
  • Linux
  • Atari
  • SNES (you’re welcome)
  • Brainfuck
  • SSL
  • HTTP
  • API’s
  • SOAP
  • LDAP
  • TCP/IP
  • WOFF
  • DOM
  • Cookies
  • XSRF
  • RSS
  • XML

I think that’s about it. I’f you’re just beginning with web development. Good luck. You’re almost there.


(Seriously, though. Keep it up, the road is long and arduous, but it’s totally worth it)



My Social Network

October 13th, 2013

I was playing around with Gephi, and I loaded my Facebook data to visualize my social network (or at least my Facebook social network). This is the result (click for full size).


As you can see the network is pretty modular, which is to be expected since I’ve lived in 6 cities. There are 13 communities:

  1. High School, mostly my graduating class (21.97%) – Green
  2. The rest of Monterrey (17.41%) – Red
  3. ITP (16.4%) – Acqua
  4. Model UN (14.23%) – Light Blue
  5. UT Austin (12.57%) – Fuchsia?
  6. Oklahoma City (5.71%) – Purple
  7. Family, extended family, and family friends (5.13%) – Dark purple
  8. Schlumberger (3.32%) – Lime Green
  9. GovLab (1.3%) – Yellow
  10. NYCDigital (0.79%) – Orange
  11. Students For Sensible Drug Policy (0.72%) – Dark Blue
  12. Las Chilangas de Nueva York (0.22%) – Dark Blue inside ITP blob
  13. The group of Canadians I randomly befriended on a bus one day. (0.22%) – Tiny Light Green Offshoot from large Green blob

I filtered out those nodes which had less than 2 degrees (less than 2 mutual friends), but it was interesting to see the lonely nodes on my network. Those are mostly people that I have encountered while traveling alone or have randomly met. The graph contains 1,384 nodes (friends) with 35,226 edges (connections) between them. The longest path (network diameter) between two of my friends (without going through me) is 8. The huge blue dot in the middle is Gaby, and she is connected to 7 of my 13 communities and shares friends. In second place is Chantel who knows everyone in Monterrey.

Making your own graph

If you want to do this for your own Facebook data, go to http://snacourse.com/getnet. Authorize the app. I selected all options in case I want to use that data later. Click on the ‘click here‘ link in Step 2. The app will need to scrape your Facebook and this might take a while if you have a large network.

You’ll also need to download Gephi, an open source visualization software.

Once you’ve downloaded your data and Gephi, open Gephi and File->Open your data file (default settings should be OK). You’ll see a bunch of dots arranged in a square in the middle of the screen.

Screen Shot 2013-10-13 at 12.04.02 PM

You’ll need to tell Gephi to reorganize the graph. On the bottom left you can choose a Layout. I chose ForceAtlas 2, checked Dissuade Hubs and Prevent Overlap, and set Gravity to 50.

Click Run. You’ll see the dots start to move around. Depending on the size of your network, it might take a while before you start seeing a discernible pattern. You can click on an individual node to find information about it by selecting the Edit tool in the toolbar (bottom-most tool). The node info will be displayed on the edit tab next to the Partition and Ranking tabs.

Screen Shot 2013-10-13 at 12.19.17 PM

If you want to remove the lone nodes and just show your one giant network, on the right of your screen you’ll see a Statistics and Filters tab. Click on Filter -> Topology, and drag “Giant Component” below to where it says ‘Drag filter here‘. Click Filter at the bottom. I also filtered out nodes with less than 2 degrees. Drag ‘Degree Range‘ into your Queries as well. When selected, you’ll see Degree Range Settings at the bottom. Drag the sliders or double-click the numbers to edit them. (Don’t click Filter again, the button works like an On/Off switch, and it was already on from the previous step).


Before sizing the nodes by degree (in this case degrees represents mutual friends), let’s calculate the Average Degree. Under Statistics, click on Run next to Average Degree. You’ll get a result for average number of mutual friends across your network and you’ll get a nifty distribution graph. Usually this looks like a power-law distribution.

Now, go to the top left and click on the Ranking tab. In the drop down menu, select Degree. You can visualize with color, size, label color, or label size. I chose Size, but feel free to play around. Choose a range that fits best for your network, and hit Apply.

Screen Shot 2013-10-13 at 12.15.41 PM

By the way, if you graph isn’t changing much anymore, you can stop the ForceAtlas 2 Layout process. Click on Stop. The dots should stop moving.

Communities / Modularity

To color the different communities, you’ll need to calculate Modularity. It’s under the Statistics tab on the right. Click Run. Press OK for the default settings. Again, you’ll get a nifty distribution chart.

Go to the Partition tab on the top left. Under Nodes, click the Refresh Button Screen Shot 2013-10-13 at 12.27.49 PM. Select Modularity Class from the drop-down menu. If you don’t like the colors, you can right-click inside that window and select Randomize Colors. Or click on the individual colors and manually select your colors. Once you’re happy with the colors, click Apply.

Awesome! You’re own social network graph. Gephi is a lot of fun to play around, and I encourage you to do so. The Gephi website has a bunch of tutorials you can follow that will teach you some of the awesome things you can do. To save your graph as a PDF, click on Preview on the top-top left. Feel free to play around with the settings, they’re pretty straight forward. When you’re done, just click on Export SVG/PDF/PNG at the bottom left.

I’ll try to make my graph prettier. As soon as I can get Illustrator to open up this tiny file.

What One Database Marketing Company Knows About Me

September 8th, 2013

It’s no surprise that marketing companies gather data about you to sell off to advertisers who then deliver targeted ads via mail, email, or while you surf the internet. Sometimes it’s even creepy how much they know about you. So far, it’s been a bit of a mystery finding out exactly how much of your information these companies have. A few days ago one marketing technology company, Acxiom, launched a new service called AboutTheData.com which allows people to take a peek into how much information the company has gathered on them.  Acxiom is no small marketing company. According to the NYTimes, it has created the world’s largest commercial database on consumers. I decided to give the service a try to see just how much data this company had about me.

Since this is such a large company, and I’m such an active internet user, I expected to find Acxiom to have gathered a lot of information about me. I was slightly disappointed–or relieved–when I found out that they didn’t have that much information on me at all (honestly, I don’t know how I should feel about this). Before going into the data, here is a little more information about where this data comes from and what we are shown.

According to Acxiom, this data is collected from:

  • Government records, public records and publicly available data – like data from telephone directories, website directories and postings, property and assessor files, and government issued licenses
  • Data from surveys and questionnaires consumers fill out
  • General data from other commercial entities where consumers have received notice of how data about them will be used, and offered a choice about whether or not to allow those uses – like demographic data

The data they show us, is their “core data”. This data is used to to generate the modeled insights and analytics used for marketing, which they do not show. Acxiom says that we are shown all of their core data. They make no mention about whether there is other non-core, non modeled insights data.

The site allows you to view data from six categories categories. Below is the information that has been gathered on me. Economic and Shopping data is over the past 24 months.

Characteristic Data: Male, Hispanic, inferred single
Home Data: No data.
Vehicle Data: No data.
Economic Data: Regular credit card holder (as opposed to Preimum/Gold), Regular Visa, 2 cash purchases (includes checks), 1 Visa purchase.
Shopping Data: $139 spent on 3 purchases (the ones referred to above?), 2 offline totalling $100, average $50 each (one purchase < $50, the other >$50, so I guess it’s a coincidence they add up to $100), 1 online for $39. My supposed interests include books, magazine, Christmas gift purchase, ethnic products (??), lifestyles, interests, and passions.
Households Interests Data: No data.

It makes sense that there is not be a lot of information about my home data or vehicle data, since I currently own neither (although there was no info on my previous vehicle ownership). Perhaps car and homeowners would have these sections filled out entirely. The household interests category is meant to include data related to interests of me or people in my household (examples given from the site include: gardening, traveling, sports). Not so surprised this is also empty, but I’m not sure why they guess that my shopping interests include ethnic products and yet they are not able to guess that I enjoy traveling. As for Characteristic Data? My Twitter feed should be enough to reveal that I’m a single male hispanic. Since you have to provide your name, email, address, and last 4 digits of your SSN, it’s pretty safe to assume that they also have this information.

**To skip Luis’ short history of shopping, jump to the next paragraph.
Economic and Shopping Data provide a little more hints as to where the data are coming from. First of all, they only have three purchases. That’s it. Out of the 3,100 card/check purchases I’ve made over the past 24 months, they have 3. I tried looking for two offline purchases on my Mint which add up to $100, but this proved to be a very difficult exercise. Even after filtering offline purchases and sorting data, there were too many possible combinations. For now, those two offline purchases remain a mystery. I was able to find a suspect for the online payment of $39. The most suspicious purchase came from a $39 seat upgrade at United Airlines. I can’t be sure if this is the one since I happened to buy a $39 upgrade, plus a plane ticket which does not show up in my AboutTheData. However, my suspicion arises from the fact that Mint had prepared a targeted ad for me by placing a green flashy dollar sign next to the purchase. This also could’ve been a coincidence.

Conclusions/Best Guesses
Given the fact that I spend A LOT of time on the internet and the high amount of purchases I’ve made over the years (I should cut down on those), I am surprised that Acxiom does not have more data about me. Basically, they know I’m a single, male, hispanic, and that’s about it. I can’t possibly imagine what they could gather from the rest of my data that’s worth $$$ to advertisers. Additionally, it seems a lot of their data comes from publicly available government data sets (home and car ownership), and–at least in my case–not a lot of data comes from neither my online habits or my shopping habits. I presume most of my important data is owned by Facebook and Google, and I’m pretty confident that they do not sell/share my data with Acxiom.

Last thought: AboutTheData let’s you edit your data so that you can receive more accurate targeted advertising. I’m curious to know who uses Acxiom data to target me, so I would’ve loved to enter distinctive preferences that do not apply to me (yet) such as “pregnancy”, “colonoscopies”, “underwater basket weaving”, or “Cook Islands National Women’s Football League” to see where these ads pop up. Unfortunately, AboutTheData only lets you change the above mentioned interests to ‘true’ or ‘false’. I guess they thought about the trolls.

RT @MartinLutherKingJr I Have a Dream… #CivilRights

August 28th, 2013

On August 28, 1963, Martin Luther King Jr. delivered his famous ‘I Have a Dream’ speech during the March on Washington for Jobs and Freedom. The event, attended by over 250,000 people turned out to be a defining moment for the Civil Rights Movement. Fifty years later, the country has made great strides towards equality, but a lot remains to be done not just in terms of equality, but also in a wide range of social issues. While some of the issues of today are the same as the issues fifty years ago, new technologies and new ideas have quickly empowered us to react to these struggles in ways where we don’t yet fully understand its effect. The widespread adoption of the internet and social media have by no means replaced traditional marches like Dr. King’s, but rather it has augmented the form in which people participate in social movements. Going forward, it’s important to ask what is the effect of these new forms of activism. Will traditional marches like the one fifty years ago ever be replaced? Will online forms of protest ever match the effects of, for example, the 60’s anti-war movement? What would Dr. King make of the hundreds of thousands of people who tweet for a cause?

Screen Shot 2013-08-28 at 9.16.38 AM


Click on the image to see the Processing sketch. Code included.

*This post was posted in this blog in tandem with The GovLab.
**No… the tweets are not live.

Red Burns

August 24th, 2013

I’m pretty bad at words as it is, and in moments like these, I’m especially bad at words. So I usually don’t say anything, out of fear that whatever I say will sound stupid. So instead of mine, here are hers.

Red would present this on the first day of her Applications to the incoming ITP class. (Transcribed by Chris Selleck, posted to the ITP Alumni list by Michael Colombo)


What I want you to know:
That there is a difference between the mundane and the inspired.
That the biggest danger is not ignorance, but the illusion of knowledge
That any human organization must inevitably juggle internal contradictions – the imperatives of efficiency and the countervailing human trade-offs
That the inherent preferences in organizations are efficiency, clarity, certainty, and perfection.
That human beings are ambiguous, uncertain, and imperfect.
That how you balance and integrate these contradictory characteristics is difficult
That imagination, not calculation, is the “difference” that makes the difference
That there is constant juggling between the inherent contradictions of a management imperative of efficiency and the human reality of ambiguity and uncertainty
That you are a new kind of professional – comfortable with analytical and creative modes of learning
That there is a knowledge shift from static knowledge to a dynamic searching paradigm
That creativity is not the game preserve of artists, but an intrinsic feature of all human activity
That in any creative endeavor you will be discomfited and that is part of learning
That there is a difference between long term success and short term flash
That there is a complex connection between social and technological trends. It is virtually impossible to unravel except by hindsight.
That you ask yourself what you want and then you work backwards.
In order to problem solve and observe, you ought to know how to: analyze, probe, question, hypothesize, synthesize, select, measure, communicate, imagine, initiate, reason, create
That organizations are really systems of cooperative activities and their coordination requires something intangible and personal that is largely a matter of relationships
What I hope for you:
That you combine that edgy mixture of self-confidence and doubt
That you have enough self-confidence to try new things
That you have enough self doubt to question
That you think of technology as a verb- not a noun
It is subtle but important difference
That you remember the issues are usually not technical
That you create opportunities to improvise.
That you provoke it. That you expect it.
That you make visible what, without you, might never have been seen
That you communicate emotion
That you create images that might take a writer ten pages to write
That you observe, imagine and create
That you look for the question, not the solution
That you are not seduced by speed and power
That you don’t see the world as a market, but rather a place that people live in – you are designing for people – not machines
That you have a stake in magic and mystery and art
That sometimes we fall back on Rousseau and separate mind from body
That you understand the value of pictures, words, and critical thinking
That poetry drives you, not hardware
That you are willing to risk, make mistakes, and learn from failure
That you develop a practice founded in critical reflection
That you build a bridge between theory and practice
That you embrace the unexpected
That you value serendipity
That you reinvent and re-imagine
That you listen. That you ask questions.That you speculate and experiment
That you play. That you are spontaneous.That you collaborate.
That you welcome students form other parts of the world and understand we don’t live in a monolithic world
That each day is magic for you
That you turn your thinking upside down
That you make whole pieces out of disparate parts
That you find what makes the difference
That your curiosity knows no bounds
That you understand what looks easy is hard
That you imagine and re-imagine
That you develop a moral compass
That you welcome loners, cellists, and poets
That you are flexible. That you are open.
That you can laugh at yourself. That you are kind.
That you consider why natural phenomena seduce us
That you engage and have a wonderful time
That this will be 2 years for you to expand- take advantage of it
Appolinaire said: – Come to the edge, -It’s too high, – Come to the edge, – We might fall, – Come to the Edge, – And he pushed them and they flew


R.I.P. Red Burns