Why: I wanted to show readers the sheer number of campaign ads barreling through Iowa’s airwaves in the run-up to the 2016 caucuses. When my first shot at a data visualization fizzled, I wondered if there was a more interactive way to put readers in Iowans’ shoes.
What I learned: A lot about game mechanics! Also, animating thousands of HTML blocks down a user’s screen will eventually turn their processor into a white-hot brick — unless you do it efficiently 😊
Why: @realDonaldTrump might be the most powerful Twitter account in the world. Some tweets are written by Trump himself; others are written by staffers. I wanted to know which ones were which.
How: During the campaign, Trump almost exclusively tweeted from an Android phone. His staffers, on the other hand, used iPhones. Using archived Twitter data, I built a machine-learning model that used language patterns to predict whether Trump was the author of a given tweet.
What I learned: Keeping a model up to date is really hard to do when you don’t have new data. Trump gave up his Android when he became president, so the classifier is based only on his pre-election tweets. It’s increasingly inaccurate.
Where: The Atlantic
Why: After it became clear that comprimised internet-connected devices had brought down major websites like Twitter in the Mirai botnet attack of 2016, I wanted to see exactly how quickly an unsecured device could be hacked. What better way than to build an IoT toaster?
How: OK, I didn’t build an actual toaster. Instead, I set up a server using Amazon Web Services that outwardly presented a fake login screen. As hackers attempted to break in, it played along and recorded their keystrokes.
What I learned: My fake toaster was hacked within an hour, and many times after that. If you connect an unsecured device to the internet and expect the anonymity of your IP address to protect you, you will be comprimised immediately.
Media mentions: All Things Considered interview
Where: The Atlantic
Why: The American health care system is quite expensive compared to other industrialized nations. We wanted to see how the high amount spent on the small slice of sickest patients contributed to that total.
How: I used R to analyze the Medical Expenditure Panel Survey for this story, a massive dataset maintained by the U.S. Department of Health and Human Services. It’s a great source because while it relies on household surveys to get information on medical expenses, it follows up with medical providers to verify the information.
What I learned: The sickest 5 percent of Americans account for about 50 percent of the nation’s medical costs, a staggering proportion. I also loved working with Polygraph, who contributed coding and design help.
Where: The Atlantic.
Why: My office uses Slack to communicate, which has native emoji support. I noticed that while many of my coworkers of color used darker-toned emoji, pretty much all of my white colleagues — including myself — preferred yellow emoji.
How: The meat of this analysis comes from a sampling of Twitter’s realtime tweet stream, filtering U.S. tweets for messages with emoji and then cataloging the skin tones used. I then compared the relative incidence of emoji tones to the reported demographics of U.S. Twitter users.
What I learned: As expected, white people don’t use white emoji. The story’s reporting gave me a deeper understanding of why, and why defaulting to the yellow emoji is also fraught.
Why: Trend data is important but sometimes doesn’t make an impression on readers. I wanted to see if having readers draw their best guess at a trend would make them more receptive to a counterintuitive reality.
How: This originally was a one-off interactive for a story on homicide rates. I subsequently built Guess Graph into a reusuable web component, available on Github.
What I learned: More than 11,000 people interacted with the graphic; I collected the aggregate guesses in a separate post. But I’m still looking for good ways to test whether this form makes people more open-minded to new information.
Why: Washington’s cherry trees are beautiful. But everyone rushes down to the Tidal Basin to see them during peak bloom season. Meanwhile, there are tons of trees planted elsewhere in the city that are just as pretty.
How: DC’s Urban Forestry Department maintains a database of all street trees cared for by the city. I dropped this data into a Postgresql/PostGIS database and built a mobile web app allowing you to search by location and see details.
What I learned: Cherry trees are awesome, no matter where they are. Also, browsers have become increasingly strict about allowing access to the user’s location; keeping this app functional has required a bunch of updates.
Where: My desk! But also, this post at The Atlantic.
Why: I love Sweetgreen. Their salads are really good. So is their app. But since I often buy the same thing, why not make something that automatically buys it for me? That way, I can just walk over and pick it up.
How: I reverse-engineered Sweetgreen’s ordering API, wrote a script with my payment information, and configured an Amazon Dash Button to order a Guacomole Greens salad (no onions!).
What I learned: Salads are good! Also, companies choose the worst times to change their undocumented APIs and ruin your lunch 😡
Where: I shut down Philly Rap Sheet several years ago, but you can see it on The Internet Archive.
Why: As a local reporter at The (Allentown) Morning Call, one of my duties was to drive out to the offices of local judges and page through arrest reports, looking for crimes of interest. I wrote a scraper that pulled these records digitally from the comfort of my desk; I used the same scraper to launch Philly Rap Sheet, which focused on my hometown of Philadelphia.
How: The scraper itself was a terrible PHP script. Come to think of it, so was the website. This was one of my first major server-side programming projects.
Media mentions: Neiman Journalism Lab.
What I learned: A lot about programming. But even more about ethics. Once folks showed up on Philly Rap Sheet, they were there permanently; I didn’t rescrape the data, which would have been unfeasible as the database grew larger. In months and years after I launched, I received many take-down requests from people who had their records expunged, and a few letters from lawyers, too. Eventually, I decided it was unethical to make this data publicly Google-able without updating it, and I shut Philly Rap Sheet down.