Once upon a time, there was some data. And that data cried out to be extracted and analyzed and packaged up like the most exquisite of gifts and then presented gloriously to an eager and excited group of stakeholders. But, alas! Will this data story have a happy ending? Perhaps. Perhaps not! And that’s the subject of this episode. Sort of. Our intrepid hosts ask the question, “How can we communicate more effectively by applying the tricks of the data journalism trade?” To answer that question, Walt Hickey, late of fivethirtyeight.com and now the founder and curator of the daily Numlock Newsletter, joins the gang to chat about how he combined an education in applied mathematics with an interest in news media to become a data journalist. Along the way, the discussion explores how Walt’s insights can be applied to business analytics. And there’s a terrible analogy about meat that gets butchered along the way (thanks, Tim!).
Items Referenced in the Show
- Numlock Newsletter
- Nate Silver
- How To Spot A Front-Runner On The ‘Bachelor’ Or ‘Bachelorette’
- We Give Captain America 4-To-1 Odds Against Winning The Civil War
- The Ultimate Halloween Candy Power Ranking
- Dona Wong: WSJ Guide to Information Graphics
- Business Insider
- ggthemes (R / ggplot2 package that includes a fivethirtyeight.com theme)
- Significant Digits
- Randall Munroe
- What If?: Serious Scientific Answers to Absurd Hypothetical Questions
- Lea Pica
- Switch: How to Change Things When Change Is Hard
- John Jackson Miller
- The Numbers: Where Data and the Movie Business Meet
- Bruce Nash
- Pokémon Go
- Andy Raskin: Steve Jobs’ Secret for Eliciting Questions, Overheard at a San Francisco Cafe
- Augustine For: State of Digital Ad Fraud Q2 2018
- (Bonus) 1:1 with Chad Sanderson: The Pitfalls of A/B Testing
- #084: Bayesian Statistics and the Digital Analyst with Dr. Elea Feit
- The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t
- The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
- Alan Turing
- The Ann Friedman Weekly (Newlsetter)
- Anna Maria Barry-Jester
00:04 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest, discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour, and their website, analyticshour.io. And now, the Digital Analytics Power Hour.
00:27 Michael Helbling: Hi everyone. Welcome to the Digital Analytics Power Hour. This is Episode 92. You know, nothing is worse than misunderstanding or being misunderstood. With the amount of data now being produced within our businesses, the media and the world, we’re inundated with not only data points that are hard to comprehend, but in addition there are those who seek to use data to intentionally mislead. And there’s tons of quotes out there linking freedom with responsibility, and if we are to enjoy these freedoms and our increasing paychecks and these good times, are we not responsible to understand the data we see and use, and shouldn’t we use our data literate powers for the good of all? Tim Wilson, some have called you the quintessential analyst. What say you about these data miscreants out there?
01:19 Tim Wilson: So, definitionally, I think some — equaling one — would raise some questions there.
01:25 MH: So, you’re saying that I’m trying to use data for evil by saying that. Is that what you’re saying?
01:30 TW: I am. You’re a data miscreant.
01:32 MH: Oh, I’m a data… Well, there you go, and now you understand the central tension of our podcast. And Moe Kiss, you fight the good fight every day at The Iconic, are you ready to take on the powers that be?
01:45 Moe Kiss: I certainly am, team.
01:47 MH: Awesome. And always, I am Michael Helbling and just glad to be here. But even with our wildly popular podcast and Tim’s awesome hair, we can only reach so many. We needed someone who could get to the masses, someone whose analysis spans Madden, the Oscars, maybe even Halloween candy, and it just so happens that our guest is that hero. Walt Hickey has spent the last four and a half years as the chief culture writer at FiveThirtyEight. Maybe you’ve heard of it. And just this past May, he has started his daily data newsletter, Numlock, that highlights the context and importance of the numbers you read about in the news. Welcome to the show, Walt.
02:28 Walt Hickey: Thank you so much for having me. This is really exciting.
02:31 MH: Awesome. Well, we made it sound really tense off the top but actually, a lot of your writing is super-interesting because it delves into data topics sometimes that people overlook. So, what got you started on this path and interested in going down these alleyways of data, if you will, and doing analysis in these places?
02:49 TW: And from a degree in Applied Mathematics, you don’t typically think that… I don’t see the… How that leap happened is intriguing.
02:58 WH: Oh well, so a lot of it comes down to I got into this because I think it’s the most fun job in the whole world. When I was in college, I went to William & Mary, and I majored in Applied Math, but on the side, if you only do an Applied Math degree, you drive yourself up a wall. So, as a side project, I worked for the student newspaper. And those were two very separate things for a while until I started following the news and following what folks like Nate Silver were doing and I realized that there is a lot more middle ground there than I originally thought.
03:27 WH: So, I started looking away from going into the business sector and started looking more and more into opportunities in journalism and what really motivated this was, I had a good internship, I worked for OpenSecrets one summer, and then at the student newspaper we got to play around a little bit with stats. And I think that if you wanna get at why I think this field is so interesting, it’s like, if you tell somebody, “I work in data journalism”, they might get bored by it. But if you tell a student, “I will tell you where to park so that you’re least likely to get a parking ticket,” that’s how you can effectively communicate the power of what we do.
04:00 WH: And so, a lot of times, it really comes down to, I think that data journalism has the potential to fundamentally change the power structure in how folks who have little access can start telling important stories, and in order to advance those somewhat lofty goals, I think that the best way to do it is to, we try to reach people where they’re at. And so, if that means writing about The Bachelor and The Bachelorette from an analytics perspective, if that means going deep in on The Avengers, of who would actually win in Civil War, and if that means going through Halloween candy and finding out which one is the best one, then I think that’s a really effective way to compel people to take a look at this and trust it and think about using numbers more and more in how they trust the news coverage that they consume.
04:44 TW: Outside of working on the school paper, did you actually take courses in journalism? ‘Cause I think we talk on the digital analytics side a lot about the need to be effective storytellers and we look to the data journalists as being like, “Look, they’re having to appeal, they’re having to explain this stuff, we have an enormous… Dona Wong, her dubious, The Wall Street Journal Guide to Information Graphics.” There’s a ton that we feel like, on the business analytics side, we can learn from journalism and where data comes together with it, but did you actually have formal journalism training classes mixed in with the applied math?
05:21 WH: Not so much. William & Mary doesn’t have a J School, but it is a liberal arts college, so as a result, even the math majors have to take English courses, history course, and so you do get that very well-roundedness, which I do think is, to your point, a class outside of that. But, a lot of this is always learning by doing, and a lot of what you’re seeing in media today is like… Never before has it been that so many people can get access and work for a journalism outlet. So, for instance, the site that I first worked at, Business Insider, was willing to tutor me, teach me in the journalistic, I guess, skills that I was lacking immediately out of college, but I was also bringing to the table something that they didn’t necessarily have. And so I think that a lot of this is learned by doing. I’m self-taught in Python. I took a little bit of computer science in college, but the reality is, is that so much of this is… It’s out there if people wanna grab it. And it’s like, formal journalism training is absolutely important but also on the job is how everybody picks stuff up.
06:22 MK: When you approach a story, what makes you decide on which topic that you’re interested in? Can you talk us through a little bit about how you actually go about tackling it? Do you start out with a question? Do you start out with the data? What’s your methodology?
06:33 WH: Yeah. With culture coverage, the primary challenge that we’ve always had in culture is, if you wanna talk about politics, you know where to find the polls. If you wanna talk about sports analytics, you know where to find the scores. If you wanna talk about the economy, you know that you go to the Fed, they got a whole bunch of data. Then everybody agrees on the terms that you’re talking about. With culture, there’s no single repository, there’s no single source of data. It’s not as easy as, “Oh, I have this big pile of data in my lap. Let’s do something with it,” which is a lot of times what you’ll see in sports analytics and political analytics.
07:07 WH: For us, it always starts off as a question. It’s always like, “Here’s an interesting show. And why is this interesting? Why is this resonant?” “Here’s an interesting career or person. Let’s see what we can find out about what connects their career through the through line.” And essentially, I attack it from a, like, “I have a topic that I’m interested in. I have a theory about it, but I love it when that theory gets blown out of the water by what the data shows.” And so the idea is that, typically what we’ll do is we’ll find a story topic that we wanna cover. We’ll figure out a couple different approaches on how to attack that and get that information. And then eventually, we’ll go through.
07:43 WH: And essentially, all that we do is really make an argument. The idea is that you don’t really prove stuff with stats. You demonstrate and argue in favor of it and say, “The evidence supports, in general.” And you say, “Here’s why I believe this.” But you never prove anything, right? So, a lot of the time, it’s just, “Here’s our argument. Here’s the data. If you wanna take it and then draw your own conclusions from it, and take other permutations of it.” And essentially, that’s kind of how we do it. It’s very problem-solving based, and it’s quite a bit of fun.
08:11 MK: It sounds like you have the dream job.
08:14 MH: So it is interesting, and I’ve actually used FiveThirtyEight just writ large as a last call a year or two ago, just because I think that site does so masterfully manage to hit a… There is a linear flow to every story. There is an effort, clearly, to find the balance between the statistical methods and pieces and the digestibility of the material. And then there’s phenomenal data visualization. I mean there’s stuff, the style guide to R, there’s a R package for matching the FiveThirtyEight style. But that is kind of… Can you talk a little bit about finding that balance? Because I feel like sometimes in the digital analytics world, there’s this tendency to want to, “Let’s show all of our work,” which puts the audience to sleep.
09:07 MH: FiveThirtyEight has historically been able to walk through and be engaging. How many diversions down, “Well, that’s not worth including, that’s not worth including, that’s not the story,” how much of finding the story, outlining what’s the flow gonna be? Which of these are we gonna dive deep and explain what we’re really doing here versus just kind of wave our hands and say, “Look, we’re using a clustering algorithm and it does x.” How does that play out?
09:32 WH: Yeah. And, I think that you get at a really interesting challenge when it comes to data journalism, and one that in my time at FiveThirtyEight, I’ve thought about constantly. It definitely led to how I perceived Significant Digits, which was the column that I ran for them, and it’s actually what motivated to, since I left FiveThirtyEight in late May, to start this new Numlock newsletter. And the idea is that, what is the line for the difference between the interesting fact and the important fact? And I think that so much of how numbers are treated in the news comes down to, it’s kind of a brute force attack. It’s a way of like, “Okay, here’s the statement that I’m going to make. And in order to back up that statement, I’m going to list every available number that I have.” And a lot of times, this is an ineffective way to persuade somebody, to convince somebody, to back up your arguments.
10:21 WH: Most numbers go in one ear and out the other, that’s just the nature of the beast. And so, through Sig Dig and now in Numlock, I’ve focused on three major ways that folks can talk about numbers and highlight interesting and important numbers, while also really grounding it. A lot of that comes down to context. One of my favorite interviews that I got to do at FiveThirtyEight was with this fellow, Randall Munroe. He’s the guy who makes Xkcd. He’s the author of the book, What If. And he was telling me, “Whenever I think about including a number in my work, I always ask myself, ‘If I added a zero to the end of that number, would that change my mental conception of it?'” So, if I say, “5,000 tons of concrete,” I don’t know what the hell that looks like, and I don’t know what the difference between 5,000 tons of concrete and 50,000 tons of concrete is.
11:06 WH: So, if you’re able to contextualize that, and if you’re able to say, “Enough concrete to pave over 50 parking lots,” that’s how you’re able to effectively communicate a number by grounding it in reality, by giving the extra context. You can also personalize a number. If you’re trying to just talk about a government policy, a lot of the time saying, “This policy cost $14.2 billion,” that’s an inconceivable amount of money. I am fundamentally incapable of imagining $14.2 billion, and have all the bonafides of a guy who’s decent at math. The reality is, is that if you wanna make that number meaningful, you have to boil it down to, “What does it mean for me personally? What does this benefit effect for me personally? What does this mean, tax-wise, for me every year?”
11:45 WH: And I can understand $1,000 and $10,000, but whenever you’re talking about affairs of state, really personalize it and get that down. And then, the last part is, and this is the main crux of what I try to do, is the tone. So much of what you were saying is that, so many people do is once they have numbers, they use every one of them. They talk about them with the seriousness and the monomaniacal solitary mathematical way of talking about it that everybody does because you’ve worked really hard on the numbers and you know that they’re right, and you’re going to explain them perfectly.
12:17 WH: But the reality is, is that you strip out your ability to communicate people-to-people. If you’re not joking and having the normal hallmarks of conversation, the same thing that you would do whenever you’re trying to persuade somebody or have a conversation, then you’re kind of stacking the deck against you. I mean, you’re speaking as if the numbers are changing what you’re trying to say but the numbers are really just adding to what you’re trying to say. And so, so much of what I tried to do at FiveThirtyEight was really hone in on a fun conversational tone. Again, I was looking at how you guys pitch yourself, which is just like the conversation at the bar after the analytics conference, and that is how everybody should talk about analytics. And so, having a very down-to-earth conversational style, I think is something that FiveThirtyEight was very, very good at doing when it came to talking about complicated things. And that’s something that I really wanna continue on as I continue to make Numlock News.
13:10 MK: Do you guys think that that conversational tone is something that can be transferred to the analytics practitioner in business, or do you think there needs to be a different tone?
13:22 MH: I was listening, thinking how… I remember Lea Pica at one point saying, “Put people in your slides.” She goes through a thing where she says, “To show a quote from your site survey that says, ‘The checkout is broken,’ is one thing. To show an angry person’s face with the quote next to it saying, ‘The checkout is broken, that that’s how much more powerful.” So, Walt, as you were talking, I was thinking through and I’ve actually been just today, I was working on like, “How do you express to a analyst that when you’re presenting the results of an analysis, the starting point… Well, the starting point really to me is a storyboard, but it’s so often it starts with, “What are the charts I need to show?”, as opposed to that whole leap to the narrative, and if you can figure out some way, ’cause it’s just gonna make it more memorable and more comprehensible. It’s way harder, right?
14:19 TW: And Walt, in the 15 minutes we’ve been talking, your mind goes in places where you can make these kind of asides that make it viable and interesting. I think some analysts do struggle with that, but I think, absolutely, almost nowhere, nobody wants to go into a meeting and go slide, slide, slide. We bludgeoned Corporate America into having that as the expectation, but I feel like I’ve never gone and presented something, making asides that are relevant and tie in, but drive a point home, seems very, very effective, right?
14:55 MH: That was the thing that lit up for me as you were talking, Walt, because I’ve often struggled with this conceptual framework of applying a very specific and unique story to make a broader point that supports your underlying… This is so… All of our politics are done this way now, right? It’s sort of like, “Let me tell you about Joanne. She’s a factory worker in New Jersey.” We’re advocating for a much broader set of things, but if you can believe in Joanne and her story, you’ll buy into my policy idea.
15:29 MH: I’ve always struggled with that as an analyst, because it felt like, was I really being true to the data if I was making something that narratively personal? And I think that was what I took away from that, is I think you can do both. And it was reminding me of the book by Chip and Dan Heath, Switch, where they go through a lot of those kinds of contexts, things of like, “Here’s the picture we made. Here’s how we help people see that contextual narrative develop or understand, so that then change or alignment could happen because of it.”
16:01 WH: Yeah. I think you’ve really hit on something interesting, which is that I think that when I did Significant Digits and now, in Numlock, the idea is there’s a difference between the important number and the interesting number. And a lot of the times, I think it’s actually better to lead with the interesting number, and then follow up the important number. The idea is that if I have a number that gets you interested in, I don’t know, let’s say Canadian Football. If I were to tell you the score of last year’s Grey Cup… Any Canadian Football fans here? Maybe that wouldn’t super-resonate, but like…
16:36 MH: Now that Manziel is part of the CFL. I mean, come on.
16:38 WH: Right. That’s what I’m saying. If you are able to get an interesting figure…
16:42 MK: You’ve lost me.
16:43 WH: Does it mean that it makes somebody has to have a relationship with this league that they are totally unfamiliar with, or an exceptional touchdown. Again, if you were able to highlight an interesting thing about this, and then you follow up with the top line numbers, a lot of times that’s a much more effective way to bait people along. And I think a lot of… What you guys… This was just what I was thinking when you were talking, is one thing, like I mentioned, I don’t have any formal journalism schooling, but there were some kind of techniques. And one thing that the FiveThirtyEight charts desk has super-nailed down is that they don’t ever have a chart that’s just, “Here’s every point.” Every single chart has to have a point. It has to be making a one-sentence argument, in many ways. That could be, “Look at the line going up.” It could be, “Look at the line going down.” It could be, “Look at this fucked up outlier.” It could be any of it.
17:33 WH: But the thing is, the chart is never just, “Here’s everything.” The chart is always, “Here’s a specific point that I’m trying to make with this visual conveyance.” And that applies to paragraphs, that applies to sentences, that applies to every slide. The idea is that every single one of those has a point. Every sentence has one point and no more than one point, or else it’s doing too much. The same thing with a paragraph, every single graph should have one conceptual point that you’re trying to get across, and stories, to an extent, can have lot of ideas, but generally you want people to have a takeaway from that. And so, that’s how you can build that out and conceptualize, making sure that you’re not trying to jam through 100 points at once.
18:10 TW: That takes you straight to the analysis, the death by PowerPoint, the, “I’m gonna have one slide that has 10 bullet points, and every one I’ve got on there because I feel like it’s an important point.” Well, then it should be 10 slides, which it may be, or it turns out maybe half of those are maybe not really worth keeping, or they’re repetitive.
18:32 WH: If everything is important, then nothing’s important, that’s the whole point.
18:34 MH: That’s right. It’s amazing how much our instincts are counter to this in a certain sense. Like, people wanna show you everything they’ve learned about the topic, or they want to demonstrate how much hard work and analysis they’ve put in, and they literally snowball people to death and don’t deliver. Kind of like you mentioned before, just bludgeon you with data point after data point, and affect nothing. And I just think that’s really interesting that there’s almost a counterintuitive approach you almost need to take, where you kind of have to go back and be like, “It’s not for me. Who is it for?” And then go from there. Which is interesting.
19:14 MK: So actually, Walt, can you talk a little bit about how do you do that yourself? When you start to put together your narrative, how do you actually go through the process of like, “This is what I need to cut, this is not important.” What’s that look like?
19:28 WH: It sucks. Everybody hates deleting perfectly good words. But no, I mean, that’s what a good editor’s for, right? That’s what the whole editing process… I typically write everything, and then it comes in 800 words too long, and then I start deleting and combining. And everything should be shorter. I see a lot of movies and I think that every movie could probably stand to be five to 10 minutes shorter, and a lot of times it’s just because somebody didn’t wanna kill their favorite scene. And that’s so true in journalism, I think that your average piece on the internet is probably 20% larger than it really needed to be. Good editors really do a great job of getting that kind of conciseness across.
20:06 WH: But I mean, it’s never easy, it’s definitely like you have to make the active choice every single time to like, “We can cut a little bit more.” And it’s always the right call. [chuckle] Having shorter, more concise, more incremental work a lot of the times, rather than big magnum opuses, are way more effective ways of talking about information, generally speaking.
20:28 TW: To me, if you’re able to cut a little bit of the… You cut the fat, that’s easy, then you start cutting a little bit of the meat. From presenting it in a business context, that means that I’ve got some meat in my back pocket, which is a really weird analogy that I’m regretting heading down the path on. [chuckle]
20:46 WH: But if somebody asks in the Q&A session, then you’ve got something to toss ’em, you know?
20:49 TW: Exactly, that’s… Exactly, yeah, I could toss them, I could toss them, you’re gonna have to cut the bone a little bit. But we don’t seem to have a… In journalism you do have an editor, you’ve got the writer and the editor, and the editors, effective editors are really trained to do that, “Hey, I’m gonna tell you your baby’s ugly. This piece can be taken out.” And in analytics, we just kind of expect people… Even we will tell people to practice, or run their stuff by another analyst, it rarely happens…
21:21 MH: It doesn’t happen enough, yeah.
21:21 TW: To say, “Is this coherent? Do I drone on too long? Am I showing too much of how the sausage is made?” We just don’t do that. And partly ’cause I feel like we think that’s too much time, even though that’s so critical, is to communicate it, but then there was a second reason that is now totally escaping me as to why that… Well, we just don’t have that expectation, I guess. We just think that an analyst should be doing it all, doing the analysis, coming up with a story, developing the story, presenting the story, and we don’t have a structural setup of an editor, basically.
21:55 MK: Yeah, but… Yeah, I don’t know. I think if you have a strong team, you do send it to someone to review and check.
22:00 TW: Do you do that regularly? Do you?
22:02 MH: Yeah, Moe does it, Tim.
22:04 TW: Of course.
22:04 MK: Yeah, quite often. I’ll be like, “Here’s an email. Can you… ” Yeah, absolutely. ‘Cause otherwise, I do drone on. Can we just go back a point though? Tim, are you saying that you advocate for, and I don’t wanna use this analogy ’cause it’s gross, for leaving some meat in your back pocket? Or even potentially bone?
22:22 MH: It could be cured meat, Moe. Like a little bit of jerky.
22:25 TW: Yeah, I mean because the fact is [22:28] ____ Well, maybe to over push the analogy a little bit, I may have a wonderful 22 ounces of steak, but that doesn’t mean that the person I’m serving, they need 14 ounces. Is there any harm in me cutting off that 8 ounces and saying, “You know what? Maybe you’ll be hungry after, maybe I’ll deliver it to you next week, the fact is, let me find the best 14 ounces and give you that steak.” We’re gonna destroy this analogy.
22:55 MK: But I think the concern is… And I hope that Walt would kind of agree, no pressure, that the person needs the right amount of information to make a decision, and particularly in a business context, they need the information they need to make a decision. So, yes, I believe in culling to get to a point that what you’re saying is understood, but not to keep extra stuff in your pocket just in case.
23:21 TW: No, no, no.
23:21 MK: Like, if they need that, then it should be part of the narrative.
23:24 TW: But if you have 10 points to support your thing, the tendency is gonna be to show them all 10 points, when really, four is you’ve made the point. The fact is you’ve got another six that also back it up, does it become that much stronger that you’ve got 10 instead of four? You may present those four and they say, “Yeah, but what if we looked at it this other way, would that counter?” And you’re like, “Oh yeah, I looked at that too. That’s what you’ve got. Yeah, I looked at that too, it also supports it, I just didn’t wanna kill you by PowerPoint.”
23:55 MH: Yeah, everything’s stronger in threes, that’s… Three bullet points, and get out, that’s my rule of thumb.
24:03 TW: No bullet points.
24:03 MH: No bullet points.
24:05 MH: Whatever. Alright, I wanna go down a little different path because we’ve been talking about how we as analysts engage with this, but actually for all of us, I think that we have a passion for creating better understanding across people who wouldn’t consider themselves analysts. And so I’d love to delve into that side of it, ’cause I think, A, that’s probably a big part of your new newsletter, Walt, and certainly all of us are dealing with business users and owners who aren’t gonna go in and try to enunciate the data. But how do we create better sort of data literacy, if you will, across the broader population?
24:45 WH: Yeah. I think the way that you do that is that everybody kind of declares it’s cool if you don’t think that you know stuff about math, and you don’t need to pretend, and it’s okay. Because a lot of people have built up complexes about how willing they are to deal with numbers. I mean, you can pull some of the statistics, like from international testing, about how comfortable people are interpreting graphs. And you would see something like 60-70% of Americans are not super comfortable analyzing a chart, and drawing correct conclusions from that. And that’s not verified, I mean it’s the education system, basically that’s a failure of a lot of different people, and it’s not the user, right? And so the idea is that I think and I believe that if you want to expand data literacy, you need to meet people where they are. And that’s why I pushed for doing stories.
25:38 WH: Like we did the one on The Bachelorette and The Bachelor, we went through all of the seasons of that television show, because it’s interesting for a number of reasons, it’s one of the few shows that year over year gains viewers on television in America, that is such a rare thing, gaining raw viewers. As a result that’s a fundamentally interesting television show, and I think that cracking the bottom of, “Hey, what makes this show tick? What makes this show interesting? What makes this show rise year after year after year?” And we were able to get at that by treating this television reality show with the same dignity and analytic fidelity as we would the NFL, or any of the other big television shows going on. And so I think that when you can use that opportunity to draw in an audience that would normally nowhere near go out of their way to find data journalism, I think that that’s a really effective way of doing so.
26:26 WH: And I think, just as a matter, of course, data is failing right now at reaching broader audiences because of the expertise that it seems require to break in. I’ll give you a good example. When I was in college and I was taking statistics programs, oftentimes they would look for sample data sets to dive on in on. And the classic data set that everybody will mess around with is usually a baseball dataset, both because there’s a robustness of data, there’s a legacy in baseball of analyzing statistics, and you can draw pretty clear conclusions from different things that affect other things in baseball, right? I think that sucks, ’cause baseball’s boring, and only appeals to older white guys.
27:07 MH: Easy there.
27:08 WH: And I think that if that’s what you’re going to use, then as a result then you’re going to disincentivize an entire generation of people from joining statistics and getting interested in statistics. And I think that as a result, so much of what can come out of the future of data analytics and all that kind of stuff, comes from interesting data sets about compelling things that reach more than one group of people, that don’t just prioritize the boys club that currently exists. And I think that that’s really what needs to happen next.
27:36 MH: Don’t you think it’s all gonna change after this season when LeBron declares that he’s gonna go play baseball, like Michael Jordan did?
27:42 WH: [27:42] ____ the truth.
27:42 MH: Sorry. Moe, you were about to make a much more not stupid point, I can tell.
27:48 MK: I wanted to talk a little bit about… You cover such a variety of topics. So, you talk about baseball to then The Bachelor, or whatever. With my own work, when I have to go from, say, the analytics data that I’m really familiar with, and then I have to go look at customer service data, which happens a lot in my work, I always have to go to the customer service analyst and be like, “Hey, can you just check I haven’t messed anything up here?” Because you have to put flags on, you have to remember to interpret it this way, there’s all these different caveats. How do you… Like whether it is baseball or The Bachelor, how do you get so familiar with the data that you don’t make this huge blundering mistake and publish it in some article?
28:29 MK: How do you learn enough about the dataset quickly enough? And I mean, is that a risk if you’re… If you’re talking about this example at university where you have these data sets, people getting up to speed quick enough to be able to interpret the data in the right way?
28:47 TW: Or even just, if I can extend it a little bit with Significant Digits and Numlock both, often you’re pulling a number from somebody else’s research, and how do you make sure you’re comfortable that whatever your interpretation, or twist, or context you’re providing is legit? I think that’s kinda two sides of the same question.
29:09 WH: Yeah, so first off, let’s talk about, when doing articles, which is I think, what your question was about to begin with. And the idea is that how do you get that facility? Well, at FiveThirtyEight, it was actually rather unlike many other newsrooms, in the sense that in other newsrooms, you’ll have interns who would collect the data, you’d have somebody who would analyze the data, who would hand that off to a writer, who would actually physically type out a story, and then there’s multiple people involved, and the writer hasn’t seen the data, and the data analyst hasn’t talked to the editor.
29:39 WH: And so at FiveThirtyEight, it’s vertically integrated, that whenever you had a story from me, it was usually me who collected the data. I analyzed the data. We have quantitative editors as well who we can come back and forth with, if we’re talking about methods and finding out what our blind spots are on certain things. And then it’s always a conversation with an editor, so it’s basically, it’s kind of like a chain of custody for evidence. That if you were involved in building the dataset, and analyzing it, and again, you were talking to the right people over the course of it, that’s the kind of secret crux-ness which you get at, which is like, “No, you always wanna talk to sources, you always wanna call somebody with familiarity over it.”
30:18 WH: I’ll give you a great example, is like, I liked writing about comic book numbers all the time, and there’s a really great resource, Comichron, which is operated by John Jackson Miller, who’s a really terrific writer. And essentially, it’s important to call him and be like, “Hey, can I draw this conclusion from this statistic?” Because then it’ll be like, “Oh, wait, but you also need to have the caveat that this doesn’t include digital sales.” And that’s the exact…
30:40 WH: And so, having regular conversations with the people who manage data and operate data, and compile that data, and oftentimes being that person yourself who compiles that data, that’s how you can avoid those blind spots. But those, you can’t always avoid them, those are like the corrections that you’ll see that pop up that are just like, “Oh, well this dataset turns out it didn’t have anything pre 1982.” And so you need to absolutely be open and honest about where you get the data from and make the data available to people so that they can get you on that later. But to go back to the second part of that question…
31:11 TW: And let me ask you, in that case, like if it’s comic book data, when you were writing a story on it, would you actually have that guy as a kind of a… Would you have him check your conclusions? Did he become of a pseudo editor? Or you just got to where you made sure you were comfortable?
31:29 WH: No, no. Sources don’t get to check the data. Sources don’t get to check, they don’t get approval on pieces. But it’s basically a conversation though, like I’ll give you another good example. Whenever we use box office numbers, we were pulling from this really wonderful dataset, it’s the data actually underlying the-numbers.com, it’s OpusData. And if we had a question of, “Hey, we wanna find out if we can draw this conclusion from this statistic,” we could email Bruce Nash, the guy who operates it, and ask, “Hey… ” And a lot of times these data curators are very, very keen on talking about the limitations and the abilities and all that kind of stuff with their data.
32:02 WH: So it’s very much just a conversation, they want the story to be just as right as you do, and so a lot of folks though have a fear of picking up that phone, and that’s one of the first things that got beat out of me when I came into journalism, which is there’s no fear of picking up that phone and making the call anymore. A lot of times, it’s just important to get that second opinion, and make sure that you’re doing everything with crossed I’s and dotted T’s. Which is not how this [32:25] ____ what are you gonna do? [chuckle]
32:29 TW: No, no. Well, yeah, but that’s actually that’s a great point. I mean, Moe, you said you would talk to the customer service analyst, but I definitely, I feel like I’ve seen that where, it is, I see a number, it seems obvious what that number means, it’s a whole system I’m not familiar with but I’m sure the number means X or Y, and saying, “I’m just gonna operate with that assumption.” As opposed to saying, “Hey, this is kind of fundamental to what I’m doing, I better pick up the phone or walk down the hall and actually ask the person, ‘Is this lead source thing in Salesforce.com? Is it really… Am I interpreting this correctly?'”
33:04 WH: Yeah. And you have to [33:06] ____ stories for that… I remember we had a really great piece about Pokemon Go when that was first going out, ’cause we had some really great data from, then operated by SurveyMonkey, but they had an app kind of dashboard of downloads and whatnot. And basically we had this fun chart that we were just like, “Oh, no, we need to really redo this chart because it turns out that weekend downloads are handled different than weekday… ” And a lot of times that’s just a… If I wasn’t on the line with SurveyMonkey half the time, then we would have had to issue a retraction, which is unfathomable. So the reality is, is that you have to always be in… Your sources should never be surprised to see the article drop, you always need to call them beforehand, and then subjects and all that kind of stuff, and if you’re doing that right, then you can avoid a lot of those blind spots.
33:45 WH: But to your second question about how do I guarantee the veracity of statistics that go into Numlock News, well the idea is that I have a shortlist of sources who are ride or die, like Bloomberg does fantastic with numbers, unimpeachable, they are one of my favorite sources to go to, they’re one of the news sites out there has a real top down mentality of, “Our clients want the numbers, our readers wanna see numbers, they want us to back up our work. We’re not just kind of off the cuff talking about finance, we have the data to back it up.” And so, Bloomberg has a green pass in my book, I believe a number when I see it printed on Bloomberg, just because of their reputation that they’ve built.
34:22 WH: Same thing with Quartz, same thing with the handful of other sources that are… And then another opportunity is you’ll have sources that aren’t so quite sure, maybe it’s a number that’s changing rapidly, maybe it’s a breaking news situation. And then, again, the key is just honesty, it’s be like, “The prosecutor alleges that this is what happened,” then you need to attribute them to who says it, you could say, “The estimate right now for… ” And “At press time,” and you really need to do very proper sourcing of where that figure comes from, so that it’s abundantly clear, “This is a number that you should be a little bit skeptical about.”
34:55 WH: It’s okay to hear numbers that you are skeptical about, because you just need to couch it on skepticism. So a lot of it comes down to how reliable has the source been in the past? How dynamic is the situation in general? And in general, it’s really just a gut check over, “Am I willing to stake a portion of my reputation as to the veracity of this claim?” You know?
35:18 TW: Which it sounds like a little bit, if you’re upfront about saying, “I’m a little skeptical of this number, but it’s the only number we have, maybe more numbers are coming in the future, which we may look back at this and say it was an early indicator of what we’ve now realized [35:31] ____ truth. Or we may say, “Nope, this was an anomaly, or this was poorly collected or poorly interpreted.” As long as you sort of couch it, it seems like the variety of data… Again, back on the business analytics world, if you see something that’s like crazy surprising, then the first thing to do is to figure out what you screwed up.
35:52 TW: If you still can’t figure that out, then you’re probably still communicating and saying, “Look, this totally surprised me. I did my due diligence to try to validate it. It’s holding up. I still have some skepticism, I’m partly sharing this ’cause I want somebody else to maybe explain to me why this is happening.”
36:11 WH: And if it’s an estimate, say it’s an estimate, if it’s a measurement, say it’s a measurement. If it’s a survey, say it’s a survey, if it’s a full census, say it’s a census. If you are honest about the source of this information, if it’s a prediction, then definitely couch, “This is a prediction, forward-looking results may vary.” And so a lot of times people miss that, and they hear “percent” and they think that all percents are made equal, right?
36:33 MH: Yeah. I think another thing in the business side that sometimes gets lost is there’s not a value placed on context and tribal knowledge of this person who’s curating or understanding this data deeply. And so sometimes you just lose some of that context, and therefore can’t do great analysis sometimes, because well, the guy that used to do that, he left two years ago and nobody’s bothered to even think about this data since. So the sources that you’ve built over the years are people who have been able to help you because they’re passionate about and stay with the data, and understand it deeply. And I think there’s something to be said, if you switch to a business context, of valuing that a little more than probably we do.
37:21 MH: Alright, this conversation is so good, and I like candy, and also ideas of where to film your movies so you can win an Oscar, and all of the different things.
37:31 MH: Part of the reason I’m so happy that you are on the show, Walt, is because you delve into so many different sort of… What might on the surface appear to be topics that don’t have a lot of analysis behind them, or reasons to do analysis. And I think that makes looking at data, thinking about data, more approachable and more acceptable. So, thank you very much, it’s been a pleasure having you on the show. Before we wind up though, one thing we do on the show is go around the horn and do a last call. And so I’d love to hear… Walt, you’re our guest, so we’ll let you go first.
38:04 WH: Oh boy. [chuckle]
38:07 TW: Well, actually, why don’t you tell us exactly where to find Numlock News, and then you can do your last call.
38:10 WH: Yeah, actually…
38:11 TW: If you are not already a subscriber, ’cause it’s been out for a couple of months, you definitely wanna go to walthickey.com/numlock-newsletter, subscribe to Walt’s newsletter.
38:22 WH: It’s even easier. If you go to www.numlock.news, it’ll take you right to that page.
38:26 MH: There you go, that takes you right there, thank you. And, you can also find Walt on twitter, @walthickey, so you should definitely follow him on Twitter, as well. Alright, last calls. Walt, what do you got?
38:40 WH: So I’ve been, again, starting up this newsletter, it has been quite a bit of fun. I have had the chance to talk to a lot of really interesting people over the course of doing this. And I think that my last call is essentially to sign up for more… A newsletter, just any newsletter. I think that the medium, I’ve been speaking to a lot of people about this, both on the sales side as well as the people who, there are other models for this kind of leap. And the idea is that I think that newsletters are actually a fundamentally interesting medium. I think that one of the primary trends in journalism recently has been publishers getting screwed over by the social media companies that claim to serve them. I think that if your audience can be taken away from you by one person in Google HQ or Facebook HQ, or a medium to twist some knob, then I think that that’s not your audience.
39:30 WH: And so I think that one thing that I’ve been really excited by, has been looking at newsletters as a medium for exchange and conversation, and essentially the decentralized nature of them, and the direct through line between you and the content creator. I think that they’re wonderful. And so I’ve had the chance to subscribe to a few, and I could open up my inbox and rattle off a few of them if you like, but that’ll take me a minute or two. But, I’ve come to really enjoy this as a passive medium, as a way of having direct line of communication, and as probably the best thing about email is not having to send an email. And so, I don’t know, that’s my pitch, just the concept of newsletters. [chuckle]
40:09 TW: How’s our Digital Analytics Power Hour Podcast newsletter work coming along?
40:14 MH: That’s a 2019 initiative, Tim, but thanks for…
40:17 MH: Maybe 2020.
40:22 MH: Moe, what’s your last call?
40:25 MK: I read an article, which was by Andy Raskin, I actually read it on LinkedIn, but I think it was circulating on Twitter too. And it was, Steve Jobs’ Secret for Eliciting Questions, Overheard at a San Francisco Cafe. But it talks about the concept of how good leaders get feedback from their staff. And there was something I just really liked, the whole approach to it about you don’t invite your people leaders, you invite the staff, and you basically… The tactic that apparently Steve was overheard saying, the veracity of that I can’t confirm, but the tactic was that you would go to individual people in the room and say, “Tell me what’s not working here? And then tell me what is working? And tell me what’s not working?” And so on and so forth, until eventually you can very quickly understand the key problems and the key successes of a business, without too much effort. So, yeah, I found that one really interesting.
41:19 MH: Nice. Alright, I’m gonna do mine before you, Tim, ’cause that way you can’t steal it from me.
41:26 TW: I seriously doubt I’m stealing one from you on this one. What do you got?
41:28 MH: I don’t know, what were you gonna do? No, I’m just kidding. Alright.
41:33 MH: Mine is… We’ve actually talked about him on the show before, Dr Augustine Fou, who does a lot of research in the area of ad fraud. And in May he published a state of ad fraud in Q2 of 2018. Really interesting read, definitely worth looking through and seeing how…
41:53 TW: It’s all fixed? It’s all fixed, we’re good to go? It’s nothing but humans clicking around the web now?
41:56 MH: Well, the good news is since [42:00] ____ it’s all gone away, we’re saved. Great job everybody, we can close up and go home.
42:06 MH: No, but I mean that so much money is being lost to ad fraud. And it’s really eye opening, so definitely worth a gander, take a look at and read through. You can try not to despair, but it will be alright.
42:20 TW: He was actually, I think might be the only remote speaker ever to present at Columbus Web Analytics Wednesday, a year or two ago. It is eye-opening. There is just no easy fix.
42:34 MH: Yeah.
42:35 TW: Yeah.
42:35 MH: Alright, Tim.
42:36 TW: So the cardinal sin is recommending a book that one hasn’t read, so this is gonna be more of like a, maybe a bishop sin, bishop comes before. That I’ve only read about a quarter of this book. But when I was supposed to have been online ordering a copy of the…
42:50 MH: You’re predicting that this will be [42:52] ____.
42:53 TW: I’m predicting?
42:54 MH: You’re just predicting…
42:55 TW: I’m thoroughly enjoying the first quarter of it. I just in often various other paths diverting myself and trying to read multiple books simultaneously. So we’ve had, recently we had Chad Sanderson on, this on… Moe did a little mini episode with him. A self-avowed Frequentist. We have had Elea, Dr Elea Feit, who is this, a Bayesian. We have had Matt Gershoff who says, “Yeah, I don’t get wrapped up in that.” I believe actually in Noise and Signal was the first I ever heard Bayesian versus Frequentist. I do not fully understand it, but when I was supposed to be buying another book, I wound up instead getting the Kindle Edition of, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy.
43:37 TW: It’s been out since 2012, by Sharon Bertsch McGrayne. I still am not super capable of really articulating Bayesian versus Frequentist. I’m slowly getting there. But it’s being a really fun read. It goes into stuff Alan Turing was doing, that through all the pop culture that Alan Turing has been in, there’s stuff in this book that I had no idea. So, it’s a fascinating account. I’m only up to the 1950s, but it goes all the way back to Thomas Bayes and plays it forward and how Bayesian thinking has morphed and been derided over time, and obviously Sharon McGrayne is a big fan of Bayesian thinking, but it’s an interesting read.
44:20 MK: Can I ask a follow-up question? Is it a book more in line with how Walt would’ve written it, or how a really boring historian would have written it?
44:31 TW: Much much more in the Walt. It’s definitely… It’s well written. It is very readable. It is not a dry book.
44:39 WH: Heck yeah.
44:39 WH: At the same time it’s got enough material in it that it’s not a massive page turner. So it’s in the direction of… Let’s say it’s directionally Walt, but it hasn’t achieved full Walt. How’s that?
44:52 MH: I think Walt, you need to write a book now [44:55] ____ proper analysis.
44:58 MK: We’ve set you up well.
45:00 MH: Yeah.
45:01 WH: Yeah, I gotta get on that one.
45:04 MH: Oh nice. Excellent.
45:05 TW: Oh, and the newsletter that I was thinking of that I’ve enjoyed quite a bit recently is the Ann Friedman weekly. That’s a very fun one.
45:10 MH: What’s the theme?
45:11 TW: It’s a, interesting stories that she’s seen from the week. It’s, again it’s very… It’s a quick read. It’s a Friday read. It’s a very good, going into your weekend if you were looking for stuff that you might have missed, or entertaining. I find it very enjoyable. But I am a recent convert to that. I believe Anna Barry-Jester from FiveThirtyEight recommended it to me, and I’ve been a big fan.
45:31 WH: Awesome.
45:32 MH: Well, if you have been listening and you are just brimming over with questions, or you’re like, “Why didn’t you let Moe talk more?” Or anything you wanna give feedback on the show, we’d love to hear from you.
45:46 MH: So, you can reach out to us in a number of ways. You can go through our Facebook page. You can also reach us on our website and also on the Measure Slack. Also, I mentioned before, definitely check out Walt’s numlock.news newsletter, and also on Twitter @walthickey on Twitter. We’d love to hear from you. Remember, if you’re a fan of the show and you like what you’ve heard, we wouldn’t mind it if you jumped over on to iTunes and subscribed and reviewed the show and rated it. For some reason that means something. Actually, I have been dying to ask Walt. Do you know what it means? Why do we ask people to do this?
46:27 MH: ‘Cause you’re familiar.
46:28 WH: Yeah. That gets you on the front page. Again, remember how we were talking about those big tech conglomerates that have the ability to pick winners and losers in the content space? Well, Apple, they elevate people in the iTunes store based on positive reviews, and you have a potential to get featured in your category if you get enough of those.
46:48 MH: Right, and if we had sponsors that would really mean something.
47:00 MH: Alright, so Walt, it has been a pleasure. Thank you so much for being on the show.
47:02 TW: Thank you for having me. This was a blast.
47:05 MH: I know I speak for my two co-hosts, Moe and Tim, saying to all of you out there, keep analyzing.
47:16 VO: Thanks for listening and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour or @analysticshour on Twitter.
47:35 S?: So smart guys want to fit in, so they made up a term called analytics. Analytics don’t work.
47:44 S?: How did you get started with this podcast?
47:46 MH: Well, we argue a lot. We do drink and swear on this show. We love our explicit rating on iTunes. It sets us apart from the other analytics podcast, so please don’t feel any hesitancy at all. And also this would go really smoothly, because I’ve already been drinking for awhile.
48:07 MH: I didn’t want you to get the show started and Walt you’d be like “What do these guys do?”
48:13 TW: Oh, are you guaranteeing that he’s not gonna do that?
48:17 MH: Not for that reason, how about?
48:19 TW: Okay.
48:21 MK: Oh Michael, you can’t say that.
48:23 MH: Why can I not say… I can’t say that?
48:25 MH: No.
48:26 MH: I have to bleep that out?
48:27 TW: Wow. Wow. Really?
48:29 MK: Yes. I’ve never been to the Grand Canyon and now I’m like it’s such a pain in the ass.
48:38 TW: We’re not allowed to talk about it too.
48:42 MK: You never know. What if I wanna change careers again?
48:44 MH: Oh no, you’re in this.
48:46 TW: Well, I’m afraid your participation on this podcast is pretty much gonna torpedo any chances for that.
48:54 MH: Rock, flag, and data journalism.