How accurate is your data? How accurate is any of our data? If our data is more accurate, will we make better decisions? How MUCH better? Why do the show blurbs of late have so many questions? THAT is a question we can ACCURATELY answer: because the shows grapple with challenging questions! On this episode, Snowplow co-founder Yali Sassoon joined us to chat about the nuts and bolts of data accuracy: the inherent messiness of client-side tracking (but, also, the limitations of server-side tracking), strategies of incrementally improving data accuracy (and the costs therein), and the different types of scenarios where different aspects of data accuracy matter in different ways! Pour yourself a drink (a 2 oz. shot of a fine Scotch will do… which would be 59.1471 ml if you want an accurate and precise metric pour), settle in, and give it a listen!
Links Mentioned on the Episode
- Yali Sassoon
- (Article) How server-side tracking fills holes in your data and improves your analytics
- (Article) The Schema Conspiracy
- Quantum Computing for the Very Curious
- (Video) Ted Talk: Sleep is Your Superpower (Matt Walker)
- (Article) I watched over 100 TED Talks, and these insights have stuck with me the most
- (Article) Why You Want a Mechanical Keyboard
- Moe’s take on mechanical keyboards
- Measure Slack
00:04 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website analyticshour.io. And now The Digital Analytics Power Hour.
00:27 Michael Helbling: Hi everyone, welcome to the Digital Analytics Power Hour, this is episode 129. Imagine you’re joining a new company and you’re tasked with taking their digital analytics program to the heights that the chief marketing officer desires. Namely predictive analytics and something, something, AI, ML, what have you. What to do? You immediately dig in and realize that your predecessors have left you with something of a mess when it comes to the current behavioral data collection. It is both inaccurate and broken in many places. And there are obvious and important aspects of functionality that simply have no data collection at all. It is almost as if most of us spend 70% of our jobs just patching up holes in our data, or attempting to produce insights and recommendations while ingeniously navigating the pot holes and dead zones that are events, eVars and etcetera. Tim, as the quintessential analyst, I’m sure you have found a total workaround and are not bothered by these issues.
01:31 Tim Wilson: Let’s put those issues aside and let’s talk about the accuracy of your 70% statistic of how many… How we spend our time. I question the accuracy of that.
01:40 MH: It’s been proven by data science.
01:43 TW: Okay, well. [chuckle]
01:43 MH: And data scientists have spent 85% of their time engineering that data to give that prediction. So there you go.
01:51 TW: Okay, then we’re covered.
01:52 MH: Yeah. We’re totally.
01:53 TW: Perfect. I do the same thing with actual behavioral data.
01:55 MH: There you go. Alright. And Moe, does this ring true for you?
02:01 Moe Kiss: Yeah. It sounds like the bane of my existence other than mechanical keyboards which are much worse than data accuracy.
02:10 MH: Yeah. That sounds like an issue that will be a top concern for the podcast going forward. Alright, and I’m Michael Helbling, and finally a show topic I have more than a passing knowledge about. But we wanted a deeper discussion, so we turned to our guest. Yali Sassoon is the co-founder and chief product officer at Snowplow Analytics. Prior to that, he’s held strategy, marketing and analytics roles at companies like PwC, Kepler LLP and OpenX and more. Today he is our guest, welcome to the show Yali.
02:44 Yali Sassoon: Thank you, Michael.
02:46 MH: Alright, so let’s kick it off. And just maybe to help level set this conversation and a sort of like, “Well, why the heck is this guy on the show?” Talk to us a little bit about Snowplow Analytics, what it’s for, what it does.
02:58 YS: So Snowplow is a digital data collection platform. So companies run our technology to collect similar sort of data that they might collect from Adobe or Google. So web data, mobile data, but also potentially customer data from other channels like support desk or email or online chatbots and those sorts of things. But unlike the Googles and Adobes of the world, we focus exclusively on the data collection piece. So we see our jobs as really delivering the data into the places that analysts and scientists and other people can take and use in the different ways that they want. So into the data warehouse, into potentially a real-time stream or a unified log so you can have applications that are consuming the data. But we’re… Just to double down, we’re focused very much on the data collection side of things, and so things like data accuracy and data completeness are pre-occupations of ours. And I spent a lot of my time thinking about both those things.
04:00 MH: Nice.
04:01 MK: So you mentioned that data accuracy is a topic that you’re a little bit obsessed with. As am I. Well, I pretty much think that any analyst or person working in analytics spends a lot of time thinking about this topic. What does data accuracy mean to you?
04:18 YS: Oh, that’s a good question.
04:21 MK: I like to start big, you know.
04:21 YS: Look, data accuracy is… It’s to what extent does the data reflect the reality that it is supposed to describe. So when we’re collecting data, we’re fundamentally building a record of things that have happened. And typically, we wanna analyze that data to figure out why what happened happened, so that we can use that knowledge to change the future. And so for me, data accuracy is about the gap between what the data tells us happened and what really happened. And I guess the fundamental challenge with data accuracy is, in a typical organization, the data is the source of truth. And so figuring out what that gap is between what the data says happened and what really happened is just a fundamentally really hard… Is a really hard problem to solve because… Like, it’s how do you measure how accurate a ruler is. That’s the fundamental challenge.
05:20 TW: But is there… I feel like there’s a lot of confusion. Accuracy, seems like one of those things that’s such a simple idea, and as you just defined it as accuracy is how well is the data actually reflecting reality. So we’ve recently talked a lot about ITP and we know ad blockers and so there’s a lot of things where we say we can point to where in absolute terms, the data is not perfectly reflecting reality. It seems like there’s another layer of how accurate does it need to be that generally, I think, the curve of the cost of getting more accurate data, getting from 90% to 95% accurate is a lot cheaper than getting from 95% to 96% accurate. And there’s the source of truth. Are we accurately reflecting reality? But then there’s also so many questions. Is the data good enough for us to make the right decision enough of the time at a low enough cost that it’s worthwhile? And that starts to become a pretty squishy kind of topic, right?
06:00 YS: I think that’s a really nice way to look at it. So I think you’re absolutely right. There’s a sense in which accuracy is kind of you wanna talk about in absolute terms. You wanna be like, “This is reality and this is what the data says and this is where the data doesn’t say anything or this is the bit the data misses, and this is the bit where the data says something happened but really something else happened and there’s that gap.” And you’re absolutely right. For a lot of what we do, we should be really comfortable with that gap. Like, the whole discipline of statistics was born to let scientists make validated statements about big groups of things, about populations based on measurements of just a sample of those populations. And in digital analytics, it’s… There are a whole bunch of analyses where we don’t need anywhere close to levels of 100% accuracy. So, if you’re interested in overall traffic patterns and big groups of things and the behaviors of big groups of people, you really don’t need a very, very accurate data set at all. Or you can have a very, very accurate data set about a sample set of those users and then make pretty reliable inferences across the whole group.
07:38 YS: I think the challenge is that in the last few years in digital, people are doing more and more with the data. So we’ve moved well beyond a world where digital analysts were just talking about overall traffic patterns and which marketing channel is driving up visits to the website and how is our conversion rate doing, and we’re starting to segment users into smaller and smaller groups, and as we’re doing that, the need for accuracy starts to climb. And in the limit, if you’re doing stuff with personalization and you’re really trying to do things like serve a particular user something relevant based on what you know about them, if half of what you should know about them is missing, you’re not gonna serve that person a relevant recommendation. So, you’re totally right. We’ve got to tie the question of accuracy to what we’re doing with the data, and we’ve got to tie the cost to the benefits of answering those questions accurately. And it might be that we can’t afford the level of accuracy or we can’t justify the return on a really, really high level of accuracy.
08:38 MK: I do want to touch on that use case that you just talked about because I think that’s really incredible of digging into smaller and smaller segments of our user base. But before I do that, I just wanna rewind a little bit because I think one of the things that anyone client-side really struggles with, and it seems to happen more if you have more than one tool where when you have multiple tools then it becomes the data is not the source of truth, it’s like one tool is maybe more than the other. But I suppose I’m struggling with, does the business get that it doesn’t have to be, like, or that 90% accurate might be good enough? The most common question I get, someone will be looking at something, they’ll screenshot some numbers or send me a report and be like, “Can I trust this data?” That is like legitimately the most common thing that gets asked.
09:31 MH: You should just have an auto responder that just says, “Absolutely. Go for it.”
09:35 TW: Well, but there’s the history, right? And that we went… That was one of the promises of digital that was an over-promise was, “Hey, everybody’s moving to digital. Now, we can track everything.” And that was a fundamental misstep 10-15 years ago that we didn’t kinda nip that in the bud and didn’t say time out a little bit on that. And I think the other thing, “Can I trust this number?” When those systems exist, multiple systems exist, a lot of times they’re stuck together. Like, there’s a… Your digital analytics platform takes through the form completion, and now you’re into your CRM or into your fulfillment system, so you have this kind of Venn diagram of overlaps where two systems are measuring something pretty similar and if somebody says, “Well, yeah, but which one’s right?” Then you have to have the discussion of saying, “They’re both right. They’re not measuring exactly the same thing.” And boy, are people not really… They don’t wanna hear that as an answer. They want to hear which one’s right. But that question was probably more directed to Yali and I couldn’t stand myself.
10:47 YS: I agree with everything you said, Tim. I add to that, that people want to pick the system. They wanna say, “One of them is right and one of them is wrong.” People are really uncomfortable with two systems saying different things and saying that both of them are right in their own way. And related to that, they’re really uncomfortable with the idea of precision and this being right to this extent or this being right or right enough for this set of answers. And there’s something wrong about data culture. I think we in digital analytics did let ourselves get carried away by our own hype 10, 15 years ago but more broadly in data. So, before I was in digital data, I was working in sort of financial data and working at PwC, and there, there was very much this idea of data being right and data being wrong. And if you presented a slide and there were some numbers on it, six analysts in the room would immediately check that against other numbers that meant the same thing and scream at you if there were any inconsistencies. So there’s something broader about data and data culture which is very binary about, “I can trust these numbers or I can’t trust these numbers.” And we actually just need to… I think that desperately needs to change. I don’t know how we’ll change it.
12:01 TW: I mean, counting the number of votes of who the best picture for the Oscars were, that was a PwC thing, right? Wasn’t it? Counting Academy Award votes? Sorry.
12:11 YS: I wasn’t on the project.
12:14 MK: I tend to find with my stakeholders, my advice is often around, if you’re looking at something, it’s generally week on week, month on month, year on year. If you’re comparing this month to last month in the same tool, then that is 100% the best way to look at it. If you are trying to, I don’t know, yeah, compare, a CRM tool to this other tool and then you’re comparing different time periods, I’m like, “That’s where you’re gonna get in a hot mess.” So, I don’t mind which platform you use, just make sure that when you’re comparing stuff, you’re using the same platform. And that seems to be partially helpful.
12:54 TW: But aren’t you hitting on that stuff, say, “Is this number… Can I trust this number?” And what you’re saying is, “Well, how are you going to use it?” And I think that’s actually pretty key. Like, it’ll get asked… And they’re not asking out of malice like they’re trying to hide it from you, they think they’re doing this thing, they’re not sure about this so they ask you, “Is this number… Can I trust it?” And you’re kinda calling out, you have to say, “Well, give me a little more context, broaden out what you’re using it for. If you’re comparing your website traffic to your ad serving platform’s clicks, well we need to have that discussion. You can trust the number but this is likely what you’re gonna see and here’s why and I’m sorry, you have to have a little more nuanced understanding about the data.” So I think you just hit on something kinda key there.
13:40 MK: Yeah, and I think the thing is, it’s like, everyone is ultimately from a stakeholder perspective, they’re trying to avoid that situation that Yali just mentioned of standing up in front of a bunch of people with a number on a slide and someone else being like, “Actually, that’s not accurate. I’ve seen a different number somewhere else.” And I mean, that’s where that question is coming from. And so, I really do, I guess, sympathize with the stakeholder in that circumstance, I think the thing is, and I don’t know how… Someone said this to me the other day about like, “Is this right?” I’m like I don’t think about something as being right or wrong, we just have to accept that it’s the best of what we know or the best of what we have. I don’t know if there’s a better way, I guess, to frame that and to get… I mean, we found this the other day. So we have an event that’s firing and it’s probably the most important event to us as a business, and it’s not getting captured a bunch of times for various reasons, which we’re fixing at the moment. And everyone’s like, “So, sorry, for the last five years, all our numbers are wrong.” I’m like, “No, for the last five years, they were as good as they could be. Now, we’ve seen that there’s a better way to do it and we’re gonna implement that.” But I still don’t know how to get everyone comfortable with data isn’t perfect. It’s never going to be perfect.
15:00 YS: I think part of the solution is at least educating people that data quality is not something that you get for free and it’s not something that you fix and then you have. It’s an ongoing investment and as a organization if you are investing in fancy analytics tools, if you’re investing in data warehouses, if you’re investing in real-time anything, you also need to be investing in data quality. You need to be putting in place things to make sure your tracking is set up right, that your event collection is set up right. All those human things that we like to… When we talk about data quality, talk about things like ITP and stuff going on in the browser that put fundamental limitations on the accuracy but actually, most of the biggest use that I see coming up in companies are related to humans making mistakes, which is what humans do, and that isn’t… We’ve gotta move to a world where that is the norm, and actually, we’re just putting in place processes and systems so that we catch those mistakes quickly and address them as fast as possible so our data can be as good as possible. And I really like that expression where the numbers are as good as they can be. And if we carry on investing, we can either keep them at that level or maybe improve the level, but it can’t be that binary: The numbers are right or the numbers are wrong.
16:18 MK: I think analysts don’t even realize though that the power that they have in data culture and… Like, I’ve talked to my own team about this where they’ll be like, “Oh, this platform sucks.” Or “This AB testing tool has a really shit methodology.” And I’m like, “You can’t go around saying that in front of all the stakeholders because that’s what they’ve got.” Like, that is what they’ve got. So unless you’re prepared to rip it out and give them something better, you need to be the person that’s helping educate them about how to best use it because, I mean, for the stakeholder you can’t be like, “What you’ve got sucks but I’m not gonna give you anything else.” That’s a really shitty situation to be in. But I don’t think analysts realize the stuff that they’re saying and the weight that it carries in the business, where people will… Can very quickly completely lose, distrust in some sort of platform because of a few flippant comments in an open plan office, you know.
17:15 YS: And once that trust is gone, it’s gone.
17:18 MH: Yeah, it’s not coming back.
17:19 YS: You’re never gonna rebuild that, yeah.
17:21 MH: So, it’s interesting ’cause it’s sort of a weird dichotomy, right? Where you have to sort of accept some amount of, I don’t know, entropy or something in your data. At the same time, you have to have an alertness and a set of processes and governance in place to ensure appropriate and correct data collection. And so, this concept of an always-on process for data, Yali, that you mentioned, I think I completely agree with. We were very dis-served as analysts and as an industry by the concept of a IT project-type implementation, where you go do it and then you put it on the shelf and you never come back and look at it again. And it’s much different than that, it is an always-on process, now, it has to be. I’ve made a great living going back and forth to companies where they do the project, they walk away, and then four years later, they’re like, “Okay, we’ll just do it all again.” And it’s sort of like, “What have you been doing in the meantime?” “Well, mostly just not trusting any of it.” It’s like, “Oh.”
18:27 TW: Cool.
18:28 MH: Yeah, that there’s gotta be a better way. I don’t know, that’s not really a question, but I think that point is one that I just wish more people could grab on to. And Moe, your point about analysts kind of needing to take ownership of that and not kind of throw the baby out with the bath water. A lot of folks don’t realize the damage they do to their own credibility by saying that stuff out loud, and that’s a problem, too, because it’s not good sometimes.
18:49 TW: I feel like there’s a flip side. I mean, I think analysts are equally likely to even subconsciously not want to acknowledge the limitations of the data they have for that very reason. They want to trust and believe that the data is fine, even if they read… And ITP is a good example, because on the one hand, you’ve got people who are analysts who kind of don’t really wanna understand what’s going on because it’s messy and complicated and it’s not easy to solve. And they don’t wanna stand up and say, “Oh, by the way we have this other thing that the world is different now than it was a year ago, and to maintain a similar level of accuracy, we’re going to need to invest.” And that’s kind of unpleasant. But at the same time, there is also the tendency to say the sky is falling, without even going and looking at the volume of Safari traffic to your site. So I do feel like there is a lack of nuanced thinking. And it seems like even on the accuracy, there’s the monitoring and the testing side, but there’s also kind of the logging side. I’ve never seen anyone say let’s just write down what we know, our limitations.
20:07 TW: Every time that I go to a site and I’m trying to look at what’s firing for Google Analytics and nothing’s firing, and then I realize that I’m running Ghostery and the default setting is to block Google Analytics. And I’m like, “Son of a bitch! Okay.” So let me turn that off and fire it again. We should have a list. We don’t necessarily need to go out and blast it out to our stakeholders of, “These are all the things that are fucking with our data.” But have we as analysts even sat down and said for our site, “These are the things, these are what it would take to mitigate, this is roughly the scale of them, stuff we can control, stuff we can’t control”? Stuff we can control could have testing, stuff we can’t control is ad blockers, potentially.
20:46 MH: Yeah, and here’s our estimate of the impact of that, right? Or something around that.
20:50 TW: Just internally within the analytics team.
20:52 MH: Yeah.
20:53 YS: I think you can get pretty precise about the impact of some of these things. So a really nice approach, and I’m always surprised that more companies don’t adopt this is just a instrument server-side tracking alongside client-side tracking. So client-side tracking, your… It’s a wild west. Like you’ve got no control over the browser, there could be any number of plugins that are running, they could be playing all kinds of havoc with the data, the person in the browser might not even be a person, it might be a bot. So it’s a total free-for-all. But server-side tracking, you really… That really is accurate. If you’re tracking things like transactions or anything on the website or the mobile app that hits the server, you can track that with close to 100% accuracy.
21:35 TW: But you struggle to filter the bots. That’s gonna be… But that’ll be an upper bound right?
21:40 YS: That doesn’t solve the bots thing, that’s true.
21:42 MH: Yeah. Bots created an artifact on servers as well. Yeah.
21:45 YS: But you can then compare the numbers from the two and you can see, “Oh we’ve got this percentage of users that we see server-side and we see nothing from them client-side, so all those people that… That’s kind of a minimum bound on the number of people that are on our site, that are blocking our tracking.” So we’re in a pretty good place because then we’ve got a sample size, a sample set, to your statistics point earlier, and we know what percentage of our population roughly. It might be a slightly smaller percentage if they’re a load of users on the website that are just doing things that wouldn’t hit a server. But we’ve got then a pretty good idea of what our sample size is relative to the population size. And that’s a great starting point for most analyses that we need to perform.
22:28 MH: Well and I also think, since you brought up the word transactions, I think that’s a great one to work around a little. Because it is a very good example of where you need higher or lower levels of context in terms of what you’re trying to do. Because if you’re just trying to see like well, what are my biggest closing marketing channels or those kinds of things in aggregate, the reporting is just fine, but as you step into the minutiae or try to work out your promotion strategy for particular segments of customers, you have to go outside of the digital data collection anyway. Because you have to understand things about the customer and their activities. Like I don’t wanna give massive promotions to someone who’s also returning products at a crazy rate. So how do I get that? So you’ve gotta go in and integrate those things. And digital analytics tools are not great at doing that in their reporting interface. Which is why almost every, what I’ll say, sophisticated implementation does today is extrapolate that into some other data store to integrate with other systems and then resurface into other things.
23:34 MH: And you actually see Google, Adobe, others all responding to that by building data analysis tools that actually incorporate other data sets and sources now. And so, it used to be… You’d had to get your data out. So Google BigQuery, or the data feed or Snowplow is kinda nice because you’re not beholden to any of those platforms. So you can kinda start with just the data and then move wherever. But that integration necessity only becomes more and more obvious but you don’t have to run all of your reporting that way. It’s just you have to understand the levels of granularity that you have to get to. And I think that’s… People don’t really understand how to kind of think through that model and then make the right choices. Because there’s a lot of cost to getting your data clean enough to go into that level of granularity of like, “Well, what are my customer set that returns products too much, versus sort of how’s my overall marketing channel performance look?”
24:33 YS: And I think at that moment where you get your data, either out of the analytics platform, your digital data, so you can integrate it with other data sets. So you bring the other data sets in, I’m much more familiar with the former than the latter approach. That’s where a lot of these consistency issues raise their heads. Because suddenly you’ve got the data set alongside each other and you’re actually joining it to perform your analysis. That’s also where we have the opportunity to start triangulating and stop sizing up these quality issues. If you’ve got your transactions recorded client-side and server-side, and you’ve got some financial system that’s a real source of truth, now you can make that measurement, and hopefully your financial system agrees with the server-side tracking and there’s some discrepancy with your client-side tracking, you can measure that discrepancy quite precisely, and you’ve got some notion of accuracy. And I’ve even seen examples where companies have got… Snowplow set up alongside a Google or an Adobe and they’re reconciling those numbers against each other which is a very time-intensive and not very pleasant process.
25:39 MH: Yeah.
25:40 YS: But you can literally do it cookie-by-cookie, like what are the cookie IDs that we see and…
25:45 MK: Very time-intensive.
25:47 TW: Well, and I almost… I think the one time that I really went deep on that with a couple of really sharp guys, and it was just was over a decade ago, and it was server logs with Web Print’s page tag logs and it was super, super informative. I recommend everybody go through one of those exercises somewhere in their life if you’re an analyst because… Yeah, yeah, just once. And then you’ll say, “I’ll never do that again.” But you start to say, “Okay, this is a big class that’s a difference. Understand or put it aside, and then ultimately you wind up with and we have no idea what’s going on with these but you know what, we found all these other ones that we now understand, this is complicated and nuanced.
26:28 TW: But, if you go through that sort of exercise… If that data is coming in, as you guys were talking about bringing the data in, that tends to be back at that user transaction level, does that start to give you a scale for where the misses are kinda? Yali, when you were talking earlier, putting aside the reporting and the counting and the aggregated and think about the personalization, the data science, the machine learning, which of these users are we gonna have incomplete or inaccurate data that we’re likely to not make the next best offer? It seems like you could use that reconciliation to say, “You know what, for 10% of our users, we can’t get all 15 data fields.” We’re only gonna get four, which means we’re higher up in the classification tree, we’re still gonna do our best shot because it seems like it’s more likely to be incomplete data than inaccurate data. And you’re like, “What’s the scale of the ones that we’re not able to do our ideal personalization?” So those seem like they could be related.
27:33 YS: I think so. I think there are a couple of really interesting points to bring out there. So, on the personalization points, I think one of the things that companies would be better off if they do want to go down the personalization route is spending more time investing in data collection and less time investing in sophisticated algorithms. So there’s people love to buy and experiment with expensive algorithms, but you’re likely to have a bigger impact on your recommendation engine by doing boring things like making sure you’re pulling together all your data about this user and running that through the algorithm and tweaking the algorithm, and then making sure the place where you serve the recommendation is one which makes sense from the user experience and just AB testing that, than actually spending loads of time faffing about with different algorithms. And I think the same is true on the attribution side. If you’ve got money to invest, you’ll see a bigger return on investing and getting your data collection in order and putting in place that ongoing investment in the data collection than you would in investing in sexy tech that does sexy stuff with the data that actually, the accuracy of the data doesn’t really support currently.
28:45 MH: Alright, the show is going great so far but we need to talk about one of our sponsors here in the multi-touch moment. Hey Josh, you know how there’s certain business questions that seem to crop up over and over again and on and on it goes? It’s just a well-beaten path, if you know what I’m saying. And that question is, what are the top paths on our website? You know how your colleagues are always asking that?
29:08 Josh: Yeah, I usually just fake an illness when I get that one.
29:12 MH: Yeah. Well, good news. It’s now easier than ever to find the answer to this timeless time suck of a question with the new top paths add-on for GA. So, it’s gonna parse through your user data, and surface the top paths browsing your website. Don’t worry about Markov chains anymore, Mark of Buttzville New Jersey is our top osteopath on the site this week. Or forget about path tracing. Tracy of Greasy Corner, Arizona was the top naturopath on the site yesterday.
29:44 Josh: Man, forget the CCPA. Stephanie Gray is the top sociopath on our site this week.
29:51 MH: Sociopath, exactly. I mean, it’s just time to just sort of know. If you need to find an allopath or a telepath or a homeopath.
30:03 Josh: Even a pathological liar.
30:05 MH: Exactly, it’s never been easier to befuddle your stakeholders with useless path information. So take the happy path and install our new add-on today.
30:18 MK: Okay, I just wanna pull this back a minute and make it really practical for, say, an analyst sitting in a company doing exactly this right now. So, step one, you have your analytics tool, you start doing comparisons to your backend events or your own data warehouse or your own… Whatever your internal source of truth is so that you can understand the discrepancies. Then you go through one-by-one, you start fixing all of those things. That’s great, you start building confidence in whatever the platform is. The bit that I struggle with is the incomplete data because it’s really easy to compare different numbers and be like, “Okay, why are these things happening?” It’s much more difficult to find out what you don’t have. So, do you wanna talk me a little bit through, just hypothetically, how you might tackle that?
31:11 YS: Oh, that’s a really… That’s a really, really good question. I think the best you can hope for is identify a set of known unknowns. So, a set of cases where you know you’re missing data but you’re always going to have to accept that there are other unknowns that you might not know, that you might not know about. So, starting with your investment in data quality, so you’ve got some number or some report that you have and you wanna build confidence, you wanna build assurance that that number that you’re presenting is right and that you’re not missing anything in that number that you’re reporting. And so, that number is a metric and that’s computed in some way, and it’s computed on some underlying data, event data that’s collected in some system or another. And so, building confidence that the final number that you’re reporting is accurate, and is complete, I think means going down into each different step in generating that number and thinking how much confidence do I have in that part of the process.
33:34 YS: So engineers have this idea that whenever you release a new version of an application, there are key user workflows that you wanna make sure aren’t accidentally broken, because you wanna be constantly improving your application all the time. And so, there are test suites that will simulate if you’re an online retailer, a shopper landing on the website and going through and adding items to baskets and then purchasing, and each time, at a modern e-comm company, each time a new version of the website is published, hundreds of those tests will be run on every different combination of browsers and different devices and so on to make sure the experience isn’t accidentally broken for one of those users, one of those browsers or one of those devices.
34:16 YS: So I think the analyst taking that approach and saying, “Well, can we write automated tests, so when a simulated user is going through these journeys we’re also checking that the data that’s collected for those simulated users is accurate and exactly describes the journey that that user has gone through?” That’s really powerful, and I think that would catch a huge number of tracking errors, because most of the issues we see with companies with data collection, pre-data ITP and pre-data, all that stuff, it’s development teams and implementation engineers, and sometimes the analysts in GTM or whatever that has made some mistake with the tracking set up, or done something unrelated to tracking that’s inadvertently broken the tracking and that’s only been discovered much later.
35:01 TW: But that is the value proposition of an ObservePoint or a DataTrue or a QA2L, I think, right? Like we’re talking…
35:09 YS: I’m not so familiar with those solutions. I can’t re-talk to them. My understanding is they do some sort of tag auditing, so they crawl your website and they check the tags firing on each site, which is a different thing to running tests where you’re simulating users doing different things and you’re checking the data that is emitted when the users do those things against those tests in a formal way, might be they do that.
35:35 TW: And they do both. But it’s like… To me, if you’re running… Let’s take Adobe Analytics because that’s where you may have 30 different variables and you can define the common user path and those tools have the ability to say define that path, but even while it can check a ton of different browsers and operating systems, that’s still kind of one simulated user. A lot of times it does have to go through, you really care the most about the post-authentication or the post-credit card or the post-transaction. So now you’re saying, we gotta do something all the way through a transaction, we don’t wanna inadvertently record a false transaction and oh, by the way if we’re changing the user experience which is another risk of where the tracking will be broken, we have to go update those scripts before we can really run them, or if we don’t, they may break. And I know they’re more just… I’m blanking on the names of some of those simulation platforms but I know I’ve talked about them with QA people.
36:38 TW: It seems like that is a really big lift and it’s still gonna cover a fairly narrow view with, say, Adobe Analytics, you may say, “Yeah, the tag’s still firing and I’m still capturing the page name and these three variables. But you know what, eVar 22, that’s what got broken, because of the logic here, here and here didn’t populate.” So I’m then kind of stuck either saying, “Let me script exactly one scenario and populate all of those values or I only check for kind of the key ones.” I’ve never personally had that actually be put in at scale in a way that works, even though I’d love to have it.
37:20 MH: I feel like when you start processing this problem at scale, like, those tools and models struggle ’cause there’s so many different ways that somebody could access sort of a web application or a website, and so it’s really hard. Whereas at your company, Tim, you know, you’ve built these validation engines that actually can run in real time on the data collection to validate type, and those kinds of things. And I think, Yali, Snowplow does something similar. So, there’s these ways to actually look at the data as it’s flowing through and saying like, “Was that a number? That wasn’t a number, that was a string. Okay, something’s wrong here.”
38:00 TW: But that’s more checking as the data’s coming in as opposed to actually going out and triggering the data. Are those… Or no.
38:04 MH: Yeah, because… Your website offers lots of people to go test that data for you in real time. And so, it’s not like they’re stopping visiting your website. So you can go crawl the site and do those user journeys and do the automation testing and I think… And that’s good, but I think there’s another side of that which is actually watch as the data flows in and say, “What am I catching?” It’s sort of like those cool machines you see that knock the green tomatoes out of the process so that only the red tomatoes go through. And it’s amazing ’cause it’s sort of like there’s just these little paddles that you’re just like, “How did they tell that was a green tomato?” But anyways, we’ll tweet again if we can.
38:44 TW: Yeah, yeah. And I’m definitely a fan of the monitoring the data to see if it stays normal and I guess… But Yali, where… The tools you’re thinking of, where do those fit? How does that work?
38:55 YS: So, I think they… I think you need to do both. I think automated testing is part of the solution. And the idea with that is you’re trying to catch errors before they ever get pushed in production but they’re only gonna catch a certain percentage of the issues, as you both quite rightly said. You’re never gonna write tests for every single scenario. So there will always be some things that can slip in. And then validating the data as it’s processed is a second approach. And the more prescriptive you can be about the structure and the shape of the data, the stricter you can be with the validation. And if you’re monitoring it in real time and if you’re flagging data quality issues in real-time, you can hopefully fix them as they come up or pretty soon after they come up, which is a much better situation than being an analyst going to the business saying, “Hey look, I found this really interesting thing in the numbers. Did you know this metric’s up or this metric’s down?”, only to deep dive on that and have to report back to the business three weeks later that it’s because some new version of the mobile app went out where something was broken and as that rolled out, the metric went up or down and then as that was fixed the metric returned to where it was before.
40:09 YS: I think there’s a third… So if we think about that monitoring on the stream, there are two things to think about. One is validating each event, each field, each type. Is the value in a sensible range? Like, is the price a positive number? Is it a negative number? That sort of thing. But then there’s also looking at the overall flow of traffic and just saying like, “Are there any anomalies here?”, looking at the data at a more aggregate level. Because that might suggest that there’s a… Either there’s a data collection issue or that something really, really interesting is happening and we should investigate that straightaway.
40:45 MK: So what about definitional differences? So I sometimes think like companies actually make it hard for themselves because for whatever reason they choose to define metrics differently. So whether it’s definition of a session or an active user or whatever the case may be…
41:04 TW: They choose to define it differently from who?
41:06 MK: From whatever the tool is that they’re using.
41:07 TW: Okay.
41:09 MK: So, I don’t know. Like, one example, choosing that a session is 24 hours of being in activity instead of 30 minutes to four hours, for example. You, as the analyst, are then always gonna be in this position of things never matching and you know why. I don’t know, do you just accept that? Do you just continue to explain like, “And this is why they don’t match up. And that’s okay as long as you’re doing comparisons within this tool and not other… “
41:38 YS: I think that there are two bits with the definition. So there’s the definition of how the data is collected. So if we’re gonna track a transaction, what’s the point that we track the transaction. Is it when the button’s clicked? Is it when the thank you page is shown? Is it when the transaction is cleared? And then there’s the harder thing, which is even with the inconsistency in potentially how we’re collecting this data from different places, how are we then computing the metrics or sessions or revenue or unique visitors or whatever that thing is. I think definitions are so, so important. You can’t even begin to talk about accuracy unless you’ve got a clear set of… Unless you’ve finished having the discussion about which definition is the one that you as a company are gonna commit to. And my personal feeling, this is pretty niche, but, is that the classics like uniques and sessions are so problematic that I think companies would be best off jettisoning them altogether and just being really clear what decisions they’re making and coming up with their own set of metrics that are really well understood for measuring them.
42:47 YS: So one of the odd things about coming to digital analytics having worked with data from other disciplines, was the sort of the analysts take the metrics that the tool presents, whereas in like other data disciplines you’re just working in Excel and it’s up to you which Excel formula you stick in. And that’s… Your metric is whatever the Excel function is that you’ve stuck in that cell. And we… In digital analytics community, we’ve totally confused everything by coming up with one set of words that different vendors apply in different ways and then not putting the time and attention into tracking. So actually, they’re not even computing on the same set of underlying events in different cases. Sorry, that was a total ramble. I can’t remember if I answered your question, I think.
43:31 MK: No, I think that was a useful ramble. And it also… I suppose it gives me confidence that maybe the best solution is, from a business perspective, to finding like, this is the metric that matters to us and this is how we’re gonna calculate it. And then doing like a shit hot job of actually educating the business. And that’s something we spend a lot of time talking about at the moment. This is how we define this metric. Like, where I work now, an active user is not someone that had a session. It’s someone that actually opened a design, clicked on something. Like, you have to have actively used the product to be counted. And I actually think that that’s personally really cool that you’re not just being like, “Oh, you landed on our site and therefore we’re counting you as a user.” But I guess my advice to the business is we need to keep reminding people of what that definition is because otherwise it can get really, really murky where people are counting users and active users and then they’re not talking the same language. Because people start new, and when you’re growing people don’t understand what the definition is, so it’s like this constant thing that you have to hammer home about how you’re calculating it.
44:39 MH: And Moe, if you’re doing context-dependent definitions like that, how do you ever compare yourself to your competitors? So, that’s, now… Okay.
44:48 TW: I’ve got… So one super-relevant… Well, exactly this is that Simo kinda made hay on this for a while. Like, four-and-a-half years ago, he wrote “The Schema Conspiracy” where he kinda goes through and basically just rips all the reasons that… Like, you think you know a session is a session and a user is a user and look at all of these assumptions that are made. And then, I think kind of really the answer to all of this is not… Any time somebody asks you about accuracy, just report everything with four decimal places and say you focus on precision, not on accuracy. And then just kinda walk out of the room really quickly.
45:22 TW: Okay, one of those might have been helpful.
45:23 MH: Speaking of walking out of the room really quickly, we’ve gotta start to wrap up. The good news here is we’ve done it, congratulations, the four of us we’ve solved it. So, anyone listening to the show, you now know what to do with your data. [chuckle] No. It is so much more to this. It is a unique experience but there are hopefully some common elements that emerged. Yali, thank you so much for kinda helping guide the discussion. One thing we love to do on the show is go around the horn and do a last call. Something we found that’s interesting, that we’ve found fascinating, or something we’re excited about. Yali, you’re our guest. Would you like to share your last call?
46:03 YS: Sure. So, I think this has been out for quite a while, but I’ve only discovered this recently and it kinda blew my mind. There’s a series of posts on quantum.country called Quantum Computing For The Very Curious, and it’s like a complete beginner’s guide to quantum computing. So, there are really lovely sets of posts. It’s a really interesting subject that I didn’t know anything about and it’s a kind of, a very… They make it as easy as they can. It’s easy enough that while I’m reading it I feel like I understand it, and then when I go away, I realize that I didn’t understand anything.
46:41 YS: But actually, they try and solve that problem. So, you don’t just go to the site and read the stuff, you sign up and then they ping you at certain intervals like, two hours later or four hours later, 16 hours later, to just ask you a couple of questions. And the idea is that you answer those emails, you do the questionnaires and they help you, they help your memory process what you read so that a few days later you really can recount it. So it’s… They call that a mnemonic medium. It totally didn’t work on me. I couldn’t tell you how quantum computing worked but I enjoyed trying.
47:14 TW: They ask you a yes no question and the answer is amplitude or that it’s both simultaneously or something? Is that the…
47:21 YS: Something like that. Quantum.country, you gotta check it out.
47:25 TW: Wow, that looks… It looks awesome.
47:27 MK: That’s very cool.
47:29 MH: Yeah, it’s alright. Alright Moe, what about you?
47:34 MK: I’ve got a twofer ’cause they’re both kind of little, but I only just found out about Makeover Monday. Have you guys heard of it?
47:42 MH: No.
47:43 MK: So, it’s actually… So, it’s makeovermonday.co.uk, and if you’re really into data viz, basically, like, people go there for fun and makeover visualizations. So, yeah, we had someone at data and analytics Wednesday present on it the other day and I thought it would be fun if you’re into data viz, but guess who has appeared again? It is one Matt Walker. So, I read an article yesterday called “I Watched Over 100 TED Talks, And These Insights Have Stuck With Me The Most” by Aria Bendix, and she refers to Matthew Walker, author of “Why We Sleep”, who has now done a TED Talk on the topic. So, if you’re not taking my sage advice of reading “Why We Sleep”, you can just watch a really short TED Talk where he shares a bunch of learnings. And also, her article is really great ’cause it just wraps up a whole bunch of really interesting threads from TED.
48:36 MH: Excellent.
48:37 MK: Tim is trying to like, contain himself right now.
48:40 TW: No, no, no, I’m still… I’m still partly… I haven’t finished that book either. It’s still partially read.
48:45 MH: You just keep falling asleep when you try to read it? Is that the…
48:48 TW: Well, I keep trying to finish it and then I’d rather actually go read the book “Finish” and then I can’t finish that and it’s kind of this whole chain of things where just nothing happens.
48:56 MH: Easy now. Alright, Tim. Well, something should happen. What’s your last call?
49:02 TW: So, I think I have… One of my life goals is to have sufficient intuition around the birthday paradox to actually be able to put somebody to sleep at a cocktail party and explaining it to them, but as she is prone to do, good old Cassie Kozyrkov has done it again. So, early in October, she wrote a post called “Using the birthday paradox to teach probability fundamentals.” So, it doesn’t just go into the math of the birthday paradox, which still kinda like where Yali is with quantum mechanics, I am with basic probability theory. So, that’s how that can be measured. I understand the birthday paradox for… In the moment when I’m reading it and then I can’t articulate it later. But this post, it’s a lengthy post, but she goes into kinda different approaches to thinking about probability and it’s always delightful ’cause she’s got her little linguistic tangents that are always amusing. So, I recommend it for anybody who’s exploring probability and the birthday paradox in particular.
50:03 MH: Very cool. Alright.
50:05 TW: What have you got?
50:06 MH: Well, you know, at the top of the show, Moe mentioned how she is not a fan of mechanical keyboards but my last call is actually related to that in that there is an article I read where it demonstrates why mechanical keyboards are in fact superior to other types of keyboards. And…
50:24 TW: Rather than speaking, could you just like, type out in the chat and we just kinda like listen to the tick tick tick tick as you…
50:30 MH: Absolutely, just listen. But actually, one of the reasons why they’re superior, which goes back to the topic of the show is it makes you a more accurate typer. And so, if we think about all these implementation issues, they’re probably happening ’cause people aren’t using mechanical keyboards enough. So.
50:43 TW: Wow, that’s fantastic.
50:47 MK: But I think we should actually test the accuracy of my typing without a mechanical keyboard versus your typing with and see whether that’s true.
50:57 MH: Well, so first off, Moe, you have to remember, I have people who type things for me most of the time. So it’s really not an accurate test. So anyway…
51:05 TW: I think the design of that experiment has some flaws that would also give inaccurate results. But that was a masterful bringing it all together though. So.
51:12 MH: Hey, listen. And speaking of bringing it all together…
51:18 MH: You’ve probably been listening and you’re like, “Oh my gosh, you’re talking about the set of problems I’m facing right now,” or “These are the challenges I’m running,” or you didn’t even talk about this aspect that’s causing pain in my life, in my data being accurate or not complete. So we’d love to hear from you. Preferably, if you liked what we said, not if we missed a bunch of really important topics. But either one will work. And the best way to do that is through the Measure Slack or you can also reach out to us on Twitter or on our LinkedIn group. And we’d love to hear from you. And that’s all I have to say about that. Yali, thank you so much for coming on the show. Really appreciate you taking the time.
51:57 YS: Oh, thank you for having me. It’s been really good fun.
52:01 MH: It’s been great. And I know I speak for my two co-hosts, Moe and Tim, when I say regardless of your data accuracy, just keep analyzing.
52:16 Announcer: Thanks for listening and don’t forget to join the conversation on Facebook, Twitter or Measure Slack Group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour or @AnalyticsHour on Twitter.
52:35 Charles Barkley: So smart guys want to fit in so they’ve made up a term called analytics. Analytics don’t work.
52:43 Tom Hammerschmidt: Analytics. Oh my god. What the fuck does that even mean?
52:52 MK: I hate people that have mechanical keyboards. They deserve to die a strong violent death.
52:58 MH: Well, Moe, so, you’re wrong, and I still love you anyway. But you’re absolutely 100% wrong.
53:08 MK: Mechanical keyboards are like literally the biggest disruption to being in workflow. It’s like you’re listening to some music, bopping away and suddenly there’s a…
53:20 MH: And to that I would say that it’s very obvious that you’ve never seriously played a video game in your life.
53:26 MK: Obviously. Except Mortal Combat, which you had the hand control-y thingies.
53:34 TW: See, Yali, what happened was that you actually resolved all your mechanical or all your audio difficulties about seven minutes sooner than we normally do with guests. So now it’s just filler of dishing shit until we apparently start recording.
53:50 MH: It’s like you’re a technologist or something, like you really understand computers and systems. You should really think about being a chief product officer of some kinda company.
54:02 MK: You’ve done this before, Michael Helbling.
54:04 MH: Oh, Moe. You’ve seen me do it before. Like, it’s literally the same conversation. It’s like, “Let’s describe this.” I’m like, “No, I’m just gonna ask you to describe it.” In that way, I don’t have to remember that.
54:19 TW: My type-token ratio analysis of presidential candidate tweets actually made it in our weekly blogs thing today. I could totally blog roll my last call. That would put people to sleep.
54:30 MH: Look at you. Maybe I should last call that, Tim, so you don’t appear to be desperate.
54:36 MK: Plus Tim’s desperate for a last call so it works out nicely.
54:39 MH: Exactly. It’s like…
54:41 TW: I don’t know if that’s at all a coherent thread, if anybody was riding along with that particular…
54:51 MK: Struggling.
54:53 TW: Rock, flag and accuracy.