Humans are creatures of habit. And analysts — those of us who haven’t been so drawn into the world of artificial intelligence that we have become cyborgs, at least — are humans. In this episode, the gang explores the good and the bad side of analytical habits: what analyses we gravitate towards, how we go about approaching those analyses, and, to some extent, how those habits are impacted by our organizational environments. With a side dish of, “What is a data scientist, anyway?” (because who can resist a question that is both rhetorical AND controversial?!).
Links Referenced on the Show
- Jim Novo
- ETL (Extract, Transform, and Load)
- Data lake
- Gary Angel
- Two-Tiered Segmentation
- #084: Bayesian Statistics and the Digital Analysts with Dr. Elea Feit
- Adobe Sensei
- CLV (Customer Lifetime Value)
- Using the Prefetch Tag and Prediction to Speed Up Websites (Mark Edmondson innovation)
- Causal inference – Wikipedia entry
- Three Pillars of Data Science – Cameron Davidson-Pilon
- (Podcast) This Week in Machine Learning and AI — Omni-Channel Customer Experiences with Vince Jeffs
00:05 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, Moe, and the occasional guest discussing Digital Analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website, analyticshour.io. And now, the Digital Analytics Power Hour.
00:30 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is Episode 93. As you look back on your illustrious analytics career, and you think about the metrics and the trend lines that brought you the most success, was it a big increase in time on site? Or maybe a big drop in bounce rate? Or was it doubling that site conversion rate? Whatever the scenario, we all have our way of handling the analysis we do, and sometimes the common pattern is the one that pays out over and over again, and sometimes it’s not, sometimes it’s a trap, but that’s what we’re gonna talk about. We’ve got practical tips, maybe. So let’s introduce the host of the show, Tim Wilson.
01:18 Tim Wilson: Hold on.
01:18 Moe Kiss: Whoa! The crowd goes wild!
01:22 TW: Sorry, I was checking… I was busy checking the time on site for analyticshour.io, so I was a little delayed.
01:26 MH: Yeah, time on site.
01:26 TW: Our KPI.
01:28 MH: And Moe Kiss.
01:30 MK: Hey, how’s it going?
01:31 MH: And Michael Helbling, that’s me. Alright, yeah, let’s start with what’s your favorite metric and why? No, I’m just kidding.
01:40 TW: If you were a metric, what metric would you be?
01:43 TW: I’d be bounce rate.
01:46 TW: Most of my social interactions.
01:48 MK: Do you have a high propensity to bounce?
01:51 MH: I would be incremental revenue.
02:00 MK: I’d definitely be engagement, some type of engagement metric for sure.
02:03 MH: Oh, right. Okay, so this is the idea, I was reading on the Measure Slack a while ago and I saw a conversation happening around how people were thinking of measuring something. And I think it had something to do with fallout reporting. I don’t really remember the exact thing, but it got me thinking about A, I think over the years, there’s been this establishment of different ways of doing analysis or in a certain sense, different things that we analyze. The website itself is a kind of a system that we analyze, and the campaigns that are feeding the traffic in are part of that system, and it’s all about optimizing and pulling the right levers.
02:45 MH: And then, over here we also have the analysis we do around who our customers are and all of the values that they have and what is their churn, what is their LTV, what is their propensity to buy? All of these kinds of things. And then, even on the product side, if we’re a product company, and I usually equate content to product if you’re not a product company, so you’re looking at the effectiveness of your content or you’re looking at the effectiveness of your product, and there’s a lot of evaluation you’re doing of those. And so I don’t know, it just got me thinking along the lines of like, is that how people do that? Does that ring true for you guys? Or how would you approach the way they categorize, I guess, sort of primary ways of doing analysis?
03:39 TW: So, I think we all do wind up coming in both… Well, one, we’re taking whatever baggage or history we have of what has worked or been successful in the past. I definitely have data that I’m way more comfortable with and therefore, I want to… That’s my hammer. I mean, I would say web analytics data, way more comfortable with than maybe next to that is CRM data, you get into, I don’t know, fulfillment or inventory data. I’m like, “What the hell is this?” So, I think there’s always gonna be a human bias to go to the data that you understand.
04:13 TW: But then I also think as much as we talk about breaking down siloes, as you set that up, I was thinking how often in an organization if you are an analyst on the eCommerce team, because you have an eCommerce organization, then chances are you’re more likely to skew around the things that you can specifically control, the eCommerce experience, and you’re gonna try to analyze the crap out of that data, even if you know or you’ve been told, Jim Novo beat it into you, or it just came up somewhere else that really you should be stepping back and thinking about the whole customer. But a lot of times, organizations don’t… A lot of analysts, I don’t think necessarily feel empowered. If I’m supposed to be working on optimizing the eCommerce experience and I’m like, “Well, hold on, let’s talk about lifetime value of our customers,” which could bridge to that ’cause you could say, “Well, I don’t wanna just generically optimize my checkout process for everyone. I wanna have some sense of the whole customer.” But that’s a really, really tough leap to make, one, because you have to get data from a much deeper area, and two, you may feel like you’re stepping beyond what you’re supposed to be doing. I don’t know if that’s on the same line.
05:35 MK: Doesn’t that just mean that we’re bad at our jobs? If we are kind of…
05:42 TW: Yeah. [chuckle]
05:43 MK: If we’re saying okay, we’re part of… Okay, literally take me as an example, part of the checkout team that looks after the checkout process, and I go through, yes, part of my job is helping improve that checkout, but if I’m not good at taking a step back and saying, “Where does this fit as like a bigger piece of the puzzle?” then I just am not very competent.
06:01 MH: I tend to think of it as having an opportunity to improve in our jobs, Moe.
06:07 MK: Oh, good.
06:08 TW: Well, but I think that’s actually… You’re bringing up that there is the… What has been handed to the analyst is the scope of their role, and in some companies, that’s a pretty tight box. And it’s this is what you need to do. In other companies, it’s like, “Hey, this is some general guidelines. But hey, if you have an idea or you wanna pursue something else, and yeah, that means this other stuff needs to be shifted around our priorities go to it.” So I think there can be organizationally, I suspect it skews towards the larger, older, pre-digital companies, we’ve talked about this in the past, are more likely to struggle to say, “Wait a minute, let’s step back and take a strategic, higher level view and figure out where should we be looking and let’s get the right people who have the skills to look into it, regardless of where they sit organizationally.” I think that happens a lot more in the younger and more digital native organizationally companies perhaps.
07:15 MK: But okay, so just pause. So if I’m trying to understand checkout experience so that my team can ultimately improve it, that’s my box or remit, so to speak, however we wanna refer to it. And I go through the checkout process, and I see that there’s different issues and drop-off points, and people struggling with different things. Absolutely one of the things I’m gonna do is go talk to our customer service analyst and also say, “What complaints have we had about the checkout?” Now, that’s completely outside of my remit but are there actually companies where someone would be like, “No, get back in your box, you’re not allowed to look at customer service data. What are you, crazy?” Is that actually a thing?
07:58 TW: Absolutely. I think there are companies that the customer service analyst that you went to talk to would say, “I’ve been doing this for 25 years. I’m not in the business of providing you with data.” I absolutely think that would be the case. Because that’s another good point though. There are plenty of analysts who aren’t necessarily intrinsically motivated, awesome employees who just wanna do the right and best thing. If you’re collaborating with somebody who kind of stumbled into a role of an analyst and maybe aren’t a great fit then they can be a real barrier to what you’re doing, and if you say, “Hey, let’s collaborate and do this awesome thing.” Yeah, there are definitely companies where people are like, “Wait, are you gonna make me look bad? Wait, is this something I should be doing? Is it, get out of my back, get off my lawn, that’s my area?” Absolutely, that exists in organizations.
08:49 MH: Thank you for tying us right back into Episode 1 of the podcast.
08:55 TW: Wow.
08:57 MH: No, that’s good. I also think or ponder, and again, I’ve got more questions than answers for sure, so maybe I’m interviewing you two. I ponder, almost like a generational divide, if you will, like Tim, you and I are old and…
09:17 MK: Older.
09:19 MH: More mature, let’s just say it that way.
09:21 TW: Seasoned.
09:22 MH: And seasoned. Yeah, seasoned. Or rizzled.
09:25 TW: Borderline rancid, turning.
09:30 MH: Yeah. Yeah. We’re ripe. But back in our day, there weren’t really great explanations of how to get at some of this stuff or how to skip over stuff that was dumb, or wrong. In fact, there’s probably examples we could go to from the early days of analytics, where people were literally screaming at us to focus on bounce rate as a primary way of understanding our website’s performance. And I don’t mean to pick on any certain person because they’ve certainly updated their conversation on topic since. But I think for us analysts back then, we had a lot less to work with, and I think it also… And I want your perspective, Moe, on this as well, because I’ve heard you mentioned as an analyst coming in, sometimes you feel like you didn’t waste time with maybe the… I don’t know how to describe it, like the old world style of tools, like the legacy tool sets, if you will, like the reporting tools or the descriptive tools. I don’t know, what are you guys’ thoughts?
10:42 MK: Yeah, I hear what you’re saying, but the funny thing is, it’s almost like things are going, I don’t wanna say backwards, because they’re not backwards. But there’s a trend going through the industry right now of, instead of doing really polished data warehouses where you have a really amazing ETL process and your data’s spat out super clean, people are starting to go back to that data, like let’s just pump in things in very raw, pretty crass formats and let’s let the analysts figure it out at the other end. And there are lots of companies making this move at the moment. And so when you talk about, “Yeah, we used to have log files and we used to do this,” I kinda feel like sometimes we’re going back to that, it’s gonna be less polished. We definitely have more data, and we have more sources of data that we can look to but we’re almost starting to do a full loop back to, here’s something really messy, and you guys need to figure out how to make useful stuff of it rather than spending this huge amount of time cleaning and prepping it.
11:47 TW: Well, I think the big difference is that when it was purely log files, eight times out of 10, it was, “No, we don’t capture that. We don’t have that data.” And then the separate one, to me though, what I harken back to is just the kind of misguided naivete, and I’ll agree that there is still misguided naivete today, but back was, “Hey, we wanna just look at the past through our site.” That was literally the perception from the business was, there were common paths through the site, that even if you just spent two minutes of critical thinking about, so what? If there is, great, it’s there, you’re not gonna change it. If there aren’t, which it always was, it’s a freaking hot mess. What are you gonna do? You actually need to turn around and think about the business side first. We’ve talked and we’re still trying to get somebody on to really talk about data lakes versus data warehouses, I equate the, “Oh, you know what? Just throw everything in a data lake and we’ll figure it out later,” isn’t always the right solution, but it gets treated as a panacea.
12:56 TW: The tag everything and capture everything isn’t a panacea. The deep learning, we’re gonna deep learning the shit of everything. There is still this hunger from the business that I don’t think has changed which is either the technology or the analysts are gonna magically make simple and black and white something from… We’ve always thought that we’ve had an enormous amount of data because we always have. We’ve always had way more data than we had two years before, and that just continues to grow. And then it’s only the benefit of hindsight that we say, “Oh, things were so nice when we just had a website and nobody had a smart phone.”
13:33 TW: Our data collection was crappier and certainly the processing power was much, much worse, but I don’t think that’s fundamentally changed. That there’s a big class of people we’re serving who really say, “You analysts just dig into the data and give me an answer.” And maybe that’s where it still might go to bias is, I absolutely wanna sit and talk, and think, and IDA it, and hypothesize, and use all sorts of other buzzwords, and not to think that, “Oh, we’re just gonna get to some kind of technical nirvana.”
14:04 MH: Yeah. Here’s a good little question for you, Moe. In your career as an analyst, have you ever had someone ask you to get them a report of the least viewed pages of the website?
14:19 MK: Interesting, actually.
14:20 MH: Versus non-viewed pages.
14:22 MK: I’ve had that happen once.
14:24 MH: Oh, well, never mind then.
14:27 MK: We were considering getting rid of a section of our website so we actually looked at how low the usage was versus other sections of the website, and we did actually find that almost no one went there and used it, so we got rid of it.
14:40 MH: No, but that’s like a real business purpose for asking that question.
14:43 MK: Yeah. Yeah. Yeah. Absolutely.
14:43 MH: As opposed to people who, in addition to seeing the most viewed pages, would also like to see on a regular basis, the least viewed pages.
14:54 MK: On this topic though, when you have a question that someone in the business asks you, do you have this list of primary go-to things that you check first? Is that just how you operate?
15:05 MH: It’s not as scientific as all that. Although I’ve struggled with this because I really think there are standards we could find and set out there. I just don’t think I’m smart enough to do it, is really what it comes down to. So Gary Angel, if you’re listening, please, define some standards.
15:31 TW: I have some go-to. Any time somebody asks me a question, literally as soon as I can, I go through the experience myself, which is often the people asking the question have not. And it’s like, I swear, 10% of the time, I’m like, “The answer is go through your own fucking experience, and you’ll see where the problem is.” Because of that, I also tend to have a go-to of what is my break down by device category because I am still convinced that even though we’re on the eighth year of it’s the year of mobile, we sit at our desktops and we look at experiences, and we say, “Oh, we built them as responsive, therefore, I’m set.” And yet, the business still… And I mean, unless they’re looking at a mobile app, they still are internally, they’re always visiting when they’re sitting at their desk. I can’t count the number. Literally, it’s happened this week, where they’re like, “Wow! What’s going on here?” I’m like, “Well, 90% of your traffic is on mobile. Let me do a screen capture of what this looks like on my iPhone.” Every question is different, but those big movers of what can be the mobile tends to be one where I feel like… Because that’s not something people know. They know they built it for mobile, but they don’t necessarily know what the traffic mix is. And if it skews really heavily, then that just can open up a whole range of ideas.
17:00 TW: And then the second thing is traffic source because even though the reporting sorta treats them as parallel, we know that every traffic source is so fundamentally different. Before the show, we were talking, you were saying promotions. That’s gonna be a go-to… If something goes crazy, you’re gonna be like, “Okay, what did we do?” What did we do to drive… So, I will say those do tend to operate pretty consistently.
17:26 MH: I guess that’s true. I think one of my first things to do is to seek to eliminate a data quality question. So if somebody is asking…
17:35 TW: Which is another part of going through the site, actually saying, “What are we capturing?” Yeah.
17:41 MH: Are we capturing the data wrong? Is it broken? How are we capturing that particular data point?
17:43 MK: Wait. So wait. Helbs, that would be one of your first steps?
17:47 MH: Yeah, because that’s gonna tell me whether or not we have a data quality problem, and then I can dig into segmentation and looking at different sources and those kinds of things, after I figure out if there’s just a big broken thing on the website.
18:00 TW: But I will say this is that…
18:02 MH: I’m a consultant most of the time.
18:05 TW: Right.
18:06 MH: And so I’m not talking about my website, I’m talking about their website.
18:08 MK: Yes.
18:09 MH: And so that’s way different, right? Because obviously in the environment where like, say when I was working on the client side, I knew our website inside and out, and knew exactly where things were broken and weren’t broken over the years, and so that wouldn’t be my first thing to do. But then, in that scenario I had like you had, I looked at channels. What was the last night’s email? What other campaigns are we running? What products are on sale? And I would just go through my list and I would leverage my secret internal daily dashboard that I built basically to tell me all those things so that I could know at a moment’s notice what was impacting all the key metrics that we had.
18:47 MK: Isn’t it funny though? We really are biased because as you guys are talking about channels, I’ve been sitting on the customer product side for so long, I’m like, that’s one of the last places I would look at, because once they get to my site, meh. Yeah, in some pieces of analysis, how they got there is really important, but ultimately, if for example, I saw conversion rate was performing really badly and I wanted to understand like, do we have a different mix of traffic than we would normally with less propensity to purchase…
19:15 MH: Which just means a huge uptick in display advertising.
19:19 MK: It wouldn’t be my go-to, it would be something… Yeah, it would be one of the things I was checking if something was going wrong probably more, but I guess that is biased.
19:28 TW: But I think even that’s knowing internally how your operations work. And so there’s a little bit is the company. If you have a very fragmented company, where there’s no communication between the email, between display, between page scores, like, yeah, you know what, every time something goes wacky, it’s somebody did something and they didn’t track it well, or they didn’t… Whereas, if you’re like, “Look, I generally know what rhythm we run on and we generally have coordinated campaigns,” I do think there’s a big difference that a lot of times when I’m getting brought in, it’s, “Hey, can you take a look at this? We think X is weird.” And there is kind of a… Maybe a lot of what I’m doing is kind of the, what’s the ramp up to just get the lay of the land for what’s going on here, so that you don’t just assume something is typical or normal? So I think that’s a good point. But that’s one of the advantages, that’s one of the upsides of being in-house and not a client. You’re not repeating that “let me get the lay of the land” process again and again. And therefore, you probably do jump to a deeper, smarter question much more quickly.
20:37 MK: I’m not sure it is. It’s like, why did it go up? Why did it go down? Like, that’s the question which…
20:41 TW: No, no, no, but you’re like what it might have been. You’ll get to a second-order level of actually answering the question. Not necessarily the question but the answer to the question or investigation of the question.
20:54 MH: Right. And that’s where things, like I mentioned Gary Angel earlier, but actually, one of the things that he and his team pioneered in the industry, not that other people didn’t do it, but I think he documented it for everybody, was this concept of two-tiered segmentation and as a practice, tried to leverage multiple layers of segmentation to drive into a better understanding of exactly what was the drivers were. And that’s very helpful. But when you start out, you’re just… I remember an experience of looking at the website’s conversion rate in aggregate and just wondering what to do. And it’s like now, that’s funny because well, of course, you start breaking it down and finding out what’s driving it and all those things, but you’ve gotta learn from something. Okay, I wanna turn another gear. We’re gonna create another layer of intensity. No, I don’t know what we’re doing. We’re just sort of going on. Everything we’ve talked about so far has been, well, I would label descriptive, backwards looking analysis. Why did this happen? What’s broken?
22:08 MH: So then, what about as we take the next step and we start to leverage prediction, how do we apply the same level of thinking? Are there standards? Are there common approaches when we predict the future and how we do that? And I know that I’m leaving myself behind because this is probably the work I’ve done the least of in my career so far as an analyst, because I stopped being an analyst and started being a whatever I do now. So, that’s more for you two, but do we see the same things? At the same kind of pattern where we’re leveraging what we know to then create that prediction? Or is it a whole new ball game in that space?
22:47 TW: I think there’s still very little prediction… I think, to me, what I mostly see is that when you’re saying prediction, you’re trying to figure out if I do X, what will happen in the future? Web analysts, digital analysts are not necessarily getting asked to forecast traffic unless it’s like, “Hey, the IT team wants to make sure they can handle the holiday load or something.”
23:13 MH: And for a great primer on how to forecast your traffic, please refer to our Bayesian episode with Elea Feit. Thank you. Okay.
23:22 TW: An asymmetric loss for the win.
23:23 MH: That’s right.
23:25 TW: But I think that marches over to the A/B testing and multivariate testing of saying the conceptually simple way to predict the future is to run the future in parallel, a testing control and say, “Therefore I am predicting.” Now you get in huge slippery slope when you say, “Well, what is this lift? Multiply this out.” That’s super dangerous and plenty of people rage against that. But I don’t feel like this idea of saying, “We want to… Media mix modeling is saying if you make these changes, we will forecast that you will see such and such of a difference in your revenue or your ROI from your marketing spend.” I don’t know, Moe. How often do you run into… As we’ve now established as the more sophisticated analyst on this podcast, how often are you doing things that you would classify as prediction?
24:27 MK: Yeah, we definitely do have to do a bit of it. I don’t know. We’re a little bit odd because we have this concept of the full-stack analyst where we all do a bit of data engineering, we all do a bit of data science, and we all do a bit of data analysis. And it’s something that I wrestle with actually internally quite a bit is because for me, I’m always like, okay, I’m gonna look at all of my past data and use that to help me learn what’s gonna happen next but then, and I can’t even remember where I was. Someone was talking about this exact concept that you can’t assume that what has happened in the past is gonna happen in the future. And so it’s something I’m always wrestling with and trying to learn more about, how do you do better predictions?
25:08 MK: But when it comes to predicting it’s definitely something that I’m learning more about and we’re gonna have to as analysts get better at that. I don’t think we can keep spending our days looking just reporting on what’s happened in the past, we need to have that forward-looking of being like, “Okay, well this promotion performed this way, we have similar circumstances except we have this extra stock or we have this extra whatever, so we can expect that it’s gonna perform that way.” So I do think we need to spend more time in that space, but it’s also tough because the business still expects you to be reporting on the past as well.
25:44 TW: It’s not like… I get a little twitchy when… There was the CEO of a large voice of customer company who just loved every time he presented, he’d talk about the, “We’re not gonna look in the rear view mirror.” I mean, that’s the most hackneyed cliche. Like, “There’s no value in the rear view mirror!” And I’m like, “Bullshit!” Conceptually, looking back. So from a yes, you wanna be looking at your historical data, that’s the data you have available to learn what worked and didn’t work, to then, to the extent that you can infer why that might be, and therefore make recommendations of how this could be better, which they’re still making decisions going forward. So, you’re impacting decisions. That to me, is different from actually saying, “And we managed to… ” And maybe we do need to get to this point of, because we’ve managed to break all this stuff down, although it’s really, really messy with the number of different variables and changing consumer behavior and changing experiences and competitors changing things, to say, “If you on this next campaign,” somehow you need to have a bunch of really messy historical data and say, “Do this X thing that is specific instead of this Y thing, we expect you will get a 3% lift in revenue.” That feels, to me, very kind of dream-state tough. I don’t know.
27:13 MK: I think the thing that’s tough, and I’m channeling Mark Gershoff because he was helping me with something the other night, and he actually assigned me homework. Anyway…
27:23 MK: If you ask him a question you will get homework.
27:25 TW: You’re not having to watch an entire linear algebraic course on YouTube, are you?
27:28 MK: No, no, but it’ll probably lead to that.
27:33 TW: It’s coming. [chuckle]
27:35 MK: But one of the things that you need to think about, because this actually happened to us recently where we had a particular set of circumstances that happened during a campaign that led to us underperforming, and we’re pretty confident that that’s what caused it. And so, there’s two camps going on where half of the business is saying, “Okay, well, we need to A/B test this so we can be sure that this is why we didn’t perform as well, versus the other camp that’s like, “Actually we can be pretty confident; this is not rocket science. Yeah, okay, there’s a small chance that we might be wrong, but is it worth all of the effort of going through this huge A/B test, potentially giving half of our customers a worse experience just so that we can be certain that’s what caused it?”
28:17 MK: It depends on your level of certainty, and in some cases when you have a pretty bad customer experience, yeah, it’s gonna affect your sales, that’s not… Yeah, so I just think you’ve gotta figure it out. I do feel like at the moment there’s this thing where everyone’s like, “We need to A/B test this. We need to A/B test this.” And I’m like, “You need to A/B test some stuff, you don’t need to test everything. You don’t need certainty for everything. Some stuff, it’s okay to know that there’s a correlation and that probably drove that.”
28:46 TW: No, really. What you should say is, “We don’t need to A/B test it, we need to pump this through a deep learning algorithm and let that figure it out.”
28:56 MH: Yeah.
28:56 MK: Yeah, yeah, exactly.
28:57 TW: But that, I think, is something that as I have grown as an analyst that I think is more and more analysts are putting on themselves that if you say, “Great, we’re not gonna A/B test, we’re gonna… That means there’s a little more uncertainty, but we’re okay because we all agree this is a lousy experience. But we’re gonna deliberately not do or we’re gonna fix that issue and on the very next campaign, I’m gonna look at it and then I’m gonna make sure I close the loop with the business and say, “Look, we didn’t test that. We saw this thing that sucked, we all agreed it was probably this, we changed it.” I recently had a case where I’m like, I thought that… I said, “Let’s A/B test, this has gotta be, this is the issue, right?” And everyone’s like, “Oh, yeah, that’s terrible. We must fix that.” And they’re like, “Yeah, why A/B test? We’ll just fix it.” I’m like, “Okay, just fix it.” Well then, they just fixed it, it didn’t solve the problem at all, but that was on me to say, “Oh, you made the change.” They were ready to say, “Checked off, we made the change, we fixed it.” I’m like, “No, the problem’s still there.”
29:53 TW: And that’s another part of the analyst of also kind of being like if you make that decision, “Hey, we all agreed to this, let’s make it. Okay, we’ve shifted to a less perfect, more of a longitudinal experiment as opposed to an A/B one, but let’s figure out exactly what day we changed it, let’s see the next campaign.” I will also say that I have used forecasting and I’ve talked about this a lot, where I’ve used forecasting more for anomaly detection. That basically Adobe Sensei is using forecasting to do anomaly detection, but it’s not actually predicting the future, it’s predicting the past. So that’s still my go-to, totally legitimate use of forecasting, but it has nothing to do with from today forward, it has to do with looking at yesterday or the last week and comparing it to what I should have expected for that. And I did not have any concept of that whatsoever a year ago. Well, maybe a year ago, two years ago. So that’s coming in, but true prediction prediction, the actual use case, I haven’t had the light bulb go on as to where that fits.
31:00 MK: As to why it’s useful or to how it fits?
31:03 TW: As to what the actual use case, business need…
31:10 MK: Okay, I’ve got one right now that we’re working on. And someone told me the other day, it’s gonna be really hard, so I’m a little bit worried. [laughter] Thankfully, I’m not leading this piece of work. We’re looking to do predictive CLV. So when a new user lands on your site, you can use a predictive model to determine what their likely CLV will be to try and understand how much you wanna try and fight to keep this customer, essentially, like how much are you willing to incentivize them to stay with the platform.
31:37 TW: And you… Now, I’m curious, that can’t be when they just arrived and they’re like, “Hey, I’m a new person,” because you don’t have enough data. Is that some of this figuring out what you know, how many…
31:45 MK: Oh, yeah.
31:46 TW: Yeah, okay.
31:46 MK: And like I said, thankfully, I’m not leading this piece of analysis, but I can provide you an update in a few months and see how it goes.
31:56 TW: So that’s fair. Take propensity modeling, or attrition modeling as the inverse of it, of absolutely… And then, this actually gets back to… So I jumped right back to the aggregate world of, “Yeah, prediction. Nobody’s asking me how much traffic’s gonna come to the site next week or how much revenue I’m gonna generate but… ”
32:14 MK: We totally get asked that by the way. We always get asked, “We’ve got a campaign next week, how much traffic can we expect? How many users can we expect? What’s the ABV that we anticipate?” 100%, we get asked that all the time.
32:27 TW: That just seems like you’re a very expensive resource to just head off. Is that for load balancing or load management?
32:35 MK: No, no, no. That’s so that we can try and balance like how much stock goes on sale, whether we need to have a bigger incentive, how many email pushes and sends we do, how heavy we go into marketing.
32:48 TW: So that’s actually…
32:48 MK: There are so many decisions that are based off that.
32:51 TW: Predicting product demand, forecasting, predicting propensity modeling at a user level. Okay, so now I’m gonna say I’ve got… I can probably come up with six totally legitimate and even kind of I’ve seen them. Even Mark Edmondson’s thing from a year or so ago where predicting the next page, so he could fire the pre-fetched tag. That’s prediction, that’s a very, very short-term prediction but has a very applicable use case. So I basically just need to get out of my… [laughter] The questions that I would have been asking five years ago, and I totally see the value in…
33:26 MH: Aren’t recommendations a kind of prediction as well?
33:30 MK: Yeah.
33:31 MH: Yeah, so.
33:31 TW: Yeah. Yeah, but…
33:33 MH: Come on, I came up with something actually valid. Okay.
33:37 TW: Yeah, I’ve realized that came across as a meh but it wasn’t a meh, it was a yeah, that is a very insightful point.
33:45 MH: Boss, I’ve had it. I’ve had it with you minimizing my contributions [laughter] in this podcast. No. Okay. Oh, I had a question that is kind of off topic, but hopefully will not… We haven’t gone too far past it that I can’t bring it back up. You were talking about how you kind of do sort of multiple roles within your function, and you said data science and data analysis, how do you…
34:15 MK: And data engineering.
34:17 MH: And data engineering. So how do you…
34:19 TW: Some would say that’s a subset of data science.
34:21 MK: Yeah… Well, I still think they’re separate.
34:25 MH: I’m asking Moe for a differentiation in her mind between data science and data analysis. Just for the purpose of how you guys have defined it, honestly, for my own edification and maybe somebody out there wants…
34:37 TW: Potentially for employee feedback. Oh my god, this is not…
34:41 MK: No, to be honest, I actually…
34:42 TW: You have no idea how loaded this question is, Moe.
34:43 MH: No, that is not true.
34:45 MK: I was talking about this last night and it is something that I’m still trying to reconcile myself. I think that there is a perception in the market that happens too often that you become a data analyst and if you’re a little bit smarter, you work a little bit harder, you’ll become a data scientist. And I actually get a little bit frustrated by that view. I think they have different roles. In my organization, those lines are very much more blurred than most organizations. But I would expect that a data scientist is doing a lot more machine learning, predictive work, all that sort of stuff; an analyst would be a lot more around answering business questions, helping communicate that to the business, helping visualize it better. I haven’t figured out the line exactly yet, but I’m still working on a…
35:38 TW: Couldn’t answering a business question be predictive?
35:40 MK: It could be but then I would expect that either our analyst would be delving into that data science space a little bit, or the data science and analyst would be working together to do that.
35:51 TW: Okay. So…
35:52 MK: Yeah, I think it’s also like how does those answers that you’ve gotten fit into your bigger strategy as a company, and I definitely see the analyst in facilitating that, like how do you work with your product managers or your marketers and actually see that through to execution?
36:09 TW: It is…
36:09 MH: Yeah, I’m secretly highly opinionated on this topic.
36:14 TW: I think there are a lot of the people highly opinionated and some of them are in very, very, very different definitions.
36:20 MH: Yeah, I guess what troubles me with some of the definitions I see bandied about is that it’s sort of like they drop this demarcation between descriptive, historical analytics and prescriptive or predictive analytics as if that’s the dividing line. And I just find that so faulty. So if you believe that and you’re listening, you’re wrong. [laughter] Okay, so…
36:46 MK: So what do you think though?
36:48 MH: Can I be really honest with you?
36:49 MK: Yeah.
36:51 MH: I think that a data scientist is someone who has the potential to become, hopefully, a great analyst someday.
36:58 TW: Wow.
37:00 MK: Aww. That’s… Aww.
37:01 TW: So, can I give the definitive definition?
37:03 MH: Yeah.
37:06 MK: Yeah, go.
37:06 MH: Yeah, go ahead, Tim.
37:07 TW: A data scientist is a data analyst who works in the Bay Area. [laughter] I did not come up with that, but I will say that… There are definitely plenty of people, it’s a term that’s getting diluted. We wanted it to be this lofty thing, but it’s sort of this weird sort of intersection of the tools being conflated with the role. So there are people who are really, really capable data engineers with SQL, Python, R, under Hadoop and Spark and all that. And they’re like, “I’m a data scientist.” And some would say, “Well, you’re a data engineer, which is a subset of data science.”
37:48 TW: The description that I saw in a, I don’t know, Lynda.com course maybe that had the Venn diagram that I kinda liked, it said a data scientist is the intersection of deep subject matter expertise, which I think is, Michael, where you were saying, kind of they could become a good data analyst. And I think a lot of people talking about data science are saying we have to have subject matter expertise, we have to think like the business. But the intersection of that with computer science, programming, with statistics, and the funny thing with statistics is that statistics are both descriptive and predictive. And that little intersection, I think is a classical or traditional analyst pursuing all of those. I actually am probably in decent shape on the subject matter expertise, but it’s the computer science, which is the pursuit of R and now Python and SQL. I worked with all three over the weekend, and two of them not well, trying to learn them.
38:48 TW: And then the statistics piece and those are, like statistics and computer science, those are not trivial things to bite off. I sort of like that definition. I get a little uncomfortable when it’s like… When we go so lofty that we say you have to have a PhD, but I don’t know, I don’t have a definition. I think it’s like big data, it’s one of those that so many people have just defined it as it suits them or as it suits somebody else who they know really well who has told them what they should think. But I don’t have any strong opinions.
39:22 MH: Alright, let’s throw another level on top of all this. And actually, I love sort of this little mini discussion on data science ’cause that’s like the little chewy thing in the middle of the candy on this episode. [laughter] But, yeah…
39:40 MK: Unexpected but pleasantly surprising.
39:43 MH: So, what about the other one that I wanna ask you guys about which I’m calling a causal inference. So, it’s not description, it’s not prediction, it is why did buying this product make a customer buy that product or does… I don’t know, I’m trying to think of good scenarios that are examples of causal inference, but I feel like we do it all the time in analytics. And where does that live for you guys? And are there patterns emerged that you’ve seen there?
40:17 TW: So I’ll take a swing at it because to me, when you had brought that up… So two things, when we say…
40:25 MH: Yeah, and by the way, that’s totally gonna be my last call, too.
40:28 TW: Okay. [laughter]
40:29 MH: Completely short call.
40:30 TW: So the whole correlation is not causation. That gets dropped all the time, but that’s what regression is supposed to be; that you do actually have independent and dependent so I think there’s a level of when you move into… When you’re logistic regression, you’re using hard numbers to try to actually have some linkage or causality. So put that into one bucket and then the second, because the article you’d shared that sort of broke down data science into three buckets, but different three buckets from what I just shared, one of them being causal inference. And I thought, well that seems to me like subject matter expertise, the stepping back and saying, “Huh, why might that have… ”
41:09 MH: Oh, so just knowing how it works.
41:11 TW: Well, but coming up with ideas of why it might have worked and then you’re feeding back into okay, now, what’s the best way for me to prove? But I don’t know that… And I went and looked up, read on Wikipedia and tried to really understand what… Because it said oh, causal inference is different from inference, and then I didn’t quite understand what it said. So to me, I sort of read that as like, isn’t that critical thinking?
41:35 MH: No, I don’t think so. It’s the process, ’cause… Taken from Wikipedia really quickly, the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause has changed.
42:03 TW: Exactly why I did not understand it. [laughter]
42:06 MK: Yeah, but that doesn’t mean it caused it or it doesn’t mean you can be confident that it caused it.
42:10 MH: You’re inferring a causation.
42:15 MK: Yeah, which, like Tim said, I do think… That’s kind of like the use case that I was talking about before. It’s like, “Okay we’re pretty confident that this is why this happened. It sounds like that’s the same process, which does come down to, yeah, like being a subject matter expert or knowing your business inside and back to front that you can be pretty comfortable with making that link.
42:34 MH: Okay. ‘Cause that’s interesting, ’cause in my last call there’s this data scientist guy who thinks it’s an area that we should be studying in data science.
42:44 TW: It may be there’s more rigor that could be had. And that may be the… Huh, what?
42:50 MH: I’m just laughing ’cause I was like, “We’re gonna end this show on a cliffhanger.”
42:57 MK: Yeah. This is a nerdy cliffhanger.
43:00 TW: I think one of the examples I had earlier where I’m like, look, I think from this, on the mobile device, I looked at it and I inferred that the issue is, “Wow, this shitty mobile experience is now dead wrong.” I still would point to it. And I think all of us were like, “Yep, that’s it.” But we sort of fell under the trap of like, “Yeah, we’re being a little over-simplistic.” And if there are rigorous approaches or techniques to say, “Here’s how you jump through that,” I don’t know if analysis of competing hypothesis says think about it first, document them, so you’re probably doing some causal inference, but then you’re taking the next step of saying, “Okay, now how do I validate that what I’ve inferred… How do I solidify my confidence in that?”
43:46 MK: How do I disprove what… That’s actually kind of the opposite. How do I disprove what I’ve inferred?
43:52 MH: But what I think is interesting to me out of that is I think this idea of inference or causal inference is very much grouped together with descriptive analytics or analysis, generally, in people’s minds. I had never really separated the two in my own mind. I don’t know if I just, I don’t read enough, but that’s just fine. Okay, we have not a lot of time and we need to do a last calls, because whatever’s going on with last calls is gonna be amazing. So, Moe.
44:26 TW: You wanna start since you have a tie right into it?
44:29 MH: Oh yeah. Let me tie this all up in a bow and you guys can go skip to the answers. No, I’m just kidding. Actually, so there’s a website of which I have a growing fascination and appreciation for called dataorigami.net. And one of their blog posts recently was this blog post written by Cameron Davidson-Pilon. I don’t even know if I’m saying that right, but he wrote this idea of the Three Pillars of Data Science: Description, Prediction and Causal Inference. And so I was really fascinated with that. It dovetailed really nicely with kind of what we were thinking about as a topic so I just shoved it in there. And that’s my last call. Alright.
45:08 MK: Right. Tim, over to you.
45:11 TW: Wow, okay then. So I am actually going to respond to Moe, your last call from Episode 88, I think.
45:21 MK: Oh geez.
45:22 TW: Because you had asked… You said, “We’ve heard of This Week in Machine Learning.” And you’re like trying to find a good episode. And I cannot say that I am a totally, a diligent, super diligent listener to This Week in Machine Learning. I will say that I briefly met Sam Charrington at the Marketing Evolution Experience, went to the session that he recorded, and he is a delightful and very brilliant fellow. So kind of related to this podcast, I did sort of sample a few of them and found one that wasn’t… It was just from last month, late June I think. And I think he went to Pegasystems. You guys know…
46:00 MH: Yeah, rings a bell.
46:03 TW: Pegasystem. So, not… They’re a big data company. They had some conference in Vegas. He went to it, did a series of podcasts, but one of them was with Vince Jeffs who is the Senior Director of Product Strategy for AI and Decisioning at Pegasystem, about Omnichannel Customer Experiences. And it was just this kind of wonderful mix of sort of naive optimism about the potential future applications of AI mixed with a high degree of pragmatism. Topics like the creepiness factor got brought up. And you know what, the guy who’s big into AI and machine learning was saying, “You know what, voice of the customer, trying to figure out what they’re comfortable with totally makes sense.” So that was… I will post a link to it but it was Omnichannel Customer Experience with Vince Jeffs on the This Week in Machine Learning and AI podcast. I think it’s a good listen. He was a very, very pragmatic fellow while also telling a story that was actually kinda cool and exciting.
47:05 MK: Nice.
47:06 MH: Alright Moe, take us home.
47:09 MK: Okay. So, a few weeks ago, I went to an event that completely blew my mind. And I think it’s really important we do a shout out. It’s called She Loves Data and they run free workshops for women who are either coming back into the workforce and have spent some time out of the data and analytics community or women who are looking to transition and change careers into the data and analytics space. And there’s actually also, a third group which they didn’t necessarily target, but it ended up happening, is that a lot of immigrant women who have amazing qualifications, who just have these huge resumes full of experience and are entering the Australian market and kinda struggling to find work as well.
47:53 MK: And so they run these free workshops. It’s sponsored by Yellowfin, Keboola, AWS, Facebook, like the whole kit and caboodle. And I just met some of the most incredible women there. And if anyone is in the Asian area, they run workshops in Sydney, in Melbourne, in Singapore, kind of all over. And I’ve also met a whole group of amazing women that are looking for internships, they’re looking for jobs. So if anyone is in that area and looking to bring someone onto their team, please let me know, and I can intro you to seriously the most amazing group of women. And a huge shout out for the work that they’re doing, running these free workshops. It’s pretty cool.
48:33 MH: Awesome. If you’ve been listening to the show, and you feel like Sean Connery meeting Jack Ryan in The Hunt for Red October, and you think our conclusions are all wrong, we’d love to hear from you. So, get out your phone, or your computer, your experience may vary depending on which device you use, and come visit us on our website or on our Facebook page, or on the Measure Slack. We’d love to hear from you. Well, I know that no matter how you analyze data, descriptive, predictive, causal inference with illusory superiority, or data literacy, or whatever it is, I know that for my two co-hosts, Moe and Tim, I’m confident in letting you know that we want you to keep analyzing.
49:25 Announcer: Thanks for listening. And don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web, at analyticshour.io, facebook.com/analyticshour, or @AnalyticsHour on Twitter.
49:45 S5: So, smart guys want to fit in, so they made up a term called analytics. Analytics don’t work.
49:53 MH: Okay, you guys ready? I’m ready. Let’s do this.
49:56 MK: I know how we like to ramble.
50:00 MH: Moe, you have an interesting life. Try to stick right on the mic please.
50:08 TW: Oh, so now you wanna share with the audience?
50:10 MK: No, no, not that kind of engagement. [laughter] Engaging personality. [chuckle] Well, when we ramble, it seems to work. Maybe we need to have this looser topic plan going forward and just talk shit.
50:27 TW: I don’t think we should plan for it, I think just when it definitely happens, it will be okay.
50:33 MK: You’re doing okay, kid. Stop stressing. Are you actually gonna vacation or are you gonna do a bunch of stupid work?
50:42 TW: And Michael, not that he’s needed.
50:43 MK: He’s needed.
50:49 MH: Listen, she’s about to come back and vacuum right next to the office, so we’d better hurry.
50:56 TW: Hard stop.
50:58 MK: That was a total fucking failure.
51:03 MH: Rock, flag, and rambling topics.