Are you a data scientist? I mean, are you really a data scientist? What does that even mean…other than a healthy salary increase? On this episode of the show, Ian Thomas, Chief Data Officer for Publicis Spine sat down with the three <sarcasm>co-citizen-data-scientists</sarcasm> who regularly host the show to delve into the subject!
References Made on the Show
- Demystifying Data Science, Part I: What is Data Science?
- Demystifying Data Science, Part II: Data Science vs Analytics
- Demystifying Data Science, Part III: Data Wrangling
- Matt Gershoff
- (Book) Bad Science: Quacks, Hacks, and Big Pharma Flacks
- Women in Analytics Conference
- Digital Analytics Association
- Cassie Kozyrkov (@quesita) xkcd-style “Choose Your Own Data Science Adventure” diagram
- Black Mirror – Bandersnatch
- (Book) Why We Sleep: Unlocking the Power of Sleep and Dreams
- (Book) Dare to Lead: Brave Work. Tough Conversations. Whole Hearts. by Brené Brown’s
- (Blog post) Adobe Target Standard v. Premium: Help me choose!
- Kelly Wortham’s Test and Learn Community (TLC)
- Test and Learn Community (TLC) YouTube channel
- (Podcast) WTF with Marc Maron – Episode 1000
00:04 Announcer: Welcome to The Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour, and their website, analyticshour.io. And now, The Digital Analytics Power Hour.
00:27 Michael Helbling: Hi, everyone. Welcome to The Digital Analytics Power Hour. This is episode 112. What’s with all this data science stuff, am I right? Since the beginning of this show, we have persevered in our attempt to never quite define it. And, well, this show won’t be that different, hopefully. No, that’s not true. Hopefully, we wanna spend a bit of time demythologizing some aspects of this thing we loosely call “data science”. Tim, you’re kind of a data scientist now, aren’t you?
01:05 Tim Wilson: I like to think I’m more of like a data alchemist.
01:07 MH: Ooh.
01:09 TW: That’s…
01:10 MH: Data magician.
01:12 TW: I’m turning data into gold.
01:15 MH: I’m a data illusionist.
01:20 MH: Moe, you’re kind of more a data scientist than you probably would take credit for. What do you think?
01:25 Moe Kiss: Depends on the day of the week and my confidence levels.
01:28 MH: Okay, alright. And of course, I’m Michael Helbling and I am not a data scientist. [chuckle] Okay. So we needed a guest, and we needed one who could elevate and maybe validate some of our views. And that brings us to Ian Thomas. He’s the Chief Data Officer at Publicis Spine. Prior to that, he held numerous leadership roles in analytics over at Microsoft. He also serves on the board of the Digital Analytics Association. Welcome to the show, Ian.
02:01 Ian Thomas: Hi, guys. It’s great to be here. Thanks for inviting me on.
02:03 MH: Well, great to have you, and big moment for us, this is our very first Chief Data Officer on the podcast, so expectations are high.
02:17 MH: Sorry. Yeah, let’s get you started that way. So Ian, you wrote a series of blog posts a little while back, I think mid last year, about data science and it’s really what got this conversation going. Let’s start with maybe just a recap of some of that perspective and then go from there.
02:37 IT: Sure. The way I’ve always used my blog over, God, I’ve been writing it for 12 years or something now, on and off, is whenever I found myself thinking, “I don’t really understand this topic,” charging myself with the task of writing a blog post about it has always been a really great way of making me get off my backside and actually research the topic, formulate my thoughts about it, and so on. And so that was really the case with this series about data science. I’ve spent quite a lot of my career, particularly more recently, working with and managing a lot of data scientists. And frankly, the reason I called the post “Demystifying Data Science” was because there was this, and still is, this kind of aura of mystery that surrounds the profession, and particularly as it’s become really popular. Recently, that only got… It’s only got worse, even as the number of resources for learning about data science has got hugely larger. And so I would run into data scientists who, if you ask them what they did, would say, “Well, I could tell you that, but it would fry your brain for me to try to explain it to you. So I’m just not going to.”
03:57 IT: And so I thought, “Well, damn it. I’m gonna figure this out myself. I’ve been working with data analytics for the best part of 20 years. I’ve done all sorts of crazy data loading and cleaning and munging and cleansing and loading into analytical systems, and building reports and dashboards, and deeper level analytical things. I must be able to figure this out.” And so that’s really what came up or gave me the idea to come up with this series of post. And it was just very enlightening to try and unpack a little bit of the major components of the discipline and think about how it relates to how people would be doing analytics for a long time was really a great outcome of that series of post. Which by the way, is still in progress. I still have several more that I plan to get out at some point.
04:48 TW: It wasn’t part one of three, it was just part one, part two. It’s kind of like Game of Thrones, except that there’s a known endpoint, but I guess…
04:56 MH: You heard it here first, folks. Soon to be optioned by HBO, Ian Thomas’s blog series about data science.
05:02 IT: On data science.
05:03 MH: Very exciting.
05:05 TW: So it’s interesting, you sort of define data science. I kinda like to think that data science, if you ask 10 data scientists for what data science is, you will get 10 different definitions but they’ll all be in the form of a Venn diagram. You kind of took it a little bit of a different angle on it. You sort of defined data science from the terms of the process for data science, is that right? Kind of thinking from a CRISP-DM approach?
05:31 IT: Yeah, that’s right. And I lifted that from some Microsoft Education I was doing.
05:37 TW: Do they know? Cause it’s on the web. Are you in…
05:41 TW: Have their lawyers come for you yet?
05:42 IT: Well, CRISP-DM has been in the public domain for quite a long time, so the Microsoft Education just references it, fortunately. Otherwise, I would have to pay my salary back or something. But yeah, it spoke to me because we’re all familiar, I think, from an analytics perspective, with the sort of the analytics process or life cycle or whatever you wanna call it, which is figuring out what it is you want to know, figuring out how you’re gonna get that out of the data, messing around with the data to make it actually behave enough for you to learn, whether that thing happened or not. Or answer the question that you wanted to know the answer to, and then presenting the results and getting the feedback and going around that loop. And CRISP-DM is… The DM stands for data mining, which is now, of course, this painfully unfashionable term for what is essentially, what is now data science. It’s really just… It’s just that process, it’s gathering the data that you need, getting it in place, modifying it and cleaning it up and understanding it well enough to be able to use it to build machine-learning models, essentially just predictive analytics, tuning those models, assessing the results of those models.
07:02 IT: And then the major adjustment I made to it, actually, when I re-purposed it for my post, was that the CRISP-DM model kind of has this assumption that the models that get built would always be put into production in some way, they’ll be engineered into a production system. And what I saw a lot at Microsoft was a lot of data science was done in service of analytical deliverables, where a leader might say, “Okay, could you predict for me sales over the next 12 months in this product?” or, “Could we build an idea of what our customers’ segments are so that we can plan a marketing campaign?” And so there’s no productionalization of the data science in many of those analytical outputs. So for me, the slight difference with the core CRISP-DM is that. But I thought it provided a really great starting point for it and as a good framework for thinking about the individual portions of the work.
08:00 MK: In your blog articles, there’s a clear delineation, well, I guess a blurry delineation between analysts and, say, data scientists. I guess the thing that I still struggle with is that in the industry, and I don’t know what it’s like in the States, but in Australia, there’s still very much a view or there’s starting to be a real view that you start to be a data analyst and then you get better. And when you get better, you become a data scientist. And it’s a conversation that’s actually pretty tough because that’s a hard thought to challenge, and there’s a lot of value that analysts bring that is unique and different to what data scientists do, which you touch on. Is that something that you see in the States? That kind of misunderstanding that it’s like one’s just more advanced, and so keep trucking along and one day you’ll become a data scientist.
08:53 IT: Oh, yeah. Every analyst wants to be a data scientist when they grow up. It seems to be the assumption.
09:01 TW: Guilty.
09:02 IT: But, yeah…
09:02 TW: Guilty as charged.
09:05 IT: And there’s one very, very powerful reason for that, which is that data scientists, as a job title, are paid 30%-40% more than analysts. And so you have to be stupid not to think, “Oh, maybe… How do I get called a data scientist?” And for that reason, for all that I find this domain fascinating and the people who are in it fascinating, I neither like the term “data science” nor the term “data scientist” very much because they are so broadly over applied to anybody who is trying to get a pay rise, or anybody who’s trying to manage people who might be about to quit for another company that’s hiring people indiscriminately and calling them data scientists. So we had a lot of this at Microsoft, and what we found was that we had two groups of people who were called data scientists. One was a group of what you would genuinely think of as data scientists, people with post-graduate qualifications in data science or statistics and machine learning. And so you might have a master’s or a PhD. And then there was a group of other people who might have different backgrounds in maybe… Maybe in a technical discipline with analytical background, and they got re-titled data scientists for slightly more political reasons, should we say, or at least personnel reasons.
10:26 IT: And the former group, the ones who really have PhDs in data science, could barely contain their rage at the [laughter] abuse of this term. And that created interesting dynamic because, again, you sort of… I would ask people innocently enough, “What does it take to become a data scientist?” And this group of PhD data scientists would be like, “It takes a three-year PhD, that’s what it takes. Next question.” I’m like, well, there must be something that an analyst can do to move in a direction towards data science without the answer being like, “Well, just quit your high paying job and go back to school for two years.” That can’t be the answer. Not to mention the fact that tools are evolving very rapidly to enable this new breed of a term, which I’m afraid I hate even more than the others but I haven’t got a better one for, which is this idea of citizen data scientist.
11:18 TW: Oh, man. Wait, so Microsoft, did both classes of data scientists report into a growth hacker?
11:25 IT: Yeah, exactly. Yeah, we didn’t use that term “citizen data scientist” at Microsoft, thankfully, but… And I don’t really know what the right… I didn’t have a better suggestion for the term. But this idea of, as an analyst moves in a more data science-y direction, what do they become? And I think the answer, some of the answer lies, to your question, lies in thinking about the overlap between the disciplines, but also the fact that analysts can be strong in areas where data scientists are typically not. And there are a bunch of areas which are important that analysts can really excel in, and those typically turn out to be things like engagement with the client or the stakeholder. Communications and data presentation and sort of business context can often be areas where analysts are stronger than data scientists.
12:20 TW: So Ian, I kind of credit you, as I started down the journey for a hot minute thinking I was gonna become a data scientist and then realized I wasn’t, and had, I think, a less clarity around thinking, so I started saying I was gonna be a data science-y analyst, and you’re the one who kind of, as I was 18 months or two years into that, we were having a discussion, you said, “Why don’t you just say you’re an analyst who does some data science?” And my response was, “Well, ’cause clearly I don’t have as good a command of the English language as you do.” [chuckle] But I do think that that pursuit by the analyst… Because I’ve had analysts who said “Look, I’m doing great. I’m communicating with the business. I understand the business. I understand the data collection. I can go to town in a BI tool or Data Studio. I don’t need to get into that world at all.”
13:10 TW: And where I think that is fundamentally flawed is that the big epiphany for me, as I’ve been trying to understand that world more, there’s the coding part which is like, “Okay, I can automate and script data way better.” That’s just an efficiency gain. But there’s this other piece that is kind of statistical thinking or analytical thinking or starting to think about the stuff that Matt Gershoff has been preaching to me for years around understanding uncertainty, and quantifying uncertainty, and thinking through the cost of the data in the modeling, and recognizing that I’m not on some march towards the truth. And I don’t know, I’m still working to frame all of that.
14:00 TW: But I feel that analysts who are, call them a traditional digital analyst who’s super amazing in Google Sheets and Excel and in the pulling the data and answering questions and segmenting data and having all those business-oriented discussions, still can fall in this trap of, “When I just get enough data, I’ll give you the answer.” If you look to all of the bastardization and misinterpretation of A/B test results, a lot of those are analysts, ’cause they haven’t quite really gotten enough of a foundation of statistics to have a little bit better intuition around how to interpret confidence and power. And I don’t know if that’s fair, maybe I’m just trying to justify this ponderous trek that I’ve been on. I feel like there’s a risk that analysts would never be able to effectively engage with data scientists if they haven’t started doing some data science and started upping their game a little bit on that front.
15:07 MH: But Tim, I would wanna ask you a question. Do you think that your experience as an analyst first put you in a better position to apply statistics successfully? Because this is what I’ve observed, is that people with statistical knowledge sometimes really fail to grasp what we’re solving for a lot of times as analysts.
15:29 TW: Maybe. Except I would say that it took me an embarrassingly long amount of time to undo some of the irresponsible ways of thinking about data that I was brainwashed by mainstream analytics platforms, that due to… And to me, it gets down to aggregated data versus detailed data. So I was so comfortable in the world of dimensions and metrics, that then moving back into statistics has been kind of a nightmare until the light bulb went on. And it’s kind of a simple pivot, but literally, no one was making that leap, which to me is easily 50% of the talking across purposes when a traditional analyst talks to a…
16:17 MH: Oh, yeah.
16:18 TW: Data scientist or statistician, is literally that one thing. So I struggled a lot, because I wasn’t thinking statistically. But I don’t know, I don’t know. Ian, what was your thought? [chuckle]
16:31 IT: Yeah, I would agree. I think knowing some data science, particularly these two kind of gateway drugs of knowing some data coding and knowing… Having a good grounding in the concepts of statistics and uncertainty, makes for a better analyst, for sure. And I do think that those skills are skills that many analysts will need to acquire in order to continue to add further value. And the piece about uncertainty and experimentation, I think, is very important. One of the jobs I did at Microsoft, I was running the data platform for the Bing search engine, and one of the things that was built on top of that data platform was our experimentation platform. And the experimentation platform, I think, now it runs something like 10-20,000 experiments a year on Bing. And the thing that was interesting about running that experimentation platform was that I got a ringside seat for the evolution of experimentation-driven thinking within that engineering organization. The idea that you should try things out, test them and learn, and then keep it rating. But that uncertainty was all about you.
17:51 IT: And so you are constantly in this battle between trying to get results while managing the uncertainty and the quality of the results. And interestingly, over time, that thinking started to seep into other parts of the company, firstly into other engineering teams, and so now the development of Windows is much more driven by testing and learning than it ever used to be. And then more recently, into the marketing organization. And so one of the interesting sets of conversations I was having when I was there was with marketers who were very used to, to your point, Tim, this idea of you do a campaign, you run a bunch of numbers on the back of the campaign, you gaze upon the numbers, the numbers either result in a smiley face or a sad face, depending on whether you generated some sales or whatever, and then you go into the next campaign.
18:41 IT: And the idea that, instead, you should treat your marketing activity and a lot of your customer engagement more broadly as a continuous opportunity to learn and to evolve what you’re doing was a… It’s quite a revolutionary idea, and it speaks to this idea of the next phase of actionability of analytics because it frames, instead of this very traditional mode of saying, “Let’s look at the numbers and see what they tell us that we should do,” instead this is mode of, “Let’s decide about the decision we’re trying to make and see if the numbers tell us whether to make that decision, yes or no.” And when you frame it that way, you have to factor in whether you trust the numbers or not, and some of these principles… And essentially, you’re doing a form of predictive analytics, which takes you off into the land of data science, essentially.
19:36 TW: Well, and it also opens up the immediate… One of the options is we need to go gather data, which an A/B test is one way to gather data, but there are other ways to gather data. This is between Matt Gershoff and data scientists that joined SDI at the beginning of the year, just hammering on clarity of problem formulation. And I guess this goes back to where I feel like… Analysts, we’ve, traditional analysts, somebody asks for data and you say, “What action are you gonna take if I give you that?” To me, is kind of this simpleton base level that good data scientists… And I think it’s a point well taken that if somebody comes purely from a statistics background or a computer science background, they may really not be able to engage with the business, that’s totally a fair concern. But I think there’s some degree of having structure around the decision you’re making, quantifying or understanding the uncertainty, thinking through the options.
20:44 TW: Even the idea of having intuition around Type I versus Type II errors, and thinking in this way of, “How much more am I willing to invest to incrementally reduce my uncertainty?” is crazy powerful. And there’s been a whole discussion, I think, in the optimization industry that’s been happening over the last year or so, where there’s some really sharp people really trying to hammer that home, saying, “We’re just experimenting for the sake of… Because we’ve been conditioned to think, just test everything and pick the winner, or look for confidence, don’t shut your… Don’t peak at the results.” All these arguments and debates, when I think that there is this fundamental intuition that is often missing in the digital analytics and the optimization realm around the problem formulation and the really kind of clarity around that decision that’s being made, and I’m clearly still struggling to articulate what I think that gap is.
21:47 MH: Yes. Yes, you are.
21:50 IT: Yeah.
21:50 TW: I am. Yes, I am.
21:52 MH: But hold on, because in all of that, between the two of you just now, I think I heard something and I just wanna say it back to you and get your thoughts on it, and that is, do you think data science helps propel analysts into a better structure within their organizations in terms of being asked to solve a business problem or a business decision, as opposed to something more like the nature of, “Puke out some data and reports for me so that I can use my crappy intuition to make the choice”?
22:30 MK: Can I just say I found the total opposite?
22:34 MH: Okay.
22:35 MK: I think one of the biggest frustrations I have is that data analysts and data engineers are increasingly being asked to do tasks that are outside of their realm. So data engineers were doing analysis, they were doing piping, getting data ready, cleaning it. Data analysts were starting to delve into the world of data science. And then the data scientists were kind of like, “Well, actually, we’re not gonna do any data engineering or data analysis,” which was exceptionally frustrating, ’cause you’re trying to point out, “Actually, a really good understanding of how our event tracking, for example, works might help you build that model that you’re trying to build.” And they’re like, “But I don’t need to know that ’cause that’s a data analyst’s job.” And then as a data analyst, you’re like, “Oh, but I have to learn your stuff so that I can help with more complex questions.” Do you see what I… I don’t know. I just feel like, yes, it is making better data analysts, but I don’t think that they’re necessarily getting the credit for that that they deserve.
23:36 MH: An analyst is never gonna get the real credit they deserve, let’s just… [chuckle] That’s our task.
23:43 IT: I think data science presents an opportunity and a challenge for analysts, which is that, the thing that analysts can bring to the table in this new world is that they understand… They should understand the business context of the work they’re doing. It’s often said that data scientists, that a key data scientist skill is understanding business context. That is true and I’m not gonna deny that, but in practice, in an organization that may have fewer data scientists than it has analysts, the data scientists may hop from one project to another, and so they are often relying on folks who are a little closer to the business to help frame the problem. But the flipside of that is exactly the same, is the challenge that it offers to analysts, which is that they must use that business context to drive much higher quality conversations with their stakeholders about what their stakeholders are trying to achieve.
24:41 IT: So again, in Microsoft, we came up with this term we called “outcome-driven marketing,” which it was bit of a big kind of… It’s very obvious when you describe it, but the idea was that when somebody’s trying to execute a piece of marketing, they should have an idea of what they are trying to achieve. And all too often… And we were pretty good at doing data-driven marketing at Microsoft, probably one of the best companies in the world at it. But still, there were lots of campaigns where before the campaign was kicked off somebody said, “Well, you know what, I’m trying to drive downloads,” and then the campaign would execute and we’d measure a dozen things, a dozen key metrics of the campaign, and downloads wouldn’t move. And so somebody would then go, “Oh, but look, we got quite good click-through, so we feel okay about the campaign.” [chuckle] And that’s just bad science.
25:32 IT: There’s a whole excellent book by a guy in the UK, whose name I can’t remember now, called Bad Science, which is about just this; setting one objective for your experiment and then reporting something completely different when you write up your paper. You have to stick with the outcome that you came with. And analysts have a really great role, potentially, to play in driving that kind of rigorous thinking, that outcome-driven thinking with their stakeholders and using that to inform, not just the analyst, the analysis is done, but the modeling and so on. But if they don’t do that, then I think the analysts who just react to whatever it is that the stakeholder is asking to look at, and just kind of puke out numbers, I think that skill set is gonna become highly, highly commoditized over time and those analysts will not thrive.
26:26 MH: Yeah, ’cause even small parts of that are automated even today. You can get context around numbers through machine learning algorithm, if you want it.
26:37 IT: Exactly. And a lot of analytical work over the last 10 years, 10-15 years, has been kind of greenfield work. It’s been people setting up analytics functions and building out increasingly automated analytics and reporting systems. If you run your marketing campaign through Salesforce Marketing Cloud or Adobe Marketing Cloud right now, you get a bunch of reporting analytics coming out almost for free. So handing that information over to the client, that’s not a full-time job anymore.
27:10 TW: I do think there are lots of cases where machine learning gets applied to the analytics problem, as it’s sometimes irresponsibly framed, and you wind up getting context, but it doesn’t mean you’re actually doing anything right and well. You can check the box that there’s some algorithm running or there are those tools out there that say, “People want an email with the insight, so plug this into your Google Analytics and it’ll send them a daily email that tells them… Use machine learning to tell them that bounce rate went up and the biggest driver of that was direct traffic.” Which is trying to circumvent that hard work of saying, “What is the business outcome?” or, “What is the business problem we’re trying to address?” So I still feel like there’s a lot of hand-waving around machine learning that says, “Ah, we use machine learning.” It’s like, “Well, you use a statistical operation to do one thing that may or may not be useful.”
28:17 TW: There was a time when Adobe Analytics came out with anomaly detection and there were people coming from an analytics background saying, “This is perfect. We’ve applied machine learning. The analysts can now come and get their list of what are the anomalies, and then they can dig in and figure out what drove those anomalies and we have assisted the analyst.” And my head wanted to… Figuratively exploded when that happened, because it’s actually circumventing what actually… That was kind of irresponsible analysis, but it would actually have the label, the veneer of fancier machine learning put on top of it to say, “This is progress.” And it was not, in my mind. But I guess I don’t have strong opinions about that. Still working on saying it succinctly.
29:04 MH: Well, you just keep working on that, Tim. No, I don’t disagree with you, Tim. And you made a number of points, but overarching, I don’t disagree with you.
29:19 TW: Can I ask about the data science… I feel like the data scientist, and Ian, when you started saying, maybe not the biggest fan of the term, and you do wind up with, now, people are striving for it and the salary differences, which we’ve kinda touched on on past episodes. What about the data scientist being sort of perceived as the magician? I feel like I’ve seen data scientists that it’s thought, “Hey, we have shitty data, but we have a lot of it, and we’re not real clear on what we’re trying to actually do. But, by God, we have a lot of data. Let’s just throw a data scientist at it and then tell them to go, ‘Run me an algorithm and give me the results.'” I feel like data scientists wind up stuck in that boat. If a data scientist tries to say, “Time out, let’s talk about your business,” they can wind up getting pushed back because they’re seen as having kind of magical capabilities, but they still need to have a business foundation that they’re trying to build on. Is that a fair observation?
30:24 IT: Yeah, I think there’s a couple of things in there. I think data scientists, particularly the sort of folks who were earlier into the field, have been both exploited and also have a little bit of themselves to blame in some situations for that exploitation. And I’ll explain what I mean by that. They’ve been exploited in the sense that a talented data scientist, this person who can take a whole mess of data and explore it and clean it up and extract features from it and build an effective model that predicts a useful outcome and deliver that in a way that is reasonably conceivable by a non-data scientist, is a wonderful thing. And as with many emerging fields, the fact that that set of activities in data science has not historically been terribly well understood or clearly defined as separate pieces of work mean that it’s very easy for people to just rely on one person to do that. You lock a data scientist in a room with several gigabytes of data for two weeks and then only let them out when they have a model.
31:35 IT: The flip side of that is that, and this is not exclusively the case for data scientists, but lots of highly-skilled technical people will, if you let them, take as few dependencies on other people as possible. And so they will tend to embrace that work style. And much as they might complain about being locked in a room for two weeks, they almost welcome it, or they have historically. And though that creates this myth… So it becomes this black box problem, where the work output is hard to unpack exactly how that thing was arrived at. One of the most useful things I learned as I dug into this topic was that data scientists spend, and I cover this in one of my posts, they spend probably 80% of their time cleaning the data and preparing it for the building of the model. And as I wrote in that post, that’s not quite as much of a hopeless number as it sounds, because it’s a bit like if somebody comes to paint your living room and they spend 80% of their time sanding it down and prepping it and filling holes and making sure everything is ready, and then the very last day they’re there, they spray everything and it’s all beautiful and, hey presto, they’ve painted your living room. And that’s… It’s kind of a little like that.
32:49 IT: That said, what I have seen many, many times over is data scientists who spend that 80% of their time not knowing, or even worse, not caring that the guy or girl in the next room just did 50% of that work themselves for a similar project two weeks ago and they’re not leveraging any of that work. And in fact, even to the extent of saying to somebody, “You do know that your colleague just did this?” And they’re like, “Well, dang it, because somebody else did it, I don’t trust it. I have to see it myself. I have to see it myself.” I’ve had data scientist ask me like, “I need the raw logs.” Like, “Really, you need the raw logs? You don’t want… ” Because there’s this idea that in the pre-processing of data, some gem of a feature, some thing that might provide that predictive power might get somehow lost in the shuffle, and that’s a classic case of the perfect being the enemy of the of the good. So unpacking it is, I think, for everybody’s benefit, but there’s a bunch of compromise that goes along with that, which I think is important to discuss as well.
33:56 MH: Moe, you were about to say something, I think.
34:00 MK: Yeah, it’s a bit controversial though, but sometimes, I just sit back and I kind of wonder, are analysts that aren’t prepared to call themselves, or analysts who are probably leaning more in the spectrum of doing data science-y tasks, in team lingo, who are not willing to call themselves a data scientist or apply for a data scientist job, are they just chumps? Because [laughter] there are so many people in my industry that are doing that. I know you can get extra money, but the place that it actually probably most concerns me, is that… And I know a couple companies that I’ve spoken to lately that are like, “We’re only hiring data scientists.” And what I said was, I was like, “My concern is that you’re also probably gonna hire less women.”
34:45 MK: And the reason is that, and this is my own experience, I have lots of male peers that are very comfortable calling themselves data scientists without any qualifications. The only woman I know who calls herself a data scientist has a master’s in data science. Every other woman I know won’t even apply for a job if it’s got the title “data scientist”, even if they have enough of the skill set. And I just kinda sit back and I’m like, “Should we all just… ” Stitch big studios now, they call everyone a data scientist, even the people that were, I guess, traditionally data analysts. They’ve called them all data scientists because they felt like there was this delineation between data analysts and data scientists. Is it just at that point now where the term has lost its integrity and everyone should just adopt it so we all get the extra cash?
35:37 TW: Oh, that’s painful.
35:38 MH: Well, there’s two problems with that. One is, I think we haven’t defined the value of the other roles effectively, from an industry perspective, and so people don’t understand the value creation process as it pertains to being an analyst versus a data scientist, and then organizations are reacting to that by just trying to get the highest end help they think they can get, which is obviously a data scientist, and then that is creating this almost inequality that you mentioned, Moe, of… There now will… Do people pretend they are? I don’t know what the gender breakdown of that is, maybe men are more confident in that regard. I never have that confidence, so I’m with the ladies on this one, but I see what you’re saying.
36:29 MH: And it seems like the solution has gotta be along those two lines. First, we have to find a way to define the roles so that the right candidates can apply for them; and the second is, understand and evaluate, as an organization, the different roles that we actually need to make analytics happen correctly, ’cause I think there’s a part to play. And so if I’m a company and I decide I’m gonna rename everybody in the organization a “data scientist,” well, you’re just gonna run into a really weird situation. Probably something along the lines of what Microsoft was running into, Ian, where the people who really are data scientists are now seeing some guy who’s spitting out Adobe Analytics reports or whatever, being called a “data scientist” and being pretty upset about it. I don’t know.
37:15 MK: Yeah, it’s this weird tension. And I feel like those of us that are like, “I’m not gonna call myself a data scientist because I don’t have that skill set,” we are kind of losing out and fighting to be like, “Well, actually, this is the value that an analyst can bring to the table.”
37:31 IT: Yeah. What we’re looking to do to at Publicis, actually, ’cause it… A big part of my job here is, we have several thousand data analytics professionals around the group all over the world, in all sorts of different agencies and brands and so on. And one of the things I’m doing is creating a structured framework for thinking about the skill areas, the competency areas that people in that group actually have. And we have… We’re calling them crafts, and we’re thinking about them as being these distinct sets of capabilities. And I’ve had several conversations so far, where people have said, “Are we talking about job titles here?” Then I said… And I’m saying, “No, these are adamantly not job titles. They are competency areas.” And we have to address job titles. I can’t just ignore job titles, but the idea here is that people are not just defined by the job title they have.
38:36 IT: If we can capture people’s crafts in a way that is reliable and believable, then within the organization, at least, we can say to somebody, “I need somebody who has a level three skill set in data science and a level two skill set in data strategy,” for example, being another one of the areas that we’re focusing on. And that somebody has a portfolio across these crafts of capabilities that they can bring to a particular project. Because I think this problem of title inflation, it’s not new in this… Nothing is new under the sun, [chuckle] but it is true that it is a problem here. I think you make a really important point, Moe, about how it impacts diversity from a hiring perspective, and so that’s the best idea we have, is to try and think about people’s skill sets as componentized sections rather than, “What is a data scientist or a analyst?”
39:38 TW: It’s kind of funny is, having just gone to the Women in Analytics Conference a few weeks ago, it’s the Women in Analytics Conference, and many of the sessions are actually data scientists or are tackling things that would fall under the data science umbrella. So I don’t know if that’s a reflection of the phenomenon that you’re pointing out.
40:02 IT: Well, yes. Funny you should say that, Tim, because some of the best data science content I have seen presented in DAA events, the Digital Analytics Association events, have been from women. And I probably would be over-stretching to draw a direct correlation, but that’s women who think of themselves as analysts but are really data scientists coming along and presenting at these events when all their male colleagues are like, “I am a data scientist. I’m gonna go and present to this fancy pants machine learning event instead.”
40:31 TW: “What do you mean VLOOKUP? What’s that?”
40:33 S1: Yeah.
40:36 MH: Alright. Well, this topic is one that could fill many a show, and we’ve done filled up this one. [chuckle] So this has been a great conversation. One of the little things we love to do is go around the horn and do a last call. Something we found recently that we liked, find interesting, and things like that. So Ian, you’re our guest, do you have a last call you wanna share?
41:01 IT: Yes. I just saw this tweet today from Cassie Kozyrkov of Google, where she had created an XKCD style cartoon. So you can go and look at this tweet online. Her Twitter handle is @quaesita, Q-U-A-E-S-I-T-A. And it’s choose your own data science adventure, so it’s… Are you making decisions? Yes or no? How many decisions are you making? Are there any… Is there any uncertainty using lots of data? And it helps. It’s actually just a really nice little way of understanding whether you really want data analytics, whether you just need statistics, or whether you really need a full-blown machine learning. So I thought that was a really nice short-hand way of putting that in a nice accessible way.
41:48 TW: It’s kind of like a Black Mirror: Bandersnatch meets XKCD in a data science context?
41:55 IT: That’s right, yes.
41:55 TW: If you follow the path, you don’t like where it leads, you start over and choose a different adventure.
42:01 IT: Yeah. Exactly.
42:04 MH: Moe, what about you?
42:05 MK: I’ve got two, but one doesn’t really count because it’s been a last call before, but it’s completely changing my life. And if you wanna be a better analyst or you wanna be a better data scientist, whatever you call yourself, Aubrey Blanche, who was on the show a few weeks ago, talked about this book called “Why We Sleep” by Matthew Walker. Hands down, the best way you can make yourself a better analyst/data scientist, or smarter, or remember more, or whatever it is. I swear to God, this book is actually changing my life, it’s in the top three books I’ve ever read and it’s gonna make you question everything ever, including how you study, how you learn new languages. Yeah, so that was kind of my half of one because it’s actually already been done. But guess who’s back?
42:50 MH: Who?
42:51 MK: Nothing? My favourite Frenchie, who’s actually from Texas, Brené Brown. I’ve been reading her latest book “Dare to Lead,” and yeah, it’s just… Yeah, so basically, “Dare to Lead” talks a lot about vulnerability as a leader, and I think what I like about it most is how she talks about just oversharing is not being vulnerable; being vulnerable is actually sharing the right information at the right time in the right way. So Brené Brown’s book “Dare to Lead” is the other one that I’ve got my hot little hands on.
43:24 MH: I’m a big fan of Brené Brown.
43:26 TW: Dare to Lead. Nice. Michael, you’re next.
43:28 MH: Alright, Tim. Okay, fine, I’ll go next. Thank you for for asking. [chuckle] I recently got a chance to read, and I try not to do this too much, but we’ve got so much gush darn talent at Search Discovery, and Kelly Wortham wrote this amazing blog post just comparing some of the feature differences between some of Adobe Target’s products like the standard and the premium version. And I know that’s not gonna hit all of our audience across the board, but all of us are sort of what is really the… And it’s not exhaustive, but it’s so good and so helpful in helping think through the different things that are involved with those two products. I just love when someone brings that kind of clarity to something that just hangs out there a lot, when not a lot of people really understand it. And you know where all of that information and thought came from was the community that Kelly leads alongside others, the Test & Learn Community, or better known as the TLC. And they have an amazing community, Slack group, and they’ve actually do… I wanna say it’s monthly panel discussions. It’s almost like a podcast, really, but they do it live and they do it like a panel discussion around some kind of topic with experts and things like that. I think, Tim, you’ve been an expert on there before.
44:56 TW: Well, I was on the panel. I wouldn’t say I was an expert.
44:58 MH: Just go with me. But the cool thing is that now…
45:01 TW: I was just the data scientist representing on the panel.
45:06 MH: You were the representative data scientist?
45:08 TW: Yeah.
45:09 MH: The citizen data scientist?
45:10 TW: Yeah, I was a citizen data scientist.
45:13 MH: The upright, citizen data scientist. But no, the awesome thing is they’ve now got all of these conversations up on YouTube, it’s a treasure trove of testing and optimization learning. And so it was almost like a master class. Just go through the channel, watch some of the videos. It’s incredible stuff. Alright, that’s enough of a spiel. I just… I could not be silent any longer.
45:42 TW: The thing is, the Test & Learn Community pre-dates… We had Kelly on the podcast before either she or I were even a glimmer in the eye of working at SDI. And she was, at the time, I think it was just a conference call, but she has kind of stayed with it and it has grown and it’s…
46:00 MH: It’s just become an amazing group. And it was amazing before, too. It’s just now they’re available for everybody to check in on and listen to. Okay, Tim.
46:11 TW: So I’ll do a twofer as well. And one of these is, as of eight days ago, as of April 1st… No fooling. The DAA, Digital Analytics Association, has had a Women in Analytics mentor program, that I was part of the pilot and then the first year or so of it. And they have now opened that up to men. That’s got a little bit of a caveat. They want to continue to stick with the DAA standards for diversity and inclusion that are connected to the Women in Analytics initiative. So any pairing of a mentor and a mentee has to include one woman, but historically, it was men could be mentors, they could not be mentees. Well, now, men can be mentees.
46:53 MH: Yes.
46:54 TW: They just have to select a female for a mentor. So that’s…
47:00 MH: I think it’s awesome. A great expansion of the program.
47:03 TW: Yeah. So if you’re a DAA member or interested in being a DAA member, check that out. My second, I recognize, is gonna be very focused and appealing to a very, very narrow niche. But recently, the WTF with Marc Maron podcast had their 1000th episode, so… And this, I may just want my co-hosts to listen to this more than anything else, ’cause for the 1000th episode, they sat down with Maron and Brendan McDonald, who’s his producer, and they had a good long discussion about the history of the show and what goes into producing it and how they think about it. One, I realized what an absolute horrible slacker of a producer I am in that role, but a ton of it resonated with me as kind of what we would like this podcast to be. Never, never, never gonna be on the scale of his show, but if you’re all… So I felt like I was listening to a couple of big brothers who have really hit the big leagues, talking about the world that I operate in. So probably, more of interest to somebody who’s actually a listener of that podcast already, but lots of it really hit home with me.
48:17 MH: So I think maybe because we do a Year in Review every year, maybe we don’t have to do a big, introspective 1000th episode.
48:25 TW: You know what, we’ll talk about that when we get to our 1000th episode.
48:29 MH: Yeah, it’s been really…
48:29 TW: Things [48:30] __ like we decided that we were gonna release twice a week from day one. And for 10 years, we have released twice a week. So just as I am aggressively, we are not gonna miss, with the exact opposite, every other week. So there are little things in that that maybe meant more to me than anyone else. If Moe and Michael listen to it, they’d be like, “Oh, this is what Tim’s problem is.” But, [chuckle] anyway. So…
48:57 MH: Delightful.
48:57 TW: It’s a good lesson.
48:58 MH: A little view into the mind of Tim Wilson.
49:02 MH: Okay, excellent last calls. Even better discussion around data science. I’m sure, as you’re listening, whether you’re an analyst, a data scientist, or somewhere in between, you’ve probably heard something that’s piqued your interest and we would love to hear from you. The best ways to reach out are always through the Measure Slack or on Twitter or through our LinkedIn group, if you wanna reach out to us that way. Ian, can’t thank you enough for taking the time from your busy Chief Data Officer schedule. That’s pretty exciting for us to have a Chief Data Officer on the show, our first ever. So thank you so much for coming.
49:42 TW: You set the bar high for future Chief Data Officers.
49:44 MH: Yes, absolutely. All you Chief Data Officer listeners, please reference this episode as you prepare to be on this show…
49:52 TW: Your pitch to be on the show.
49:53 MH: Yeah.
49:54 IT: Well, you are so welcome. It’s been a real pleasure to be here, and I’m realizing a long-held ambition to be on the Power Hour. [chuckle] So we’re equally excited by our respective opportunities, so thank you very much.
50:09 MH: Wow. You have filled it out nicely, sir. No, but seriously, for my two co-hosts, Moe and Tim, and for myself, I think whether you’re a data scientist or an analyst or anywhere in between, I think you can always still keep analyzing.
50:31 S1: Thanks for listening, and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour, or @AnalyticsHour on Twitter.
50:51 S?: So smart guys want to fit in, so they’ve made up a term called “analytic”. Analytics don’t work.
51:00 S?: Analytics. Oh my god, what the fuck does that even mean?
51:08 MH: Or at least, I don’t know anything until someone starts talking, and then I realize, I knew it.
51:12 TW: I was gonna say, since when has that ever stopped you?
51:14 MH: Yeah, I know. I will be fine.
51:17 MK: Okay, so it’s just a buzz word?
51:18 MH: Yeah, yeah. One-to-one marketing, 360 view, blah, blah, blah.
51:22 IT: You’re welcome, Michael.
51:24 MH: Thank you. That validates so much of what I’ve already decided to say, so that’s really helpful.
51:32 IT: It’s out of the dumpster fire into the dumpster fire right next door. Yeah.
51:41 MK: I didn’t… I intentionally didn’t ask that question, but it just naturally progressed to that. So you can’t blame me for that one.
51:48 MH: You’re an amazing interviewer, Moe.
51:51 MK: Oh, I’ve just finished it and every single person I talked to in the world is hearing about it, and they’re like, “I’m going to kill you, stop.” The reason you’re fat, sleep. The reason you’re dumb, sleep. The reason you can’t remember stuff, sleep. It’s all sleep. The reason you got Alzheimer’s.
52:07 MH: You’ll have to get through Tim and I, first, in terms of idiocy. That’s what it is.
52:13 MK: I was wondering where you were going with that.
52:17 TW: I’ve finally decided that I was gonna drink during a show recording for once, and now I’m just gonna burp through the show recording. So…
52:24 MK: Maybe you shouldn’t drink beer, then.
52:27 TW: Well, I’m taking it easy.
52:31 MH: I think a data scientist could become a great analyst.
52:34 MK: See, that’s where we need to get to.
52:36 TW: If they want that pay cut… [chuckle]
52:41 MH: Yeah, I know.
52:41 TW: [52:41] __ stuff like that.
52:45 IT: Yeah, I wasn’t gonna go there with that comment, but I’m glad you just rolled off the Band-Aid on all the…
52:51 MH: Are you listening? Are you listening? No, just kidding.
52:58 MH: Rap, [53:00] __ chumps who get paid more.