Items Referenced on the Show
- 8-bit Sheep
- Simo’s Blog
- The Notorious B.I.G.
- Simo’s The Notorious B.I.G. site on the Wayback Machine
- Integrated Development Environment (IDE)
- The Measure Slack
- R Weekly Blog
- The Tidyverse
- Stack Overflow
- (Past Episode) #058: Analytics in an Agile Organization with Simo Ahava
- Customer Data Platforms (CDPs)
- Data Management Platforms (DMPs)
- Simon Rumble’s Blog Series on Modeling Your Event Data
- Sydney MeasureCamp
- (Podcast) The Trouble – 79 Cents
- (Podcast) The RFK Tapes
- (Podcast) Crimetown
0:00:04 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, Moe, and the occasional guest discussing Digital Analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website, analyticshour.io. And now, the Digital Analytics Power Hour.
0:00:27 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is episode 99. By now, you know this show is about measuring things, specifically digital analytics things and powerfully. Like, for instance, have you noticed that each show over the last 99 has always been exactly one hour? No, go back and listen if you don’t believe me. Okay, wait, don’t do that. But we do talk a lot about the skills required on this show to be a great analyst. Is there a baseline of technical competency you have to achieve to be a good analyst? And what is the extent of that competency? Let’s introduce our hosts.
0:01:14 Moe Kiss: Hi, I’m Moe. I’m still trying to figure out the answer to that question in my daily pursuits of analytics.
0:01:20 Tim Wilson: And I’m Tim Wilson, the Grumpy Cat of analytics.
0:01:24 MH: And I’m Michael Helbling. I’m as technical as I need to be. Occasionally, we have a guest who somewhat personifies the topic and after completing his master’s degree in English Philosophy and general linguistics, Simo Ahava has endeavored to enhance his technical skills. Many of us in the analytics community have benefited from his desire to learn through his website, and today, he is a partner and Senior Data Advocate at 8-bit-sheep, and also one of the most prolific and thoughtful technical leaders of our little industry. His blog is the basis of much study by analysts worldwide, and he is the most prolific contributor to The Measure Slack, so far in 2018, but on this show, once again, he is our guest. Welcome back, Simo.
0:02:16 Simo Ahava: Thank you so very, very much for that introduction. I’m so happy to be back and also have Moe among the hosts this time.
0:02:24 MH: Yeah.
0:02:24 MK: I know, I’m really excited. That was one of my favorite episodes from last year, and I missed it.
0:02:29 TW: And you don’t have to change jobs every time you come back. So if you want to stick with this one for a while…
0:02:34 SA: Give me 40 or 50 episodes, please let me enjoy my current job for a while.
0:02:39 MH: Steep in it for a while. Well Simo, welcome back, and I think this is the crux of the conversation or where we want to start is. This is something that I think a lot of people struggle with because a lot of us come to analytics from so many different directions. We do struggle with where should we focus ourselves from a technical perspective, how much effort should we put into that? And I think that’s sort of like maybe a good starting point. I didn’t know until I started trying to find something curious about your background that your studies in university actually weren’t very technical. And so, actually, that was delightful to me, and I’m actually really curious how your own journey started and you transitioned in some of the things you’ve done over the years.
0:03:28 SA: Well, yes. The English studies had nothing to do with the kind of IT or web development background that I’ve accumulated. That was all started as a hobby. I was a teenager before the Internet thing started to blow up, or during it, before Google, at least, and I was always fascinated by my dad’s old Compaq. I think we might have even had an old Apple, Apple 2 at one point, but I was always fascinated by computers, and I’ve… Like the first thing I’ve been told that was a cause of concern for my parents was when I tried to eat a radio when I was like one year old. Ironically, I’m seeing the same behavior in my son, who’s now 16 months old. He’s like…
0:04:10 TW: Wait, now I’m starting to think you actually were successful and you actually are partly a… You’re a cyborg.
0:04:15 MH: That would explains so much.
0:04:17 TW: We have cracked the code, Investigative Journalism at its best. [chuckle]
0:04:22 SA: It is, but… Yes, so my son is exhibiting the same behavior, like if you offer him a plush toy or a book or something, and then there’s a cable, a random cable, just out there, he will instantly run towards the cable and start to [0:04:33] ____ that so… I’m happy to see that brushing off of him. But anyway, it’s playing with computers and doing stuff with them has been with me. And I started my program and career with QuickBASIC or QBasic, which was shipped with all the MS-DOS operating systems.
0:04:51 SA: And I was just fascinated by how you can be so creative with so little. You can just write a couple of lines of text in a editor and compile it and suddenly, you’ve created something that’s more than the sum of its parts. It accepts an input and produces an output based on the input. And I think it just always grew with me, and web design and web development just happened, in a way. I was introduced to HTML by a friend of mine, when I was at a sleepover at his place at age 12 or something, and we wrote our first website together. And it happened to be a website for this dead New York rapper Notorious BIG, like fan site for him. And we did so many cool things, like we wrote out wav file, we recorded a wav file where I said in my normal voice like “Hi, this is Notorious BIG. Welcome to my website. And then we slowed the wav file down so the pitch became lower, and I thought I sounded like an overweight rapper so, “Welcome to my site,” and then all the content was…
0:06:00 SA: And this is how I got into search marketing because, like I said, I’m 13 years old. All the content was completely plagiarized from other Notorious BIG sites, but I think I did a good job at it, because it shot to like number three in Yahoo search results. [chuckle] It made no sense to me. I started getting like a shitload of… Sorry, a huge amount of listeners in my stat counter, which was the analytics tool for me, [chuckle] and I’m looking at it going, like 100,000, one million, two million, three million. And that was when I really… And this sounds like a perfect segue, but that’s when I really got into analytics, because I wanted to see where are these people actually coming from?
0:07:24 TW: Well, it sounds like you had your passion, but then you thought, “You know what? I better go get a really practical degree, just to have as a fallback because, there’s always going to be room. I can always get a job as an English Philosophy major, but if I’m not able to pursue this kind of computer science or web development, I better go and get something really practical, as my university degree.
0:07:47 MK: But, do you think that it wasn’t… You didn’t consider it, because it wasn’t as popular when you were deciding on at Uni, ’cause I feel like now, kids get rammed down their throats like, “Do computer science. Everything’s in tech.”
0:08:52 SA: It was just beyond me, and I think I was just too young to firmly dedicate myself to the study that it would have need. I didn’t see computer science as an option, because I firmly thought that it would be impossible for me to understand it and in order to be good at it. And today, I took computer science as an open degree. I took basic studies later on, and I realized, “Hey, this isn’t that difficult.” In Finland, it’s been organized a bit weirdly. Instead of actually coding anything in a computer, you actually do like, notebook coding. All the exams are, you have to write the code in a notebook. You can’t use like IDEs or syntax checkers, and you actually have to write fluent code on paper, which is not…
0:09:34 SA: It’s a nice exercise, but completely detached from reality. Nobody code programs like that. You might write an algorithm on paper, but you don’t write the first proof of your program on paper. That’s just not… I think, back then, it was a bit of, maybe imposter syndrome in a way, but also very much understanding my own limitations, and how much I want to dedicate, whereas, languages were much more natural to me. I had a passion for them as well, so it was much more natural to go and study them in the university, and to keep technical stuff as a hobby.
0:10:05 MH: The two really crucial questions before we move on. First, is that Notorious BIG website still up? Can people visit it?
0:10:14 SA: No. [chuckle] But there’s a Wayback Machine link I will share this moment.
0:10:17 MH: Okay, perfect. And second, what current rappers are you interested in, or do you like? Or has your love of rap waned over the years?
0:10:28 SA: Yeah. My love of rap waned about two months after publishing the website. [chuckle] I subsequently forgot the password to… It was a Geocities site. I forgot the password and username, which is unfortunate, because I started getting really interesting emails, because I have an email address there, where people were actually emailing me, thinking that I was the reincarnated Notorious BIG. They’re saying like, “Thank you so much for coming back to Earth in this beautiful Finnish, blonde haired human form. You look so different than you did when you died.” This is really scary stuff. [chuckle] And should I be talking to counselors, or getting protection, or something, but…
0:11:09 MH: I guess, third follow-up, would you consider a career in rap if you… [chuckle] Alright, it’s alright.
0:11:21 TW: Well, this is… There are parts of your story… It’s funny, because I learned my first programming was Basic on a Apple 2C, and I have fond memories of late nights, when I was 13 or 14 years old, with a good friend, either his house or my house, going through and typing in code. And similarly, I wound up not pursuing computer science, but I still… And by the time I got, this was in the day when in high school, you learned Basic, and then if you took the next class, you got Pascal. And that’s where you…
0:12:32 MK: I’m the total opposite. I wrote my first line of code like three or four years ago.
0:12:36 TW: Right, well, so that’s… I don’t know how much that is and Moe, you’re maybe the total perfect case to then ask that. I do believe that there are people who can, that struggle to actually sort of think procedurally, and what are the steps that I need to do? And they will struggle.
0:12:56 MK: Right. I think I… Maybe I’m the weird anomaly or something, because I really like writing code. I quite enjoy it. I find it really therapeutic. But I definitely didn’t start doing it when I was young, and I definitely don’t feel like I have a brain that’s set in that mindset. I think it’s more just, I like seeing that it does something quickly and that you can do it over and over again and… I don’t know, I’m more… Anyway, I feel like I’m digressing.
0:13:26 TW: What would be bad is if we landed on the conclusion that if you hadn’t done something that steered you in some direction in your youth, then it was going to be really hard or impossible to become more technical, in which case, that really narrows the options. I think that’s a great counterpoint. We can talk about what’s… How technical do we need to be? I struggle with what are the characteristics or the attributes for somebody to be as technical as they should be. I guess, today we’re more talking about how technical should they be, so…
0:13:57 MK: I think the thing is, we all enjoy it. Maybe it doesn’t matter whether you started early or late, more it’s about whether you actually are interested in it. And I think… And this is what I wanted to chat about. How technical do you expect your colleagues to be that don’t enjoy it? Is it fair to have that expectation or… I don’t know.
0:14:50 SA: For example, for me, personally, why Computer Science was always so alien, is that I saw the next level of programming as something that really requires so much more dedication and so much more understanding what the underlying algorithm is, understanding how machines work. But today, we have programming languages that are really almost like user interfaces themselves, that you don’t… There’s very little coding going, it’s abstracted on so many different levels already. And when you… Like one of the… I think we’re going to be talking about this today, but for example…
0:15:23 TW: What do you mean, what’s… What do you mean by… Are you about to say, Python’s like that? Or you… [laughter] Or no? What do you mean by that?
0:15:31 SA: You could actually have Python, because Python has modules that you can actually have do all the work for you. Same as R has modules.
0:15:37 MK: But R… Yeah, R has packages that does the…
0:15:42 SA: Yeah, the level of abstraction is so… You’re still writing code, because you’re typing in characters on the keyboard, but it’s not the same thing as programming an entire thing from scratch, which… Programming from scratch is also controversial, because if you program from scratch, it would mean that you’re writing something in Assembler, which is a machine language. You’re actually instructing the bits in the computer to do things. Programming itself is, I think the lines are blurring, what is programming, because we are… It’s just giving a set of instructions to some machine. It doesn’t mean that the set of instruction has to be written in functions and variables and methods, but it can be just clicking through a UI as well, if that thing then turns an input into output. At least that’s my… Where I’m standing with what programming could be. Google Tag Manager, you know…
0:16:31 TW: But even… I mean Google Tag Manager’s a great example of… You can do that poorly and inefficiently. Even the Excel example, there are people who use Excel every day, and I look at it, and they are not wired to… And elegance is maybe not the best word, ’cause it’s not elegance for the sake of elegance. It’s good code, good Excel spreadsheet structures, good GTM configurations. All do have to have a need for not just, “Can I make this do it?” But, “Can I get a reasonable semblance of something that’s not going to bite me in the ass in a month because I wasn’t thinking, I guess, programmatically?” I don’t know.
0:17:15 SA: Yeah.
0:17:15 TW: To me, that’s one of the cases… That’s one of the reasons you, even if you don’t like it or don’t gravitate towards it, you can really back yourself into a corner if every time you have to do it, you’re doing it very reluctantly. You’re probably going to do it poorly, and it is going to affect your ability to be an effective analyst, right?
0:17:35 SA: Yes. I think getting started with programming requires… Or getting started with producing code requires a passion for it. I think that’s almost a given, because anything you want to be good at requires a passion in it. And passion might be the wrong word. In some ways, you need to understand the utility of what you’re doing. When you’re learning to code, I’m sure that we’ll be talking about this as well, but there are ways to learn code, coding in a vacuum, so you’re just taking classes and doing exercises. But until you actually apply them to some real world use-case, they’re going to be isolated. It’s very difficult to find the utility of that.
0:18:10 MK: Wait, what… But do you think it’s actually acceptable? ‘Cause we were actually talking about this today. To only learn something as you need it?
0:18:19 SA: Yes.
0:18:19 MK: So if you were learning a language, you only learn it for this use-case or that use-case, and then you kind of… Is that acceptable?
0:18:53 SA: I have written now almost 15,000 lines of Python, and every single line almost has required me to learn something new about Python so that I can build that feature. I’ve had a feature in mind, like I want this to be a recursive method which checks, does the container, have tags that need to be renamed, or something like that. And then, I just start writing the Python code, reading Stack Overflow, reading Python’s own documentation. How do I do this? So learning as I’m doing it. This has been an excellent way for me to learn Python. It’s been also an excellent way to write a really, really inefficient program. So the…
0:19:29 SA: One reason I’ve never ever made anything public about GTM Tools’ codebase is because I’m so terribly ashamed of it. It’s so ridiculously inefficient because there’s no structure to it. There’s just things that I’ve been incrementally adding to it. And this, I think, is where either computer science or having… Being involved in software development is really a big plus, because you learn… In computer science, you spend a great deal of time just learning algorithms. Nothing to do with code, just understanding what is the most efficient way to, for example, reverse all the vowels of any input string? So given an input string, how do I reverse all the vowels only and then produce an output string? And I was in… I actually applied for Google some years ago to a Developer Advocate role, and it involved a lot of whiteboard coding. They gave me an exercise like, “Write an algorithm that does something like this,” and then you had to write it on the whiteboard. And the thing that struck me was that after every pass of the code, they said, “Now, make it more efficient.” And then again.
0:20:32 SA: “Now, make it more efficient”.
0:20:34 MK: Oh, man.
0:20:34 SA: “Now, make it more efficient”. You’re always, with algorithms, you’re always chasing the most efficient way to reformat your code, so that it does as little redundancy as possible, as little recursion as possible, only does the fastest possible thing. And this is an engineer’s approach.
0:20:52 MK: I’ve actually been thinking about exactly this a lot lately. As someone that does want to get more technical expertise or, I guess, deeper technical knowledge or whatever the heck you want to call it, and that whole concept of making your code more efficient. And I was thinking about this the other day, because I was chatting to someone, and they were like, “Oh Moe, there’s a more efficient way to do that. Have you thought about this, this, and this?” And I’m like, “No, I don’t even know what you’re talking about. I’ve never used that.” And so, what is the best way, especially when it comes to code, to actually… How do you make your code more efficient without bringing someone else in? Because you don’t know what’s inefficient, right? Or do you just assume that it is inefficient? Or do you just wait until you find out on the fly? I don’t know.
0:21:33 MH: You make friends with great coders, that’s what I’d do.
0:21:36 TW: I have thoughts on that. [laughter] Can I take a first swing?
0:21:41 MH: Yeah.
0:21:41 TW: One, I think the… ‘Cause Moe, earlier, you said, “Do you just learn as you go?” And we sort of talked about… I think the one caveat there is, if you only learn it as you need it, but you only need small things with big gaps, it’s really hard. I recently dove into doing some stuff with BigQuery, and I bumbled through, and it was painful, but I got a query that worked. And then I haven’t touched BigQuery since, which means I’ve completely regressed, as opposed to writing something that’s got substance, where I’m going to be immersed in it for a while. But I think that on the becoming more efficient, some of that comes up with in the world we’re in now, watching the Measure Slack or the R blog weekly newsletter, or whatever the code and seeing what people are using and talking about. There’s a ton of stuff in SQL that goes beyond the select from and where.
0:22:36 TW: And if people keep putting out examples that I don’t unders… That I’ve never even seen, but I keep seeing people reference it, to me, that makes me realize, “Wait, there’s a whole other world.” In R, if people… If I was plugging in, and I kept seeing people talk about the tidyverse, and I’m like, “I don’t know what that is.” If I told myself I don’t need to understand what that is, then I’m going to continue to write crappier code. I think there’s a… Yes, having somebody review it. Yes, looking at other people’s code and understanding how they did stuff. But I honestly think, just watching the chatter of the way people are doing stuff in snippet form has actually helped me realize, “Oh, these are clever ways to use this package or something that is more efficient.”
0:23:23 SA: Yeah, that’s a good point. Measure Slack is a great example, because people are so open about what they’re doing and sharing information, so that’s a… But for becoming more efficient, first of all, there has to be a need to be more efficient, because efficiency usually comes at the price of elegance. Code that is readable is not always efficient because compromises have to be made so that it can be read by anybody and understood. Typically, really efficient code is coupled with, really, a lot of comments which are very unhelpful, or documentation which is unhelpful.
0:23:56 SA: But there are a couple of good incentives for writing efficient code. The first one is if you’re contributing to an open source project, then your code will by default be peer reviewed. And typically, it’s a kind of a badge of honor trying to figure out the most efficient way for some code to be merged into a project because you don’t want to be the one whose code becomes a blocker in terms of performance. Another reason to be really efficient is if your code is actually producing overhead that costs money.
0:24:26 SA: If you’re building tools for the AWS pipeline, for example, you want to actually keep your queries really efficient. You want to keep your pipeline really efficient because otherwise, it’s going to cost more and more dollars, and the redundancy becomes a huge cost in the long term, especially with large data sets. And then the third place, especially with code produced in Google Tag Manager, or any front-end, like web browser-based tool is, every single line of code that you write is contributing to the load time of the page, and keeping that in mind is actually a good incentive because we know how much increased load times can hurt e-commerce sales.
0:25:57 TW: But there is… You have to have… There is an awareness, right? First, you have to be… And it may be, Moe, that was the… If somebody said, “Have you thought about doing it this way?” Like that happens two or three times, and you being the unique person that you, very quickly, would start having intuition of, “I bet this could be done more efficiently.” Right? Because I think that’s… We run into that mentality of, “I did this, and it works,” without the thought of… Or everything you just talked about, Simo, the overhead of a page load, the overhead for queries that are going to cost money. Like there has to be some level, developing some level of intuition of, “Okay, I’ve got this working, but it’s worth me now investing another few hours because every fiber of my being tells me that this is duct tape and bailing wire.” And now, so like you have to go seek out how to do it better, whether that’s asking someone, whether it’s looking at Stack Overflow, which I think gets back to that core, which means you have to have some level of technical competence.
0:27:02 TW: And I know we keep, yeah, technical to your point, it’s a pretty broad, broad category. But I remember running into people who, early days that were like, “I don’t need to understand how the Internet works,” or, “I don’t need to… ” I mean, didn’t you… Who was it? Was it Simo that you had commented somewhere that you’ll ask people to explain what happens when you load a web page? Or am I dreaming that up? That’s like a brilliant question. And maybe they can’t answer it, but they’re at least curious. I’ve watched analysts that if they shy away from that, and you can start telling when you’re talking to them, like, “Let’s talk through what’s happening with these query parameters,” or, “Let’s inspect this element and figure out what’s going on here.” And if you just see them imperceptibly back up and start to be like how quickly can he get out of this conversation, to me it’s like, if you’re not intrigued by this, then it has much, much broader implications beyond this conversation we’re having right now to try to figure out how to do this analysis better. Tim’s rant, I guess.
0:28:12 MH: No, I was going to agree with Tim that I think probably one of the first things we teach someone who’s brand new to analytics is to understand how data gets collected, even just getting familiar with your browser, being able to understand how to look at your console, how to understand how the page loads, what a URL is, what a return code is, all of those things matter to how then you could go and move from there to do other things in analytics. A lot of times, we start people off with doing QA. You start learning by going and seeing, “Well, what fired? How did it fire? When did it fire? Is that what we should expect and why?” And that being the starting point of building a familiarity with sort of the underlying structure of the web to then give you a foundation to then take a the next step. And it’s like we’ve gone in our conversation already on the show, we’ve already gone deep into Python and all these things, but just right click on a web page and inspect element and then start there. There’s so many people, I think, in analytics who might not have even done that step.
0:29:26 MK: I’m going to overshare a little bit. But we are on our quest to have everyone in the team be full stack when it comes to data science analytics, etcetera, which is… It’s a really interesting journey. And actually, what the analytics efforts have been talking about a lot lately is like all the people that work in analytics, and I guess, so to speak, are the analysts, we are scrambling because we are trying to learn so much about data science, and we’re trying to do data engineering, and which we’re all happy to learn, and there’s a big learning curve, but one of the things that we’re struggling with is that our data scientists and data engineers don’t know shit about what we do.
0:30:04 MK: Like, the other day, they’re like, “Oh, this tag’s broken.” And I’m like, “Great. Go in and fix it.” [laughter] And they’re like, “I don’t know how to fix a tag. I don’t even know how a tag fires.” Or like, one of the data scientist is like, “What’s an event?” And you’re like, “Oh, dear.” We need this to go back around the other way as well, like… And that’s what we’re facing right now. How do we get our data scientists and data engineers, if that’s the strategy, how do we get them to learn what we know too, because I think it’s just as important the other way around.
0:30:31 SA: Yes, yes, and that’s a great highlight of the different level, different paradigms of technicality as well. And I think it’s actually such a dispersed term by itself technical, that it should be abolished altogether. We’re talking about things that are happening with people who work in digital, and I think that’s the unifying thread, not the fact that we’re dealing with technical machines. I think that’s a given when you work in digital, and I think that’s one of the problems with this entire talk is trying to figure out on what level of discussion we’re trying to move on. Are we trying to solve all the problems, because there’s an entire can of worms that is introduced when we start thinking about organizations.
0:31:15 MH: Right.
0:31:15 SA: And how organizations should develop a terminology around these things, which is something that we touched upon in our last talk on The Digital Analytics Power Hour. [chuckle]
0:32:20 SA: It makes sense on a very personal level. For example, I identify much more as a developer than an analyst, and still I’m introduced as an analyst.
0:32:29 MK: Why do you think that is?
0:32:30 SA: I think it’s because of the company I keep.
0:32:34 SA: No. Yeah. It doesn’t help that I’m on The Digital Analytics Power Hour, it doesn’t really.
0:32:38 MH: But to go back to that Simo, you strike me as someone who is deeply curious about all of these things. You go into a problem set, and you wrestle with it to get and extract the knowledge. To me, that’s the heart of what it means to be an analyst. I think what it is, is sort of, implementation, right? So many different things happen in implementation, and they are technical in nature, where in terms of… Let’s just use the space of Google Analytics and Google Tag Manager, because that’s probably the most widely known and studied analytics tool in the world. That’s probably right. Yeah, I’m sure that’s right. Okay. There’s so much that happens in that space and so many people are working to develop their skills, and those kinds of things, but where does someone stop having… There’s this implementer, and then there’s an analyst. And so what’s the bridge?
0:34:41 MH: I think one of the things we’ve sort of been dancing around in our conversation so far is actually there’s different kinds of technical in analytics. One is the tools and the data collection side of it, then there’s this, I don’t know, data engineering, for lack of a better word, side of it, and then maybe there is the use of technical skills and coding and development to actually do analysis, querying the data, using R and SQL, those kinds of things. And so, in your experience, Simo, where have you seen, where have people… Where do you think analysts should break down across those? And maybe that’s not the right way to ask it, but I just, I sense that’s touching on actually different aspects of being more technically capable, but it actually is they’re much different enterprises, in a certain sense.
0:35:42 SA: Yeah, they are, and a very different skill set is needed. There are ways to do each of them, of course, in a multitude of different ways. I think current trends seems to be that data engineering is like the hot topic because we have tools like Google Cloud and AWS are having a really interesting tug of war over the engineer, and who builds the pipelines using their components. I think one of the reasons why it’s been such a fascinating thing to start working on is because they have become so much easier to use and so much cheaper to use. Because it’s in the Cloud, it’s scalable. I don’t know, you guys might have also worked in these industries, but this business of intelligence 10 years ago was pretty brutal.
0:36:27 SA: Consultants costing like $100,000 to $200,000 for a gig to build a cube and then to build some dashboards for the management. And it would seem like, it’s more than 10 years ago even. It was a very esoteric field to enter. You needed like a certain frame of mind to do it, but you can actually start building an Amazon pipeline fairly easily, just using the wizards of that particular tool, and start, suddenly, you’ll be seeing data pulling in and being collected into warehouses, even with a limited understanding of what’s actually happening under the hood.
0:37:03 MK: But do you think there’s a danger with that? With the building the pipeline and having a very limited understanding of what’s going on under the hood?
0:37:11 SA: Yeah, there’s definitely…
0:37:12 MK: ‘Cause I find it a little bit scary.
0:37:14 SA: Yeah, there’s definitely a danger in it, and you’re right to be scared, because I’m terrified of it daily as well. I think the key is that there’s a little barrier of entry, and you can actually start doing it. And you can set… If you’re worried about suddenly generating a $50,000 bill for AWS, you can actually set budgets and limitations of what you’re doing. And the Amazon, for example, AWS has these ready-made templates that you can just implement and then start modifying one by one. I think what I’m driving at, is that when you are working with that entire pipeline for my collection to ETL or Extract Transform Load, and all the way to reports… As a data engineer, it would be impossible to work without understanding the data collection and without understanding the output of the pipeline. Whereas, when you’re working with pure data collection, I could theoretically implement things through GT and without giving a single damn how it’s actually going to be analyzed.
0:38:11 SA: I could add the same scroll track for every single content, but the site I work with have the same kind of eCommerce funnel without actually caring about it. And it would still work because the pipeline can do the transformations. They can turn my bad shoddy work into something proper. And the same way I could imagine doing analysis with R, or with SQL, or with Python, or with Data Studio without understanding how the data was collected, because I can see the output in the pipeline. But the pipeline in the middle is so ridiculously critical because that’s what shapes… That’s what kind of normalizes and democratizes the information coming in and turns it into a proper output.
0:40:50 SA: Absolutely, I think you’re right. And I think it’s always dangerous to isolate any part of that process. I think what I was aiming at is, that the pipelines have different dependencies, and I kind of tend to look at… Coming from the front-end side of things and the data collection side of things, I tend to look at the data pipeline being the crucial part of that. But I think that’s just biased, I think that’s biased because I feel like uncomfortable with the data collection part.
0:41:17 MK: In a hypothetical situation then, couldn’t there be a scenario where essentially every analyst or data scientist is doing their own ETLing? And, I see the pros of that, but there are also cons, right? Like, if we’re trying to get everyone to understand the whole process, and everyone’s owning bits of the process or the bits that they care about, couldn’t that just create another type of issue?
0:41:45 SA: Yeah, I think it’s a natural way to create silos in the end of things. You are contributing to these barriers between… And if you’re drawing lines between data collection and the pipeline, and then barrier between the pipeline and the analysis, like I’ve been doing here when I’m illustrating the three components that we were talking about, like the technical components of analytics, then that is definitely contributing to the problem. But I think the… Well, the fascinating thing is that we have these different components that can be approached technically from so many different angles. And I think that for long term satisfaction, it’s obviously crucial that whoever is doing analysis understands the entire pipeline or whoever is working on the pipeline understands the input and the output. I think that one constant in this thing is that if data collection fails, if something is not collected, then the whole pipeline is screwed up for that. There’s no way to get it back. If the pipeline itself manages to process or ETL something in a bad way, there’s always a backup to fall up for… Hopefully, there’s always a backup that you can fall back on and reprocess the data. And if the analysis fails, you can always redo the analysis.
0:42:56 SA: A data collection is kind of a weird piece that it’s impossible to pull information from history that wasn’t collected in the first place, unless you have some other way of collecting that data, like offline sales data, for example, you could import that later. But like web browsing data, behavior data on the website is kind of different. So in a way, I think the data collection part is in a… This might be cognitive dissonance speaking, but I think it is in a special place in this pipeline and there is that level of… There is a level of crystal ball gazing required in the data collection phase. You have to kind of tune that collection to the analysis itself.
0:43:34 TW: Well, I mean… Of late I’ve felt like there’s been a lot of discussion, is customer data platforms are bubbling up, is DMPs as people were talking, cross-device, is there’s first party and third party? To me, I watch people who clearly don’t have understanding of cookies and cross-device and the Internet, and they want to chase the latest shiny object and be like, “Oh, a CDP is going to solve our cross-device challenge,” and it’s like, “Well, maybe, maybe not.” You have to sort of think through exactly how is that link being made. So I think that’s actually, the data collection understanding that goes to these messier things we’re trying to chase, and as the latest media platform or as Facebook or Twitter or a DMP is making claims that they’re going to give you this audience targeting, and they’re going to be able to stitch all this stuff together, you have to have that knowledge to sort of call BS.
0:44:47 SA: Yeah. No, that… Yeah. Absolutely. I think that’s a good example in the way that it’s not just about data collection anymore, either. Because yes, you are collecting first party data from the website itself, but it is then being linked to a third party data in the DMP or in the pipeline or in Google’s DoubleClick back-end or whatever. So it is blurring the lines, definitely things are happening with browser cookies and happening in the browser, but the lines are being blurred between deterministic data collection where we decide what we want to collect versus what is that actually being tied together with what kind of keys are being joined behind the scenes that we don’t even know about. And I think…
0:45:25 TW: So it’s… [chuckle] It’s funny, like as you were saying earlier about programming in the abstraction layer, you don’t have to write in Assembly, but there’s some level of un-abstracted that you need to understand, and maybe the same thing goes. What hasn’t changed much about the Internet is the basic http. The protocol doesn’t evolve much. Like those fundamentals and the foundations really don’t change. We can’t all of a sudden start reading other people’s cookies. I mean, there are things that are somewhat immutable and are very, very stable, but they are fundamental and foundational, and there is a need to understand, I mean, the same thing back to query parameters and anchor tags, the structure of the URL. That stuff hasn’t changed, and it… Yeah, you really need to understand that. So that’s another kind of angle of the data collection.
0:47:24 MH: I don’t think we need an hour to answer that.
0:47:26 SA: Yeah.
0:47:27 MH: The answer is no, you cannot.
0:47:32 SA: You’re just saying that because you’re in a hurry.
0:47:35 MH: No, I still believe that. I think it’s foundational. Right?
0:47:40 SA: Yeah.
0:47:40 MH: You have to understand how the data was collected so that your analysis can actually measure or think about the right things.
0:47:47 SA: Yeah.
0:47:48 MH: Even something as simple as measuring a conversion point on form submit or form success. Right?
0:47:56 SA: Yeah.
0:47:57 MH: Big difference in what you might get back in the data.
0:49:06 SA: So I think they’re doing a great job in that, but my sincere hope to these platforms is that they don’t hide behind this simplification. They don’t try to obfuscate the complexity too much, because otherwise we’ll be learning the wrong kind of terminology and using it in the wrong context, and we’re creating friction within an organization where we want to… Like the example of the developer who didn’t understand what is a tag and how do we do that. Well that’s a symptom of this. We’re using tools that introduce words like tags, and we’re be probably using them incorrectly if we don’t understand what they mean in terms of a technical context where the developers and the IT are working in. So I think the root of this discussion is that there is a baseline terminology somewhere out there. We don’t maybe know what it is, but there has to be established if you want an organization to really function around these things, and the terminology and the jargon has to support that, maybe.
0:50:00 TW: Yep.
0:50:00 SA: You’re nodding. Michael is checking his, but everybody else is…
0:50:03 MH: I was reflecting as you were talking, Simo, about how back in the old days web trends would let you insert a tag inside of a meta-tag, so that made it less confusing. No, it made it more confusing for all those people. But anyway, we do need to wrap things up. But before we do that, I want to go around and do a last call. This conversation is one where we will never end, and to your point, Simo, there’s so much more into this. Probably we’ve done very little to calm the nerves of anyone nervous about this, but I think there’s still a lot of good information, but maybe in our last calls there’d be more. So, Simo, you’re our guest, do you have a last call you want to share?
0:50:49 SA: Yeah, so touching upon this topic, I’ve been, and when we were talking about the data pipeline and data engineering stuff, which is fascinating to me, and I’m doing my best to learn things currently in that paradigm. I’m having a bit of trouble finding real world use cases for it, but there’s a stellar blog series in snowflakeanalytics.com by Simon Rumble, who’s a Measure Slack regular as well. He’s written a brilliant series of articles called Modeling Your Snow Cloud Event Data, which is going very deep in the deep end. But I’ve been reading that with gusto, and I’m completely overwhelmed by the things he’s been writing, so again a very good way to introduce… For me personally, to be introduced to a world that I understood very little about currently, so looking forward to getting my feet wet with that as well.
0:51:41 MH: Nice. All right, Moe, what about you?
0:51:43 MK: I really don’t want to follow Simo. [chuckle]
0:51:46 MH: You want someone else to go first?
0:51:49 MK: No, it’s fine.
0:53:01 MK: Well you haven’t heard mine yet.
0:53:04 TW: You just said “See, Moe,” and I was like…
0:53:06 MH: Simo?
0:53:06 TW: Wait, are you referring back to Simo? See, Moe? Simo?
0:53:08 SA: Yeah, but so many pieces are coming together in this episode, it’s amazing. My rapper name was CMO, it’s C-M-O and back in a… I was a Chief Marketing Officer. It’s coincidence? I don’t know.
0:53:24 TW: The notorious CMO.
0:53:25 MH: I love it.
0:53:26 SA: Yeah.
0:53:26 MH: All right, Moe.
0:53:29 MK: Okay, so, I have not read anything lately. I have been in the deep dark depths of… Wait for it… Financial analysis, and I kind of, I’m hating life. So I haven’t actually been reading anything or listening to anything, but what I do want to encourage everyone to do is, go grab a coffee with someone in your finance team and ask what they’re working on and also how they’re doing it, because I am learning an absolute truck load about an area that I don’t typically work in, so I just really want to encourage everyone to do that. Also, I have a reminder that Saturday a week, which is October 20, we have Measure Camp here in Sydney, Australia. So if you are anywhere in the Asia Pacific region, please come fly into Sydney. We promise it’ll be awesome, and we’ll give you some food and free tickets.
0:54:25 MH: Awesome. Thank you, Moe. Tim, what about you?
0:54:29 TW: So I’m going to do [0:54:30] ____ and they are going to be all podcast-related stuff, but they’ll be quick. So on the depressing, if we weren’t depressed enough about the technical side of things… So there’s a podcast called, “The Trouble” with Shannon Cason, and they recently… Season Two, there was a long hiatus between Season One and Season Two, but Episode Two of Season Two is called, “79 cents”, and it’s a story of a math teaching consultant out in Fresno, California, and how basically she ultimately has shifted legislation in the US because she was getting just screwed on a salary front, and she kind of stumbled across that was happening and decided to fight rather than just kind of cave. So it’s a little depressing, but it’s a pretty interesting story that’s made some waves. I think she’s actually running for Congress now. But the more kinda meta-podcast recommendation there’s a, it’s called, “The RFK Tapes”. So for the non-US history people, Robert F. Kennedy was assassinated on June 5, 1968. There are some guys who did a CRIMETOWN podcast which is amazing and entertaining, and while they’re working on their next season of that, one of them went off and did the RFK Tapes where it’s basically tracking down conspiracy theory, ’cause Robert F. Kennedy Junior was assassinated in Ambassador Hotel in LA, Sirhan Sirhan was the one who killed him.
0:55:57 TW: And it was kind of a cut-and-dried case but now there… It turns out there’s a whole realm of people who are conspiracy theorists. And where it actually… I found it being really interesting as an analyst, is that they run through, however many episodes, 8 or 10 episodes, and they explore the data and the evidence from many different directions, and it ultimately lands with a degree of ambiguity, and everybody kinda comes out of it saying, “I believe this is what happened,” or, “I believe this is what happened.” But if you listen to it as an analyst and think about the data that we deal with and how there’s not a lot of true certainty, it was fascinating. Why is Moe so… Like this.
0:56:37 MH: I think Moe, like me…
0:56:40 MK: This is your version of quick. And meanwhile Helb is pulling his sleeve. [chuckle]
0:56:43 MH: I feel that maybe, Tim, you didn’t fulfill the promise of your last call of it being [0:56:48] ____ but really quick. Yeah.
0:56:54 TW: But really quick? I feel that is a very fair criticism.
0:57:00 MH: Very elegantly done as well, Tim, as always.
0:57:01 TW: But I’m done now.
0:57:03 MH: Yeah, that’s right. Well, listen, as you’ve been listening, you probably have been thinking, “Man, there is a question that I’ve been dying to ask Simo and/or Michael Helbling about technical things,” mostly probably Simo.
0:57:17 MH: And we would love to hear from you, all of us. And as I mentioned up the top of the show, Simo is one of the most prolific members of the Measure Slack community, and it’s a place we’re always hanging out as well, so that is a great place to contact us. You can also reach us on our website and on our Facebook page, and obviously Simo’s active on the social webs as well, I believe on Twitter and such forth and so on and like such as. Because that’s… Or you can go to his Twitter account where if you scroll back for a few mentions, you can see his rejection letter from our 100th episode which…
0:57:55 MH: Is something I’m delighted about. Okay. Anyways, Simo, I can’t…
0:58:01 SA: It was a low point for me.
0:58:03 MH: Oh. Well, this has been a high point for me. So thank you so much once again for coming on the show, it’s such a pleasure to have you back as a guest, and thanks for all that you do in our community to raise the level of technical skills across the world. Really, I… So many people, I think, have mentioned things they’ve learned from your website over the years, and so your contributions are making a huge difference to our industry. And so probably this podcast is right up there too, maybe… No I’m just kidding it’s not. [chuckle]
0:58:41 MH: No, I feel like we’re in the presence of greatness, Simo, it’s really what I’m trying to say. So thanks once again for coming on the show and talking about this topic. And for my two co-hosts, Tim Wilson, and Moe Kiss, all of you technical analysts out there, remember, keep analyzing.
0:59:03 Announcer: Thanks for listening and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour or @analyticshour on Twitter.
0:59:22 Speaker 6: So smart guys want to fit in, so they’ve made up a term called analytics. Analytics don’t work.
0:59:31 MH: Well, Tim leaves very little, in fairness, to me. Tim leaves very little for others to contribute, if you can imagine.
0:59:37 MK: Yeah, Tim.
0:59:38 TW: Really? That…
0:59:39 MH: Yeah. I feel like it will haunt Tim. [chuckle]
0:59:41 TW: I’m trying to figure out… I’m glad I got this on tape. I’m going to have to listen to that with my therapist a few times. I will.
0:59:50 MK: So I was looking through some like Google Forum the other day, and there Simo’s name popped up. And I’m like, “Are you on every forum ever just answering questions to the world?”
1:00:00 TW: We only have the most special of Europeans on, ’cause frankly it’s not convenient for us. Yeah.
1:00:04 MH: Yeah. We do a lot of Australians, a lot of Americans, very few Europeans.
1:00:11 SA: Well, I mean, there’s really, really not that much happening in Europe in terms of data analytics, so it’s fine.
1:00:16 TW: That’s what we hear.
1:00:18 MH: I’m so glad we’re recording for that.
1:00:23 MH: The problem Simo is, you have such a soothing voice. It puts me right to sleep.
1:00:29 MK: So what’s your view then in a hypothetical situation about every analyst essentially or data scientist? Jesus.
1:00:41 MH: No, Tim and I are staying in a wonderful historic location, and I think the Internet is original for the civil war.
1:00:49 SA: Are you going to drop that in before every time you talk on this show?
1:01:01 MH: You’ve got the right mental attitude for rap. That’s what I’m saying.
1:01:05 TW: That was a diss track.
1:01:07 MH: That’s right.
1:01:08 MK: So this girl on my team comes up to me the other day. She’s like, “Moe, I can’t do it,” I’m like, “What’s the matter?” and she was like pretty distressed and was like, “I can’t listen to the podcast anymore,” and I was like, “What do you mean?” and she was like, “Tim’s so mean to you, he’s always putting you down!” I was like… She’s like, “I just can’t listen to it anymore,” and I was like, “It’s okay, we do kind of disagree intentionally sometimes and also sometimes it’s a good debate, I don’t think he’s mean.” But it was really funny because today you gave me this compliment, and I’m going to have to tell her that sometimes you swing the other way too.
1:01:47 TW: Now, I can just go curl up in a ball and like… Because my… I don’t have enough self-esteem issues as it is…
1:01:54 MH: Don’t worry.
1:01:54 TW: I think, now… Oh my god… Rock flag and write some code.