Do you long for the days when your mother could ask you, “Now, what do you actually do for your job?” and “all” you had to do was explain websites and digital analytics? The “analyst” is now a role that can be defined an infinite number of ways in its breadth and depth. Is the analyst who is starting to do data transformations to create clean views still an analyst? Or is she a data engineer? A data scientist? On this episode, we explore the idea of an “analytics engineer” with Claire Carroll from Fishtown Analytics who, while she did not coin the term, can certainly be credited with its growth as a concept. And there is a brief but intense spat about the role of “analytics translator,” which Claire sat out, but observed with bemusement.
Tools, Concepts, and Articles Mentioned in the Show
- Claire Carroll
- Fishtown Analytics
- DBT (Data Build Tool)
- (Article) The Analytics Engineer by Michael Kaminsky
- Snowflake Cloud Data Platform
- (Conference) Coalesce
- DBT Blog
- (Article) The Startup Founder’s Guide to Analytics by Tristan Handy
- (Article) SQL vs R. Which to use for data analysis? by Jordan Peck
- (Article) When did analytics engineering become a thing? And why? by Claire Carroll
- Lak Lakshmanan
- Measure Slack
0:00:04 Announcer: Welcome to the Digital Analytics Power Hour. Michael, Moe, Tim and the occasional guest discussing analytics issues of the day and periodically using explicit language while doing so. Find them on the web at analyticshour.io and on Twitter @AnalyticsHour and now, the Digital Analytics Power Hour.
0:00:26 Michael Helbling: Hi everyone. Welcome to the Digital Analytics Power Hour. This is episode 151.
0:00:36 MH: It just sits there blinking, sort of like a little line of doubt, daring you to try something and it’s usually green or some other not very word-like color. The command line, the IDE. For many of us analysts, our skills are not exactly highly technical. We look through data, we do analysis, we cobble together a story that we can share that highlights what we think is happening, what might be done about it but let’s face it, it’s 2020 and the world isn’t the same as it used to be. Python and R are basically required at many jobs, SQL is showing up as a requirement on more and more job descriptions so here’s the deal.
0:01:17 MH: You might have a ton of technical skills because you’re an awesome data engineer or you may be more like the data analyst just taking your first faltering steps on the technical side. Well welcome to this episode where we’re gonna talk about the intersection of these two skill sets.
0:01:35 MH: So let’s meet our hosts. Moe Kiss is the marketing analytics lead at Canva. Hey-o.
0:01:40 Moe Kiss: Hi morning, evening. How you doing?
0:01:44 MH: Yeah, I’m doing really good and Tim Wilson, Senior Director of Analytics at Search Discovery.
0:01:50 Tim Wilson: How you going?
0:01:51 MH: I’m going great. I don’t know the right response to that. We’ll get back to that. I am Michael Helbling. I’m the founder of Stacked Analytics. Okay but we needed a guest. Someone who lives this crossover day in and out. Claire Carroll is an analytics engineer and community manager of DBT at Fishtown Analytics. She’s also held data roles at companies like Airtasker, Uber and others and today she is our guest. Welcome to the show Claire.
0:02:21 Claire Carroll: Thank you so much for having me. I can confirm that “How you going? Good.” is the correct way to answer that question [chuckle]
0:02:26 MH: Good okay. So that’s good because we’ve always doubted Moe and so getting some independent verification…
0:02:33 TW: We’re trying to line up as many Aussies as we can just to figure out if Moe starts yanking our chain and try to come up with things, we’ll go for it.
0:02:41 MK: No but that’s a great idea.
0:02:44 MH: Just do a whole Australian episode.
0:02:46 TW: No more have her tell us that… Oh yeah.
0:02:48 MK: Yeah. Misinformation.
0:02:51 CC: We definitely should have organised beforehand.
0:02:53 MH: Okay, yeah. This is exactly what I’ll say. [laughter] Yeah that’ll end well. Okay Claire, welcome. So maybe to kick things off, let’s talk a little bit about what you do day in and out and then I think that’s gonna devolve into a deeper discussion about engineering and analytics and kind of how they’re sort of merging together so much these days.
0:03:17 CC: Okay. My day in and out looks very different each day. I’m the community manager at Fishtown Analytics. We make a software called DBT and I actually live in the States. As we’ve already discussed. I have an Australian accent because I grew up in Sydney but I moved over to the States around two years ago to take on this job. I…
0:03:40 TW: Good timing. Good call. Sorry.
0:03:42 CC: Yeah yeah. [laughter] In hindsight, maybe 2018 wasn’t the best time to do that but you know… With hindsight, you use 20/20.
0:03:52 TW: I’m sorry. [chuckle]
0:03:53 CC: It’s okay. I feel the same way many days. [laughter] So yeah, at my job at Fishtown, I’m the DBT Community Manager, which means a whole lot of things. I spend my days working with clients and with community members on how they can best transform their data so they can use data to make really good decisions about their organisations. I hang out a lot on Slack, helping people that use the software, answering their questions. I write all the technical documentation. I write different blog posts, all these different things and sort of all of that is in support of this bigger mission that our company has, which is to elevate the analytics profession and our opinion is the way that we do that is to get analysts working a whole lot more like software engineers, which is where this term analytics engineer sort of came out of.
0:04:49 MK: And yeah, I have to admit Claire, you were the first person I heard say the term analytics engineer. We were lucky to have Claire at our data analytics meet-up when she was visiting Sydney. So what do you mean when you say the term analytics engineer? See, I’m already getting confused. [laughter]
0:05:08 CC: No, absolutely. Well I have to give full credit to my friend Michael Kaminski, who originally coined the term but we… So Kaminski wrote this post and I sort of read it and went “Huh. That actually describes the thing that I’d been doing.” And sort of from there, we saw this title and took it on and have tried to really build community around this title “analytics engineer.” So what does it actually mean? I guess it’s worth going back to my prior role at Airtasker where I did a whole lot more of this kind of work. So I was working as a data analyst, that was my title and I was in charge of the entire data stack.
0:05:45 CC: We were using off the shelf tools to get data into our warehouse. We were using a BI tool to serve it up to our stakeholders but there was a big in-between step which was all this sort of transformation to get the data ready. So data isn’t always clean. In fact, I don’t think it’s ever clean and it’s never really in the format you need it to be in to be able to hand it over to a stakeholder so that they can start using it to do different analyses.
0:06:12 CC: And so I was working a lot in this sort of middle section of the stack. I was using SQL to transform my data, I was putting tests on it, I was documenting it, I was using version control, I was using the command line, I was using a code editor. My computer looked more like a software engineer’s than it did sort of the analyst role that I had started out with, in the way that I set it out. Dark mode everywhere.
0:06:40 CC: And yeah it just… This title of data analyst just stopped sort of making sense for me because it wasn’t really the thing that I was doing. I wasn’t really looking at the data that often and making decisions and figuring out why did cancellation rates increase last week. Instead I was making sure that if someone had that question, we had all the artifacts ready for them to be able to answer it. So I didn’t really know what my title was. I didn’t have this term and then this post got published about maybe six months after I left that role and I went “Ah! That’s the thing that I was doing it was analytics engineering.”
0:07:16 MK: And so yeah, I have to confess at Canva… This is how Claire and I have also come to know each. Canva is a very big user of DBT and I would say that at our company most analysts are hybrid of a data analyst and analytics engineer in that we do all of the T of ELT and I have to confess, I was there pre DBT and it has made life a lot easier in terms of that whole transformation piece but I was just kinda thinking back, the first time I really heard of analysts doing that transformation piece was actually a friend of mine Cohen, was working at Oneflare and it feels like Snowplow data really became that first piece where people had to start massaging the data a lot before it became usable.
0:08:08 MK: Is that something that you’ve seen in the US as well or is that just my weird view of the world that I’ve seen.
0:08:16 CC: No, I think that’s definitely related. I think this role sort of came out of the evolution in the technology stack. Previously, if you wanted to get data into your warehouse, someone was writing the code to get it there. If you’re using say Facebook ads, which a lot of companies use, someone was writing the Python code to hit the API and put it straight into your warehouse and they’d sort of maybe do the transformation along the way.
0:08:41 CC: They’d make sure that it was all clean and flatten it out, whatever needed to happen and they’d also spend time making sure that the warehouse was available and people could actually run queries against it and those two things have just really been automated by technology. So whether it’s using a cloud warehouse like Snowflake, Redshift, BigQuery, whether it’s using an of the shelf extract and load tool, Stitch, Fivetran just those pieces don’t really make sense anymore.
0:09:12 CC: The problem then though, is that if you have data in your warehouse that is exactly the spec of the API, which is what these extract and load tools do, they’re sort of source conformed. It’s all about “Well, how does Facebook name this field?” You end up giving it to end users in a format that they don’t really… Well they can’t really make sense of. Maybe the columns are really inconsistently named, maybe they need to have those two ad sources together and so then all these other problems of having access to raw data starts to crop up. People try to do their own analysis and end up with different numbers.
0:09:46 CC: You have to remember all the time oh, make sure to exclude user ID one, that’s the CEO, they’re not actually a real user. Make sure to filter out these deleted out records. All these sort of problems come up and the way that you can get ahead of those problems is to transform your data before you hand it to your stakeholders. So it is sort of that seeing the rise of Snowplow with your friend at Oneflare, definitely related. Sort of these tools where data was coming in, it just wasn’t quite ready yet. I think that we’re seeing that across the stack.
0:11:18 TW: Is the data lake to a data warehouse world analogous to an analytics engineer saying “I’m getting the data from some intermediate raw form and into a transformed spot.”
0:11:30 CC: Yeah absolutely.
0:11:31 TW: Okay.
0:11:31 CC: I think. I’m definitely in the group of people that thinks data lake and maybe thinks data swamp every so often.
0:11:40 CC: Hopefully no one takes offense at that but yeah it’s those same problems just different terminology around the tools we use.
0:11:48 TW: Well ’cause the business seems to assume that like, oh, once you have the data in the data swamp… I mean I’ve got a client, big pharma client that it’s infuriating because they told us for over the course of a year, they’re like “Oh that data is going into the… ” They would just refer to it as the hub or they’d call it Azure or Azure, I’m still not sure how that’s pronounced and the reality is, is there’s a data lake and kind of a mart world there and it was the lack of precision. Like they would say “Oh that data is going into this.” And it’s like “Yeah but there’s access, documentation, cleaning, getting it to somewhere we can use it.” So it still seems like there’s still some kind of naivete on the business side that oh, it’s just data is in one spot, you move it to another spot and now you can visualize slice and work on it and kind of miss that whole I guess the T, basically to get it into a repeatably usable format, maybe?
0:12:47 CC: Yeah. I agree with that.
0:12:49 MK: Sometimes he rambles.
0:12:54 TW: I’m the most not clear on this role. I guess the role in the space is one that I’m not coming across in my day-to-day life but at the same time, I’ve got R scripts written where I take and pull data out of somewhere and then my script does a bunch of stuff and yeah, I’ve got version control on it. I’m checking it into through get but that’s servicing just me. This feels like well yeah, how do I get three other analysts to use that same cleansed data? Well, they get the repo and run the same script? That’s not right. So yeah. I’m continuing to ramble. You wanna say “And there he goes again.” Yeah.
0:13:37 MK: So Claire I just wanted to ask, at my previous company, data engineers were responsible for this and I’ve talked previously… This is something I always discuss internally about whether we have the balance right. What are some of the pros of having an analytics engineer I guess look after the T versus a data engineer?
0:14:00 CC: Pros and cons, I’d say to both. Again, part of this is the tooling around enabling this move. So previously, you really did need a data engineer to do these things ’cause you had to use Python to transform data. You maybe had to use Spark who do all these things that are like… I’m not gonna pretend that I know in deep detail but the reality is that the warehouse, if you are using a modern cloud warehouse, you can do all of it in SQL and it’s usually as performant and much, much more accessible. So the pros of having this person in the middle, an analyst who has SQL experience doing this work is that it just makes it so much easier to make changes.
0:14:45 CC: I’ve worked with… So earlier in my role, I was doing consulting, I’m actually not doing that at the moment but I worked with a client where everything was in Python and when anything broke, it was just so hard to figure out what was going on. That observability wasn’t there because it was just like so many layers removed from the final artifact.
0:15:03 CC: Data engineers are also incredibly expensive so putting that work on them, I don’t know, maybe it’s not the best use of resources but I also think as a data engineer, if you’re the person that’s in charge of the transformation, it just becomes a blocker to people who need to get work done. New data comes into the warehouse and if you can’t do anything with it until someone else has worked on it, it can really limit how quickly and how agile… So how quickly you can work on your data and how agile you can be.
0:15:35 TW: Don’t data engineers also… I think of separate from it, was you’re describing an analytics engineer, if I’m actually trying to push a model out, like if the analytics engineer, the analyst, has built something when it comes to a performance and low latency on a model that’s pushed into production, that could be generated by a data scientist or someone, doesn’t that sort of fall squarely in to the data engineer to figure out how to make that happen?
0:16:06 CC: Before we go into that, what do you mean by model?
0:16:10 TW: Well so if there’s machine learning and there’s like we’ve pulled stuff off and now we’ve done some fancy gradient boosting classification, whatever it is and now we want to put that model into production so that we are personalizing a user experience with super low latency. So now we’ve got something that we’ve proven has a degree of predictive capability that’s useful, that we can act on but it’s only gonna be useful if it is running and put out into the world for making decisions.
0:16:45 CC: Yeah. I think that falls into data engineering world or machine learning engineering even is another title I see quite frequently. That’s not sort of where my experience is in terms of building predictive models. So I don’t want to come across as an expert on a thing that I’m not an expert on but I think that is the kind of the area that falls into data engineering. I think the analogous part in analytics is… Let’s say you built this huge transformation in SQL and an analyst had built it. There is a risk that an analyst won’t really do the right performance optimization on that model.
0:17:24 CC: There’s a lot to learn, to learn how to work with your warehouse as opposed to against it, what performance SQL looks like, how to break up your code into separate pieces so that it’s easier to debug and those sorts of things. So that is where things get really blurry if you’ve got analyst data engineer and someone’s gotta write the transformation code, if it’s… Those are your only two choices and it’s an analyst, there is a risk there. I’m not gonna say that every analyst is bad at writing SQL. I think that’s a really bad way to see the world.
0:17:57 CC: Many analysts are fantastic at writing SQL but it’s about understanding how do I work with the data warehouse that I have to make sure that things run quickly. Those principles often, we see more on the data engineering side. So I think that’s one of the trade-offs that comes into play when you’re thinking about who does this work?
0:18:18 MK: Yeah and I think particularly as that’s kind of a teething problem that we’re starting to hit as well, where we built stuff, not a fair while ago but our data set grows so quickly every single day that we’re now having to go back and really optimize things and it really becomes this partnership between those in our team who are really on the analytics engineering side and our data engineers to be like “Cool. Let’s all sit together. We need to figure out how to optimize this because the sheer scale of data is getting so big that if we don’t optimize it, the run times just take… “
0:18:57 MK: You get to the point where your data’s out of date because the run time takes so long and that’s the whole reason you don’t wanna go down this path. So I feel like optimization and efficiency of queries becomes a really big issue. Just out of curiosity, if you’re an analyst or sorry, an analytics engineer, who’s starting to face this, how would you tackle it?
0:19:23 CC: Know some really good data engineers and ask for help, I think. There is… The word you said there was a partnership and I think that’s a thing that’s so valuable is having data engineers. If you’re on a larger team, data engineers who can maybe look at your query and provide feedback on how it could be more performant or start to break it down into smaller steps.
0:19:46 CC: Myself, when I had to learn these things, I was a one-person data team. I did rely a lot on the community that I now manage and was just asking people in the data community like “What is going on? How do I… Why isn’t this query working?” or “Why is it taking so long?” And yeah, learning from those in my network as well. It’s a pretty hard thing to learn from first principles, I think.
0:20:11 TW: This is so funny, my son is taking a performance… It’s a coding performance class right now and he literally dropped another class ’cause he’d rather geek out on this one class that much more and he’s like… It’s kinda funny, last night, he was commenting on… They’re doing a rotating a matrix and I think they’re using Silk or something and he was like “I got it down from 2100 milliseconds to 39 milliseconds and I just did this thing.” He was trying to explain it to me and I was like “Yeah this is like talking to a data scientist. I wanna follow what you’re saying, kid but I’m becoming a grandparent apparently.”
0:20:53 MK: Oh seriously.
0:20:55 TW: But that idea of performance… Because it does seem like that progression of the analyst to… Initially, I run it and I say “Yeah, it runs for a minute.” That’s enough time to go grab a cup of coffee. Well now my data set keeps growing. Well now it’s five minutes, I still don’t really care but now, it’s starting to kick the fan up on my laptop and oh other people want it and now they wanna do something and now it’s growing over time to where you can… An analyst plots out and says “Well this is gonna be taking an hour and a half.”
0:21:29 TW: It does seem like a different skill set. I worry a little bit about the analyst… I used to worry about this with digital analytics, that it’s alluring to get sucked into the “I’m gonna capture new data.” or “I’m gonna prep the data and I’m gonna assume… ” And this happens with the Adobe analytics world, there are definitely people out there who are like “We’re gonna do all this cool data collection and then we’ll just assume that somebody’s gonna work with it.” and then you realize that you’re entire analyst team is so busy collecting and organizing the data and just assuming and hoping that now somebody’s gonna get use from it.
0:22:08 TW: I don’t know. The whole analytics engineer role. As you were describing, you’re rowing into that and you said “I wasn’t really doing the analysis.” But that feels like this kind of… It’s a nice career progression from getting to the data is in a better spot but is there a risk for organizations that “Oh crap! We have all our analysts doing stuff with the data that is getting it cleaner and better and transformed but we’ve now understaffed and we can’t rely on our… We’re not getting it to a point where the business can just get into a BI tool and get actionable insights from it.”
0:22:50 CC: Yeah. I think that’s definitely a risk and I’ve seen some teams go in that direction. Personally, when I’m working with data doing analytics engineering, I’m working with the stakeholder first and understanding “What are they trying to get out of the data, what questions are they asking, what’s the business question, what’s the data question, do those two actually match up, do we need to be asking different data questions?” And working backwards from there to figure out what data do I need and then fill it in with what data do I have.
0:23:20 S1: I have definitely seen teams start to go sometimes in the other direction, where they start out with their raw data and then they think “Oh what is the final, absolutely modeled data mark that I need to build?” And that’s kind of a little scary to me ’cause I think you could end up building the wrong thing or building something that’s not useful or just putting… Investing a lot of time in something that…
0:23:42 TW: That literally used… 10 years ago was the Data Warehouse Institute did a study that said “Why do data warehouses fail?” And it’s because that was their approach. “We’re gonna build everything in this glorious data warehouse and it’s gonna be amazing.” and they just weren’t incremental enough and they weren’t focused on what’s the specific questions.
0:24:01 TW: So it gets back to the same thing that that data analyst was doing, which is don’t just say… If somebody says “Can I get X?” and think “Let me go solution and give X.” it still goes back to that “Hold on, let’s figure out what we’re really trying to do so that we can provide a better solution, think through the whole… All the nuts and bolts.”
0:24:25 MK: I really do like the idea of also Claire, you mentioned, really thinking through “What’s the final output gonna be, what’s the dashboard that we want to be at the end of this?” Our team has gotten into the practice now of wire framing a dashboard before we even start working on a model or a report table because then, the stakeholders are like “This is what the output is gonna look like and how I’m gonna use it.” and you can really marry up those two things.
0:24:51 MK: But I think Tim does raise a really interesting point and this is why I love this topic so much. It’s like, the analytics engineering is a lot of heavy lifting and it is really time-consuming and how… I mentioned earlier, our team’s dual hat so they do the analysis as well, is like “How do you get that balance right of making sure you’re doing that model and report building but also having the time to do the analysis?” and I feel like we’re just starting to hit our stride with that but I think it must be a problem that lots of teams come up against.
0:25:25 CC: I think you get the balance right by getting it wrong and adjusting from there.
0:25:30 CC: I think sort of at one extreme of things of if you are only ever working with a stakeholder and answering that business question, you’d end up building one process to answer that one question. Every time you need to answer a new question, it’s a totally new transformation or analysis that you need to do.
0:25:49 CC: On the other side of things it’s trying to answer everything from the outset and that’s also sort of those risks we were discussing earlier of investing too much in the infrastructure and then not delivering value and so I think it’s a bit of balance. It’s picking a few things to start out with, a few report that needs to be automated and currently someone’s downloading it to Excel and doing a lot of vlookups and figuring out how can I make that process easier and then building in like these extra… I know long queries. Like doing it sort of one half at the time but keeping an eye on how does the big picture fit together? It’s a really hard skill to master and I think it is just one of those ones that you master over time, balancing those two things.
0:26:36 MH: Yeah, I’m having flashbacks right now to the early days with Web Trends software where you literally had to pre-plan any report because the database was not relational unless you related the dimensions prior to analyzing any data. So it’s sort of that same thing, is sort of like if you don’t get one, then you’re like “Oh, we need this other report.” suddenly you’ve gotta go back and rebuild it and re-analyze all your log files back in those days but in the same way, it’s sort of like “Yeah, we’ve got all this data. Okay how do we want this data to come out?” It’s very hard to predict exactly how you want it to come out ahead of time so it’s a tough challenge.
0:27:19 MH: I kind of think there’s something I’m noticing, as I read through your blog post about the analytics engineer and we’re touching on it a little bit so maybe it’s just a little bit of a different tack but it seems like not just the technical side of it but the analytics engineer also seems to need to have a pretty significant amount of business domain knowledge to be able to help drive the business need for the data into that place where they’re kind of sitting in the middle. Can you talk a little bit about what you think is required there in terms of what an analytics engineer would need to do?
0:27:53 CC: Yeah. That’s a really good question. So how much business knowledge does an analytics engineer need? Is that…
0:28:00 MH: Well, yeah or just talk to like domain expertise is sort of one hand, it seems like and then the technical expertise is maybe the other and kind of bringing those together seems like where the analytics engineer sits so maybe the relative importance, those kinds of things.
0:28:15 TW: Well, it seems like as you were talking about the needing to probe for what the business… Like the whole active users and re-ups, like I’m hearing how does that… You’ve gotta be able to engage with the business and understand the business to actually do the necessary probing I think, right?
0:28:34 CC: Yeah. I definitely agree with that. It’s… Yeah there’s so many different skills to learn in this role. It’s technical skills and business knowledge and then also I think sort of a separate kind of domain knowledge which is things like product analytics and marketing analytics and financial analytics and all those lots of different things. I think as an analytics engineer, I feel pretty confident that I could walk into another organization and be hopefully pretty good at my role. I think you get set up really well with this technical base that then means like learning the business things on top of that… Those things are so much easier to learn as opposed to having to…
0:29:17 TW: Well, actually but as you’re framing it, is it because if you’ve got the… I wanna ask a question about a talent and like what a shitty analytics engineer looks like but if you’ve got the analytics engineer kind of skills and ability then when you’re engaging with the business, say at a new client or new employer, you’re thinking through kind of… Aren’t you approaching it from a data perspective to be asking questions about “I need to ask questions about the business because I’m playing forward in my head two or three or four or ten steps of what that actually means for the data and for analysis.” so…
0:30:01 CC: One thing that does come to mind on this topic is that the best analytics engineers that I know are people that haven’t come from an engineering background. Hopefully none of my friends from that background hear this and are upset by that but honestly, the best analytics engineers are those that were from an analyst background, from an operations background, is another really common one or a finance background and so they actually had that business context of what does this number mean and why is it important? And they ended up falling in those analytics engineer roles ’cause they got curious about data or they got frustrated with the quality of the data that they were working with and just kept asking sort of why on the other end of the stack, more on the technical side of the stack and so I think that those business skills are super valuable and key to being a really good analytics engineer because yeah, if you’re not asking the right questions, you’ll just end up building things that aren’t valuable and that’s a way to be a pretty bad analytics engineer, I guess.
0:31:01 MK: But so where’s the balance? I was basically working where yeah, I was a solo analyst and I was doing a lot of the analytics engineering I guess, prepping what we needed and I remember calling my sister and she’s like “What the hell are you doing? You’re building for two years time. Like this company is probably gonna look so different in two years time. Like what do they need for the next six months? And build that.” How do we get that balance right? Because I feel like that is one of the trickiest parts when you delve into this analytics engineering world, like you can keep optimizing and optimizing and trying to think of every single use case that this is gonna answer or do you really hone it down to like it needs to be used for this one case and that’s what I’m building for.
0:31:50 CC: That’s such a good question. I think that idea of like “What is the timeline you’re building for?” is such a valuable question to ask yourself as an organization. I know that in my last few months at Airtasker, we had Dan Gooding come in as the data lead, absolutely fantastic and we had this big session where we were sort of saying “We’re gonna build the infrastructure that’s gonna last us for two years.” which was like a total shift and a sign of maturity for that organization but I think asking that question of your own organization is really valuable because it helps you understand what is the trade-off that I should be making? Like which way should I be biasing? ‘Cause there is like build the thing that does the thing once or build the version that is innately flexible but then much more complicated to debug if something goes wrong ’cause maybe you’ve got extra code happening there.
0:32:42 CC: So from my perspective, it’s always start off with the one that solves the single use case and then re-factor it so that it can solve maybe a few more use cases. A really concrete example, one that I’ve actually been working on recently, I’m doing some pro bono work at the moment with a non-profit and they have a lot of survey data and this survey data is structured in a very particular way where like boolean responses live in one table, string responses live in another, date responses live in another and they have different surveys.
0:33:17 CC: Like every so often, someone creates a new survey and then the person on the other end of the survey wants the flat, the wide table of survey responses. They’re using it to inform who they’re calling up to tell them to register to vote and so this is an example where they’ve sort of built the first version just to solve the one survey and they got it to work and we re-factored it so it was pretty tidy but it wasn’t doing anything more than it needed to do. It was sort of setting it… Like the way we wrote the code, I’m so passionate about SQL star, like using a lot of CTEs and naming things really well and being intentional about those choices meant that then when they came back around and said “Okay, we have a new survey. Can we make this do two things?” And we went “Okay. We’re gonna do a bit of copy and paste for this one but now when we are at like 10 surveys, then we can start to look at, is it worth making this code really flexible, maybe generating the code, using extra things outside of SQL?”
0:34:20 CC: But we really started off with the simple version. I myself was absolutely guilty of over-refactoring. I don’t know if that’s a term but like getting too clever with my code and being like “Look at all cool things I can do.” Like I can write a UDF to date trunk to a week but I could just also use…
0:34:41 TW: Pro bono, you can do whatever you want.
0:34:44 MH: That’s right. [laughter]
0:34:45 TW: To stimulates your intellect for providing that.
0:34:49 CC: Yeah. So I think like I personally, maybe a year ago, around that sort of time, I was working too far on the side of like “Let me show you how clever I’m gonna be and how I can automate all of this.” and it ends up with something that’s harder for your end user to understand so it’s sort of that over going too far and then pulling it back and re-correcting it is the process that I…
0:35:14 MK: I’ve got to ask this. So with DBT and I think it’s pronounced Jinja but it’s with a J, which is basically like a templating language that you can throw in your SQL to get it to do a whole bunch of stuff that it wouldn’t normally be able to do. Like I look at our code base now and if you were a straight SQL person, I think it’s a point where you’re doing things that are so fancy, it’s probably not gonna become readable to someone new. What’s the balance there?
0:35:44 CC: Yeah. I think that’s a thing that I’ve seen happen in a lot of organizations. For those who aren’t familiar with Jinja, it’s a templating language. Think like if you were to send a marketing email and if you ever like accidentally received an email that says like “Hi, ‘first name’ in two curly brackets” and you’re like “Uh, something went wrong here.” So it’s that templating language and that templating language, using it with SQL means that you can do things like generate the correct SQL for the survey ID, Survey ID 42 even though that SQL actually needs to look different each time but you can use like templates and things like that and then I think that is an example where like people can go, can get too clever with things and it’s… Yeah, it’s finding the balance, figuring out what’s right for your organization.
0:36:32 TW: I would take the people who are too clever, as long as they’re aware that they’re being a little too clever and might have recognized they’ve gone down that and made something that’s not necessarily extensible ’cause the ray of hope I’m getting that we won’t have a bunch of dunderheads calling themselves analytics engineers is that there is… I look at what many analysts that I have come across in my professional life who their spreadsheets are shit and they have their one-time case of I’ve got to produce this report and they put zero investment to you know, you’re gonna have to produce that again a week later and they put nothing into it and so the next week, they spend 80% as long as they spent the first time producing it and they will go on down this…
0:37:22 TW: So it’s like the opposite Moe, from the high growth our data exploding, take the big ass enterprise that is not moving quickly and you take an analyst who’s gonna say “Well, I’m gonna export this from this system and then I’m going to import it into Excel.” and even there, they don’t get it down to “How can I streamline the crap out of this?” So to me, there is kind of a self… There’s got to be a self-selection where somebody who is working in spreadsheets and is saying “Okay, this is getting ridiculous, my formulas are way too long. Oh crap I’m writing VBA and I know that that’s fragile.” and they’re the ones who are gonna be drawn, ’cause it does feel like there has to be some talent, some ability to say… To recognize there’s a continuum between down and dirty fastest to produce and getting too clever but essentially infinitely extensible. I think there are plenty of people out there who are data analysts who don’t even internalize and understand that continuum.
0:38:26 CC: Yeah.
0:38:27 TW: And they are so destructive and they will wind up filling their days doing the same damn thing over and over again and then pissing and moaning about I’m just doing reporting ’cause they have no ability to make anything efficient and rant complete. I have earned us the explicit tag for this episode.
0:38:42 MH: Yeah. Great job.
0:38:46 TW: So I don’t know. The way you are framing it, to me sounds like the… Anybody who’s listening and saying, that makes so much sense. Okay, analytics engineer, possibly of a career progression for you. People saying, what the hell are they talking about? I was like, uh! I feel like you… Our audience is all phenomenal. I might have had current and past clients who caused my blood pressure to go up a bit but I love them all.
0:39:16 CC: Yeah. I think… No I think you talked a real good point, like it is the people who are frustrated with having to repeat processes who find themselves drawn to this side of things. Like why do I have to write a query every single time? Why do I have to run this query? Like why can’t it just be a thing that I reuse again and again as opposed to having to re-do it? People who are asking those questions, I think are gonna benefit a lot from analytics engineering. People… I mean I think this is true in life in general, anyone who’s scared of automating, having parts of their role replaced by tooling or anything like that, maybe I shouldn’t put it this way.
0:39:57 CC: Let me rephrase this, I think it’s that curiosity of how can I make this process easier and less manual that drives a lot of people towards analytics engineering. For anyone that does enjoy repetitive tasks, maybe I have fewer words to say.
0:40:18 MH: Have you looked into the world of quality assurance?
0:40:22 CC: I have but we, we automate it, yeah.
0:40:25 MH: No I’m saying for people who enjoy repetitive tasks.
0:40:28 CC: Yeah.
0:40:28 TW: No the QA, you should be automating your QA too, hello Selenium, you know?
0:40:33 MH: Right. But I find that is less compelling. The sort of the… This discussion or intersection. The other thing that’s really sticking out to me and Claire, I’m not sure if you’re familiar with this concept as it’s relatively new in our space is this role of the analytics translator and…
0:40:53 CC: Cool.
0:40:53 TW: Oh my God! You’re gonna…
0:40:54 MH: Tim… It’s funny Tim that you’d have that reaction but then it’s literally like analogous to the analytics engineer in so many different ways.
0:41:02 TW: No, that is a complete… That is a bullshit cop out person.
0:41:06 MK: I totally… Yeah don’t agree.
0:41:10 MH: It’s not the same but it is actually sort of sits in between these two functions.
0:41:17 TW: No, it is a lazy fuck who wants to say I don’t wanna actually have analytics talent but I’m intrigued by this idea of analytic. No.
0:41:27 MH: So when Gary Angel brought that forward to e-metrics two years ago, you had a negative reaction to him saying that.
0:41:35 TW: I did because I…
0:41:36 MH: Uh-huh! Sure you did.
0:41:37 CC: Okay, you gotta catch me up. What’s…
0:41:39 MH: No.
0:41:39 CC: What’s the analytics translator?
0:41:41 MH: Don’t worry about it but it’s interesting because I find that it’s sort of another role that’s being described as sort of this hybrid between these two things and so that’s why it kind of…
0:41:53 TW: No. Go ahead and define it because then we can see, Claire can be like, oh yeah!
0:41:57 MH: No, the analytics translator is basically someone who sits in between the business and the analytics function within a company. That’s it.
0:42:06 MK: So we had this guy who came to me and basically was like “Cool, I wanna work in analytics.” and I was like “Great so you should probably learn SQL or R or Python, all the languages that everyone in our team use.” and he was kind of like “Well I don’t really wanna do that.” And I’m like “Great. So how are you gonna get data?” and he was like “Well, someone else would pull it for me.” which I mean come… Like… What…
0:42:29 TW: Yeah. I had the same example with somebody who said that the same thing.
0:42:33 MH: So Moe you should… Moe hold on, hold on. You need to distinguish between someone who’s just lazy versus someone who actually performs a function.
0:42:42 MK: Oh no, apparently this is called strategic analysis. That’s what… This was the term that he used to describe the role.
0:42:50 MH: That’s fine but you can’t do analysis without understanding of the underlying data or the ability to even go get your own data.
0:42:57 TW: I had the exact same thing with a former co-worker who came to me at a meet-up and said like “Oh I found this… “
0:43:03 MH: Claire, we’ll be back with you in a minute here. [chuckle]
0:43:05 TW: Yeah.
0:43:06 CC: Oh it’s great. I’m enjoying it.
0:43:08 TW: It was the exact thing. That is what a… We need more data literacy amongst our business users because…
0:43:15 MH: Bullshit will be persistent, we’ll always face that. Okay. Let’s move off of this. It’s interesting to me to see these things as patterns that sort of emerge. Okay, there’s one other topic that I kinda wanna delve into which Claire, I think you can do this and Moe and Tim, I’d be interested in your perceptions as well but…
0:43:37 TW: Well, you’re wrong.
0:43:38 MH: Actually, I’m all…
0:43:39 TW: Oh sorry, we’re moving on.
0:43:40 MH: I’m always right Tim and that’s the weird thing about it so you just pick an area.
0:43:48 CC: It must be so wonderful.
0:43:49 MH: Well, it works. It’s been working out pretty well for me so far. No so one thing that I’ve been sort of feeling about analytics and of course, we come from the digital analytics space so it’s sort of like the Google Analytics and Adobe Analytics are our bread and butter a lot of times but our space is getting a lot more complicated so if you look at of where analytics is going so well, let’s call it the modern analytic stack or whatever you wanna call it so that is changing how people are having to interact with this and I think to your point Claire, this is where the analytics engineer basically is coming from, is filling in where this, we’re now modularizing the entire analytics stack and there’s all these new roles that have to be fulfilled.
0:44:35 MH: Does anybody find it stressful like that if this is happening? ’cause I do. Like our challenges are still fundamentally the same but the technical aspects of it have grown in complexity dramatically because some of the things that were out of our hands before, like the processing of data and how that data was gonna turn into reports was sort of managed by a tool that had positives and negatives, a lot of negatives actually but anyways, I’m just curious what people’s response to that is.
0:45:06 CC: I think there is so many different aspects of being a data practitioner now, just someone on a data team, there’s so many different things that you have to have the context of to be a really effective practitioner. Moe, I was speaking with your coworker Vincie about a project she’d been working on to pull mobile screen views into your sessionization model and that conversation was like so many different pieces of the stack. It was like working with your growth marketers, working with data engineers, working with frontend engineers, all these different things. I think it is a really challenging aspect of being someone in data right now. The thing that is good at the moment is that, a lot of the easier tasks are being automated.
0:45:54 CC: So I had never written code, my own code to put data from an API into Snowflake prior to about a month ago when I wrote my first ever singer tap because Stitch Fivetran was just always doing it for me and I understood how it worked, I knew the context of it but I’d never done it myself and you can get pretty far with off-the-shelf tooling as long as you understand what that tool does and what it’s best at and how to use it the right way.
0:46:24 CC: So you can end up glueing a lot of pieces together and getting pretty far but the hard part is I think the context of knowing all these different aspects.
0:46:35 MK: And I have also been meaning to do a shout-out to Vincie who gave me some really interesting questions to add to the show. She is the DBT guru, one of the gurus in our team so I do wanna do a quick shout-out to her and thank you to her for taking on that screen view project ’cause that would have been the death of me. [chuckle]
0:46:54 TW: Doesn’t it come down to an organization that is willing to be creative and flexible with what are the kind of pieces and parts from a skills and talent that we need to put together and then I think it goes to, don’t hire anybody as a data practitioner who isn’t frustrated that there are not enough hours in the day for them to learn it all. I think that’s what’s toxic is having somebody who wants their role to be static and I feel like we’ve talked about this as like, if you’re in the world of data these days, you better be taking step functions forward and what you understand every year or you are gonna get left behind and so it seems like those two need to come together and yeah, the reorg may need to happen periodically because, you know what?
0:47:47 TW: We need like three analytics engineers. We had zero last year. What’s getting us on a path to do that? Who internally can move into that role? What do we need to hire? So I think it just… There needs to be a recognition that it needs to be dynamic, both the talent and the organization and I feel like I’m seeing a lot of organizations that don’t have one of those other pieces and are struggling.
0:48:09 MH: Well and I think the other thing is, people have got to take a step back on strategy and Claire, I think you made that point, we were talking about how you had an analytic that would come in and said “Hey, we’re gonna build the infrastructure that’s gonna last us for the next two years. Let’s take a step back and really think through it.” That’s the other thing is because there’s this constant iteration happening.
0:48:29 MH: This may or may not be a conversation that I just had today with a developer in BigQuery which is “Hey, I think there’s some issues with definitions across this team and defining sources of truth.” And they’re like “It’s so annoying.” I’m like “That’s literally every company for all time.” [chuckle] It’s how it always is and it’s sort of like “Well then how do we get around those things?” And strategy becomes so crucial at those moments. Anyways it’s just…
0:48:57 TW: What? And governance, right? It seems like even in the analytics engineering world, you can’t just be marching off down your own path and not having a mechanism for documenting and getting those definitions figured out and recorded and aligned on, right?
0:49:13 MK: I have to say that’s one of the things that I’ve really loved about our move. I mean to be honest, our team was always really good at it but the thing that I’ve loved the most, well when I review any code, the thing that I’m probably the toughest on is the documentation because people are so in their zone about their area that they know what a particular field means and I’m like “Yeah but you’ve got 25 other analysts now who might need to pick up this table and use it and if you use the word session in the help center, which has a different definition to what session means in marketing, for example or even like a location code or something like that, you need to be really explicit so everyone is speaking the same language and knows the differences for how different terms are used.” I feel like the documentation, that’s the thing that I probably love the most about analysts doing more of this analytics engineering, is they actually do know what that field means probably better than anyone else in the business.
0:50:13 CC: Yeah. I remember at my last role our API engineering lead was like “Yeah, we’ve got a pretty complex API, it has like 150 tables in the backend. People specialize on different parts of the backend, no one really knows what’s in all those tables. Oh except for the data team, they know what’s in every single table which I think is very true and in your history when you’re… When analysts are working with data, they just know so much about what this field means and it’s crazy to me when that information lives in their head as opposed to written down in a YAML file that someone else can access and review and edit and all those things.
0:50:56 CC: It’s just… It’s a really quick; if those things live in your head, it’s a really quick way to make sure that you never progress in your role because every time someone has a question, they’re going to be coming to you to answer it as opposed to checking the documentation.
0:51:10 MK: The YAML files do make it so easy.
0:51:12 TW: It goes back. Where can I write this down so that I can point people to this so that I can move on in my career? It just goes back to the same…
0:51:18 CC: Yeah.
0:51:20 MH: Okay. Well, we have to start to wrap up and this conversation is really good.
0:51:25 CC: There’s so many things we haven’t talked about.
0:51:29 MH: No, I know. I know. It’s always the way it is with the good episodes. All right. Well one thing we do love to do is go around and share a last call. So Claire, you’re our guest, do you have a last call you’d like to share?
0:51:42 CC: Yeah. So my last call is, we have our inaugural user conference coming up. It’s December 7th-11th it’s called Coalesce, which hopefully most of your audience appreciate that name. It’s four days, each day is in a different time zone, very fortunate to have Canva as one of our online hosts, all online. We’re really excited, we’re only a little bit nervous but I’d love to see some faces from this call and the audience there as well.
0:52:12 S1: Outstanding. Yeah. I think it’ll be an excellent event and it sounds like there’s gonna be a lot of great content shared which I’m already on the record as telling people, if you’re not part of the DBT community, Slack community, you should probably get in there. It’s like 7000 people or more now so…
0:52:31 CC: I think we just hit 7000.
0:52:34 MH: Yeah so…
0:52:34 CC: By the time this airs, you may not be as easy to get in by…
0:52:38 MH: It’ll probably be 8000 by the time. After we just talk about it, we’re gonna get a huge bump in… Alright. Moe, what about you? Do you have a last call you’d like to share?
0:52:49 MK: Yeah, I just want to do a shout out had actually to Claire and the team at Fishtown Analytics, their blog and the DBT blog, I mean I look it at it at least every week. It’s an absolute lifesaver and I actually find if you are starting to explore DBT, it just is so… I sometimes struggle when I go through other people’s blogs and particularly when it gets quite technical, I get completely lost and I need a human to explain it to me and I find that all of the articles that Claire and the team put together are so readable and I actually leave the site being like “Oh I know how to do this thing now.” so it’s been on my last call list forever but it seems like today is really the appropriate day to do a shout out.
0:53:33 CC: Oh thank you so much. I have a very lovely story about that blog if you wanna hear it very quickly.
0:53:38 MH: Sure.
0:53:39 TW: Let’s hear it.
0:53:40 CC: I read an article. You can find this message in DBT Slack, it’s from January 2017. I read an article on the Fishtown blog, I was an analyst in Sydney and I said that exact same thing. I said “Hey, I read this article and it made sense and everything else I had read on data, just had not made sense.” It’s the Startup Founder’s Guide to Analytics written by Tristan Handy, the CEO. Still relevant today and that was basically how I found my way into DBT Slack and DBT community and using the software and then two years later moved over to work for the company so pretty crazy way to be introduced to an organization but if it resonates…
0:54:21 MH: Very cool.
0:54:24 TW: Nice.
0:54:24 MH: Okay. Tim, what about you? What’s your last call?
0:54:28 TW: Well, I’ll do one that’s been on my list forever but I would like to say it’s just so appropriate for this topic. It’s about this role of an analytics translator.
0:54:37 TW: No.
0:54:39 TW: Everything after that was actually true. I totally just wiffed and I had screwed up the URL and so it was broken and I was like “Whenever this comes back up, I wanna last call it.” but it was totally a user error on my end but Jordan Peck, this was several months ago, wrote a post called “SQL versus R: Which to use for Data Analysis.” So it’s not trying to solve an R versus Python or SQL versus R versus Python, it just makes some… He does a very clean job of going through and saying, This is what works here, this is what works better here and his closing note was “Can you use a SQL database to perform analysis or use R to build a reliable data pipeline? Of course, you can just be weary of the pitfalls and the difficulties going into it.” which to me just seemed like this seemed like a good episode to throw that into because it does come down to… And even Claire in your article about analytics engineering, you talk about kind of the different… The whole tech stack, like we are in a point where we’re putting multiple pieces together, you can’t just learn Google Analytics, you can’t just learn Excel and you’ve gotta kinda understand the broader landscape and that even comes to the coding. So various people have written SQL, R, Python stuff but I thought Jordan’s was a nice little write up.
0:55:56 TW: What you got, Michael?
0:55:57 MH: Well, I’m glad you asked, Tim. I’m excited to announce my new YouTube channel, The Analytics Translator: Why it’s better than anything Tim… I’m just kidding. I’m joking. [laughter] Wouldn’t that be hilarious if that was what it was though? Okay. No so actually recently, I found someone new to follow on Twitter who works for Google, his name’s Lakshmanan. I’m probably mispronouncing his name but he is… He works with the data analytics and AI solutions on Google Cloud and he and his team write a lot of the training content and things like that and so I got into his…
0:56:38 MH: Kind of got introduced to him through some things that… An interaction on Twitter and then started diving into some of the Google Cloud training which I think is actually pretty cool and he also has a book which is currently in pre-order but it’s Machine Learning Design Patterns by O’Reilly so it looks like it might actually be a pretty good book to have for those who are gonna take some time to go beyond just analytics engineering and get into machine learning.
0:57:07 MH: All right. Well we know as you’ve been listening, you probably have tons of questions, lots of comments and we’d love to hear from you and the best way to do that is through the Measure Slack or the DBT community Slack, either one of those ’cause you can find definitely me and Claire on the DBT Slack and I’m sure by the time this goes live Moe, you will also be on there.
0:57:32 MH: Yeah. Okay good. Yeah.
0:57:34 MH: Moe actually knows how to use DBT, I don’t, I’m just part of the Slack community.
0:57:39 MK: Not well. Not well to be clear.
0:57:42 MH: I just think it’s hilarious because I believe very much in the idea of it so I’m in the community, I have no idea how to use the product so just every little bit helps. Anyways, we’d love to hear from you, you can also talk to us on Twitter or on our LinkedIn group and we’d love to share your comments and questions. Claire, do you have… Do you interact with people in other places besides the DBT community that you’d wanna shout out, like Twitter or anything like that?
0:58:09 MH: I do have Twitter. It’s ClaireBCarroll and yeah I’m on there pretty frequently, usually making terrible jokes so don’t expect any high quality content there.
0:58:20 MH: I like it. Memes and jokes. Those are the best kind. All right. Well no show would be complete without a shadow to our awesome producer, Josh Crowhurst, who does such an amazing job helping us deliver the show to you, the listener. So thank you Josh, from all of us.
0:58:39 TW: If this episode has sounded smooth and seamless throughout, just know that Josh did a heck of a job to get you to this point.
0:58:47 MH: There’s this new role emerging of the podcast translator. No, I’m just kidding. Okay so yeah, we do appreciate it. All right so I know that no matter the challenges of the modern analytics stack, no matter whether you started out as a developer or a Philosophy major who got into analytics through marketing, I know I speak for my two co-hosts, Moe and Tim, when I say keep analyzing.
0:59:19 Announcer: Thanks for listening and don’t forget to join the conversation on Twitter or in the measure Slack. We welcome your comments and questions. Visit us on the web at analyticshour.io or on Twitter @analyticshour.
0:59:33 Charles Barkley: So smart guys want to fit in so they made up a term called analytics. Analytics don’t work.
0:59:40 Thom Hammerschmidt: Analytics. Oh my God! What the fuck does that even mean?
0:59:47 MH: Tim, did you even cover the logistics?
0:59:53 TW: I really feel like that’d be the responsibility of the moderator. If you just wrote the intro before 6 PM.
1:00:00 MH: I wrote part of it before but I didn’t write it on the actual document.
1:00:03 MK: Uh-huh.
1:00:04 TW: No, that is when you say “Oh, I’ve been thinking of it. I’ve been thinking about it for weeks. It’s been up here.”
1:00:10 MK: It’s been percolating.
1:00:10 MH: Rrrrrr… Don’t mess with my process.
1:00:16 MK: Can you say that we have two people with very different work styles here? It frequently blows up in our pretty little faces.
1:00:23 MH: Maybe even three people with different work styles, Moe.
1:00:26 MK: Well I like to think I’m the happy medium.
1:00:29 TW: Where have the most fireworks occurred? Whose fireworks have been…
1:00:32 MH: Okay so let me work quickly is to talk a little bit about some logistics players, since no one else is willing to.
1:00:42 MH: Well just America is…
1:00:45 MK: On the brink of annihilation?
1:00:51 MH: You should redo it. That can’t be the rock by the needle.
1:00:54 TW: That can’t be the rock by the needle?
1:01:00 TW: Uh… Uh…
1:01:01 MK: You’re struggling. I’ve never seen you struggle.
1:01:06 MH: No it can’t be… It’s terrible.
1:01:10 TW: Rock, flag and transformation.
1:01:18 MH: There you go.