#115: Build vs. Buy and the Tradeoffs Therein with June Dershewitz

A simple recipe for a delicious analytics platform: combine 3 cups of data schema with a pinch of JavaScript in a large pot of cloud storage. Bake in the deployment oven for a couple of months, and savory insights will emerge. Right? Why does this recipe have both 5-star and 1-star ratings?! On this episode, long-standing digital analytics maven June Dershewitz, Director of Analytics at Twitch, drops by the podcast’s analytics kitchen to discuss the relative merits of building versus buying an analytics platform. Or, of course, doing something in between!

The episode was originally 3.5 hours long, but we edited out most of Michael’s tangents into gaming geekdown, which brought the run-time down to a more normal length.

Links to Items Referenced in the Show

Episode Transcript

[music]

00:04 (Announcer): Welcome to The Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest, discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website, analyticshour.io. And now, The Digital Analytics Power Hour.

[music]

00:27 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is episode 115. Analytics ecosystems are no longer the domain of just one technology and frankly, it hasn’t been that way for a very long time. A long time ago, you could just point your server logs at your AW stats and call it a day. Okay, that was a really long, long time ago but anyway, these days the need for quality data collection, data storage and manipulation and finally, data analysis and presentation is serious business.

01:03 MH: There’s many vendors in the space looking to solve all your problems and there’s plenty of smart developers and cloud storage and processing is really cheap so what should a company do? Will an off-the-shelf tool give you enough flexibility? Or will a tool that you build saddle you with insurmountable technical debt? We’re gonna talk about it. Hey Moe, you’ve seen a lot of different options. Do you have a preference?

01:27 Moe Kiss: Yeah, little from column A, little from column B.

[laughter]

01:30 Tim Wilson: Okay. One of those millennials I’ve been hearing about.

01:34 MK: I think I’m actually a Xennial.

01:36 TW: She’s a Xennial. I believe we can just…

01:39 MH: A Xennial. Wow!

01:41 TW: I need to go look that up, that’s how old I am.

01:43 TW: It’s funny, you’re old enough that when we’ve talked about this in the past and we gave her shit about it…

01:47 MH: I probably didn’t know what it was. Yeah.

01:50 TW: It’s like a three-year window.

01:52 MH: Anyways, Tim, if you could write R in Google Analytics would that make you happy enough to not have to ever build your own solution?

02:00 TW: Probably not, ’cause I don’t think I could ever be happy.

02:02 MH: Okay.

[laughter]

02:03 MH: And of course I’m Michael and for better or worse, I think I’ve tended to most to be around companies that have bought technologies but it doesn’t always mean it’s right. Okay, so we needed a guest and someone who’s walked a mile or three in both sides, both shoes. Someone who’s had to make some tough choices and could shine a light on some of the details we might be missing.

02:23 MH: June Dershewitz is the Director of Analytics at Twitch. She’s also held senior analytics roles at Apollo Education Group and Symphonic. She literally invented Web Analytics Wednesday, so basically Tim Wilson owes all of his fame to her and she has served on the Board of the Digital Analytics Association. I have already gone on record on this podcast as a huge fan of her work and thought leadership. Welcome to the show, June.

02:48 June Dershewitz: It is great to be here, thanks so much.

02:50 MH: Awesome. So June, I know we’re gonna talk about whether to build analytics ecosystems or buy or what kind of mixing could be the best solution and why but actually, I wanna go back and hear about how Web Analytics Wednesday started ’cause that’s huge for the industry.

[laughter]

03:09 JD: Yeah, it isn’t a story I get to tell too often but I’m really excited to get to share it here. It goes all the way back to 2004 or 2005. In that period of time, I was looking around, looking for other people who were doing something that looked like what I was doing, which was web analytics. I felt really alone. Where are these people? And where I found them was this Yahoo Group for Web Analytics. So I went, “Hey…

03:33 TW: A Web Analytics Yahoo Group.

03:34 JD: You know it. Okay, so…

03:35 MH: I remember.

03:36 TW: Oh, I was… Yeah.

03:36 JD: I guess we were all there, right? So I joined this Yahoo group.

03:40 MK: I was definitely not there.

03:42 JD: No, I think it’s before you were born. [chuckle]

03:45 TW: There were no Xennials there.

03:46 JD: No, not at that time, it was a different world. So I joined this group and I was pretty much a lurker. I was listening, I was reading and paying attention to all this stuff that was going on. I was like, “Huh, I think I found these people.” But they were people on the internet. So I got this idea that I wanted to meet these people in person and I was sure that there were some of them in my local area which was San Francisco and it literally took me six months to get up the nerve to post on there to ask “Hey, does anyone who’s in the San Francisco Bay Area wanna get together for drinks next Wednesday and talk about web analytics?”

04:20 JD: And I got this amazing response, not only locally from people who did wanna get together and come to my meet-up but also I got a response from Eric Peterson who said from Portland, Oregon, “June, that is an awesome idea. Let’s call it Web Analytics Wednesday and let’s try to do it in lots of places on Wednesday.”

04:35 JD: And it did, it happened in that first month in multiple locations. I’m not sure exactly where but I believe globally, not just in San Francisco and everyone was getting together with people in their own local area to sit around and drink beer and talk about web analytics. And I was like, “Oh my god, what have I done?” And the very first one in San Francisco was at a place called 21st Amendment. It’s a brew pub, south of market in San Francisco and I still very clearly remember sitting around a table, handing out Hello, My Name Is stickers to all these people who I would get to know over the years. It was a really important time.

05:12 TW: So ’cause I have to throw this in. When I moved to Ohio 11 and a half years ago, it was the Web Analytics Yahoo Group that… I didn’t know anybody here and there were two guys, Dave Culbertson and Scott Zeckerzach, who both, it was clear we were in Ohio so we got together for lunch. In that lunch we said, “Well why don’t we try doing a Web Analytics Wednesday?” ‘Cause it was already established enough at that point that we could start it and they’re both… Scott actually moved away to Boston for five, six, seven years and then he moved back so they’re actually both still very active co-organizers in Columbus. So, thank you, June.

[chuckle]

05:51 TW: I don’t think I’ve ever actually flat out thanked you for that. I know I’ve thanked Eric multiple times for his hand in it.

05:56 JD: Eric had this huge hand in it too because it was not just like saying, “Hey, let’s do it in one place,” but “Let’s make it into a thing that can scale and scale globally and where we have a structure for getting sponsorship.” And it really worked, it worked really well. We don’t do it in San Francisco anymore. We do lots of other things.

06:12 MK: Really?

06:12 JD: Yeah. There are meet-ups every night for data-related things if you wanna go to something, you can find it but I’m glad to see that it has a life that continues on.

06:21 MH: That’s awesome. Okay, well you’re not done explaining things because Tim needs you to explain what Twitch is and probably a lot of our listeners too. ‘Cause apparently, even though I’ve been around him for a long time, my love of eSports and stuff like that has not rubbed off.

06:38 TW: I’ve spent time watching Twitch, but that was only in preparation for this episode.

06:44 MH: Yeah. So maybe if you could just give our listeners just a little brief description of Twitch, what it does and some of that ecosystem and I think that’ll help play into some of the conversation later too.

06:53 JD: Sure. So Twitch is a live streaming video platform and social community for gamers. It’s people watching people play games, really. So you might not think that that’s a thing but it is. It can be totally entertaining to sit and watch your friends play a game on the couch while you’re not playing the game yourself but you’re watching them and you’re having dialogue with them about the game that they’re playing. It’s that but moved to the virtual setting and it started as a little startup in San Francisco called Justin.tv and it was literally this guy named Justin who was walking around San Francisco with a webcam on his head.

07:29 MK: Stop it!

07:31 JD: No, I’m not…

[laughter]

07:32 MH: Yeah, that’s literally, that’s true.

07:33 JD: It is true. I used to see Justin walking around town. I think lots of us did. I used to tune in to his livestream and it was actually not that exciting but many people were watching it because it had never been done before and out of that, they also developed a technology that made it possible for many people to livestream. And so Justin.tv grew and grew and content and community sprouted up around gaming and at a certain point Justin.tv pivoted and became focused solely on gaming and rebranded as Twitch.

08:05 JD: And then nearly five years ago Twitch was acquired by Amazon. So we’re headquartered in San Francisco. We’ve grown and grown over the years. I’ve been at Twitch for three and a half years and over that time our product has grown and changed, our customer base continues to grow and our employee base also continues to grow and it’s been fun to be a part of that journey.

08:25 TW: What’s the nature of the product feature growth? Are there big radical things? And again this, as Michael set up, I’m pretty green in this space but what’s the latest new big addition on Twitch?

08:40 JD: The latest new big addition in terms of…

08:43 TW: Or however you wanna answer that.

08:46 JD: Oh, oh, I have an… Here. Here, I have a way to answer that. So, Twitch is essentially a platform for broadcasters to come to share content that they’re producing with people who are interested in that content and they can develop audiences around that content and those people engage with them in real-time as they’re playing games or as they’re chatting or as they’re weightlifting or whatever they happen to be doing. And one of the interesting things about Twitch is our revenue model. We’re not just solely based on advertising, although advertising is one component of it, we also get revenue from subscriptions and also Cheering, which is the thing called Bits where if you really like what a broadcaster is doing you can throw Bits at them.

09:27 JD: And for some portion of the broadcasters on Twitch, we offer a revenue share to them so people are actually able to make money playing games on Twitch and last year we added nearly 250,000 affiliate partners who were able to monetize content that they produce on Twitch. So it’s basically people playing video games on the internet and making money doing that.

09:49 MH: Wow!

09:49 MK: Color me stunned.

09:50 TW: What about the eSports leagues? Do some of those wind up being on Twitch or do those wind up being on their own platforms?

10:00 JD: There’s definitely eSports content on Twitch. We have different types of content. We have three million people a month who broadcast livestreaming on Twitch and many of those people are very small, maybe don’t even have anyone watching them but they’re playing a game and they’re livestreaming it and at the high end of that, we have things where we have hundreds of thousands of people who are tuning in together to watch streams on Twitch.

10:26 JD: And either that’s one incredibly popular individual broadcaster like Ninja who last year was one of our most popular or it might be an eSports event where a bunch of people are tuning in to watch, basically, professionals compete with one another. Just as you would watch a football game.

10:44 MH: So a couple of weeks ago, Tim, I watched TSM play Team Liquid in the spring finals of the League championship series, which I think Twitch streams.

10:54 JD: Yeah.

10:54 MH: And there were probably close to 300,000 people watching at that time.

11:00 TW: I can kind of understand it ’cause I was the guy that when you’d be somewhere where there were arcade machines, I would actually stand and watch somebody play Tempest or Asteroids looking over their shoulder. So I’ll admit that I got some degree of satisfaction out of that but I’m just… Such a fascinating thing.

11:19 MH: I kinda wish we were doing a whole show on it but this is an analytics podcast.

[laughter]

11:23 MH: We gotta go and talk about… So Twitch is pretty unique in a lot of ways in terms of what happens on the website, what’s going on. So I assume there’s probably some pretty unique analytics challenges. So I’d love to dive into, in the time that you’ve been there, you’ve obviously had to make quite a few choices about how to do a lot of things with analytics, so maybe just start with some of that and then let’s get into a discussion about it.

11:48 JD: Sure. Why don’t I give you a very brief high level overview of what our data environment looks like and let’s just kinda put it in context. First of all, I just wanna say that we… Going back to the theme of this podcast today, we don’t use an enterprise tool for analytics. We built our own data pipeline that we use to collect and record information about how our customers are interacting with our product and that was kind of unique to me because I’d come from a world where I was working with those enterprise tools for a while.

12:16 TW: Did it start out… Presumably, was there another platform on the site and then as it scaled it was just like, “Look, we’re gonna break this thing and we need to build one?

12:27 JD: There was an enterprise tool that existed in the early start-up days that was prior to the time that I joined Twitch and even also prior to when I joined, there was this decision that they’d run into some challenges that that tool couldn’t solve for them. They wanted to interact with granular data in a way that they… They couldn’t do joints basically and they wanted to write SQL against this data and they couldn’t and they were struggling with it and they were like, “Hey, I know. What if we take this event log data and fork it? We’re gonna pass it to this software as a service tool that we’re paying for and we’re also gonna send it to Redshift locally for us.”

13:01 MK: Yeah, nice.

13:02 JD: And that choice was made about five years ago and so then by the time I joined the company three and a half years ago, this platform was already there, it already had 18 months of behavioral data in it that was growing over time and was shifting the balance of the kinds of questions that you could ask and answer using that granular event log data and so people were strongly preferring to use the data that we were collecting in-house in our own platform already at that point.

13:29 MK: So, sorry, can I just clarify. When you interviewed for the job at Twitch ’cause I feel like people do this really often when they go to companies where there’s something familiar. They have a tool that’s familiar or I don’t know, a visualization platform or whatever it is. Was that something… Did that scare you? Or when you heard that they built their own platform and that wasn’t something you were familiar with, what was your thought process around that?

13:55 JD: I was excited about that actually. At my previous job we were many organizations, a customer of an enterprise analytics vendor and I do this because I enjoy doing it and I was evaluating new technology and I was looking at some things, it was a BI tool in this particular case but it would work really well if you had a data lake and you had all your company’s data in that data lake and you could point this BI tool at it and hook things together really nicely. And I was like, “At my next job, I wanna work in a place that has a data environment that makes that kind of analysis possible.”

14:29 JD: And so when I started interviewing at Twitch and I found out more about how they were capturing and storing their data, I was like, “This is exactly the kind of company that I wanna come work for. I wanna find out what they’re doing and I know it’s not gonna be perfect, they’re gonna have a lot of challenges that I’m gonna find super frustrating.” cause I’m like, “Why… ” Things like, “Why do you have to redefine sessionalization? Why do you have to decide what a visit is?” Or, “Oh my God, you haven’t figured out what bounce rate is yet? Here, let me help you.”

[chuckle]

14:57 JD: And those things that you take for granted if you’re just cracking open an enterprise tool that has all of that stuff pre-calculated and all of that logic already defined. You don’t have that luxury when you have all of the raw data there at your disposal and you have to make it up on your own.

15:11 MK: So under what circumstances do you think you would go back to a BI model?

15:16 JD: I think that if I worked for a company that had smaller data, we have massive volumes of data here and I think that enterprise vendors tend to struggle with that or charge you tons and tons of money for that. So, if the data that I was working with was smaller and maybe if it was an e-commerce business where the things they were measuring fell into this funnel where there were already really fully formed framework for just applying our data to something that fell easily into a template that was already there, I would totally go for it.

15:50 JD: But, keep in mind, I’m working with live streaming video data. I had never done that before. I’d worked with video data but it was short form. Did they watch one quarter, half, three quarters? Complete the video? Of a marketing video and this is like nothing I’d ever seen. There’s a heartbeat once a minute that says they’re watching, they’re watching, they’re watching. For every single person who’s watching and we have roughly 15 million people watching every single day. And so they were collecting a lot of information and it’s very rich, that describes people’s interaction with live streaming video. It doesn’t fit naturally into traditional analytics tools, I’d say.

16:26 TW: How dependent was and it predated you from what… Even what you’ve just said, it sounds like it’s a really solid platform. It sounds like the sort of thing that somebody who actually had a lot of foresight both for scale, for data usage, for… An architect who was what you want an architect to be. That the data collection, the data structure, the data storage was… And I’m sure they’re still continuing to add features and update it but it sounds like you’re not at a point where, “Wow, we made some fundamentally poor decisions.” That feels like a big risk. If you’re building that you need to have the experience and the intuition and the skill to have a foundational framework that actually works.

17:20 TW: Even the third party vendors, every five or six years, they say, “Oh, we’ve had to completely re-gut and re-engineer some aspect of the product.” So do you feel like you got really lucky or just approached it in the right way, that you had an experience with the third party tools that you really knew what you were gonna… They really understood video before they started trying to build a video analytics platform?

17:46 JD: Yeah, I think that there are things that if I look back on the evolution of what’s been built here and what I’ve had a hand in helping to build, that there are things that I would do differently next time based on what I know now and I know, I’ve talked to other people who’ve been in similar data environments where they built it themselves and there was one place, another start up business in San Francisco and it seemed like they really had their act together and I asked them similar questions about, “Wow, how did you get this all right?” And the guy who was the head of data infrastructure there said, “You know what? This is not my first rodeo. I did this again… I did this before once and I made some poor choices and this time I’m getting it right.”

[chuckle]

18:27 JD: It’s true. And I was glad that he admitted that but I think now we have this whole generation of people who are trying to do this thing, who are building rather than buying and it’s their first rodeo for many of them and I think a few years from now, they’re gonna move on, they’re gonna go to the next place, they’re gonna correct some of the mistakes that they’ve made the first time around and they’re gonna get it right and it’s gonna be totally solid.

18:46 TW: That’s interesting, so if you’re gonna build, it seems like you wanna have a… You want somebody who’s built before.

18:50 JD: I would say that would be a good early hire.

18:53 MK: So June, do you think, cause I suppose in the startup space people often ask my advice and one of the things that I tend to say is, “Use free as long as you can.” If you can use a free tool, especially when you don’t have a big team yet, do what you can with it then when you reach its limits, then you go on to, “Okay, what’s gonna be our next decision?” Which is probably gonna be more in line with scale. Do you think that’s appropriate? Do you think you just try and start straight away from the ground up? Because all the places where I’ve seen build yourself… Yeah, you’ve really gotta work hard. You’re gonna make mistakes, unless you had that experience.

19:33 JD: I have two ways to answer that. The first way is quickly, if you’re at a small but growing company, it’s guaranteed that if you’re successful as a company, you’re going to outgrow things. You’re like a snake that’s gonna have a skin and they’re gonna shed that skin and they’re gonna grow and they’re gonna show their next skin and I think that’s perfectly fine and you just have to make your choices about the tools and techniques that you’re using being applicable for now and maybe a year or 18 months down the road, they will still be applicable but eventually, if you’re successful, you will outgrow it and you’ll need something else and it’s fine to acknowledge that.

20:10 JD: The other way that I will answer that is that when you talk about free, there are two kinds of free. I think there’s like a, maybe there’s more than that. Tim says more.

20:20 TW: Well, think of Ted Stoffel; There’s nothing… Nothing’s free.

20:22 JD: Nothing’s free. One kind of free is a trial to paid membership right, where you get a little taste of something that eventually you could enter into an enterprise contract to buy and then there’s another kind of free that’s Open Source. And that can be kind of risky, too because you don’t know what the lifetime of that thing is gonna be. But I’ll say there’s some really interesting and promising Open Source technologies in the analytics realm right now that I think definitely should be considered by small companies that want a lightweight way to get started. The difference though is that if you do the trial to paid free, then you don’t necessarily need a deep engineering commitment and if you’re gonna go with Open Source, you need engineers.

21:05 MK: Have you found, it sounds like Twitch is quite evolved culturally that the business understands that. I feel like when you decide to build one of the biggest challenges is the business need to understand that you’re investing, it’s not free because you’re investing your people and their time into building what it is. Have you worked in organizations where you’ve had to have that cultural conversation around this is what people… You’re not gonna see this stuff, you’re just gonna see at the very end and there’s gonna be potentially months of work that people are gonna struggle to understand why.

21:39 JD: Yeah, it definitely plays into real conversations about cost and what cost means and it’s not just the vendor contract that you sign and renew every year. If you’re choosing to prefer to build, then you have to factor in the cost of maybe needing to hire analysts that hit a higher technical bar because the system that you have isn’t usable unless they meet that bar. Or needing to hire more engineers to support your infrastructure because it’s something that you rely on for critical business processes rather than paying a vendor and just establishing and enforcing an SLA.

22:19 JD: Is a completely different ways of managing things but they can certainly go wrong if you choose to go with build and then under invest. You’re like, “Oh I didn’t know that we had this massive area of risk over here. Okay, let’s figure out how to overcome that.” Not where you might not have, if you were working with a vendor, you would just rely on them to take care of all that for you.

22:43 TW: Maybe. You’re in a unique… You’re somewhat unique in that you probably… You’re easily in the 98th percentile when it comes to the volume of data that’s being generated and that’s almost a reason that if you go with the buy and you look at who they’re, cause I’m dealing with this with the client right now that is a massive client that has bought but it turns out that their needs and there’s an SLA and the sales guy said, “Sure we can do whatever we can scale.” But now all of a sudden you’re at a Twitch level volume of data and oh, guess what? Shit’s starting to fall apart and you have an SLA and you can scream at him but now you have no control. You’re at the whim of other than saying I’m gonna go build or I’m gonna switch to another vendor, the reality is you’re now relying on their engineering department to go solve it for you.

23:35 TW: So I think that probably that’s if you are massively scaling, that to me winds up what is a big risk of an actual risk of buying ’cause you’re actually giving up some control.

23:46 JD: Yeah, you give up a lot of control and it’s another thing that we found really valuable to have within our control, is now related to all the different ways in which we use our data at Twitch. We use it widely throughout the company for reporting analysis, experimentation, data science, data products, the whole gamut of things and we need to be able to support all of those different kinds of users of our data.

24:13 JD: And so if we make it possible to subscribe to streaming data to feed into a customer facing data product that’s based off of customer behavior data, we can do that and we can scale that and it’s completely within our control. Where if we were working with an enterprise vendor for data collection and transformation and surge and all that, it might get incredibly expensive or it might be impossible or we might have 24 hour latency or something crazy that’s not gonna work for us and suit our needs but we can do things that are more customizable because we control all of it.

24:46 MH: Yeah and there’s so much that goes into that. I wanna sort of get you to react to an observation. So, over the years, I’ve observed that a lot of times companies who choose to go build their own, a lot of times they exist in a fast-growth or tech space because they’ve already amassed a lot of technical development or engineering talent because that’s what they’ve used to build their products and those kinds of things and then on the enterprise side where those resources are more constrained and they don’t have as much access, those people there’s a tendency to go buy. Do you feel like that kind of tips the scales for companies a lot?

25:24 JD: I live in this little bubble that is the San Francisco Bay area and that’s what I see happening around me.

[chuckle]

25:32 MH: Data scientists just grow on trees out there. So, I’ve heard.

25:36 JD: Oh I try not to use that phrase.

25:39 MH: Or growth hackers grow on trees.

[laughter]

25:44 MH: You’re a good company with both reactions. Trust us.

[laughter]

25:48 JD: Okay, ’cause I was just about to end this interview right now.

[laughter]

25:55 MH: We definitely need more of [25:55] __ but not today.

26:00 JD: I think that the slip that I’ve observed here is that companies that are more established were more established earlier on and tended to adopt enterprise tools because they were the best on the market at the time and if it works for them, there’s a switching cost now of saying, “Oh, well, why should we prefer build now because what we bought is working pretty well for us and it’s not our core competency as a company anyway.” But I also see a lot of newer, younger companies saying oh, I’m gonna evaluate the best that’s available for me right now out there and it might not be the same choices that companies were making 15 years ago or a decade ago when they were evaluating analytics tools and platforms.

26:41 MK: I think it does also depend where analytics sits in the business because I think if it sits with marketing, which is a more traditional org structure, it does… I just find older senior marketing people tend to lean towards let’s buy something, whereas when your analytics department sits with engineering and tech, there seems to be a much bigger, I guess, understanding and yes, let’s invest in this and let’s build. And I think you get more support from engineers to help with that too. So I think there’s also a lot of thought you can have around where your analytics department sits. Tim, you look confused.

27:15 JD: Yeah.

27:16 TW: Well, I guess no, I’m thinking that if you’re sitting with marketing there’s the other reason which, one of the downsides of building your own… What both the two big players, Google and Adobe, have both gone very very heavily into is their integration with their respective marketing cloud. So if I’m sitting in marketing, I’m integrating with the marketing and advertising offerings and components for those platforms and so I could see marketers being drawn to it partly for that saying, “Look, I can now build audiences and push them out and advertise to them,” whereas it seems like if you build, then you’re having to go down a level to hook into a third-party DMP. Again, you have…

27:58 MK: Or a CDP.

27:58 TW: Or a CDP or whatever, you have more control but I don’t know, that seems like another… What do you… I don’t know, do you guys… Do you need to, do you have paid marketing or do you guys are just printing money and growing hand over fist so it’s not needed.

[laughter]

28:13 TW: Is there a piece of that too, that when you build you are… Now that means every system you might wanna hook into, you have to build an integration to it and do you guys have to deal with that?

28:24 JD: I’m gonna go back to what I think is maybe the root of your question which is, does wherever you put the data team within a company, it might be dictated by the needs and values of the org that it lives inside and it’s origins and you have companies that are heavily marketing-focused, it’s all about acquiring new customers or something and if that’s a big focus and you have a big paid marketing budget, that might be where your data people live because that’s the area of most need.

28:56 JD: At Twitch, we had our roots as being a very product-driven company and so the need to understand customer interaction with our product was the key thing that led us to invest in an analytics platform and then as we’ve grown into a more mature business, we have all the kinds of concerns that you could imagine across marketing, analytics and product analytics and content analytics for us, like how well does eSports do quantifiably. But for us it had its roots in product and product analytics. We don’t live there anymore; there may be no one perfect spot for a data org in a company but I think that that determines what you see as your priorities and what you get funded to do and how.

29:38 MH: Yeah. Another thing that I often observe is actually there can be two almost warring factions within a company going in opposite directions. So the marketing department wants to keep using Google Analytics because that’s where all of our paid search information is but then there’s also a BI team or a data science team that’s like, “I need unaggregated data and I’m sick and tired of dealing with… “

30:01 TW: Or product team.

30:02 MH: Yeah, or product team and they are like, “Give me tools that I can build on this,” or even… I remember one of the very first Webtrends engagements I ever worked on, I was explaining about WT dot whatever parameter to some developer and they’re like, “What are you talking about? I could just build this.” And that was… That’s a lot of times a developer mentality that goes all the way back and so I’m just curious, what advice would you give to an organization… Let’s say you’re a leader at a big organization and you have teams who are coming to you and being like, “No, we need to go this way” and then the marketing team is like… Or other teams are like, “No, we wanna stay this way,” how would you give them advice? What questions would you tell them to ask or whatever?

30:44 JD: Well, I would ask how important is organization-wide consistency and if it’s critically important for different business units within your company to be driving off of the same KPI Sport engagement data… Not revenue data ’cause we want that to be consistent across the company already. But if you want your engagement data to have consistent definitions in product and in marketing and in sales and everywhere else you use it, then investing in something where you have more control over how you define those metrics, including data collection and transformation and storage and all of that, then I would say you might consider build.

31:20 JD: If you don’t care as much and you say, “Oh, we’re just getting a sense of trends here and as long as it’s directionally similar then it’s fine for marketing to continue to use the enterprise tool that’s easy for them as nontechnical users and the product portion of our organization can build their own thing because they have different use cases they need to satisfy,” then that might be appropriate too but don’t be surprised when there’s divergence.

31:42 MK: Ooh, that’s a fun topic.

[chuckle]

31:44 TW: There’s the assumption that it’s all… That it’s just for free things are gonna align, that the [31:49] __.

31:50 JD: Yeah. They’re not.

31:51 MH: Because bot filters that come standard don’t when you build your own or all the things that you mentioned earlier about sessionization and visitors, all of those things.

32:00 JD: I know.

32:02 TW: That’s actually one of Simo like I have a, now for five years. That’s one of his favorite things is to talk about all of the decisions Google makes for you around a session. He has a a whole, pretty sweet, little riff which he admits is kind of over-inflating the size of the issue. But yeah, if you define it yourself, then everybody gets to change their definition.

32:24 MH: But I definitely don’t wanna lose that point you made June. That idea of defining or the need for organizational consistency across engagement metrics. That’s amazing, I’ve never heard it sort of put that way but that makes so much sense to me. So that’s a really great insight.

32:42 JD: I would just add that it will depend on your business model too, how much that is important to you. Twitch is ostensibly a media company. We care a great deal about our customers engagement with our product and our platform and so we need that consistency across all the places where we’re measuring that engagement so that we can do what’s best for our business and what’s best for our customers. But if I work for a retail business or a traditional retail business or something looking at digital engagement signals, may not be as critical to my business. Maybe I’m looking at consistency in some other capacity of my business.

33:14 MK: I think though… So when I was working at The Iconic, we had two analytics platforms we used Snowplow and we had raw event level data and Google Analytics and I don’t recommend having two tools ever. Just because you’d walk into a meeting and marketing would have one number and tech would have one number and then the question was always, “Why don’t these numbers match up?” It was never about, “What should we do with this number? What does this mean for us a business?” It was always, always came down to, “Why don’t they match?” And it just…

33:46 JD: It’s true but you know what, it’s not all roses even with the build path because I’ll say that we struggle with that in my company too and one of the really important initiatives that my team is working on currently is to take our massive volumes of raw event log data and aggregate that in a smart way that makes it clean and consistent and trustworthy and based off of the same logical rules that we have and we want to apply throughout our business. So that’s an in-aggregate layer that lives a level up from a raw detail, it’s up to us to define that.

34:22 JD: I’m sure Simo is super unhappy if he’s hearing that right now but we have lots of control over that and we say, “Okay everybody, you can go to the raw event log data to ask and answer any question that you want but you don’t have to and if you use the aggregated layer instead, you’re guaranteed to get an answer that is consistent with answers that other people will get when they query that data.”

34:42 MK: ‘Cause that’s definitely something that you need to work with the team on, is making sure that people are querying the data with the same caveats and same rules and the same logic ’cause otherwise… Yeah, you can get two analysts that run a very similar query but come up with completely different results. Because one forgot a flag.

34:58 JD: Yeah, that’s called flexibility. Yeah. The ultimate flexibility but also ultimate in chaos and headaches.

35:07 TW: Well, but sessionization is one of those things that seems like presumably it really benefited that you came from a background where working through the whole with Adobe when they had daily, weekly, monthly unique visitors and that sort of thing that, yeah, it does matter how you’re aggregating. So you think about it, you have data collection, you have the transformation and the storage and the schema that’s being stored in. And then you just spoke to, there was kind of an aggregation. Does that mean ultimately there are some third party, are there third party tools that sit on some of the aggregated to, that are a little more maybe user-friendly to get some high-level reporting or is it literally to get at the data, somebody needs to be writing SQL whether it’s on raw data or an aggregated tables. How does that work? What’s the… Do you have to build a full stack or are there are pieces that you can plug in that are kind of third party and safe?

36:06 JD: Well, I think the closer you get to the end consumers of your data, the more it might make sense to leverage existing technologies, rather than try to build it from scratch. So for data interfaces for instance, we have currently three that I’m not gonna name my name. But one is really good for writing, running and sharing SQL and if you’re a technically proficient, it’s incredibly flexible and easy and light weight.

36:30 JD: The other that it’s a database technology with a simple data visualization layer on top of it, that takes our big data and allows you to explore it easily and filter it and cut it in pretty simple ways. You can’t do calculated measures or anything but if you just wanna ask some basic questions about user engagement, cuts in certain ways, it’s awesome but then you hit a wall.

36:53 JD: And then the third one is a more full-featured enterprise tool for business intelligence that does all the kinds of reporting and analysis stuff that you would expect and one important feature of that is it has a caching layer, so it pulls back data so that you can work with it locally without having to go and query the massive data sets over and over because we found that that caused some real pain and a lot of confusion with our end stakeholders who are like, “I click run and I went away for two hours and then I got an error message, what happened?

[laughter]

37:24 JD: And they’ll say that that’s the fault of the data interface but it’s absolutely not. It’s a problem with… Maybe it’s a problem with our data model, maybe it’s a problem with our database but the way that it’s expressed as pain by the people who we’re trying to help is like the interface is broken.

37:40 MK: Yeah.

37:40 TW: But it’s nice. They can bash the third-party tool instead of bashing the…

[overlapping conversation]

37:45 MH: Right. Plausible deniability.

37:45 JD: They can hide behind it.

37:47 MH: We told our vendors we wanted that but you know how it is.

37:52 JD: But one thing I’ll add though is that occasionally I’ll look… I’ll just speak on Google analytics for a minute. I’ll look at a path analysis for a report in Google Analytics and go, “Oh man, this is amazing.” I wouldn’t wanna build this from scratch and it’s just done really well and that’s the kind of thing that you can… There are tools on the market that I’m also not gonna name, that you can take your raw event log data and apply some modeling to it and pass it to them and they’ll give you pathing and cohort and funnel analysis, not unlike what you would get from standard enterprise vendors but based on your raw event log data. And that’s awesome and there could be more of a need for that in the future for people who choose to build but still want the presentation layer to be really full featured.

38:40 TW: But that seems like you have to make that design decision that making sure that you’re saying you can, ’cause those, now you can swap out little bits and pieces and say, “Oh let’s try this, let’s try that.” It’s at the… It’s pretty far down the flow but it still means that you… There’s a commitment to saying we’re going to allow being hooked into by these different things.

38:58 JD: Yeah, yeah and actually that feeds into a design choice that we’ve made with our data architecture to say we have these… The aggregate tables that I was mentioning as something that’s decoupled from our data interface choices and so you can plug into the aggregate data we have from any one of those interfaces and you should get consistent results across all of them. And there are some data interface vendors out there that push all of the… A good portion of the business logic definition into the interface tool itself and I think that is incredibly dangerous because then you’re beholden to that interface as the place where you’re storing all of your important business logic and then you’re prevented from using it and say, “Oh, now I wanna use it in a data product that’s customer-facing.” Well, that business logic lives over there in the reporting tool, you can’t get to it for the data product.

39:51 TW: Those are the developers you wanna hire who did it the first time; it has totally bitten them in the ass, and now you want them to not do that again…

39:56 JD: Yes, exactly.

40:00 TW: Yeah, gotcha.

40:01 JD: Put your business logic upstream.

40:02 MH: That’s one of the things I wanted to come back to, ’cause you mentioned this earlier June, that there is a different technical bar that your analysts need to hit to be competent in your environment. So there’s a lot of people who listen to this show who are growing their careers and they’re looking at job options. So what are the kinds of things that you’re seeing that analysts need to be adding to their skill set to be successful in an environment like yours or other ones that are building versus buying? Cause I think that would be helpful for people.

40:29 JD: Absolutely must know SQL and I don’t mean know SQL in the sense of went to a bootcamp and dabbled in it a couple of weeks with it but I think the thing that we really look for is a track record of having used SQL on very large data sets and familiarity with optimizing SQL queries when you’re working with high-volume data. And that’s not something that we… That if people have done that before, it saves us a lot of pain of them coming in and getting confused and saying, “Well, I just did a select star from this massive table with hundreds of billions of rows in it.” Like, “God no!”

[laughter]

41:09 JD: And maybe we’d all have to learn it. Dammit, no!

41:10 MH: I need a bigger computer! [chuckle]

41:12 JD: I remember the first time I did that, not here but… And this is terrible but when I first did that and I brought down our data warehouse at the company I was working for, I was like, “Yes, I’ve done it!” [chuckle] Which is not the right attitude. You learn and then you go, oh, well, and this is how I run an explain and see how terrible my query actually was and not do that again. [chuckle] But yeah, so that’s one thing that just getting your hands on super, super high-volume data and being really comfortable with that is just really invaluable in today’s market.

41:47 JD: For Twitch, we love people who have worked in business-to-consumer businesses before, we think it’s… Maybe Twitch also as an Amazon company is all about the customer and thinking of the customer first and so we find that people who have worked in business-to-consumer settings often already have that in mind and have good stories about having used data to understand the customer. But it’s different, if you’re trying to do something similar in a B2B company, you probably would prefer people with a B2B background so that’s not across the board.

42:21 TW: I could also say, R and/or Python but it really… Some of that depends on the kind of role and position that you’re coming into for product analysts especially. I believe our product analysts prefer R because it helps them when they’re doing experimentation, which is something a lot… That our product analysts focus on a lot. Python tends to be great for people who are moving data around and transforming it; just being able to work with data quickly is handy in that capacity.

42:52 JD: I would say that data analyst at Twitch is a very, very broad term. It’s all the way from someone who might be a data engineer in another company all the way to a customer insights person. And it can be tough because we can’t give up on the technical skills that they absolutely must have to be able to thrive and so the kinds of people who are most amazing to us are the ones who have that technical knowledge but also have a grasp on the customer insights end and are great at talking about what they’re doing and making sure that they’re working on things that have impact to the business and everything that you want from a data person.

43:33 TW: Quick clarification, on the R and Python, are those people who are heavily using R and Python, they also, definitionally also are using SQL or are very, very good with SQL?

43:43 JD: Yeah, yeah.

43:44 TW: Just wanted to makes sure, clarify.

43:45 JD: Everybody uses SQL primarily and then augment with R and Python, although I have encountered people, not at Twitch but elsewhere, who do most of the work that they do in R and only go to SQL when they need to but I think they might be operating in a different environment where… One person I was talking to was working for an agency and was just working on marketing attribution problems and he found that R was a really good tool set for him because the data that he was working often took a certain shape and form but for us, SQL is the place to start.

44:22 MH: Alright, well, this has been great. We do have to start to wrap up, this has been an awesome topic. One of the things we like to do on the show is go around the horn and talk about something we’ve found recently that we’re interested in and we think would be cool. So June, you’re our guest, you have a last call you wanna share?

44:39 JD: Oh, yeah. The other day I learned that termites eat wood twice as fast when they listen to rock and roll.

44:46 MK: Stop it.

44:47 TW: What?

laughter. 44:47 JD: No, Google it. [laughter]

44:51 MH: That’s amazing. So the fact that Detroit is crumbling probably has a ton to do with the fact that it’s a rock and roll city.

45:00 JD: Yeah, I think so.

45:00 TW: Wow!

45:00 MH: But the Rock & Roll Hall of Fame is in Cleveland, so…

45:04 TW: Yeah, yeah, but Detroit is close. It’s just… I realize that sort of now.

45:09 MH: Okay. No, just Google it.

[laughter]

45:17 MH: Oh, that’s amazing! There’s… Okay, so Tim, you listen to more podcasts than anyone, how come that has never been covered on one of the podcasts you listen to, that’s fascinating. There’s gotta be a…

45:29 TW: I will ask my daughter and I’m pretty sure she will say she knew it.

45:34 MH: That’s worthy of some kind of one of those investigative Reply All-type podcasts, figure out why, the science behind that. That’s awesome. Okay Moe, what about you, last call?

45:46 MK: Sorry, this actually comes from one of our former guests. Els Aerts put out a tweet about a book that she was reading and I saw it and I was like, okay, I’ve gotta read this book. I read it in 24 hours; it was completely amazing, It’s called Invisible Women: Exposing Data Bias in a World Designed for Men and it’s by Caroline Perez and she basically… I think the reason it’s so fascinating is cause it’s so data-heavy and she talks through study after study, stat after stat and so, yeah, if you don’t like reading a lot of data information, probably not the book for you but it was… Yeah, completely blew my mind. Well done, total silence, nothing. Zero.

46:25 MH: Well, you didn’t sell it super hard with all the tons of charts and graphs and it’s really data heavy. I was like, ugh. [chuckle]

46:33 MK: No but interesting, really interesting data problems.

46:38 MH: Alright, well I’ll do my last call cause it’s a non-serious one and that’ll leave you Tim to bring the show back around. So because we’re talking to June and I’m such a huge fan of Twitch and gaming and eSports generally, I thought I’d share a couple of my favorite Twitch streamers. So if you’re of a mind, you can go check them out and see what a weirdo I am. So on Twitch there are some people, there’s one guy I like to watch, his name is SovietWomble and he’s a hilarious, silly guy and he has his friends get online and they do stupid stuff in games, it’s really dumb but it is very hilarious.

47:14 MH: The other favorite I have is a guy by the name of Dyrus who is a former… He’s a retired professional League of Legends player because you retire when you’re like 22. So he’s a former professional, he plays lots of different games but he was probably my favorite League of Legends player of all time so back in his heyday he was just somebody who I thought was awesome.

47:37 MH: And then lastly, a lot of times on weekends there are whole groups of people who do charity streaming. So they’ll do video game events and a lot of times they’re in a public space, so there’s people watching there locally and then online. And they gather donations a lot of times for cancer research or other great causes, so actually watching video games online actually makes a really positive impact on society sometimes and you can also watch some pretty cool… A lot of times they run classic games on those channels so you get you to see people playing games that are from a long time ago and they do speed runs, which are really interesting and there’s a lot of real technical stuff that goes on.

48:15 MH: Anyways, as somebody who just appreciates some of those things, those are kinds of things that I like Twitch for. So if you’re interested in the platform or you wanna try to give it a little try, there’s a couple options for you. Okay, Tim.

48:25 JD: Aw, that’s so great. Thank you, Michael, for the shout-out.

48:28 MH: Eh, you know.

48:29 TW: Wow!

48:30 MH: It’s never good to plug your own product so let me do it for you. I don’t get the chance to watch as much as I used to but it is… It’s an alternative to television honestly. It’s a relaxing thing to do.

48:42 JD: Yep.

48:42 MH: Anyway. Okay, Tim, what’s your last call.

48:45 TW: Well, I’m not sure how exactly to set this up given the diversity of the last last calls but I was listening to a podcast, shocker, a few weeks ago, Planet Money episode number 908, called I Am Not a Robot and it basically is the history of CAPTCHA and reCAPTCHA. So the first half, not gonna be news or interesting to pretty much anybody but then it gets into reCAPTCHA version 3, which is the one they’re working on now. So two things that I did not realize, if you’ve been on a site where it just says click the check box of I am not a robot and I was not a deep thinker and thought that seems silly, it’s easy to have a robot check a checkbox but device fingerprinting, they don’t use that word, but looking at the actual other information that goes along with that clicking a checkbox. But what Google is apparently now rolling out and I briefly went to the developer documentation for it, it’s the…

49:43 MK: Of course you did.

[laughter]

49:44 TW: Kind of Sci-Fi site. Well, I have had clients that have had major issues with bots. Bots just really screw up their data so it’s the bane of many sites that have bots that are hitting them that the bot filtering doesn’t take out. So this is actually using machine learning on a site-by-site basis, Google basically trying to sniff what typical human behavior is on the site and therefore if you’re running reCAPTCHA version 3, it sends you back and I’m not sure if it’s a flag or if it’s a probability thing of the likelihood that based on the profile of what this activity is doing, this may or may not be a bot.

50:27 TW: But the idea is that CAPTCHAs and reCAPTCHAs will eventually go away entirely and that machine learning will be applied to… It’ll be much, much harder for bot developers to work around it. ‘Cause we’ve moved past the machine vision able to tell how many of these tiles have a stop sign in them. It’s actually getting to looking at the behavior and filtering it site-by-site. So I found that fascinating, I was walking the dog when listening to it, I came back, immediately Googled and was intrigued.

50:58 MH: A never-ending war of attrition between humans and robots, only escalating through the power of artificial intelligence. Ah, well. Okay, you’ve probably been listening and you’ve probably been thinking to yourself, oh my gosh, I’ve been facing similar challenges in my organization or I’d love to ask a question, or oh, so now I know where to put my amazing SQL skills to use. So we would love to hear from you. The best way to do that is through the Measure Slack or on Twitter or on our LinkedIn group, so feel free to reach out. June, are you on… I think you’re on the Measure Slack, aren’t you?

51:34 JD: Yes, I am.

51:35 MH: See, so you can even reach out to June there as well and you’re also on Twitter at…

51:41 JD: @jdersh. I do… Well…

51:43 MH: It’s not as… I follow you on Twitter, you’re not as active as some but you can still find June on Twitter as well @jdersh so you can reach her there. June, thank you so much for coming on the show. Really loved… Appreciated the insight you brought to this conversation and of course, letting me personally nerd out a little bit about video games, always something I enjoy doing.

52:05 JD: Thank you, Michael, thank you, Tim and thank you Moe for having me as a guest here, it’s been a pleasure.

52:11 MH: Anyways, well, it’s been our pleasure and for everybody listening, I know I can say with a lot of confidence for Moe and Tim, my two co-hosts, whether you know SQL or not and now you know you need to learn, keep analyzing.

[music]

52:30 (Announcer): Thanks for listening, and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, Facebook.com/analyticshour or @AnalyticsHour on Twitter.

[music]

52:49 JD: So smart guys want to fit in so they’ve made up a term called analytics. Analytics don’t work.

[music]

52:55 S6: Analytics, oh my God, what the fuck does that even mean?

[music]

53:06 JD: You gotta start buying bits.

53:08 TW: Yeah, do I though? [laughter]

53:11 JD: I don’t know. [laughter]
[music]

53:15 MH: So this idea has already occurred to me Tim, which is we could stream our podcast live on Twitch…

53:20 MK: Oh, Jesus.

53:21 MH: And people could watch it.

53:22 JD: You could, and they could.

53:23 MK: There is no way we’re live streaming shit.

[music]

53:26 MK: Tim, what’s prompted your red wine drinking tonight?

53:31 TW: Well, I have a leftover box from Web Analytics Wednesday.

[music]

53:34 MH: Probably for you Tim, it’s best not to think of it that way. Let’s assume it has no monetary value today.

[music]

53:41 TW: Are there any people writing R on Twitch while they’re letting people watch them code?

53:45 MK: Jesus.

53:46 JD: It has been done.

53:47 MK: Tim would totally watch that.

[music]

53:50 MH: As an aside, we’re not allowed to talk about [53:51] __ so we’ll have to edit that out.

53:56 TW: Why are we making my editing a nightmare.

[laughter]

54:00 MH: Okay, sorry.

[music]

54:00 MH: A man with one watch always knows what time it is, a man with two watches is never quite sure.

[music]

54:09 TW: No, that’s great, I just have to say that I’m reading this book thing and it’s called Why We Sleep: Unlocking the Power of Sleep and Dreams. [laughter]

54:17 MH: Oh. I’ve heard it’s a really good book.

54:19 TW: I am actually reading it now.

54:21 MK: I nearly thought about giving up caffeine when I read that book and then I was like, that’s just crazy and then I started doubting my alcohol consumption so that’s where we went really off the…

[music]

54:30 TW: Rock flag and Helbs is a gaming nerd.

One Response

Leave a Reply



This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

Recent Episodes

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_243_-_Being_Data-Driven__a_Statistical_Process_Control_Perspective_with_Cedric_Chin.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares