WHY does Tim simply not give Python its due? Isn’t Python a perfectly acceptable — possibly even better — option when it comes to diving into programming with data? It’s open source, too. Some say it’s easier to learn than R. And, frankly, isn’t a programming language named after a snake just inherently cooler than one named after a letter of the alphabet? The fellas tackled the topic with Ryan Praskieviecz from EY on this episode…and possibly wound up tackling it in a way that will leave Python lovers that much more ready to strangle them (as pythons are wont to do).
Links from the Show
Lots and lots of things got reference! Some of them are:
- R: The R Project for Statistical Computing
- Simo (Ahava)
- Jim Cain
- Ottawa Senators
- Google Analytics Reporting API
- Google API Python Client Libraries
- OAuth 2.0
- Eric Goldsmith (podcast episode)
- Mark Edmondson (R packages)
- RSiteCatalyst (R package)
- Randy Zwitch
- Jason (Thompson)
- The Measure Slack Team
- Pandas (Python)
- Data Frames (R)
- REST API
- Yandex Metrica
- httr package (R)
- Mt. Katahdin
- Shiny (R)
- Github Pages
- Google Analytics R Tutorial (ryanpraski.com)
- Base R vs. the Tidyverse (R)
- Cross Tabulation w/ Chi Square Test for Independence
- dplyr (R)
- David Cancel (founder of drift.com)
- Machine Learning to Improve the Energy Efficiency of Data Centers (Google)
- Google Deep Mind
- Mary Meeker’s 2017 Internet Trends Report
- League of Legends
00:04 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour, and their website, analyticshour.io. And now, the Digital Analytics Power Hour.
00:28 Michael Helbling: Hi everyone! Welcome to the Digital Analytics Power Hour. This is episode 67. Folks, you know we cover the hard-hitting topics on this show. You know we don’t pull punches when it comes to asking you the tough questions. You know we bring you quintessential digital analytics, thought leadership, and amazing new paradigms. Alright, that’s too much. But you know sometimes, we just wanna gaze up into the sky and figure out what date it is each passing cloud is. And what’s a better pie chart in the sky as a topic than R versus Python. Insert air horn. Any regular listener knows that my co-host, Tim Wilson… Hey, Tim!
01:20 Tim Wilson: Hey, Michael!
01:21 MH: Is on a “R walkabout” this year. And I, Michael Helbling, have our studio installed, thanks to Simo. But who could help us contrast Python, that’s walked a mile in the digital analytics shoes, so to speak? Well, that’s our guest on this show. Ryan Praskievicz is the manager of Web Analytics Ernst & Young. Prior to that, he was with Semphonic, then joined Ernst & Young through that acquisition. But more importantly, he has spent some quality time with both R and Python, eating lobster rolls on a house boat in Boston with Tim Wilson. Okay, wait. He’s done a ton of cool projects with these tools, and that’s what we’re hoping to talk about on this episode. Welcome to the show, Ryan!
02:08 Ryan Praskievicz: Thank you, Michael and Tim. Such an awesome introduction. Thanks for having me. I’m a big fan of both of you. I was also a big fan of Jim Cain when he was another co-host. I’m excited for the episode.
02:19 TW: Go, all the lost editors!
02:20 MH: [02:20] ____ we lost to the Penguins, but anyways, tipping a glass. So Ryan, maybe fill our listeners in sort of what got you an entry into both of these tools and some of the things you’ve done with them. And then, we’ll shift from that into doing some compare-and-contrast between what you’ve seen useful across them.
02:41 RP: Sure, so like Tim has been on a journey, I’ve also been on sort of a journey to be more data science-y. I’d always heard a ton about R and Python. And like a lot of us, I always felt like I was too busy getting my regular job done using the tools of the trade, using Excel, using just the standard reporting interfaces in Adobe and Google, Report Builder, Google Sheets plugins, different tools. And I just never really made the jump. I heard people talking about the courses that they were taking on R, the big Coursera one. I think Tim, you’ve mentioned in past episodes that that’s something that you’ve taken. But for me, personally, I thought it was better to not take a course and learn with a specific task. So the first task that came along where I really was pushed over the edge was about two-and-a-half years ago, where we were working on a project to pull marketing channel data from Google Analytics. And it was about 50 different profiles, now called “views,” that we had to pull the data from. And a custom report in the Google interface would only pull five dimensions across 5,000 rows total. So to pull this data just would have taken forever across all the profiles and for the time period that we had to do it.
03:58 RP: So I decided to explore alternatives. I knew about the Google Analytics Reporting API. And thankfully, I had a co-worker who, in graduate school, had a lot of experience with Python. So together, we explored the options and found Google has a Python client library for their different products. Google Analytics is one of them, AdWords is another, their link shortener, YouTube, tons of different examples. So we saw this. And the first barrier that I saw was getting the authentication set up. And Google’s documentation was super, super confusing. Again, thankfully, I had this co-worker who’ve had some experience with OAuth 2.0 and authentication. So he helped me through that process. But yeah, that was really my first foray into Python or R. So that’s how I got there.
04:52 TW: How long did you give yourself… That seems like you… You were a very experienced web analyst at the time. So you knew that while this was a reasonable ask for data, it was gonna be a pain in the ass with the tools that you had available. But at the same time, you had… I think I remember we asked Eric Goldsmith the same question. He was the same way, he said, “I just dove in and decided I was going to do something with it,” which in hindsight, was much better. Taking an R course on Coursera, I did absolutely nothing with it. But because you had that phone-a-friend person with the Python and some OAuth knowledge, was that like you gave yourself a work week to turn it around? Or was it, “Let’s see how far we can get in a day. And can we get close enough that we’re confident that this will work?”. I feel like you were taking a little bit of a leap, where you’re also losing time that if, “Nope, I gotta do this the horrible way.” Do you remember how long it took before you were like, “Oh yeah, this is totally gonna work?”
05:54 RP: It took a week, for sure, maybe even longer. And I think that’s a big part of learning all this stuff, is people probably give up way too soon. And I think they underestimate the time it really takes to dive in and figure it out, so that’s a really good point, yeah.
06:09 TW: But once you’ve gone through it from start to finish, then the ability to go and tweak stuff became like, “Oh, I could do 200 things just with what I’ve already learned?”
06:21 RP: Totally, totally. So I think the benefit here was we had a little bit of time on the project, and we had the time to invest in figuring out how to use Python to pull the data, and then, being able to tweak it. We knew that would be a benefit once we figured it out. But that’s really the invaluable part, is once you do spend the time, and get the setup done, and go through the pain, later, if we had to do things like, “Oh, somebody wanted a different data range,” or, “Somebody wanted to add a few different dimensions or metrics,” doing that in something like Excel, it would take forever, it’d be a huge pain. With Python or R, you just change a few lines of code, rerun the scripts, and you have the new data. So the benefits, long-term, can’t be underestimated, and it’s totally worth it. If anybody’s thinking about getting into it, I highly, highly recommend it.
07:15 MH: And once you’ve used one of those tools, you were also able to call yourself a data scientist, correct?
07:24 MH: Actually, even if you’ve just thought about using one, I believe that’s…
07:27 TW: Well, you have our studio installed, Michael so…
07:29 MH: That’s true. So actually, as a data scientist, I’m throwing it down right now.
07:36 TW: I also sort of feel like there’s a little bit of… With R, I wound up tackling some things that I’d wanted to do, but that I didn’t really have a specific application. I just thought, generalizing this, this would be useful to do. And then, it was really the third or fourth thing that I did was actually quite simple, and it was a very similar thing. It was be able to pull data from 38 different views and do different things with it, like how did it… I guess, how did it progress from… ‘Cause now, you’ve used R more. Do you still use Python at all? Or do you feel like you’re primarily R at this point?
08:13 RP: So in preparing for the podcast, I did a little refreshing on Python just to make sure, but I’ve pretty much moved over to exclusively using R in the last few years. And that’s just been because of the amazing packages that are available for the analytics tools for Google Analytics, the package Google Analytics R, and the other associated packages from Mark Edmondson and Fred Dobbie, Randy’s, which is RSiteCatalyst, are just, I would say, far and away, the best packages for the daily tools of the trade. And there’s really nothing I would say comparative in Python.
08:53 MH: Oh, like in terms of accessing tool APIs and the like?
08:56 RP: They’re just not as good, and they don’t have the community of users, which I think has been super beneficial too. It’s been great to go on the Measure Slack, and ask people who go on Twitter and ask people questions, bounce things directly off the package creators. That’s been amazing and it’s helped me learn, I think, way more than I could just Googling stuff.
09:19 MH: In researching a little bit for this show, it was sort of like that seemed like there are more packages that people were talking about for R. But a lot of the people that I read were also saying, basically, you could do more with one package or whatever it’s called for Python. So I was just curious. But for specific things like accessing Adobe Analytics report data or things like that, then it makes sense that having the R package would be something that would be more hopeful than panda or something like that for Python.
09:55 TW: And I’ve slowly started to sort of, I think, understand this, that you have the API. And the API, be it a REST API, which I don’t fully understand what that means, or whatever kind of API, it’s there, and it doesn’t really care what kind of calls are being made to it. It’s then if you’re in Python, or R, or SPSS, or Tableau, or whatever it is, is… Winds up being an intermediate layer that can get very, very detailed, where you’re saying, “I’m basically using this platform’s tools to make direct calls to the API in a very direct native way.” What the packages do is they just move that up just a little bit of a level to say, “We’re gonna put a translation layer so it’s gonna be just more readable. We’ll cut out some of the stuff that you have to do all the time so you don’t have to keep doing it. We’ll just make it where it’s simpler. You’re still very, very close to the API, but instead of hooking into the API… ” And I guess I feel like I went through this when I did connect to Yandex.Metrica with R, all of a sudden, I was using the httr package, which is the, “Hey, if you don’t have a package that’s done some of that work for you, like Randy and Mark have done, you just have to get one more layer of technical.” And to me, that’s a layer of technical that I don’t really feel like is playing to my strengths as an analyst. I’m getting just a little too far down in the weeds to get the code to work as opposed to get the code to do what I want it to do.
11:32 RP: Yeah, I totally agree with that. For me, being able to run a script, to run a program and see the data quickly, it’s amazing when you’re actually able to see it. I just thought it was so cool to do it and to see data come back, being able to request it via Python or R. So the faster that somebody could see that, I think that’s gonna get them more hooked on whatever tool it may be and be willing to spend more time on it. So I’m all for making it easier. And for me, R has just been easier.
12:07 TW: So because you started on Python, and then, shifted to R, the thing that I’ve heard but I haven’t experienced… Well, I experienced that R was very confusing to me. In my background, I had extensive dabbling with VBA, I had written C, go back to Basic. So I’ve been in scripted languages for a while, and R had all, and continues to have things that mentally, I struggle with sort of the approach, even though I realized that for a lot of reasons there have a lot of benefits. Lists is definitely on that list because you try to avoid loops in R. And I kept hearing that Python, the ramp-up on just the syntax and the nature and how to structure a program is a much, much faster progression. Did you feel like when you went from Python… Or maybe back up to what was your… Before Python, what was your code-savviness? And then, was learning the syntax and the mechanics of Python, can you compare that to R? And you can’t have a parallel universe where you learn both of them starting in the same spot. So obviously, you had already learned Python to some extent before you went to R. But is that something that… Does that resonate? Or is that just some myth that I’ve picked up along the way?
14:09 MH: Yeah. Well, and I’ve heard people say that if you come from sort of more of an analyst or statistics background, R will feel better to you. And if you come from a programming or development background, Python might feel more natural. And I wonder if that could color how somebody perceives it?
14:29 TW: Well, definitely. One of the things in R that will make… So me not coming from a statistics background, and Mark Edmondson doesn’t either, and so, I’ve heard him say the same thing. You get a lot of factors and levels, which to a statistician, makes perfect sense. And a lot of base tasks that as a web analyst doing in R, you basically are saying… Stuff’s coming in automatically as factors, and you’re saying, “No, no, no, don’t treat those as factors.” And it’s confusing. Your code won’t work because you forgot to tell it to treat the factors as strings. So some of that, and even as I’m learning more and more statistics, I’m saying, “Ah, this stuff, it starts to make sense how this will be really useful that R has it.”
15:14 TW: I’ve also talked to people who said the very first thing they learned was R from a programming perspective, and they pointed out, they said they knew nothing. So because if they learned in the right way, they said it was learning their first thing, and now, Python, when they’ve looked at that, it is very confusing. So it’s like whatever you learned first, it’s like if you learned Adobe Analytics first, it’s gonna take you a while to ramp up on the Google Analytics world, or vice versa. So I could see that this… It’s like on the one hand, you could say the second one’s gonna be more challenging ’cause you’re used to the paradigm of the first one. On the other hand, you could say, “Well, they’re both programming. So once you’ve learned one or the other, all you have to do is sort of figure out the translations.”
15:58 RP: Yeah. I think one big thing for me that made R be a bit easier and make a lot more sense was the concept of the data frame, which I later learned that you could also do in Python too through pandas, which I didn’t know about right away. But the data frame seemed a lot like an Excel sheet. So you have columns and rows, and you have headers for each of your data. And each row is a observation. And it just made a lot more sense to me looking at data in that tabular format.
16:29 TW: That’s a good point.
16:30 RP: Yeah.
16:31 TW: And now, I find myself, when I wanna do stuff, and… ‘Cause data frames are really just a list of a special kind. I’m like, “Oh! Back to lists.” So we’ll leave that as unresolved. It depends on where you’re coming from, as to which one may be easier.
16:45 MH: Well, yeah, now that we’re all teaming up on R, Team R. So everybody, you heard it here first. Digital Analytics Power Hour, we looked into it exhaustively, R is better than Python.
17:01 MH: Send all your comments to @tgwilson on Twitter.
17:04 TW: Yeah, we want credit. Somebody was like, “Why haven’t you done a thing on Python? That’s all I use.” Actually, I can think of who it is, and I’m not gonna call ’em out ’cause this is not gonna wind up being the show on Python.
17:15 MH: Well, and I think there’s different ways to go about it. The other thing I’ve heard people say, and this is actually because all three of us, we work more on the services side of the industry, is that R is a lot of times better for doing sort of analysis. Or somebody I read, I think it was on Quora, said it was better for “consulting-type tasks,” which I take as sort of breaking down data and finding sort of patterns or meaning or direction out of it; sort of a typical analytical type of thing. And I thought that was interesting, and how that might contrast with Python that you might use to, say, build products or leverage its sort of ability to do any kind of development, both for applications or the web.
18:05 TW: With Python being more… It is more general purpose. Whereas, R, all-purpose…
18:08 MH: Yeah, it’s a more general-purpose programming language. So for me, I was sort of like, “So would it mean I do my analysis in R, but when I turn that into a product to give to other people, I write that in Python?” I’m saying if I knew how to do those things, not me personally.
18:26 TW: Well, my understanding is that… And this is… I’m woefully ill-equipped to talk about it, but that you can actually do some things where you build a model in R. But then, you’re not recoding it to use it in Python. You can… They’re not like you have to do one or the other. I am not equipped to say I’m gonna use them sort of… I’m not gonna call a Python thing from within R. I’m not gonna use Python to call a model in R. But I guess there definitely seem to be the ninjas out there that say, “Oh, I freely float between the two depending on the specific task and the end goal, and everything from what level of production, what performance is needed, who’s gonna maintain it long-term.” Whereas, I’m about the, “Does this work for me to do this and maintain this over the medium-term?”
19:18 MH: So brilliant product idea for somebody out there, build a translation tool so people can write stuff in R and automatically port it over to Python. What could go wrong?
19:33 TW: Oh, way back in the day, I worked at a company that had… They had looked at the graphical user interfaces that were coming up at the time, which there was Mac, and there was Windows, and there was this thing called OS/2. And they were gonna have the first graphical tracking management program and they coded it in OS/2. Took ’em a year or two to do it, it was a big deal in the industry. And then, OS/2 pretty much bombed, and Windows NT came on, and they actually paid a company that had a tool that was gonna port the OS/2 code to NT. Lest anyone thinks that you weren’t totally joking, that would be a horrible, horrible thing to do. But it did bring me back to the mid-90s when I watched a company do something similar.
20:20 MH: Well, ports are really popular, yeah.
20:23 TW: Well, except the idea… I guess… So Ryan, do you have other people using R? You had the Python person. Are you relying on the online community almost extensively now? Or is there R adoption within your… I’m sure there are plenty of people in EY who are using R, but in your domain, in your area, are you a lone wolf?
20:51 RP: No, I very much rely on the online community. Most of the people that I work with currently are not R users. So yeah, it’s nice to have that online community there. There were people, definitely, that I have worked with over the years that have been users of R. And there’s tons of people who are telling me to get onboard years ago, and I just never did. I think, yeah, it’d be awesome if I worked with people who used it internally too. So if anybody from EY or EY-Society is listening to the podcast, who’s R users, hit me up, please.
21:26 MH: Yeah. Well, actually, that’s awesome. And that’s what we’ve been noticing over in our shop too, is people are just coming, congregating around it, and learning from each other, and it’s awesome. We actually just established a Kaggle Team, I don’t if there using R for that, but they’re…
21:42 TW: But you guys have still some level of consistent Python. It seems like you’ve got people who are…
21:49 MH: Nope.
21:49 TW: No? Not so much?
21:50 MH: No. Yeah, we do. So the way we break down, we have developers who are building product. So those guys are leveraging whatever tools that we need to use at whatever time. And I think we’ve done some stuff with Python, although probably not as much as… Not too, too much.
22:10 TW: Maybe it’s starting to feel like, “Maybe I don’t have to have on my long-term plan to actually learn Python.” I’m sure it would be good to spend some time…
22:18 MH: No, Tim, you have to learn Python next. How will you be the quintessential analyst if you don’t know R and Python?
22:26 TW: My plan was to retire before I die. I don’t think I’m gonna fully learn R by that point.
22:31 MH: Alright.
22:32 RP: That’s gonna be the last part of the walk, right? The last part of going up contadon is gonna be your Python pilgrimage, to finish the data science walk.
22:39 TW: Yeah. Heart attack, heart attack on the side of the hill, he didn’t make it.
22:46 MH: Hey.
22:47 TW: Yeah.
22:47 MH: He died doing the thing he loved most: Trying to become a data scientist.
22:51 TW: So outside of the web analytics things, have you done much with hooking into other… I know I have tried with R and outside of my one Yandex.Metrica, which took me a bit of effort. And that was web analytics so I was very familiar with the data. But I remember trying to think, “Oh, I’m gonna pull some data from ExactTarget.” They have an API key but I immediately got lost in their documentation. Wasn’t sure if it was even doable at all. And I don’t know if you have a sense; is that more the nature of the API and its documentation, and how it’s set up? Or is it possible that using a different platform that I would have been more equipped to hook into some other maybe more niche data source that didn’t have a package developed for it? I don’t know if you have any experience or thoughts on that.
23:45 RP: In that case, I’m not totally sure, but I have hooked into other data sources and loaded other data sources successfully. Obviously, if there’s a package that’s gonna make it easier, and I think there’s probably good packages for dealing with HTTP requests and building the request to the APIs and stuff. So I would love to take a look. I heard your episode talking about Yandex.Metrica and definitely sparked my interest. I saw people discussing it on the Measure Slack too. So if you wanna share that, that would be interesting to check out.
24:20 RP: And actually, that brings up another point that I wanted to bring up, that Tim and I have talked about separately. And he’s always pushing me to shiny-fy my applications that I’ve built on my site and share ’em. And I think it’s a great idea. And also, I think it’d be awesome if there’d be some sort of repository to share code across the industry. I know, Tim, you’ve been working on something, and you have that course coming up, that’s super-awesome. I know Randy tried with the Analytics Playbook to create a repository to share stuff. And there was some activity a couple of years ago to do that. But the stuff has been so helpful to me, that’s why I share it on my own site. But I think some sort of centralized spot to do it would pretty awesome. So I’m putting Tim on that one.
25:12 TW: Great!
25:13 TW: Yeah, I think Randy and Jason, they made the best run at it ’cause they actually had the… And probably ’cause they had the technical chops and the vision to stand something up. The problem is, I’m not equipped to… I think Randy, somewhere, he sorta called out. He’s like, “I think we had just a little bit too high of an expectation of people’s ability to actually do… ‘Cause you just make a branch and, ‘Check this out,’ and do this thing.” And I’m like, “Man, I’m barely learning Markdown and I can’t quite figure it out.”
25:44 MH: Branching is hard.
25:46 MH: Pull requests. Well, that is a question is should people just use GitHub for that? Given that that’s an industry standard? Or is it for needing explanation and describing what you’re doing with it?
26:03 TW: Well, there’s even a mechanism for that. And again, Mark had mentioned he’s done it with many of his packages where he stands up GitHub pages or… Dartistics.com is actually all… It’s GitHub pages. I commit and push to GitHub and the website is updated. I think it’s more that having individuals maintain their own repositories is actually great and giving them some guidance on the sort of documentation they should have. The challenge is that GitHub is this entire world, and it’s, “How do you make in an organized way as searchable?” I don’t know that it’s searched. I don’t know how much do you want browsing for people to say… What’s the entry point?
26:50 TW: It’s almost like a website, information architecture challenge of you wanna have a high, high volume of these things, but you wanna have some level of the quality of them. You wanna have some searchability, you wanna have some browsability. You wanna know, “Is this something that’s gonna take me an hour to get up and running within my unique environment? Or is it gonna take me two weeks, but it would’ve taken me a year-and-a-half otherwise? It just seems like there’s a lot to work out to say, “How do you get this up there?” And then, you’ve also got the… At some point, there will be companies that wig out that there’s IP being given away, although I think those people just don’t share. I think there are plenty out there who are thinking, “No, this is cool. It’s not gonna give my direct competitor any competitive advantage,” so yeah.
27:41 RP: Yeah, it’s tough. I think the DARTISTICS, is that what it… I can’t even say the name anymore.
27:47 TW: Yeah. Yeah.
27:49 RP: I think just having the links there is helpful. I think if you could tell somebody, “Hey, go to this site. There’s a bunch of really good links and a bunch of good examples to get going. And here’s where you should start.” I think that’s super helpful. ‘Cause I remember when I was even getting going, even looking on Google, searching for stuff for Python, and how to do Google Analytics, data pulls with Python, it was impossible. And Google’s documentation, for me, as somebody who wasn’t a super, super programmer, it was really hard for me to follow. And even to do the authentication and set up the project on the developer console, it’s confusing.
28:26 TW: Yeah, which I guess I’m sometimes surprised that there aren’t… Actually, that’s interesting. With the Yandex.Metrica front, their internal people use Python. So they were trying to help me. But they’re like, “We don’t really use R.” I did meet one lady who does use R somewhat, but it sort of feels like that if a company has an API, they are more likely to publish Python-supporting things for it. Adobe did come into it sort of in a weird backwards way where they are now big supporters of RSiteCatalyst, but at the same time, you’re like… Randy said, he was like, “I just wanted to learn the platform, so I wrote a package for something that would be useful.” And now, that’s become this thing that a shit-ton of people rely on. And frankly, Adobe’s API is slow. It’s frustratingly slow to return results. And so, you’re like, “Why wouldn’t Adobe take that on and own it and say, ‘Hey, we’re gonna both make a package, and we’re gonna figure out what the choke points are in our API, and we’re gonna improve that because… ‘” And maybe I can…
29:35 MH: Once again, that’s @tgwilson.
29:41 MH: You are right about that, Tim, that the APIs are fairly slow. And the work we’ve been doing with analytics anywhere, we actually put all the data in a different platform so we can access it faster via voice commands.
29:54 TW: Which you could see, in their case, they motivated the people to use their…
30:01 MH: Yeah.
30:02 TW: And maybe they’re not that incentivized ’cause they’re like, “Oh, if we make it that easy to get the data out, then that takes away this whole tier of our product offering.”
30:10 MH: I don’t perceive that. I think it’s more that it’s just a function of how the platform works today. And I think things like analysis, workspace, and the new Sensei platform, which will be coming in in a few years, are actually things built to address those kinds of things.
30:29 TW: I guess. Although that’s if they’re looking at workspace and feel like I’m clearly… Yeah, I guess it’s probably an infrastructure issue. I just don’t know the names of the people who are the performance, API… Analysis Workspace is maybe faster than reports, and the analytics is by no means fast compared to me making a query through the Google Analytics API of an equally-sized site. And I just don’t know who’s… I don’t know. I don’t know, so I’m not, by any means, trashing them. It is something you become very cognizant when you’re saying, “Oh, I wanna run this script,” and “Wow, I’m waiting for 40 minutes for this thing to run.” And that seems terrible. And some of it is, “Well, that’s shitty code you wrote there, Wilson.” But in other cases, I’m like, “No, the API is slow.” But that’s another challenge to actually… If you just take Adobe and Google, they’re working with dimensions and metrics. You can get a generic data set, and I can just change the headings and say, “Oh, this is gonna be an eVar,” or, “I’m gonna have a space between page and name.”
31:37 TW: And the metrics, I can translate them. Structurally, the data coming out is very, very similar. But that’s this other challenge is that you’ve got the getting the data piece, which is tool-specific, and then, you’ve got doing stuff with the data. And I think that’s another wrinkle with trying to build a community of shared things, is I’ve done it now a few times where I built something originally for using Google Analytics data, and then, I’ve ported it over to use Adobe data because I knew I wanted to do that. I just made sure I structured the code to say, “Yeah, I’m ripping out the ‘where’s the data coming from.'” Or even with shiny… I don’t know if I can remember if I got the one working, where you could change the drop-down and say, “Which platform do you wanna pull it from?” But that is asking people to say when you’re developing this, you need to develop, you need to spend an extra 10-15% of an investment to make it reusable. Now, that’s probably gonna benefit you even if you never shared it. And that maybe is just good coding practices to say you’re building it for reuse and extensibility. If you do that really well, then it also really builds for shareability because it’s well-documented, it’s well-written. And you can say, “Yeah, fine. This was written for Adobe data, but if you know how to pull Google data, then this is the section you need to pull out and swap out.”
33:01 RP: Yeah, and at least, it’s only two tools that you have to really worry about, Google and Adobe.
33:05 TW: Right. Yeah.
33:07 MH: Cool! So in learning this stuff I’ve gone through… And I’ve done some really intro-level on Lynda.com. I did an intro-level series on R that just shows you some real basics in terms of getting it set up. And so, do I understand you guys correctly that sort of getting from that point, being able to get in there, and do a “Hello world” with it, to actually accessing APIs and getting to this data is a big hurdle, still, for people? Or it’s not well-defined so that people can do it easily? Is that right?
33:45 RP: Yeah, I think it’s still definitely a bit of a hurdle.
33:49 MH: Okay.
33:50 RP: I think, personally, like I said, having a specific task that I was trying to figure out held me a lot.
33:57 MH: So it seems like maybe a little YouTube video that shows people how to actually just go from, “Here’s R. Here’s installing this package. Here’s authenticating into Adobe and pulling out a set of data.”
34:12 TW: Well, Ryan, your Google Search Console thing, was that R or was that Python?
34:18 RP: That was an R one. And actually, I did do a little video for how to do that in Google. So if you wanna check out my site, ryanpraski.com, I do have a tutorial on how to get started with Google Analytics R, with a video.
34:30 MH: Very nice! See, teed it up, didn’t even know it, awesome! Definitely, check that out.
34:37 TW: I’ve written and revised… To me, I think part of it is that there is the learning the syntax. A very simple example in R is that when you’re setting… If you wanna set a variable called “A to the number four,” it is A, and then, it is “less than dash three.” If you have not internalized that, guaranteed, you’re gonna say, “A=3.” And your code’s not gonna work and it’s gonna frustrate you. When do you put two equal signs? That takes some repetition. And then, there’s this next-level, there’s this whole… There’s base R, and there’s this thing called the tidyverse where there’s this whole set of packages that really are better and cleaner, but you… And it took me over a year to where I got to where I’m like, “Eric Goldsmith had told me why it was so awesome.” And I didn’t get it. And it’s only been… It was a year later where I’m like, “This is what he was talking about.” And now, this is amazing. I am just piping shit together like crazy and saying, “This is so readable,” and I’m just quickly iterating through this. But some of that just comes with repetition. So that, to me, is one…
35:47 TW: You may be able to see that you’re doing it but you’re really… It is like learning another language until you’re thinking in that way, and that just takes time and repetition. And then, I think the second part is that the more… The building models, internalizing factors and levels, and thinking through what alpha you wanna use, and interpretation, and what a linear model is, and what time series data is, and time series decomposition. There are all these concepts that have been explored very, very heavily. But if you’re not coming from that world, that, to me, is this other… That’s the big hurdle for me this year is to get to laboriously going through a cross-tabulation with Chi-Square, and understanding what that is, what it means, and what I can do with it. Understanding ANOVA, what it is, where I can use it, when it a one-way, when is an in-way. And so, I think you can get to the…
36:49 TW: It’s funny that Ryan and I had the same, the first, “Oh, this is fucking awesome,” was I can crank through a bunch of Google Analytics views and pull some set of data. And, “This just saved me hours and hours of tedium,” and you can do that. And if you do that enough, you learn the fluency, and the syntax, and the tool, but then, you still have to keep pushing yourself to say, “But what about all this other… ” It could be text mining. Certainly, it could be regression models. It could be dipping into machine learning type stuff that has the capabilities there, but you still have to figure out what is it you need to learn, and then, how do you learn it. And then, using the code, I think, becomes sorta second nature. The code is the easy part. It’s actually understanding what you’re doing and what input you’re getting and what… How to interpret the outputs.
37:43 MH: Yeah.
37:44 RP: For me, I don’t think it really sunk in until I tried to do something that was a little bit outside of work, and something that was a little more of a personal project, where I wanted to… I was frustrated by the data on my iPhone for my steps. I was never one with one of those fitness bands but I thought it was pretty awesome.
38:03 TW: I’m guessing that’s not like… I’m frustrated that I’m never actually walking that far.
38:08 RP: Yeah, I wasn’t doing enough steps. That’s when I opened up the dashboard and saw that I was under 10,000 a lot of days.
38:15 TW: So you spent the next 10 hours coding and realized that didn’t help at all. Your steps went down.
38:20 MH: How do I use R to make it look like I’m walking more?
38:24 TW: My step count is inversely correlated with my ability to… My time, my ramp-up on R. I’m spending more time reading.
38:32 RP: Totally, but then once I actually started looking at it, my step count is… I bet it’s a lot higher now that I’m actually seeing what’s happening. So yeah, it was super helpful.
38:41 TW: And that was… Were you looking into… So you looked into… I cut you off. I think, were you heading down that you actually looked at your data? You hooked into the API for…
38:51 RP: So they don’t actually have an API, which stinks. Apple is so guarded with all their data, so you actually physically export it from an email. They give you an XML file. I figured out how to translate that into an R data frame, and then, analyze the data from there. And it was pretty interesting because it’s like each step was at a different timeframe. So they could be a couple seconds apart where they would measure the steps, or a couple minutes. And it was pretty wild just to see that data. It probably looked like an old server log or something from Webtrends, I bet. But then, figuring out how to aggregate that, and figuring out how to use dplyr to do that kind of aggregation. That really pushed my learning forward. And then, visualizing that data and that was the most helpful thing, I think, I’ve done, just something that was tangentially-related, but personal to me that I wanted to figure out. And yeah, it was helpful.
39:47 TW: So did you start by… One of your first tasks was to say, “Can you take that raw data and replicate what their crappy dashboard is giving you just to validate that you were looking at it the right way?”
39:56 RP: Exactly, yeah. That was the first step, is making sure that I could, at least, replicate and get the aggregations that they had. And that the step counts, at least, per day were the same as what they were showing in my dashboard, in my phone, so yeah.
40:10 MH: So that was the other thing I read, was that… And I think, Tim, you touched on this just a minute ago, that for the natural language processing and things like that, it seems like Python has got the edge over R. Is that not true anymore?
40:25 RP: Yeah, I think both the tools have progressed so much that I think you just choose one and sorta go for it. I think that’s my conclusion from our discussion.
40:32 RP: That’s what I did.
40:33 MH: Alright.
40:35 TW: Well, it sounds like choose one but I was a little nervous that there is this… And maybe we will hear that we just haven’t found. There is this active group of Python users who we’re talking to each other. But to echo what you were saying, we’ve got this sort of international group of very helpful, pretty well-connected to each other digitally people using R. Somewhat, I guess, the R and statistics channel on Slack, occasionally, somebody’s like, “Well, I don’t really use R, but here’s the statistics thing.” We don’t have a Python channel, we don’t have R split up from statistics. So it may be that the Measure Slack channel is reinforcing it, but I haven’t heard any evidence that that’s necessarily a bad thing. There aren’t that many people who are doing it right now. So I guess we won’t wind up with a pissing match if there’s not two active and aggressive factions.
41:32 MH: Oh. No, but now, we are teed up for the angry tweets and comments. So I’m super excited to see what happens in the Measure Slack after you watch this episode. Anyways, this is great. And I think there’s some really good, tangible things for people to take away from this and get. I think for me, Ryan, it was, “Get in there. Start defining some projects, and give yourself some goals. And use that to push yourself forward.” So I love that. Well, one thing we do on the show is we go around, we share a last call, something that we’ve learned or solved recently that we thought was pretty neat, worth sharing. So hey, Ryan, why don’t we start with you? Do you have a last call you wanna share?
42:17 RP: Yes, I do. So my last call is drift.com. It’s a chat bot, live chat application. And they have an ambitious goal to replace lead forums. And they were actually founded by David Cancel who is an analytics industry person. He founded Compete and Performable, which was purchased by HubSpot eventually and I guess, rolled in to become their analytics tool. But Drift, it’s a super interesting tool. I have it installed on my site. It’ll pop up. So if you go check my site out, you could see it. And I’ve had quite the response. It’s been interesting to see people ask questions about blog posts that I’ve written right in the chat window. And I think that the… The biggest difference that I’ve seen is they have a really nice app that I could chat with people if I chose to. Obviously, for my blog, I’m not gonna be up at all hours chatting to people. But for your business, it could be really useful. So that’s my last call. And they have a free-tier too, so go try it out.
43:19 TW: So that’s what pops up on your site?
43:21 RP: That’s it.
43:21 TW: I don’t think… I didn’t… I was like, “Really?” Like, “Really, Ryan? Are you… You could be out on the slope somewhere, but okay.”
43:29 MH: Now, does Drift, does it integrate with Intercom? No, I’m just kidding. That’s just… Sorry. It probably does.
43:37 RP: Maybe, yeah.
43:38 MH: It probably does.
43:39 RP: I think it’s a competitor, though.
43:41 MH: Oh, or they would be considered a competitor?
43:43 RP: Yeah.
43:44 MH: Okay, well then, never mind. Take that, Intercom! Drift is coming for yah! No, that’s pretty cool. I will definitely check that out.
43:51 RP: Yup.
43:52 MH: What about you, Tim?
43:53 TW: Did you guys see this thing that Google has written up? And I think they did it, it was back last year, but I just found it more recently; that was machine learning to improve the energy efficiency of data centers, where basically, Google has always been environmentally-conscious, so they did a lot of stuff, and reduced the energy footprint by 50%. And then, they’re like, “Yeah, we think we pretty much got it.” But then, they took some of their… It’s a little bit of a long roundabout story. This guy had some pretty good failed attempts that are amusing where he said, “Oh, maybe with machine learning, we’ve got… ” This is a challenging problem. If we had 10 pieces of equipment, and each one of ’em had 10 settings, that would be 10 to the 10th power of possible comp configurations. So we just can’t test our way out of this. But maybe this is something that machine learning could help with.
44:46 TW: And then, his first few takes at it six months in were just complete bombs. I think one of the times, the model came back and said, “We’ll just turn all the data centers off. That will reduce your footprint the most.” But that converged with Google’s, with DeepMind. And so, now, they’ve gone on, and actually apply machine learning, and gotten a 15% reduction in overall energy overhead. So it was cool. Google comes with machine learning, comes with somebody saying, “If we have enough data and enough leverage to pull, here’s a good scenario for it.” And it’s make-the-world-a-better-place sort of thing. So it’s cool. It’s an interesting little read.
45:32 MH: Nice! And to round it out, I too have an R-related… No, I don’t. I totally don’t.
45:41 MH: Actually, what I’ve been reading lately is I’ve been trying to work my way through the Internet trends report. Mary Meeker does Kleiner Perkins, so I know it’s come out a month or two ago, but it is long and worth the study read. And I’ve been reading up a little bit on Bitcoin too. I think I’m too late to Bitcoin.
46:07 TW: Have you got anything if it’s just a go of the hundred and however many slides, anything from Mary Meeker that surprised you, or that…
46:17 MH: Okay. So one thing stood out to me. There was a slide on social media, and I’m doing this off the top of my head. But it was showing this really high engagement on social media on one side, one chart. And then, on the same slide, a different chart showing that businesses didn’t understand the ROI of social media. And I was like, “Maybe there’s a problem and a solution all in one.”
46:38 MH: It’s like, “Oh! Hey! Engagement as a metric, not understanding ROI. Interesting!”
46:46 RP: Nice!
46:47 MH: So anyways, more to learn in that space. But yeah. No, there’s… And I feel like I saw the trend of eSports happening a couple years ago ’cause that’s something I’m really curious about. And it’s starting to become a really big deal. And it got some space in the report as well. So there’s a lot of cool stuff that’s happening in the world. Tim, just for you, eSports is the thing where people watch other people play video games online.
47:18 RP: But in person? In person too?
47:20 MH: Yeah, in person, yeah.
47:20 RP: And in huge stadiums?
47:22 MH: Sold out the Staples Center. They’ve sold out Madison Square Garden for League of Legends. Here in Atlanta, there’s a company that does events pretty regularly. In fact, in July, I think somebody was telling me there’s a Halo tournament that they’re planning to go to. So I was like, “Well, there you go.”
47:42 TW: Things that trigger thoughts in my head that make me realize I sound like the, “Get off my lawn.” But I do have two teenage sons.
47:51 MH: There you go.
47:52 TW: I’ve learned with the “E” and be vicarious.
47:55 MH: Yeah! All this whole conversation about R and Python has been triggering that, “Get off my lawn,” Tim.
48:04 MH: But if you have been listening out there, and you have just had it up to here with Tim’s inability to give Python its proper due, or you’ve got some great ideas the rest of our listeners could gain from, we would love to hear from you. And you can do that on our Facebook page, on Twitter, and on the Measure Slack. And Ryan also, really active Measure Slack user, so ask him some questions. Gain from his experiences using R and Python, and things like that. As a community, if we help each other, I think we will advance the cause of analytics greatly. Ryan, thanks again for coming on the show. It was such a pleasure having you.
48:47 RP: Thanks a lot for having me, Michael. Thank you, Tim.
48:49 MH: Yeah, it’s so great. And for my co-host, Tim Wilson, remember, keep analyzing.
49:01 AF: Thanks for listening. And don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour, or at Analytics Hour on Twitter.
49:21 S5: So smart guys want to fit in so they made up a term called “analytics.” Analytics don’t work.
49:30 MH: Yeah. You gotta go through the outtakes, that’s right.
49:33 MH: Which is technically the only part of the show that I regularly listen to.
49:38 TW: But once I got there, the relief, I arrived with an empty bladder and a parched throat.
49:46 TW: There’s very little that I just flat out don’t like. I don’t like something… I don’t like… Come with pumpkin in it. It shouldn’t have coffee-flavored and pumpkin-flavored.
49:54 MH: Oh, but see, I don’t know. Southern Tier makes this one called Pumking. ‘Cause I’m with you, typically, Tim. This one…
50:01 TW: No.
50:02 MH: No, it has this toasted-ness to it, the rum.
50:05 RP: The Southern Tier one is pretty good. I’m with Michael on this one.
50:08 MH: Yeah.
50:09 TW: Fuck you both. You know what? You guys have your own discussion, I’m out.
50:13 MH: Yeah.
50:14 RP: Didn’t you guys drink a bunch in that trip to the conference you went to, the SUPERWEEK?
50:20 TW: Yeah.
50:21 MH: Underneath the SUPERWEEK? No.
50:25 TW: Yes.
50:28 MH: I did not go outside. I just drank with Matt Gershoff.
50:32 MH: Simo, I have him to thank for installing our studio.
50:36 RP: He installed it on your computer?
50:37 MH: No. As a result of that podcast, I installed it.
50:40 RP: Oh, oh, okay.
50:43 MH: No, we have our 404 page, which actually is pretty amazing, but…
50:48 TW: Oh!
50:49 TW: That’s nice!
50:51 TW: He might come across as this nice, pleasant, helpful, brilliant individual, but deep down, I think he’s just an evil, scheming…
50:58 MH: Oh, just cackling with laughter.
51:01 MH: Tim, can we get started? We’ve waited long enough.
51:04 MH: Got a guest on the show.
51:07 TW: Can we talk about the updates to the analyticshour.io site, Michael?
51:10 MH: No, we cannot talk about those right now, Tim.
51:14 TW: I saw that it came out, and literally, the thought that went through my mind is, “Oh, I bet Michael is excited about this.”
51:20 MH: I’m on my third day of going through it. I’m only on slide 188.
51:25 TW: The number of people who see me as Tig Wilson. It’s…
51:28 MH: Tig Wilson? Hey, so like taking the tart out, Tig Wilson?
51:32 TW: Oh, that would be… Yeah, okay.
51:33 MH: Yeah. See? Now, you like it.
51:36 TW: We can cut that out.
51:36 MH: Yeah, we’ll cut that.
51:37 TW: Nah.
51:41 MH: Guys, you both like all that. Get on the same page.
51:46 TW: Rock, flag, and Python. Or not.