#019: R U Curious About R?

In honor of Talk Like a Pirate Day (and by popular demand), we donned our eyepatches, poured ourselves a few tankards of grog, and commandeered the wisdom of Eric Goldsmith from TED (maybe you’ve seen one or two of their videos?) to explore the whats, whys, and hows of R. If we’d recorded this episode with Excel, it would have taken an hour, but, with R, we pulled it off in 42 minutes.

 

Episode Transcript

The following is a straight-up machine translation. It has not been human-reviewed or human-corrected. We apologize on behalf of the machines for any text that winds up being incorrect, nonsensical, or offensive. We have asked the machine to do better, but it simply responds with, “I’m sorry, Dave. I’m afraid I can’t do that.”

[00:00:24] Hello everyone. Welcome to the digital analytics power hour.

[00:00:28] This is Episode 19. Guess what everyone I’ve got wonderful news. Are you curious about. Are we sure are. And you know this Saturday is Talk Like A Pirate Day. So we figured hey let’s do a show about our are impressed by Michael Hellblazer not necessarily those of Jim Kane and Tim Wells. The great disclaimer that since Jim Kane and Tim Wilson who are my cohosts and I don’t know anything about are we decided we needed a Sherpa on this journey. So we’ve invited Eric Goldsmith. Eric welcome to the show. Thank you. Thank you. Let me tell everybody about you.

[00:01:14] You’ve got a pretty awesome resume and he’s going to be Herpa on a pirate ship. So already there are or are just go with that.

[00:01:22] Maybe Shiver Me relational databases. Let’s get into this. So yeah Eric Goldsmith. He’s currently the head of analytics at TED. Maybe you’ve seen one of their videos and prior to he did some time at AOL CompuServe. And he’s a guy for all seasons he’s done a few things in this industry and he’s certainly somebody we look up to when we think about our and all things data science. So once again welcome Eric we’re delighted to have you on the show. Thank you. Pleasure to be here. Well let’s jump into it. Let’s talk about all our other things related to our lead. The number two and no so yeah our is one of those tools right. Everybody. It’s a cool tool for analytics. It’s very from my perspective in the short little pieces I’ve tried to use it for I mean it’s just got a ton of power very flexible and it seems like the ultimate analyst weapon should every analyst do it. If you do use it does that automatically make you a data scientist. Answer some of our questions Eric you’ve got to say about our.

[00:02:37] Well the way I like to think about it are is it’s a combination of a statistical programming language and a language for manipulating data it’s built to manipulate data. I come at things from a programming background my formal training many many years ago is software engineering. And I’ve always had that bent toward programming. So it’s a natural fit to have a language that allows you to really program like are but where I think it really excels is the data types that it uses the manipulation techniques are all built to make working with data as easy as possible. You can work with data in any language. When I first started programming years ago it was in C C++ and the Java and other things you can do today Python Derrell popular for data analysis. These are all fairly general purpose languages you can do data manipulation with them but a lot of times you have to rely on external libraries or you have to develop tools or methods yourself to really make it easy. All that is built into our it was built from the ground up to make data manipulation extraction as simple as possible. So there’s so many shortcuts so many things that it would take me five 10 15 steps in a general purpose language like a python I can do on one line of code in our so by allowing me to get rid of those details. I can I can jump ahead and focus on the higher level work I don’t have to worry about the underlying details of data manipulation.

[00:04:08] So you’ve been kind of in our user for about five years so that back to your AOL CompuServe days. I feel like I’ve been hearing about are for maybe a couple of years although it’s been around for a long time. It was free to go and take the Coursera a course on it free to download and use anything that’s free and open source is appealing for you was it was there something you were doing you were hitting a wall and said I have to find a better way to do this or was it kind of came along. You said that’s interesting and you poked around and kind of gradually evolved to using it more.

[00:04:41] It was really a combination. But I think what drove it the most for me was the visualization methods available and are I’m a very visual thinker and whenever I’m manipulating data and trying to explain data to people I try to do it visually and for years I had done that kind of work in excel and was just hitting too many limits I just couldn’t do what I wanted do and I always heard about the capabilities of our specifically in that case towards visualization and that’s what led me to spend the time to learn the language. There is a learning curve. I won’t lie it’s a different way of thinking about moving data around and the building blocks I suppose you call them are or are different because there’s so much this is built in the language. There are larger building blocks so you can get more done with fewer commands. So there is that results in a bit of a learning curve. And the way you think about data. But for me the visualization piece was hugely important. I would run into some of the limitations and I’m not badmouthing Excel it’s certainly a tool that I use everyday still but for complex visualizations lots of dimensions and any kind of sophisticated manipulation prior to the visualization. It was just taking it was just too many too much work in Excel or couldn’t be done in excel.

[00:05:55] And I was able to get past all that with our few analysts that aren’t working in Excel and if they’re working hard and excel they’re hitting limits it’s interesting you say visualization. And we’ve actually talked before that we don’t have any tablet experts. I don’t think amongst the four of us tonight either so we can’t go down that path either. But it almost seems like there is there’s a parallel there are two very different types of platforms. It feels like our is probably a steeper learning curve but probably also has more power and plus the printing and the role are in you’re not shelling out a couple grand for a tablet license but it still feels like kind of a false false comparison.

[00:06:36] I’m not I don’t have all and are the same thing and I don’t think I put those two in the same bucket. I don’t think that they’re both they’re both.

[00:06:44] I mean if from what you said was I was hitting visualization limits in excel and I went to R and there are certainly people who are saying I’m hitting visualization limits in excel and I’m going to tablo.

[00:06:55] Yeah I mean that’s certainly true. I think if I’m following what you’re putting down there Eric you know one of the things that I’ll be is often a challenge with Excel is you run into things where you’re trying to work through a dataset to understand what’s meaningful out of it and excel just won’t give you the tools to kind of try it apart the way you need to pry it apart either look at it the right way in a visual sense or to run the testicle analysis against it. So kind of extending into our gives you a greater kind of just gives you a better toolset for understanding that particular data. Whereas I would Tim I would say Tablo is like once I know what I want to show I’ll put it in tablo.

[00:07:39] Maybe that’s when the day will be back on. Could he actually is he said that he’s always like tabloid because of the exploration capabilities within it.

[00:07:48] But I do feel like it’s exploration within a finite world because there’s so much that you need to understand about the data before you can even put it into tabla you have to position it correctly and get it all kind of normalized so that you can actually visualize it in a meaningful way.

[00:08:03] Anybody who goes in Jacks around with our for four hours like you know you put enough data interesting into Tablo anyone will go in and say look at the pretty pictures and Jack around for hours and not do anything of value it seems to me that and again stop me if I’m wrong. And again not a bad thing. Exploration can be powerful but ours seems to me to be a little bit more surgical in its approach. Clearly defined problem clearly defined datasets and had a white model and appropriate answers that nuts.

[00:08:32] Let me preface this by saying I don’t have any firsthand experience with Tablo but I have heard that a lot of what Michael said is true that you have to have the data essentially free arranged pre normalized pre aggregated in whatever format that Tablo is happy with in order to be able to use it to its fullest extent. So where did you do that work. Something that I’m a little leery of with gooey Baystate exploration tools is sometimes the details behind the data get hidden. You see an aggregate result and you don’t know how many data points are behind it. Maybe two maybe a hundred so maybe a significant or maybe not. And you don’t have that exposed to as you’re moving through a gooey. It’s not to say you can’t make those same mistakes with a tool like our Stet. There’s a little more knowledge required to work with the data. So by gaining that knowledge by developing the expertise to be able to use our you’ve had to go through that learning curve and sometimes get burned for that information to sticking your head. One of the comments that was made earlier was are is really helpful with data exploration piece. A lot of times the quality of the data that you get depending on where it comes from internal sources or scraping it from somewhere else you’re getting from some third party API. Sometimes the quality of the data is really poor and there’s a lot of cleanup that needs to happen missing values need to be removed consistent naming schemes need to be applied to columns and so forth and so on.

[00:09:55] Makes it really easy to do that kind of thing. You could work with it in an interactive mode to explore and figure out what needs to be done and as you’re doing that you’re building up all the steps that need to be put into a script that can be used for automation later. So you go through that exploration process once you learn what you need to do you build a script and then you can throw that data at that script ad infinitum. And it’s it’s automatic from that point forward.

[00:10:21] I think that’s really important. And I was about to kind of go there as well something that seems like it’s really unique to our. And this is I’m coming at it as the guy who took the course there of course so I sort of have what the guy John Hopkins taught me in the exercises that were taught there that was close to a year ago so I’m kind of working off of what’s stuck since then and what you’ve described seems like one of the things that is super super unique like when I’m in excel and I need to clean up the data I wind up in VBA and kind of trial and error of trying to hack out VBA that I can run to do sorting renaming replacing missing values.

[00:11:03] So on and so forth. But I’m I’m doing the kind of classic writing code running it against the data set. Usually I’m doing a saib before every time I run it because I want to reopen the file and try to run it again. Whereas something that seems crazy powerful about are in as you were describing it it’s not just that you iterate through it step by step. And then you go and take that knowledge and go and create a script isn’t it. You’re literally can copy and paste the iteration that you did in the console into the script. So it is a about as seamless as it can get when it comes to iterating through doing effectively EDL type work that I want to save so that I can then have my clean enough data set to actually then visual I can bring in a new dataset or an updated dataset and then run that same script on it and it’s going to perform that same same functions whereas macros in Excel are kind of clunky way to come at that.

[00:12:09] Yeah that’s the way I generally work when especially when I’m working with new data or a source that I’m not too familiar with and I should say that I use development environment on top of our called our studio made by the company also called our studio. Also free and also open source and it provides multiple panes one. One window is your script one window is the interactive environment another window is the data you can examine all the data that you’ve collected and all the data variables and so forth. As far as I’m working with a new dataset or accessing a new API I’ll work in the interactive section to figure out how I want to manipulate the data and then when I get it right I copied it into the script section of the ID and then when I’m done I have a script that I could use on any future dataset and being able to script things and program like this. For me it’s been very helpful with the reproducibility of the work. Well at times I’ll do some analysis one month and then I won’t do anything like that again for six more months. But I’ve written the script I’ve written all that I’ve gone through all that process of learning how to clean up the data documenting it documenting the output doing the visualizations. It’s all you know defined in my script. So what I want to create an updated version or point to new data source six months from now. It’s simple. It’s in my experience very difficult to do that kind of thing effectively in an Excel type environment.

[00:13:36] Where can you comment when you say documentation are you. Are you commenting within.

[00:13:41] Absolutely. I kind of very liberally. One of the big creators of extra we don’t need to get politically political here is a guy named Hadley Wickham has created many of the extensions to our they’re called packages and he has a quote that every project has to collaborators you and future you. So make sure that you pay attention to documentation when you’re reading it the first time because six months from now you’re going to look at that say what the hell was I thinking. I’d just like to say I’m real proud to work in an industry where someone in one sentence can say exposed and gooey and we all just sagely nodded and agreed to.

[00:14:20] Work related. My day was totally made right there quietly about 20 minutes ago.

[00:14:26] But actually it seems serious airing in some of our we did a session with some big data and we kind of rumbled around and all agreed that we don’t understand it well enough and that a lot of cases it’s a great excuse for IBM to sell a million dollars worth of crap. That’s the short version. Yeah I’m not going to disagree. The thing is is that we’ve never had anybody on the show before who approaches at least to me in the way that you’re doing it approaches qualitative analysis as a software engineering challenge and it is like you’re talking about dealing piece of qualitative analysis is the way the software engineers who work for me talk about solving engineering challenges. And you kind of win me over right now I got to be honest with you.

[00:15:08] Yes funny I get that feedback in the industry and in the other peers of mine who are data scientists a lot of them come from different disciplines. A lot of them come from a traditional Stach background and I don’t come from a computer science computer engineering background. I picked up the stats along the way and a lot of times they asked me what’s different about your approach because they’re used to their standard statistical approach and everything’s about modeling and they didn’t really have that background. What’s it like to look at this from a software engineer. It works for me. That’s the way I think programming has been the beginning of the beginning of my career and has always been a touchpoint for me in everything I’ve done. That ties back to programming. For me it’s just natural to approach these kinds of problems in this way.

[00:15:51] You’d probably have version control a lot of you know what I mean you just see the desktop here right now. Every script I write is checked in to get home.

[00:16:00] So we didn’t cover in the intro that Eric is based in Columbus Ohio. My favorite city so I’ve gotten to know over the last six years that he’s our friend now.

[00:16:09] TIM Yeah well yeah. So our friends are either he was at a wedding or with Wednesday this year.

[00:16:18] There’s the guy working on TED video data coming into his web analytics that he was kind of reminiscing about the days that he had you know really big data sets to work with and you know even with large climbs that I’ve got I’m I’m not fighting the million road limit in Excel. That’s not the sort of data challenge I think many of us run into on that front.

[00:16:38] If you’re pulling in if you’re capping out to a premium limits on an API call is our handle that while other aspects of it that it doesn’t do you need a powerful machine or.

[00:16:50] Well there’s two aspects of that in that particular case the limit that I was hitting was the million sessions that when you access Google Analytics via the API and you try to query data Ranger data to date range that will encompass more than a million sessions at capsid at a million and sample’s everything beyond that. But it’s doing all that server side. The data that you’re getting back on your machine are is much much smaller than that usually. I wasn’t requesting details on every session I was looking for some aggregated data that just happened spanned a million sessions one of the limitations of our is that it only works with data that fits in memory. But today’s systems I have I think I have 16 gig on my machine my MacBook Pro servers you can run on a server there’s a server version of it and you know you can go multiple tens of gigabytes on servers I’m sure. Now if you can get to the one that.

[00:17:50] So what are you. Are you running it on server or are you running it on your today.

[00:17:55] I’m just running it locally on my personal macbook. My work MacBook but at some point in the future we’re gonna be you know as we build out the data capabilities. Ted we’re going to be building a data warehousing system and it will utilize our server and some other server based tools to build out the data warehouse you with the macro prothese thing gigabytes ram or you do you find times when you’re like crap I got to call this Chrome because I’m trying to I’ve never run out of memory. There’s been a few times where I’ve pulled a lot of times I’m pulling data from multiple databases multiple Majewski all databases and I try when I can. I try to do any kind of aggregation. It’s possible the ASCII all database all structure the query that I’m making from are asking a query that are is making I’ll do as much of the aggregation vs q l as I can and then pull the data back into R and manipulate but if I can’t if it’s if it’s something where I’ve got to join datasets from two different databases for example so I’ve got to have the full dataset from both and are occasionally I’ve never run out of memory but I have run into times where manipulation may take 15 20 30 seconds instead of instantaneous it’s noticeable when it comes to pulling in data from multiple data sources is that kind of another.

[00:19:17] Hey I’m regularly having to pull in data from Adobe analytics or GA and I’ve got a common key and I want to kind of merge that with my as Cubao or some other database.

[00:19:28] There are packages for pulling in from from a sequel server mice Q How well there’s packages for accessing those data sources so I use one package for accessing Eskdale databases. It’s called are my ask you out clever name and there’s another package for accessing the Google Analytics API called R GA and has a clever name. So again within the same script I’ll pull the data from GA pull the data from Iris Cuil do whatever manipulation you do in our if I’m going to do some visualizations I generally use a package called Dzieci plot too and then I’ll load that in and then do the visualization. Then you can have to give a flavor.

[00:20:08] You can have a script that actually lives in those packages.

[00:20:10] Am I right on top of the script just list all the packages you knew that script and they just followed. Just like standard include programming for any other language yeah but not everybody’s doing standard include program for.

[00:20:26] Developing a list of things we all need to learn and that brings me to kind of another topic for discussion which is you know I’m sure Eric is you’re growing your team and you’re kind of doing things you’re hiring some millennials young folks who are getting into this space who may not have exposure are what kinds of skill sets are you. Do you look for what increases someone’s aptitude to be successful using are.

[00:20:56] Well I think having any kind of a programming background or at least that that interest in programming we take a step back and quantify some of this little bit. When you’re doing it ask to query that’s considered or known as declarative programming where you just declare this is what I want to happen. I want this data. These fields that I want to sort of this way I want it joined this way. The details of how that happens are up to ask Eskew well talk to the rescue engine you don’t you don’t specify how that works are type programming and most programming languages are procedural where you describe step by step do this load this data manipulated in this way Nello this data now join it here and so forth. So that that way of thinking of step by step here’s how I want to approach this problem. I want to take take it and break it down into these steps and here’s where I want to pull out pull data from some other source and here’s where I want to combine it and here’s how I’m going to visualize it that way of thinking about it is lends itself well I think to complex data analysis. Some would argue that the approach that declarative approach you just describe what you want let the optimizer deal with it. Some people would argue that that’s simpler and allows she frees you up for a higher level thinking and so forth. And to some extent that’s true. But you also are giving away. You are not able to just to define how you want this stuff to work.

[00:22:24] You’re giving away control to somebody else’s optimizer. So it may do the right thing and it may not. And you don’t really know all that’s hidden from you. So in my mind I want somebody who wants to work procedurally who wants to get into the details he wants to describe step by step how things work. That’s the level of knowledge I think you need to be really effective at as you know real deep analyst.

[00:22:50] So I actually had a question because it’s one of the ways that I viewed tools like our largest data science in general. It seems to me that the best fit for that level of the lake scientific level Rawe analysis of data is radio ad or just past the bleeding edge of what you idea than tools can do. So for example Google Analytics four years ago is kind of junky you know a lot of cases that you have to pull the data out. Inducive indicated things to answer questions. The tool has gotten way better which like you said frees up your time to do more first class more complex work. So is the gangling again. We’ve had a hell of a time trying to wrap our heads around a good definition of a data science and big data and what are these things mean so is a big part of the value proposition of art to you to allow you to stay just past the cusp of what you are driven tooling allows you to do that starts to fill in and lets you push farther forward.

[00:23:53] I think it’s a combination of that because using GA as an example any kind of reporting within the Googie for GA is limited to limited to two secondary dimensions. What if you want more than that. You just can’t. Many times most of what I do I need more than that. So I have to develop it and are something I just have to be are it’s something where I’m accessing the data via an API manipulating it myself visualize visualizing it myself because I have more complex reporting needs than what that I can satisfy. That’s one area where I really lean on are that it doesn’t have to be or it could be other tools as well that’s just my tool of choice. Another area is combining data from different datasets I mentioned earlier we use Google Analytics premium we also have some internally developed tools that we track different things with a lot of times I need to combine the data from both of those. We have multiple internal databases that housed the details about our talks and I need to combine that I need to take the data from GA that tells me the usage I’m seeing externally it did bring in details about the talks meditated whatever it might be transcripts whatever I can do that very simply in our thinking and it’s not just our. But I find that a domain specific language. DSL is the term people use where it’s our is created for manipulating data. It makes it so easy.

[00:25:21] I could do the same things in python I could do the same things and any number of languages but it would take me a lot more work more effort because their general purpose language is they’re not domain specific for data manipulation. So for me I find that I can get things done a lot faster with a language like our. We haven’t really talked too much about the statistical part.

[00:25:42] I mean median mode and that’s about it right now is going to have also a standard deviation as well. Yeah yeah.

[00:25:52] Yeah. Those are all. Okay. Yeah our. Now we go.

[00:25:57] So send it down to all the appellation that are makes easy all these statistical tools that don’t exist in things like Excel. Like if I want to look at them comparing two different medians and I want to know if they’re statistically different from each other. You know those tools are built into our if I want to develop that cord is that a package. That’s all aquarter are. It was developed by statistician’s for doing this work.

[00:26:24] You’re not a statistician by training how much does the tool cover that. How much is you still like you need to be knowledge about the data. How much how much does the tool make for the physical knowledge deficiencies.

[00:26:41] Well it’s it’s a combination of you know I have a working knowledge of a lot of areas of statistics but I’m not certainly not a Ph.D. level but there are the people who wrote these packages for r r that level. So you’re getting the benefit of all their work and their knowledge. So let me give you an example. One of the things I do is forecasting we have people watch the videos the rate at which they watch videos has a periodicity to it you know different months of the year have different usage levels different times of the month have different usage levels. There’s trending involved we see upward trends downward trend etc. etc. and it’s common whenever you turn to do forecasting of usage data to try to separate out the periodicity cycle of the data separate out the trending of the data.

[00:27:35] The training aspect of the data and then do a forecast on what’s left and then add back in the periodicity and the trending that’s called decomposition forecasting and then recomposition to do that manually is very laborious. There are experts in forecasting they have written packages that make it simple. It’s three or four function calls and you’re done. Now you still have to understand enough about how that all works to make sure that you’re sending all the right parameters into the functions and so forth. But you’re able to stand on the shoulders of giants speak. Somebody else has done all that low level work it’s all been validated.

[00:28:13] It’s all inherent to the standards around forecasting and you can just use that you can leverage that work and if you google that the community that over the world there you will find somebody who says this is the package and this is this is what to do with it.

[00:28:31] There’s almost some must there’s so much out there you Google and anything and you get so many results you don’t know how how to narrow it down sometimes. But the stock exchange helps with keeping the quality of the responses high. There’s a whole section of stock exchange for statisticians called Cross validated. A lot of times the answers will be in there but there’s so much community support for our and all the our packages. It doesn’t take too long to go through that iterative cycle of Google searches and pretty soon you’ll zero in on what are the right packages to use. What are the most commonly used packages. What are the most commonly used approaches. It’s experimentation to find out what works best for you. That’s a good segue into something else that I’ve really started to use recently with our web development environment are a web application environment called Shiny sh y and also developed by the studio people. But it’s a it’s a web development environment for our applications and these people have done some incredible work to take all of that all the detail and all the grunt work out of developing applications and these are applications these are very well done applications they provide all the building blocks to do the work. But I remember developing this kind of thing years ago when you had to do it all by hand. And now there’s so much that you can just leverage and take the take all the advantages of our and all the manipulation and data manipulation of statistical tools later this Web application framework on top of it.

[00:30:07] And you know in a day you can develop an entire application.

[00:30:11] It’s just amazing work arounds like to have a server. You heard it here first.

[00:30:17] We’re playing back to Episode 19 ages hence thing. Fortified by a little bourbon. Gilligan’s Island. It would be pretty cool if you could use our inside of dough too. So one other thing I want to talk about and I’m asking for a friend. But where else would you go Eric to find tips and tricks and things like that.

[00:30:41] You meant that exchange the cross validated which is a great resource. What other tips and tricks and there’s actually getting across the starting line in the first place right. Yeah. Our Yodok come in.

[00:30:54] Where do you start. Well I actually did that once I downloaded our studio and I’m like wow now what am I do now what do I do. So I went to court their route and I would love to hear it.

[00:31:05] Well I’ve been asked this by people who want to learn are so I’ve looked into this a little bit and I am a big fan of the Coursera Data Scientist toolbox Coursera. It’s Johns Hopkins program taught by three professors there. They’ve expanded it a great deal since I last looked at it I think there’s one.

[00:31:25] It was one professor and I took it a year ago but in June Dershowitz is taking it right now.

[00:31:29] So there’s three professors now and I believe it’s spread out over multiple courses. It’s nine courses now. It used to be three or four. So they’ve they’ve really expanded it. I’ve heard really good things about that. They cover everything from just learning are to the details about how to do exploratory data analysis how to do data cleansing how to make your work reproducible and then they start getting into some of the more data science things like doing machine learning and doing regression models and getting into statistics more. But I’ve heard really good things about that course. The other one that I’ve heard good things about is just a statistics class it says the focus is statistics but it’s taught with our. So it’s here’s how you do these statistical things with our. And it’s also of course of course Princeton University taught by Andrew Conway suffered good things about those as far as where I get my information. I use Twitter to cure rate my news. So I’ve over over the years I’ve developed a list of people who I follow who feed me the information that I need to stay current in the industry. So if you follow me on Twitter at Goldsmith Eric and you can look at the people I follow and then you can develop a list of people who can help feed you information like they feed me.

[00:32:50] So you’re a Creber on Twitter. Pretty much I get by. I was saying that every once in a while but not not nearly as much as a lot of these folks.

[00:32:58] I would be remiss if I didn’t mention that up till now we’ve pretty much talked about our as a tool to facilitate data exploration data manipulation pulling data from multiple sources and combining doing the visualizations. Then there’s that whole statistical modeling and forecasting machine learning and text processing natural language processing. There’s so much more so many more packages and other things that are out there that we haven’t even touched on. So there’s so much that is available.

[00:33:30] Well there you have it. Are those pirates really knew what they were doing. So yeah I think honestly I think we could keep going for a while. And it’s funny because I don’t know about Tim and Jim but I’m super excited to kind of dive into this again and sort of take another crack at this whole art thing which I’ve attempted a couple of times in my career. To me this is this is one of the funniest ones we’ve done in a while and one of the things I really wanted to pick Eric’s brain at is. You know we’ve talked before we’ve all ended up here from different backgrounds.

[00:34:04] And you come from a completely different one. As a software developer. I wanted to start kind of hammering some of our biggest agreements with you like the ideal analyst team should have these three people in I’m kind of picturing you saying no because you’re missing someone who can do this you know. So it was a really refreshing perspective. Frankly got me turned around a little bit on my thoughts about where data science fits into the ongoing delivery of measurement into a business. Like it’s not a periodic as needed. You really had me with the whole software developers approach. I

[00:34:42] really really like that because for some reason that just made a whole bunch of things make sense in my head about where this fits into service delivery inside of business that was really cool and actually I’d love to pick this one up again.

[00:34:54] I’ll throw mine. I do get kind of excited about this. Have sorta dived in semi successfully. But this whole discussion has me back thinking this is for the analyst who whether it’s. Are there other tools out there. There was a pretty amusing if brief exchange between Michael Healey and Tom Naylor on the measure slack a few weeks ago where I think Michael was kind of making a brief case for python and Tom Miller’s comment was that he was trying to bring a religious argument into the channel. So I don’t know that it’s necessarily are. But I do think looking at if you’re a whiz at Microsoft Excel and thinking Dunnam said from my career you’re probably not because a lot of the things that we wound up touching on from the statistical modeling and you know standing on the shoulders of others with like true statistical stuff and saying with confidence I can I can apply this use case shiny. You know talking about a web app and I think with Microsoft there’s something through SharePoint where in theory you can Web enable an Excel spreadsheet but that just feels like a bit of a bit of a joke that ours actually giving you the potential to say no I am going to make an interactive thing where I’m going to allow people to refresh refresh interact with the data which does to me you start to converge a little bit with tabel and tablet server potentially.

[00:36:32] We didn’t touch on some of the text mining stuff and Eric shared several examples and didn’t share some of the other examples but I know there’s there are text mining or text analysis packages within our so that kind of exciting aspect of their excel and their Excel plugins and I use almost daily analysis engine I use four daughters so plug ins are great but it just feels like when you’re going in the open source world there’s just this abundance of polygons and the plugins are iterating. So this has me.

[00:37:09] You don’t get enough exposed it just gets gooey.

[00:37:15] That was going to be my brief wrap up on my incoherent and rambling takeaways but I think there was a lot a lot here.

[00:37:23] Well Eric I don’t know if you have any takeaways of the kind of a single source of information.

[00:37:31] So well I would add that anybody who’s interested in learning are it is a bit of a steep learning curve just because of a different way of thinking and looking at the data. So it helps to have a use case it helps to have something that you want to try to to really get some data that you want to try to work with. Don’t just try to learn are just to learn the syntax really have a problem that you want to solve and that that will really help you work through the details and really get to get to the understanding quicker I think.

[00:38:00] So given that would you say a use case that is I have to deliver something within the next two weeks for work versus the use case. I’m going to dig into data and download downloaded dataset and just answer a question that I’ve wanted to answer like How much how risky is it to say I’m going to commit myself to delivering something for work on a given timeline. Do you have a take on that one way or the other.

[00:38:31] I guess it depends on your your personality and your risk tolerance. The first the first time I used R was for a work project that I was committed to deliver on. So I forced myself to get up to speed to learn how to deliver what I needed to deliver. But that’s just me.

[00:38:51] Well certainly this has been a really great show. Eric thank you so much for enlightening us and totally saving. Tim Wilson’s terrible show idea gets really redeemed completely. So kudos to you. If you have questions or comments we’d love to hear from you on our Facebook page or on Twitter or on the major slack and if you’re not part of major slack you can certainly find how to get on the measure slack from our Facebook page Facebook dot com slash analytics.

[00:39:27] Our thanks to everyone. Thank you again. Eric loved having you program and Tim my co host. Get out there and get you a new shiny R.

[00:39:38] You’ll love it. Thanks for listening. And don’t forget to join the conversation on Twitter. We welcome your comments and questions. Facebook dot com slash now or on Twitter. At. Made up a. Word.

[00:40:03] And you go grab another delicious beer and then we can get started. OK. I like it. Can talk right 28. We could talk about the old guys for another 20. I like it. No say so. I’m a specialist in our and I’ll go. I don’t get it. I’m waiting to come out. If you’re under 30 you’re an asshole right. No my joke is a real thing.

[00:40:25] That’s predated are they now the truth comes out. Animated bubble charts 99 times out of 100 are going to be Wheatly obfuscating and not helpful. Now the truth comes out. Jim Wilson is not the boss of you. All right I’m just making stuff up now. Oh it sure is one Dilo can actually see now the truth comes out. Tonight’s episode been brought to you by the letter R and the number twenty two.

[00:40:56] For all the millennials out there kids to be a man. I like it. It’s all about you. Yeah yeah. Now the truth comes out. My 3 year old fell out of bed. So I was like Daddy loves you. I’m recording. Now the truth comes out the insight skewed insight. Now when the truth comes out. That’s a bit of a sore spot. Now the truth comes out.

[00:41:29] Jim was trying to save money but we’re out of time so we’ll just never know what Tim had to say. I like it. Facebook and you know we’re slashing at it. I mean what’s our Facebook page. Somebody is on like lose a lover. Hey there you go. We’ll fix that in post.

[00:41:48] You guys sound like a bunch of guys I could just sit around and talk with for day.

One Response

Leave a Reply



This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

Recent Episodes

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_243_-_Being_Data-Driven__a_Statistical_Process_Control_Perspective_with_Cedric_Chin.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares