#057: Open Data with Brett Hurt and Jon Loyens

Published: Feb 28, 2017

Subscribe: RSS

Subscribe: Apple Podcasts | Google Podcasts | RSS

0 Shares

So, knowledge management and data management walked into a bar and bumped into Github. The result? Open data and, specifically, data.world! Coremetrics…and then Bazaarvoice founder Brett Hurt, along with Homeaway.com and Bazaarvoice veteran Jon Loyens, joined us to talk about what open data is, why it’s gaining traction, and why we all should care. And, if you’ve been pining to have us record an episode that runs for more than an hour, this one is it!

NOTE: Tim mentions Un-Summit in his last call and states that it’s on March 21st, 2017. That is incorrect. It is on March 20th.

People, Places and Things

We’re pretty sure this episode is going to own the record for number of people, articles, videos, companies, and governments mentioned! We think we caught them all below:

@datadotworld
Atomic (Semantic) triples
The Semantic Web
Tim Berners-Lee
Tim Berners-Lee on linked data: here and here
Sun E10K
National Security Agency (NSA)
Palantir
Linked data technology
Modern architecture
Git
Library of Alexandria
Certified B Corporation
Customer Relationship Management (CRM)
Data Dictionary
Master Data Management (MDM)
Zika hackathons
Cloudera
Indeed.com
Netflix Prize
AirBnB’s Open Data
LinkedIn Data Initiatives
How Capital One Differentiates Itself
Resource Description Framework (RDF)
Graph Database
data.gov
Captain Picard
Artificial Intelligence (AI)
Machine Learning
Computers Beats Go Champion
Joe Biden / Cancer Moonshot
Coremetrics LIVEmark
Google Analytics benchmarks
Linux
Apache Server
Java
Uber Movement
Data donation: What’s at stake? – Brett Hurt and Ian Greenleigh
Your company’s data could help end world hunger – Mallory Soldner, UPS, TED Talk
Slack
Tesla’s open data
Apple
Canada’s Open Data Initiative
UK Open Data Initiative
CKAN
Linus Torvalds
Usenet
Telnet
U.S. Census
Ancestry.com
Zillow
Un-Summit (This is on March 20th — NOT March 21st, as Tim erroneously stated)
The End of Analytics – Tom Davenport
Easy there, cowboy
Data for Democracy
B Corp / Public Benefit Corporation FAQ
Warby Parker
Ben & Jerry’s
Patagonia
Wasabi – Intuit’s open source A/B testing platform
Dylan Lewis

Episode Transcript

[music]

0:00:04 Announcer: Welcome to the Digital Analytics Power Hour – Tim, Michael, and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website, analyticshour.io. And now, the Digital Analytics Power Hour.

0:00:28 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is episode 57. You know what we just don’t have enough of? Well, if you guessed data, good guess, but not really. But what about open data? Now, I think that would come in pretty handy. Think of all those cool data sets you could use… Are on, Tim.

0:00:53 Tim Wilson: Oh, I have.

0:00:54 MH: See? Think of all the cool correlations you could bring to bear when leveraging, let’s say, three centuries of macroeconomic data from the UK. Open data and open data platforms are what we are talking about on this episode of the Digital Analytics Power Hour. And to help us understand why we need access to data sets – like retired comedy phrases, “Oh no, I didn’t,” – we actually have enlisted a couple of guests, a Power Hour first, Brett Hurt and Jon Loyens. Brett is the CEO and Co-founder of Data.World, but you have almost certainly heard of him in his many other roles. In the world of data, he casts a long shadow as the Founder of both Bazaarvoice, and before that, Coremetrics. Welcome to the show, Brett.

0:01:39 Brett Hurt: Thank you. It’s a pleasure to be here with you guys.

0:01:41 MH: And Jon Loyens is the Chief Product Officer of Data.World, and prior to that, he’s held numerous senior technical roles, both at HomeAway, Bazaarvoice, and Trilogy. Welcome, Jon.

0:01:51 Jon Loyens: Thank you, guys. It’s awesome to be here.

0:01:54 MH: Yeah. And of course, joining me is my other host, Tim Wilson, Senior Partner at Analytics Demystified.

0:02:01 TW: Hey, guys.

0:02:02 MH: And I am Michael Helbling. I lead the analytics practice at Search Discovery. Well, gentlemen, this is awesome. First off, I think Tim and I both hold you up, Brett, as someone we’ve looked up to for many years in our industry. And Jon, you – by proxy, I guess I would have to say – since this is our first meeting. But guys, what was it that got you started down this path with Data.World? Why open data, and why this right now?

0:02:31 JL: Yeah, sure. Brett and I go way back. We met each other at Bazaarvoice, and when Brett started looking at forming a new business, I was lucky enough to get the call. And it’s a real pleasure to be able to work with him again, and work with him day in and day out. And when we started discussing the kind of company that we’d love to be in, we discussed a lot of things about what vertical will we like to be in, what shape of company we wanna be in. And one of the things that we really kept coming back to is this idea of data and its ability to change the world and change how people look at the world around them, and the fact that the world – a lot of current events notwithstanding – is becoming more and more data driven. But we also wanted to start a company that could potentially impact the culture of how people work on something. And for us, we were lucky enough to… I’ve been lucky enough to be friends with another one of our Co-founders – we have four Co-founders – named Bryan Jacob, who’s our CTO, who happens to be a real student of data management and knowledge management.

0:03:43 JL: And we started riffing with him on, how can we make data management more simple, more collaborative? How can we make it easier for more people to participate in data projects? And how can we bring more data together in effective ways? And through Bryan’s passion for a lot of linked data technologies and Semantic Web, we arrived at this idea of, what if you could create a collaborative community where people actually worked on linked data? Could you facilitate finding data faster? Could you facilitate people finding other people to work on data projects with faster? And we thought to ourselves, “Wow, if we could build this thing appropriately, we could really change the culture of how people approach data projects and how people work together on data-oriented projects, both inside companies, and on open data projects, on civic data projects.” When we started thinking about the potential for that, it was really mind-blowing the way that we think that this could change how people approach these things in the way that new discoveries are found and real world-changing views, based on data that hasn’t been correlated, hasn’t be joined, hasn’t been found, hasn’t been made machine-readable. And so, we’re really, really lucky to be on this journey doing that.

0:05:16 TW: So it’s knowledge management meets data? Companies have more data, companies’ been struggling with knowledge management for a long time. The data we have meets knowledge managements, plus you have that kinda outward-facing public data sets? Is that a oversimplification?

0:05:32 JL: No, that’s not an oversimplification. In fact, I’ve taken to describing Data.World recently as “a knowledge management platform cloaked in a data science community.” Where we really try to focus on what I call the “first mile of data science.” So, you can really segment data science and most data projects up into a few distinct phases. And the first mile is really about understanding your data sources. Doing exploratory analysis on them, trying to gather up the data that you feel… And you might need to prove a hypothesis. Then you get into the analysis, model building, and then you get into application building. But really, you see all sorts of quotes out there, and it’s pretty easy to Google the stories about how people spend 80% of their time in data projects doing data janitorial work. And that data janitorial work, ultimately, is actually a knowledge management problem. But researchers and analysts – and even data scientists – for them, knowledge management is really a different discipline.

0:06:44 JL: And so a lot of that knowledge management – and a lot of the knowledge about data ends – up being tribal. It’s stuck in communities. It’s like you have a colleague that sends you an Excel spreadsheet, and they’re like, “What does this column mean? You have a column labeled ‘income.’ Well, is that net or gross?” Well, then you fire back an e-mail, and you’re like, “Oh, that’s gross income.” And now that’s like shared tribal knowledge between those two people and the person doing the project, but what happens when that person leaves? That knowledge gets lost. And it becomes very hard for somebody to jump in and pick up that work. And we wanna take all of that tribal knowledge about data and make it easy to put it beside the data. Answer the context questions, “Who’s worked on this data? What has it been used for? What does it mean? How should I use it?”, and get all that context in one very durable location, so that anybody that comes to a data set – whether that’s open or not – can pick up and start working with that much more efficiently and much more quickly.

0:07:46 BH: So one thing to keep in mind about Data.World that’s really, really exciting is that we’re ingesting all of the data the gets uploaded into a large graph database scale and breaking it up into atomic triples, so that all the data can be related to each other. And this is a real fancy way of talking about the Semantic Web, which Tim Berners-Lee has been advocating for, really, since the invention of the World Wide Web. He’s said in multiple TED talks that, “A web of linked data is gonna be much more powerful than a web of linked documents.” And when Jon and Matt and I started to talk with Bryan about the real power of that, a huge light bulb went off in my head because, as you guys mentioned, I’ve been working around – I don’t really like this term, but – “big data,” for a long time. I don’t like the term, by the way, because what’s big data today is gonna be small data tomorrow, so it’s kind of a meaningless term. But…

0:08:49 TW: It’d be like modern architecture. It seems a little trite, when that was… [chuckle]

0:08:52 BH: Yeah, in many ways. But Coremetrics was very much a big data company. It was back in the days before all these Cloud providers, and we spent $15 million just standing up our original infrastructure with SUN E10ks and the whole thing, to be able to host all the data for clients like Walmart, all the clickstream data and transactional data and such. And Bazaarvoice was – and is – a big data company, and has around 700 million consumers now that visit a Bazaarvoice-enabled site, and read customer reviews, and questions and answers, and other forms of user-generated content, and has really built probably the biggest database in the world of customer-generated content around products. And so the light bulb, for me, really went off when I learned the power of this technology. And most people don’t know the power of the Semantic Web. The NSA, Palantir, Google, Facebook know, but really, it’s only the wealthiest few that know. And so, being a very mission-driven entrepreneur, that really appealed to me as we have the ability to democratize the most powerful database technology in history, and democratize it in a way where it seems easy to everybody.

0:10:17 BH: That’s one of the really big things that we’re working on here, where everybody – from a business analyst, to a statistician, to a very deep data scientist – can leverage the power of the Semantic Web, without really understanding all the nuance of it. And in many ways, that’s what GitHub did with Git. I would argue that Git was pretty esoteric and nuanced before GitHub came along, and GitHub moved that bell curve to the right, made every programmer in the world better, because now they have access to the Library of Alexandria code. That doesn’t exist in the data world. And that’s part of the mission. And we set this up as a public benefit corporation, recently became a Certified B Corp. And we’re very serious about advancing that mission, really, turning what Tim Berners-Lee has strongly desired into a real thing. Not just a thing for the elite few, but a thing for everybody.

0:11:16 JL: Yeah, I can’t stress what Brett said enough, that this is a thing for everybody. And that there are so many people in so many disciplines creating so much data right now, and even when they release their data, whether it’s for reproducible research reasons, or through data philanthropy causes, or when cities… And a lot of cities are doing incredible work. And even our government, one of the most bipartisan issues in our government is around open data. But without building the metadata around that data, and making that machine readable in a lot of Tim Berners-Lee’s vision for the Semantic Web, you’re not gonna get the ability to do a lot of the machine learning and make a lot of the advances and AI that people thought you were gonna be able to make.

0:12:07 JL: But the question you’re gonna ask yourself is, “If I am a biology researcher, maybe I’m looking at populations of a particular bacteria across different water sources across the country or whatever. And I’m doing this research, I’m building these surveys, I’m building these models. And I do some research and I publish my data along with it.” Why should they have to learn about the Semantic Web and how to model their data? That’s not in their wheelhouse. They’re not engineers, they’re not data management people. So we wanna create a tool that makes them easy to participate in that, and gives them incentive to participate in that, makes it easy for them to use, so that we wanna make it as part of their process that they capture that data. And that additional information and knowledge about the data gets stored alongside the data in a way that won’t go away and won’t get lost in tribal knowledge.

0:13:05 TW: So how much does… I feel like, as an analyst whose got a kind of a narrower focus around digital and social and mobile, and I know all the flipping caveats of web analytics data from how it’s collected, and where the gaps are, and how it could be implemented, and people screw up implementations, and there was a gap in this date and that date. And then I always assume that everybody else’s data… It’s just I think human nature is to assume that everybody’s else’s data is way cleaner. Even when I’m going to a client to another department and saying, “Oh. Can we just get this out of your CRM?” Well, then the CRM expert is like, “Well, here’s all the gazillion caveats.” Which gets into the world of data management and data governance.

0:13:48 TW: It sounds like part of something you were saying earlier, Jon, was, “Okay, let’s accept that nobody has a complete and accurate data dictionary that is well-maintained. It doesn’t happen anywhere, ever.” Part of what – it sounds like – you’re saying is, “Well, to properly democratize it, every exchange, there’s somebody’s trying to explore it.” If you’re recording that, you’re getting a more organically-built record of what to do with it. But is there also, though, a risk that… How do you manage around the improper interpretation? I’m a researcher analyzing data, I found another data set. If you strip away the need to really dig into the weeds of how that data was collected and what the ins and outs are of it, is there a risk that now you got people doing analysis who don’t have enough understanding of the data? It sounds like you’re doing stuff to kind of address that.

0:14:46 JL: Well, I think that if you capture the output of kind of every piece of analysis that gets done about the data – and your intuition, Tim, is totally right on – I think a lot of the reason why big enterprise master data management projects fail is they take a very top-down approach to this. They build a model of all of the enterprise’s data. They find where all the bodies are buried. But by the time everything is done – and the world moves so fast now – when an intrepid analyst comes along and tries to throw a new piece of data at it, it fails. But then you have the big ivory tower saying, “Well, no, we can’t accept this as part of our model, because it doesn’t fit in our whole MDM strategy.” And now you lose the agile nature of being able to assimilate new data.

0:15:34 JL: So I think the only way to do it is through a community-oriented bottoms-up approach. All the way, same way that open source changed the way we approach software. And if you think back to the early days of open source software, people were like, “Can we ever trust open source? If we just download a piece of software and we use it, won’t it be insecure? Don’t I need a huge corporation behind it?” And I feel the same way about data. I think being transparent about those interactions, keeping a lot of eyes on things – whether you’re talking about it within your own organization, or external organization – the more transparent you can be about those analyses that are getting done against the data and making sure that the work that’s getting done on the data also is relatable to the data, and you keep that provenance chain. Sure, somebody might do some spurious analysis on it, but you hope that that’s actually easier to find through the crowd rather than, like, “Yeah, sure, somebody can come along and” – and honestly, it is the same as open source – “somebody can misappropriate open source and use it for something that it’s totally ineffective for.” You might pick an open-source database that doesn’t fit your use case and your whole site falls down around your ears.

0:16:47 TW: So there’s value and actually if somebody misuses it, the idea is that if it’s appropriately accessible, that will get caught by the community – which may be only three people who care about the data, maybe 100 – but then it’s kind of on record that that was caught and that’s been documented. And potentially now, some artifacts have been generated to say, “Yeah, yeah, don’t do this. ‘Cause we figured it out”?

0:17:09 JL: Exactly. That’s exactly it. And today, what happens is people do it over and over again.

0:17:14 BH: Yeah. This happened… I would just make the point that this happens time and time again inside companies, too, just even between offices or between departments. This just amazing waste of human labor, just constantly doing the same clean-up work on data, the same preparatory work to get to the good stuff – which we all love, we all love analytics, everybody loves analytics, that’s the real discovery and the mystery – but the janitorial work, who wants to do that over and over again? And the crazy thing about the world right now is you can have people in neighboring states, neighboring cities, working on the same exact humanitarian data set – because they really care about poverty alleviation, for example – and they don’t even know that each other exist. And they’re doing the same clean-up work. We’ve done quite a bit of work on the Zika Hackathons. And you go to these things and everybody’s spun up, and there’s all this good will, and it’s beautiful, in so many ways.

0:18:22 BH: And then the Hackathon starts and someone’s like, “Okay, whose got the data?” And everybody gets a USB drive, and then the first thing that happens, people say, “Well, what’s in this data?” Nobody knows what the fields mean. And everybody there wants to help solve Zika. And they’re sponsored by these big companies, and Cloudera’s involved, and all these great companies. And it’s very frustrating. And it happens over and over and over again. And it’s just crazy that here we live in this networked world – this world where we all have super computers in our pockets, and we’re all networked to each other – but that hasn’t yet happened with data. Everybody says data’s the new oil, but at best, it’s crude and unrefined, and nobody’s turned it into gasoline. And when they do, they do it in a very proprietary way. It’s like a hedge fund, “We’ll turn it into this amazing asset” – or the NSA will turn it into this amazing asset – “but nope, nobody else gets to benefit, sorry.

[chuckle]

0:19:29 JL: Well, and you really even have to look at the amount of folks out there that are doing things like re-packaging publicly available census data and selling it.

0:19:40 TW: Well, there are a couple from inside a company, that’s an interesting… Repeating the same issues. I’ve got one client, and it’s driven me nuts for a few years, and it’s become a bit of a sore point in the relationship. Because of that sharing, because the fear that somebody will misuse it, the option they have gone down is, “We’re gonna keep this very, very small group that has access to the data, because other people may misuse it.” Which has just, in my gut, like, that has been wrong, wrong, wrong, wrong. And yet that’s kind of where they insist being. And I think there’s a second part, that sometimes people wanna keep the data captive, because they know the data’s kinda dirty. And they’re afraid that if everybody has access to it, then there’s a fear that, “Now everybody’s gonna know how dirty our data is, and we wanna stay in our own little ivory tower.”

0:20:32 JL: Well, yeah. And it’s amazing how getting things out in the open solves a lot of those problems. This is history repeating itself. It’s the early days of open source all over again, like, “Oh, I don’t wanna release myself right there to the world, people will find the issues.” But security and quality by hiding things is not the way to go. And we wanna make it easier – whether it’s inside an organization, or for the whole world – to make more people into data users. I think some of the most enlightened companies in the world are already doing this. If you go look at wonderful companies like here in Austin, indeed.com, companies like Netflix, LinkedIn, Airbnb, who recently open-sourced their experiment. Basically, they have an open-source project – I actually forget the name of it – where they basically have an experiment with you, where if you’ve done an experiment on a set of data, you can publish that inside Airbnb, so that more people can see the work that’s being done in an effort to create more data users. I think another great example of this is… Again, none of these companies right now internally are Data.World users, although we have people from these companies who are members of our community. But this…

0:21:52 MH: This episode, the gap between when we record and when this releases, you never know. You could have three or four of these guys signed up.

[chuckle]

0:22:00 JL: But I think Capital One does a tremendous job of this as well. While they work in an incredibly regulated environment and they’re dealing with a lot PII, there’s a lot of enlightened people inside that organization that I know are working at trying to create structures for more people to participate in data, so that people can make better decisions together. Now, for them, it’s really, really tricky, because they do have to worry a lot more about privacy and security, obviously. But they’re really committed to doing this, and I gotta give them a lot of credit for putting the forethought and thought leadership into doing that.

0:22:43 MH: So one thing that’s occurring to me – or I just am curious to kind of explore a little bit is, sort of – so I’ve been fortunate enough to get a chance to kind of look at the site and to dig around with it a little bit. I’m not the most data-savvy analyst out there in terms of my ability to leverage SQL or Python or those kinds of things, so I might not be even the right audience. But what are the risks to adoption? So how is this gonna work, and how do you overcome some of the hurdles of access to what I’d say “the common person”? Which is sort of where this hits massive scale.

0:23:20 JL: Yeah. There’s a fair bit to unpack in that question. Obviously, we want Data.World to be incredibly accessible to a wide variety of people. And one of the keys to Data.World, I feel like, is that we are… There’s a very strong social component to what we’re doing, because putting people together – putting subject matter experts together with engineers, together with analysts – is how you build the knowledge about the data, so you can actually do informed analysis on it. So, getting people of a lot of different personas in there is really, really important. From a corporate standpoint, if you wanna talk about corporate privacy and security, obviously, we’re building those features as quickly as we can to allow this to be used in those sorts of environments. But you also have to kind of ride the trend here, that more and more companies are using more and more Cloud providers.

0:24:21 JL: And there’s a lot of best practices out there on how to make these Cloud environments secure. And we follow all those best practices, ’cause we have just an incredibly experienced team who have worked for big public companies before. And we know the measures we need to put in place to make sure that this is really friendly to those environments as well. But it will start with the individuals who find that our tools and our methods of collaboration in projects is more efficient and easier and more conducive to getting their job done. And those people will come along and start using Data.World, and hopefully, they find it really easy to use, and really easy to approach, and really easy to participate with people of different personas. And then they’ll wanna bring it into their organization. And it’s up to us to be a responsible and secure Cloud provider that follows all those best practices, so that we don’t throw up hurdles for participating in those environments.

0:25:30 TW: I like that. I like the Git versus GitHub analogy, because I have… Having hosed myself on Git a few times, and I still cannot use the command line to save my life. But GitHub, I’ve got public repositories and private repositories, and I’m able to pull and push. That still means I’m at least in the world where I want to use and share code. But that light bulb definitely went on when I think, Brett, you kinda used that analogy, and the idea of having one platform that is both public and private, and being able to control where the access is happening. From a meta-question, say, there are two data sets and someone joins them – which is likely not just a simple common key – does that… And maybe this is Semantic Web, and I don’t know if you guys can do a quick definition of “Semantic Web,” but does that – just the nature of linking, and whatever metadata is required to link – does that spawn a new data set? Or is that kinda part of the model, is that as people figure out how to connect stuff together – even if they were from two different sources – does that sort of fit into the open, I’ll say open data/Data.World world, universe?

0:26:52 JL: Well, the way that we think about that is we try to make our data sets be as agnostic a container as humanly possible. So, you can put documentation in our data sets – tabular data that actually gets interpreted into RDF triples and goes into a graph database – into our data sets, and you get all sorts of amazing previews and exploration capabilities against that. You can put code in our data sets. So, conceptually, our data sets are designed to represent all phases of a data project, and to capture any asset required to work on that data, whether it’s – like I said – code, documentation, the raw data itself. And so, that’s a little bit of a mental leap. And the reason we did that was, frankly, to give us a platform to answer questions like that. So, you could run a query that joins two data sets.

0:27:49 JL: But as a user, what you then can do is take the results of that query and add it back into either of the two data sets. Or you could just save the query itself as an artifact inside the data set. Whatever is most appropriate for your phase of the project, and whatever’s most appropriate for capturing the context of what you just did. But it does put a little bit of the onus on our community to kinda figure those things out. But as time goes by, we’ll build more and more features to make those things more obvious, establish best practices. Obviously, we wanna build more boilerplate, to be like, “Hey, if you’re doing a project that looks like this, this is how you should structure your project, and how you should structure your data set.” But for us, it’s all about creating a container that can contain every bit of context, regardless of the phase of the project that you’re in.

0:28:45 BH: But just to make sure you really understand the power of Semantic Web. So, let’s say that you upload a data set, and that data set has zip code as a field. Let’s say, for argument’s sake, we’re based in Austin. It’s an Austin-centric data set. One of the zip codes in that data set is 78735. You, now, can link that data set to any other open data set on Data.World – or private data set that you’ve been given access to on Data.World, from another company or from within your company, from a business partner – you can link that together as long as there’s a matching zip code field. So as long as someone else has a zip code field of 78735, you can now link those. And the power of the Semantic Web is it gives you an individual, unique resource identifier to do that. You can go into Data.World today and go into the search box and type “column: Zip” and you can see all the data sets on Data.World that currently have “zip” as a column name, meaning zip code.

0:30:00 BH: You can also search for “zip code.” And what’ll happen with Data.World is it’ll get smarter and smarter over time to make that linking easier and easier, both within companies working in private, as well as outside of – they may still be within the company – but outside of the company working out in the open. Now, there’s lots of people that work in private at GitHub. But then, the majority of people at GitHub – in the community work – out in the open. And that’s dramatically advanced the IP development around the world, of companies, of non-profits, foundations, everybody has benefited from the open-source movement. The crazy thing that I just couldn’t even believe when we first started brainstorming this idea as Co-founders is there’s already more than 18 million open data sets in the world. Nobody’s brought those together. Think about the billions and billions of dollars that has been required to fund those open data sets.

0:31:07 BH: One of the highest profile examples in the US is data.gov, which is… I don’t even know how much taxpayer money has funded US Census and Commerce, and so many different initiatives to produce data to share with other agencies and with the public. And data.gov is a reflection of that, you can download lots and lots of data sets, and it’s super, super cool. But you look at the cumulative spend of that around the world. Foundation wants to alleviate poverty, they go out in the field, they collect tons and tons and tons of data. And then the next project comes along, and then someone asks a question about that poverty data set three years ago, and people are like, “Oh, I’m sorry, those people moved on. They’re getting there PhD now. I don’t know what’s in these fields. It wasn’t well-documented. Sorry.” And that’s just a shame. Really, to advance human knowledge and human civilization, you need a large public works project.

0:32:11 BH: We’re not gonna get to that ‘Star Trek’ future, where – if you’re a ‘Next Generation’ fan – where Captain Picard can just call up to the computer and the computer can answer any question, unless you have a large pubic works project that comes first, to say, “Let’s turn the data itself into something where we understand it. We understand the context of the data, so we understand what the fields mean. We understand how to link them together.” That’s what Data.World is. It’s what comes before AI. It’s what comes before machine learning. It’s the foundation that gets you to that future. Everybody’s chasing analytics and these amazing inventions that… We have industrial-grade analytics that make what we developed in the first version of Coremetrics look like nothing. But the underlying foundation is supper shaky, at best. And that’s what we’re gonna solve. That’s what Semantic Web solves. That’s what Tim Berners-Lee was talking about, and I encourage your listeners to go watch his TED Talks on ted.com. Go search for “Tim Berners-Lee Linked Data.” Really listen to what he’s talking about. That’s the future that we’re gonna make a reality.

0:33:27 MH: Right. Because so much of machine learning and simulations, all requires all that data input.

0:33:32 BH: Yeah. It’s mostly a bunch of hocus-pocus today, just to be really brutally honest about it. There’s some really neat things. A computer’s beat one of the leading Go players in the world, there’s some really neat, kinda flashy objects that come out of it that make us go, “Wow.” But to really get to that future where AI can help us – truly help us – you need to first clean up that data. AI has to understand what’s in the data. It’s not gonna magically figure out what’s in the data. If we as human beings can’t even figure it out – because the PhD moved on to a new project – what hope do the machines have to magically figure it out? You’ve gotta document it and everything else. You gotta have hygiene. And by the way, people do that. They do that, as I said earlier, in neighboring cities and states. They’re doing the same… They’re spending the same 80% of time each, they just don’t know each other exists at all.

0:34:35 BH: This is what Vice President Biden was talking about with the Cancer Moonshot project. He lost a son to brain cancer. And what he was talking about is it’s ridiculous that, here we live in this networked age, and people are working in silos when it comes to cancer data. Well, that’s happening in every field. That’s not just cancer data. I would even argue that that’s web analytics data. One of the huge innovations that we made at Coremetrics – one of the most popular features we ever came out with – was the ability to benchmark each other in their specific vertical. So let’s say, you’re electronics retailer using Coremetrics. You could benchmark against other electronic retailers using something called “Coremetrics LIVEmark.” And the reason that worked is because Coremetrics was a standardized database. It was really well-understood what the data meant inside of it. And it was a total game changer for the industry. It was every bit as good – and in some ways better – than the state of retailing online, which shop.org gets out every year. Because it was standardized, you knew that conversion rate for Best Buy meant the same thing as conversion rate for CompUSA. It was the same exact algorithm to get to that answer. Different data inputs – because they sold different things, and they marketed in different ways – but the same exact formula.

0:36:11 TW: So, where does… And I don’t think I knew that… Coremetrics was one of the platforms I worked with the least. ‘Cause Google kind of did the same thing and they turned it off – I think it’s still off – it came back on. And that, coming from a technology perspective, saying, “We can enable this,” and either… And quietly making the, “You have more to gain than to lose by putting yourself into this anonymized pool,” that “you’ll get more benefit from sharing it.” When it comes to the paranoia, from a corporate perspective, I feel like most companies, they… Look, there’s a huge financial institution that has a big place in Columbus and operations in Chicago and New York. And all of their employees, the ones I deal with, are these analytics ones who – and they’re not a client, but I know them locally – they’re not allowed to go speak anywhere about what it is they do because there’s the corporate fear that you’re gonna give away a trade secret, some big thing that we’re differentiating.

0:37:20 TW: Somehow, in the code world, open source, there are things getting built all the time that are pulling open-source platforms in and using them. But a part of me thinks that’s because the developers are quietly saying, “This is really efficient and we’re not giving anything away. And everything’s gonna be okay. And where a rising tide raises all ships.” Data somehow feels like it’s a little bit closer to where the paranoia of, “If we share, if we kinda go the open-source route, we will be helping our competition.” But it sounds like you guys feel like, “We’re gonna go up the same curve that open-source software went, where organizations will start to get comfortable with the benefit wildly outweighs the risk. And by the way, the data just doesn’t work that way. You could publish all your data, most people won’t care about a large portion of it. And some people won’t be able to replicate what you’ve done with it”?

0:38:25 JL: The majority of companies out there today that use open source, they don’t publish open source and that’s okay. But they’re using it, and they’re getting a huge amount of benefit from it. There’s a huge amount of open data out there, and some companies will open up their data, some won’t. But the fact remains that the fact that open source even exists has raised the level of quality and understanding of software development in immeasurable ways. I would posit that the majority of source code out there today remains closed. But the open source that has been released has been so valuable, things like Linux, the Apache Server projects, Java. So many of the backbones of what we build on today has become open. And you actually see that with a lot of data sets that are out there today, like the census data. Tons of businesses get built on the census data, but it takes a special kind of person to figure out how to use that effectively. And even using the census data effectively is like a competitive advantage for a lot of companies. We wanna be able to democratize that from the open data that’s out there. And then the other thing that’s really important about open source is that even inside companies, it’s really changed how people approach software development projects.

0:39:48 JL: And I think working with open data, combining it with private data, there’s a huge amount of leverage there, but also just the ability to change how inside companies people approach data projects in terms of reproducibility and knowledge captured, what’s been done with the data, that’s gonna be huge in and of itself. So, yeah, I think that the way that people work with data will become influenced by kind of the wisdom of the crowds techniques. It may take a different shape, but I do think it will follow the same curve. And again, I’m gonna point to a lot of those enlightened companies out there like your Airbnbs, and Netflix, and LinkedIns, that actively try to make more people inside their organizations understand the data and be data users. They’re not necessarily releasing the data, maybe they will for a contest or because they have a data set, like Uber just did with their traffic data. They have a really good philanthropic reason to release that. They believe that it’ll help advance society, and they’re not necessarily giving away part of their competitive advantage. They have a ton of data inside that they still use, but they’ve also gained a lot by working with a lot of public data. And we wanna make that a lot easier for a lot of people, and also make it easier to apply those same principles inside their organizations.

0:41:15 BH: Yeah, but in short, I would say that it is going to follow that same curve. And the world changes when you have an invention come along that changes the world. GitHub changed the world. There is an absolute explosion in open-source projects since the launch of GitHub. If you look at the data, it’s unbelievable how many more open-source projects there are than there were when GitHub was first launched. Jon was one of the first people – may be the first person – that brought GitHub to Bazaarvoice. And we were initially very reticent to put anything on it. Put any of our private IP, start use the…

0:41:56 JL: Even in private, not even opening it up.

0:41:58 TW: Right. Just use it in private, right?

[chuckle]

0:42:00 BH: Just use it private. And then we became a big contributor to open source not that long after. So it was amazing how quickly that attitude shifted inside of Bazaarvoice’s walls. It was kind of stunning. And I think the same thing will happen with Data.World. But the really important thing to remember with Data.World is we’re gonna meet people where they’re at. There are people at Data.World today that work only in private. They have only private data sets. They have only private collaboration groups. We have no idea what they’re doing. And that’s fine. Because very purposefully, they’re in an environment where maybe they wanna work very privately on some kind of financial data because they’re trading against it, but they also are very interested in fantasy sports. And so, they start to work with open data and fantasy sports. That’s totally fine with us. That’s gonna be a seed planted, if you will, in their mind, of, “Hey, this is pretty cool. You know, we actually have this data that’s not that proprietary to us, that could be really useful for this humanitarian cause.” I’m sure there was a discussion like that that occurred within Uber when they came out with Uber Movement. They said, “You know, this data we have? Yeah. Some people could look at it as proprietary – but it’s not really that proprietary to us – but boy, it would really help cities with civic planning.” And that’s why they released it.

0:43:32 BH: One of the things that I wrote about – I’d be happy for you guys to link to this in your podcast – is I co-wrote an article on the coming corporate data philanthropy movement, and you can actually see it happening. There was a TED talk – it was actually featured on the ted.com homepage for an entire week – by Mallory Soldner of UPS, where she was giving example after example after example of this, and it’s very serendipitous. I was writing this article with Ian Greenlee, and then we met Mallory through a mutual friend, and she’s like, “Hey, I’ve got a TED talk [chuckle] going live on ted.com next week. I’d love to include your article in my talk as a resource for people to read,” and “would that be okay?” And she didn’t know it was gonna be featured on the ted.com homepage for an entire week. And that’s a pretty high bar to make it on there.

0:44:32 BH: So, this corporate data philanthropy movement is going to be a big thing, but you know what? There’ll be some people that use Data.World in private for years. And there may be some people that never open up. That’s okay, because at least they’ll get the advantage of Data.World in terms of leveraging the open data to further their own projects, which at least makes their company run more efficiently. And they’ll be a part of a community – they’ll obviously be out there in private – but they’ll be part of a community, which I would guarantee you, over time, influences them, just like anybody gets influenced in a community. No matter what it is, if you look at the advancement of humanity in any cause you can think of – whether it’s the abolishment of all sorts of discrimination, or the advancement towards new social norms – it’s all been because people are in a community, operating together, seeing what each other is doing, that eventually moves the needle. And the needle needs to move in this direction. It’s ridiculous that we live in this networked world and data’s viewed as such a proprietary, closed thing. It’s stopping the advancement of so many different scientific and humanitarian causes. It just makes no sense. But we’ll meet people exactly where they are.

0:45:52 TW: But I like the way you’re describing it. That you used it a little bit, you’re primed, you get to where your using it maybe purely as a, “What’s in it for me,” purely in a closed world, but then it becomes just like Google Documents, or Gmail, or Slack, or whatever, and then you’re primed, and then it just kinda naturally… There’s for some decent percentage, all of a sudden, they make that little pivot where they’re contributing, or publishing, or working with, “I buy it. I’m excited.”

0:46:23 BH: Slack is a great example. Slack is a great example, we are totally addicted to Slack at Data.World. We use it all the time, 24/7. And if you had told me 10 years ago… When I first started Coremetrics – I started Coremetrics in 1999 – you know what my number one hurdle was to get over?

0:46:45 TW: No.

0:46:45 BH: There’s always something super…

0:46:45 TW: That was nine years before GitHub was founded, I have now looked up.

[chuckle]

0:46:50 BH: Okay, so 1999, what was the number one fear? Just take a guess. What do you think would be the number one fear people would have with something like Coremetrics?

0:47:00 TW: That it was… Data was in the Cloud?

0:47:00 MH: “They’re gonna take my data. They’re gonna figure out what I’ve done.”

0:47:03 BH: Exactly. You nailed it.

0:47:05 MH: Yeah.

0:47:05 BH: I had to – and I didn’t have that term, “software as a service” or “Cloud” – and so I had to do all this hand waving to people, saying, “Well, you know how mainframes work, and you have these dumb terminals? And we have all this security, and etcetera, etcetera, etcetera,” and then Walmart signed with Coremetrics.

[chuckle]

0:47:26 BH: And not only did Walmart sign, but Walmart did a press release about how they were using us. Well, the world changed, literally, with that moment. I could then go to every single retailer on the planet and say, “It’s good enough for Walmart.” And then, retailer after retailer started to sign with Coremetrics. And then we eventually convinced people to opt in, ’cause they’d opt in to use the benchmarking product. And we wouldn’t expose them – you’re right, that it was anonymized – but it was category-specific, and LIVEmark was a huge success. And IBM bought Coremetrics and uses it today to do their holiday benchmarking report, which comes out every holiday season. So, don’t underestimate how much a very driven team can change the world. Tesla is gonna completely change cars. Apple completely changed phones. We would still probably all be using Blackberrys today if Apple didn’t come out with the iPhone. I don’t know if Google would have led there. I think Google copied Apple, very frankly, with Android. And so the world changes when some people will it to change, and this is something that has to change. It’s so obvious how it would be good for the world.

0:48:46 BH: There are examples out there, like Mallory Soldner’s talk, where you can listen and learn. There’s people pleading, like the founder of the World Wide Web, saying, “Guys, you guys are all smart at TED! Please do linked data. It will change the world! I’m telling you, it’s gonna have a bigger impact than a web of linked documents. A web of linked data is gonna blow them all that away!” And yet it doesn’t happen. NSA and Palantir, great. Facebook and Google, great. Everybody needs to have access to that same super powerful database technology, and again, that’s what we’re gonna do. But like you said and like we discussed, we’re gonna meet people where they are, bring them along for that journey, at their own pace. And I think, really change the world in a big way.

0:49:37 MH: So is Data.World now available for anyone to go use? Is it open to the public?

0:49:43 BH: Yes.

0:49:44 MH: Okay.

0:49:45 JL: Yes, it is absolutely open, it doesn’t require an invite. So, we’d love to have anybody working with data, we have some incredible communities that formed within Data.World that we would love to see people come and join up. One of the core things about Data.World is we really wanna put people at the center of this, because people hold the knowledge and information that make data powerful. But part of that is we want people to actually have profiles at Data.World, where as they work on projects, they can use Data.World to find other people that may have similar interests or might be subject matter experts in things that they’re actually interested in researching.

0:50:27 JL: Because we hold the data that they’re working on, and we know what they’re interested in, what their backgrounds are, we can link those people up much more effectively to move their projects forward faster. And we really think that Data.World is gonna become – and your profile on Data.World is gonna become – your calling card for the work that you do in data. So, we’d love to encourage everybody to come and claim their profiles as quickly as they can. Everybody wants to have twitter.com/Jon, I wasn’t fast enough to get there. But yeah, I think if you’re a data professional, we would love to have you in our community, being active, and really discovering all the amazing things that you can contribute to and use your skills to contribute to as well.

0:51:18 MH: Yeah, I know, and I think that’s great. Who else out there is doing this? There are other open data sets. Brett, you mentioned data.gov as sort of the government trying to be more open with their data. What does the landscape look like for you guys, and who else is out there contributing or has similar efforts, what else is going out in the world like this?

0:51:43 JL: Honestly – and being the Co-founder of a startup, it’s very easy to say this – but I really honestly don’t think that anybody has combined data knowledge management in a community together in a way that’s as effective as what we see going on in some of the amazing communities we have in Data.World today. I think a lot of that has to do with the fact that we are putting people in the center of the process. So, there is amazing work out there. Amazing work that cities have put in to opening their data, that governments have put in to opening their data. There are projects like the Harvard Dataverse, which aims to make a lot of their scientific research available to the masses.

0:52:28 JL: There are things like data.gov, which we’ve already mentioned. And different countries doing incredible jobs. My native Canada has amazing open data initiative, as does the UK. And there’s some great open-source data portal projects out there, too. Things like CKAN that allow people to stand up these data portals. But without the collaborative aspects – without the community – that’s really, I think, what the magic of Data.World is, is that we have such a vibrant community of people doing really meaningful work. And bringing together people in that community, and helping advance the knowledge about the data that they’re working on is really central to what we do. And that’s something that you don’t see a lot of out there in the landscape.

0:53:14 BH: I would compare it to the early days of open source. You had a really passionate, small working group working around Linus Torvalds – who obviously invented Git on Linux – and they were a micro-community, if you will, and using usenet and email, and all this stuff. We still use email a lot, but we don’t use usenet that much. But I’m sure they were using Telnet in doing all types of chat, based on Telnet protocols and everything else. And that was a micro-community. And when GitHub came along, it brought all those micro-communities together. The UK has data.gov.uk. It’s hard to call it a “community” because it really doesn’t have social features. But it is a portal into UK open government data. And it’s amazing. It’s a really, really nice site.

0:54:16 JL: Yeah, they do link data very, very well. They have bought in 100%. The metadata available there is insanely good, and they invest heavily in it.

0:54:28 BH: Right. But yeah, the… I guess a good analog here would be that they’re all these separate little islands, and they each independently don’t really know about each other because they haven’t been linked together. There’s no kind of fiber optics, if you will, going from island to island. It’s linked to the data set so that they can do all types of joint analyses and benchmarking against each other, and really build community around it. And that’s something that’s really neat about Data.World. And they don’t look at us as competitive at all. As a matter of fact, they look at us as a partner. We partnered with the City of Austin, and City of San Diego, and data.gov. They’ve shared their best data sets on Data.World. And by “best,” I mean the ones that get the most number of views and downloads.

0:55:25 BH: One of the things that’s really fascinating to me about this space – given that this is an analytics-focused podcast, if you will – is that this space of open data is kind of where analytics was when I first started Coremetrics back in 1999. The main metrics people were using to measure success on their e-commerce site were page views and unique visitors. Believe it or not. Maybe you guys remember that. And Coremetrics came along and showed how to measure conversion. It showed you how to measure SEO – although I don’t know if we called it “SEO” back then, I think we called it “natural search” back then. Showed you how to measure a banner ad from double click. It showed you how to measure a shopping cart redesign. It showed you all of these things that you never had an insight into before. And it was phenomenal. The epiphany, the light bulbs, it set off.

0:56:30 BH: I remember going to some clients, and they were like, “Brett, you don’t understand how big this has been for us. We used to spend five hours every Monday debating what goes on our homepage because we think like a cataloger. And Coremetrics came along, and we found out only 5% of people ever even clicked through on the image that we spent five hours debating every Monday. We just stopped doing that entirely. ‘Cause it just doesn’t matter. It doesn’t move the needle at all. And we didn’t know. ‘Cause we didn’t have even the basic click-through information. We had no idea whether or not people are clicking through on that image and never buying anything.” And so it was a massive epiphany, and it brought me to this saying, where I started to tell everybody in the industry, “Analytics is liberating. It’ll liberate you from all that wasted time that you’re spending debating things that may not matter at all.” In the open data space, with over 18 million open data sets in the world, you know what the primary metrics used to measure success are? How many page views have you gotten, and how many people downloaded.

[chuckle]

0:57:42 BH: We’re back in 1999. Except this time around, there’s more money spent on producing these open data sets than could be imagined, and the intent and the nobility around it is so awesome. It’s so awesome that if you get a grant from the National Science Foundation, or you get a grant from the Bill and Melinda Gates Foundation, you are required for a period of seven years to share the data that you produced based on that grant. And why is that? They want there to be this multiplier effect, that everybody can take that data and build on it. But if it’s not well-understood how you do that… And frankly it’s not. As we’ve traveled around and talked with various people at universities getting these grants, they’re like, “Well, that means to me that I’ll email them an FTP link.” Or “That means to me that I’ll ask for their address and send them a USB stick.” And so there’s no community built around that.

0:58:45 BH: And if they’d put it on Data.World, guess what? You’d be able to see the queries people are running against that. Everybody would be able to see the queries. You’d be able to document it once and for all, and say, “This is the gold standard of my grant data set that I produced, and everybody can build derivative work.” You’d be able to have discussions around it. You’d be able to see how these different grants link together – these different grant data sets – link together. And forever, that would be documented and understood. And that’s the intent. That’s the multiplier effect. That’s what they’re getting after in the first place, that a lot of our partnerships have come from that place where they say, “I need to understand how people are using my data. I want that knowledge. I wanna understand how people are using my data, because I wanna be able to service them better.”

0:59:38 BH: That was an exact conversation that we had with the CEO and the CMO of the US Census, which is a massive organization. Billions of dollars of budget. And there’s companies that have been built on top of US census data. As Jon mentioned earlier – he didn’t name the companies – but like ancestry.com, zillow.com. There’s many companies that have been built on top of US census data. It’s created a huge amount of commerce and all types of inside. And it’s a highly linkable data set. It’s a gold standard, if you will. But those types of things have happened in silos. Like Ancestry’s benefited – and I would argue that people have benefited from using Ancestry – but imagine if all that was open. Imagine if all of that could link together. And that’s what we’re building.

1:00:32 MH: Yeah, I can begin to imagine that. And that’s what’s been sort of fun, is just exploring through some of the data that’s just starting to show up in Data.World. You can kind of see some really, really cool stuff starting to take off. So as you guys start to see this really ramp up and go to scale, it’s very exciting to kinda see where it will go. While I’m sure that, well, honestly we could probably keep going for another hour, but we do have to wrap up. One of the things we do on this show, we do a last call. We just go around and talk about something we found that’s interesting in the world of analytics or data. So I don’t know who wants to start, but what’s your last call?

[chuckle]

1:01:16 TW: I can start, ’cause I gotta quick one. I’m gonna do a twofer. One is that we… UnSummit, our friend and former guest, David McBride, who runs UnSummit right before Adobe Summit every year, that is coming up on March 21st. I will not be attending, but Michael, I believe, you are on tap to do a little presenting?

1:01:36 MH: We will see. As of this moment – this recording – we don’t have it totally ironed out. But I am really excited to be part of UnSummit this year…

1:01:44 TW: Well, you heard it here first, so you can start asking awkward questions while he’s not there. So that’s March 21st in Vegas. [Editor’s note: This is not correct. It is on March 20th.] If you’re heading to Adobe Summit, the day before is always pretty cool. My actual – like official – last call is something that actually got published back in October of last year, and it was Tom Davenport wrote not too long of a piece called “The End of Analytics,” that I did not see until this year, ’cause it got reposted on LinkedIn. But he went to Dreamforce, and I think that the gist of the article was that he was running into people who were saying, “People wanna crawl, walk, and run with analytics, and we’re only crawling right now. But hey, with machine learning and AI, we can just skip the walk, and we can jump straight to just getting glorious insights.” And I was a little nervous that he was gonna say, “Yeah!” But he actually said, “Easy there, cowboy.” It’s a pretty short article, but I was kind of fist-pumping a little bit with what he said. We’ll throw that in our show notes.

1:02:46 BH: I was gonna jump right in if he said something different, if he actually said that would work.

[laughter]

1:02:51 TW: That’s the sense of what happens as you’re reading it, is he’s kind of building it up and it’s not that long of a piece. And I’m like, “Oh, my God, no!” And he was like, “Yeah, uh, no.”

1:03:04 MH: What about you, Brett and Jon, any last calls? Besides this cool, new data platform I’m sure you guys have heard of.

[laughter]

1:03:12 JL: I’ll actually give a plug to one of my favorite communities in Data.World. We were lucky enough to get involved pretty early on at the company here with a wonderful local data scientist named Jonathon Morgan, who does incredible work on a lot of open data projects, in addition to running his company. I don’t know where he finds energy and time for it, but he recently launched a new community called “Data For Democracy.” They’ve been using the Data.World platform a lot as a way of understanding the data that they’re collecting.

1:03:47 JL: But the whole mission of Data for Democracy – and Jonathon announced this just before the holiday break this year, and this community had already has well over 300 subject matter experts, analysts, and data scientists working in it – that, in a world of fake news and feelings, they’re trying to ensure that we are bringing data to the table when making arguments in a lot of our civic discussions. And these guys, the momentum is unbelievable in this community. I couldn’t be more excited. I am humbled that they have decided to use Data.World, even in a small way. Just that little bit of use to feel like we can contribute to a movement like that is absolutely incredible. And for any of you – your listeners – that wanna get involved, check out Data for Democracy, just Google for it. It’s pretty easily findable at this point, and they’re doing insane work.

1:04:55 MH: Oh, that’s outstanding. Alright, what about you, Brett?

1:04:58 BH: Yeah, so my last call… First of all, it is really cool to see what’s happening in the Data.World community, I’m like a kid in the candy store when I’m watching it. And I’m watching like, classrooms spin up, and I’m watching different communities – like Data for Democracy – spin up. It’s really working, and that’s a beautiful thing. But really, my last call would be around Certified B Corps or public benefit corporations. I think this is a big trend. I think the world of open data is a really big trend. Obviously, that’s what the company does. I think it’s gonna be really normal – 10 to 15 years from now – that companies get started up this way, where mission is literally at the center, legally, in the company’s actual formation. And that you publicly have a requirement to report how you’re fulfilling a public service, how you’re addressing a public need. And that’s something that makes me really proud. That conversation among us as Co-founders when we’re deciding to become a public benefit corporation, that conversation took all of like five minutes.

1:06:15 BH: And it was just such a cool trace-back to our very early brainstorms – where we’ve all been successful entrepreneurs, we’ve all been in multiple startups – where we said, “Let’s do something that really makes a difference in the world.” And a lot of people think that a “B Corp” means a non-profit or something like that. Warby Parker is a B Corp. Ben and Jerry’s is a B Corp. Patagonia is a B Corp. I think anybody would love to be a [chuckle] shareholder in those companies. You can build a incredible business with mission and soul at its core. And mark my words, this is gonna be a very normal thing, where people are gonna ask you, “Why are you not a public benefit corporation?” Whereas right now, primarily, people ask me why we are. ‘Cause they just don’t understand it. It’s too new for them to have heard of, and there’s very much a follow the herd mentality when it comes to entrepreneurship, or BCs, where you set up as a Delaware C Corp, and that’s just what you do, and everybody’s done that before. And that’s a real miss. ‘Cause public benefit corporation has all the same legal advantages and tax advantages and everything else of a C Corp, but with the added bonus of mission really mattering.

1:07:38 TW: That’s cool. I’ve heard of B Corps, and I knew Patagonia was one. But I hadn’t really stopped and thought about ’em.

1:07:43 MH: Yeah, I know that is… I completely agree, and I agree there’s more and more of them. And honestly, I feel like it’s an appealing course to take for our time. I feel like it’s an idea whose time is…

1:07:56 BH: I got a great FAQ on what they are, if your listeners are interested in. I’ll send you the link.

1:08:02 TW: Sure, we’ll get that on the show notes. Yeah. What’s your’s… What’s your last…

1:08:06 MH: In the spirit of open data and open platforms, my last call is actually something that’s also just been opened, and it’s really interesting, ’cause you don’t often see this happen. So, Intuit had built their own A/B testing capability technology, and recently open-sourced that to the rest of the world to give everyone who wanted to use their tool access to deploy it and use it in whatever ways they saw fit. So I thought that was very interesting in sort of a, “Hey, here’s an A/B testing tool. We’ve kinda kicked the tires on it with our organization. And now, if anybody else wants to use it, you go to town.” And I think that’s just a very interesting thing to see happen in the analytics world. We don’t often see this. So I thought that was…

1:08:52 BH: Oh, cool. I didn’t know that.

1:08:54 TW: What’s it called, Michael?

1:08:56 MH: It’s called the “Wasabi.”

1:08:57 TW: That’s ’cause it’s hot! I’ve been waiting for that. Come on!

1:09:03 BH: Come on. That’s really neat.

1:09:06 MH: It is neat, and knowing some of the folks, like Dylan Lewis – who’s been on the show – there’s some pretty smart analytics folks that have been probably instrumental in advising and seeing this tool be built, so, it’s fascinating from that perspective. I think it’s probably pretty solid tool. Anyway, if you’ve been listening, you have probably noticed that Tim and I didn’t do a nearly adequate job asking all the questions that everyone was thinking of with Brett and Jon. So we would love to hear from you about this show. You can reach us easily on the Measure Slack on Facebook, on Twitter. And I’m sure Data.World is also active in social media, so I’m sure you can reach out to them that way as well. Jon or Brett, do you guys have a Twitter account that people could easily get to?

1:09:57 JL: Yep, it’s DataDotWorld with “dot” spelled out. So, D-A-T-A-D-O-T-W-O-R-L-D.

1:10:07 TW: Ooh, that puts a hurt on some 140 characters.

1:10:08 MH: Perfect.

1:10:11 TW: It’s almost as bad as “Analytics Demystified.” I’m glad we don’t have a Twitter account.

[laughter]

1:10:15 BH: My Twitter is just DataBrett.

1:10:18 MH: DataBrett? So, that’s perfect and…

1:10:20 TW: With two Ts?

1:10:21 BH: Right. Two Ts.

1:10:23 MH: We’d love to hear from you. This is a very fascinating thing, and I think it is an idea whose time has come. Brett and Jon, thank you so much for being on the podcast. It’s been a pleasure having you, with the work you’re doing, I clearly see what you’re doing here and it’s very exciting. And we’re excited to see where you guys take it from here, and I’ll be following. And not because I know anything about data, but because I really care about data, and where it’s going. Anyway…

1:10:53 BH: Well, our goal is to make Data.World a platform where even if you just care about data, there’s a use. Maybe one day, you’re coming there and doing nothing else but sharing a cool visualization that someone has done on something that you do care about. And then everybody else in the world will have access to the underlying data to do more with it.

1:11:18 MH: I have already shared a Data.World data set during the show.

1:11:22 BH: Very cool.

1:11:23 JL: Wow, thank you. [chuckle] And thank you for having us, guys. This has been really incredible. Thank you. Yeah.

1:11:29 MH: So that’s great. Well, for my co-host, Tim Wilson, keep analyzing.

[music]

1:11:38 Announcer: Thanks for listening, and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour, or @analyticshour on Twitter.

[music]
[background conversation]

One Response

#069: The Biases of the Analyst - The Digital Analytics Power Hour says:

August 15, 2017 at 6:32 AM

[…] Brett Hurt (Digital Analytics Power Hour podcast episode) […]

Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

SUBMIT IT HERE

Recent Episodes

#300: Are Semantic Layers Really Necessary?

June 23, 2026

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_300_-_Are_Semantic_Layers_Really_Necessary.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares