Did you hear the one about how the AI eliminated cancer? It just wiped out the human race! As machine learning and artificial intelligence are woven more and more into the fabric of our daily lives, we are increasingly seeing that decisions based purely on code require a lot of care to ensure that the code truly behaves as we would like it to. As one high profile example after another demonstrates, this is a tricky challenge. On this episode, Finn Lattimore from Gradient Institute joined the gang to discuss the different dimensions of the challenge!
- Courts are Using AI to Sentence Criminals. That Must Stop Now
- Matt Gershoff
- YOW! Data Conference
- Type I and Type II Errors
- Confusion Matrix
- The EU releases guidelines to encourage ethical AI development
- (Python) scikit-learn
- ODSC – 25 Excellent Machine Learning Open Data Sets
- Mark Edmondson’s compilation of all of his past presentations
- NIPS 2017 Presentation – Fairness in Machine Learning
- Gradient Institute blog
- As concern over deepfakes shifts to politics, detection software tries to keep up
- Darin Young – Mixt Media
- Josh Crowhurst
00:04 Announcer: Welcome to The Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website analyticshour.io. And now, The Digital Analytics Power Hour.
00:27 Michael Helbling: Hi, everyone, welcome to The Digital Analytics Power Hour. This is Episode 118. Imagine for a moment that you and your sibling are being chased by a robot sent from the future to stop a human uprising against the machines. Finally, by some chain of events, you are both cornered but the robot only has time to shoot one of you. Which one should it shoot? Okay, that’s more likely something out of a movie script, but what an ethical dilemma, eh? Even though that scenario is unlikely, and evil robots from the future will probably never grace this timeline, we do have to grapple with the choices we make as we make computers smarter and smarter. I think as Pedro Domingos said, “People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid, and they’ve already taken over the world.” I don’t even know what that means, but that’s the level of this conversation. We’re gonna be talking about ethics in AI. Hey Tim, are you excited to talk about it?
01:31 Tim Wilson: I am. I’m intrigued to think about how and where this conversation may go.
01:36 MH: Yeah me too. Hey, Moe. Knowing that we might not actually discuss anything to do with big query on this episode, are you still game?
01:43 Moe Kiss: Oh my God, I’m so pumped. This, it’s like…
01:45 MK: This is my bread and butter. I love this stuff.
01:48 MH: I’m super excited, too. Okay, well, none of us knows anything much at all about AI ethics, so we needed a guest, somebody who can shine a light on what is this murky topic. Finn Lattimore is a data scientist at The Gradient Institute, and she has also held data science roles at Ambiata and other places. She has her PhD in Computer Science from the Australian National University, and today, she’s our guest. Welcome to the show, Finn.
02:15 Finn Lattimore: Hey, everyone. It’s a pleasure to be here.
02:16 MH: All right. Well, the first and most important question is what are we even talking about? AI or ethical AI? I think we need to just do a quick breakdown of those two words, maybe just to set the table for this.
02:32 FL: This turns out to be a super hard question. Even defining what AI is, let alone defining what ethical AI is really challenging. In general, you could say that AI is an algorithm that learns from data, so the behavior of the algorithm adapts depending on the data that you feed into it. This is the broadest overall definition of artificial intelligence. But that in itself can encompass a huge range of things. Anything from the really sophisticated stuff that big companies like Facebook and Google are doing to personalise what you see in your search results, or what you see in your Facebook feed, to a little macro someone has in an Excel spreadsheet, to a sensor in a smart toaster. And when you start to talk about the ethics of these things, it really doesn’t make sense to treat all of those things the same way.
03:24 FL: Perhaps the best way to think of this is, AI is a very general technology the same way that steel is a general technology. You could have a discussion about the ethics of steel, there are certainly bad things you can do with steel. You can make sharp knives, you can make guns, but there are also… You can build bridges with steel. You really need to think about what is new about AI, what are the places in which we should be concerned about AI. Where could it do real harm, where could it do real good, and how can we leverage the huge power that some of the sophisticated algorithms we’re starting to see, at the same time as protecting ourselves from the worst case scenarios.
04:04 TW: Is the challenge that it’s not a James Bond evil doer who’s necessarily developing unethical AI… That’s presumably a concern as well, but it’s the fact that… I think even like your steel analogy, that you can make a knife with the intention of slicing tomatoes very effectively, but that knife can also be used as a murder… Could be sold with the intent of use, but then in actuality could be used for stabbing someone. And is that, from doing a little bit of research on the topic, it seems like that’s the challenge. There’s generally a good intent, but then there’s unintended consequences, bias that’s built that inadvertently creeps in and does things where we say, “Oops, didn’t really see that coming.” Is that kind of the root?
04:54 FL: Yeah, absolutely, I’d say that the biggest cases that we’re seeing where we’ve found real problems with the way that algorithms are behaving, were not intentional. Certainly, you can use algorithms to deliberately manipulate things. There are concerns around things like deepfakes, which allow you to take a video of a person and make it sound like they’re saying something that they never said. There are certainly concerns around false news and malicious use of these types of technologies. But many of the other cases that we’ve seen around algorithmic decision-making around bail decisions or cases of facial recognition systems not working well for all groups, they really are inadvertent. The designers didn’t intend that to occur, but AI is really complicated. And because you have both the algorithm plus the data and you don’t necessarily understand fully how these things are going to interact until you deploy it, it’s very easy for unintended consequences to occur.
05:54 MK: At the moment, are most of these errors getting picked up just by random finding them? Is there actually a structure in place to help companies discover these problems or is it just someone stumbles on it?
06:11 FL: At the moment, it is more someone stumbles on it or some external person will do some analysis or some testing of an AI system and say, “Look, here’s this. Did you realise this was the case?” And then go back to the company and say, “Look, you need to fix this.” Companies are working towards trying to construct processes to catch some of these issues, but it’s very much in its infancy and still an ongoing research area as to how you design AI systems to be more robust and to avoid these kind of problems before you deploy them.
06:41 TW: So you glossed over… You referenced in passing, a bail decision. Is that one you can talk through? I feel like that’s one that crops up reasonably often and is an example of where unintended consequences happen. Is that one…
06:56 FL: There’s the infamous COMPAS case which was an algorithm used to score people for risk of further re-offending. And it wasn’t actually making the decisions itself, but it was being provided to judges as something that they might use in their decision making. And it turned out to be biased against African-Americans in a particular way. Basically, of the group of African-Americans who did not go on to re-offend, they were more likely to be labeled high risk than Caucasians who did not go on to re-offend. And that clearly raises big concerns because we’re talking about really life-changing decisions as to whether or not you get bail or if you’re using this thing for parole, whether or not you get parole.
07:44 MK: And then whether or not you even get bail can also then determine whether or not you re-offend. Yeah, it’s terrifying.
07:53 FL: Oh absolutely. There’s all kinds of potential reinforcing consequences of this kind of error.
08:02 TW: It would actually make it look like it was working even because it was actually introducing a reinforcing feedback loop, potentially.
08:10 FL: This can happen. And this issue has also been raised in predictive policing systems whereby an algorithm essentially identifies hot spots and says, “Look you should target police in this area, because there’s been more crime there.” But when you send more police to a given area, then they’re also more likely to find crimes occurring, minor crimes. And so that will then make it more likely that that same area will be targeted again in the future and so on, whereas another area which might also have had minor crime going on, if those crimes aren’t serious enough that somebody actually calls them in, then they’ll never appear in the data in the first place. And so you certainly can get these self-reinforcing loops.
08:51 MH: Yeah, so that example is one that makes sense in that it’s a projection of some inherent bias on the… That probably came from the people who made that algorithm in the first place.
09:04 TW: Well, or it’s there just… It’s reflecting systemic…
09:08 MH: Yeah, or systemic bias that exists of the system, yeah.
09:09 TW: Bias that already existed within that system.
09:13 MH: Matt Gershoff, who is a great friend of the show, and has been on a few times and everything, made a comment one time about how machine learning is basically just a force multiplier. But if you’re doing something stupid, it’ll just make stupid bigger. And so this feels the same to me. If you’re already inherently a little bit biased, it’ll just make that bias more pronounced. Am I getting that right or is there more to it than that? Is it not that straightforward?
09:36 FL: So the COMPAS case actually illustrates a really interesting challenge to do with what can happen if there are biases in society the way it’s set up at the moment or statistical differences between groups. I explained before how it was biased against African-Americans in a particular way. The company that developed this algorithm then defended it by pointing out that a label of high-risk in the system did mean the same for everyone. All those who got classified high risk, the recidivism rate was the same for African-Americans and Caucasians. And it turns out that you can’t have fairness in both of these ways simultaneously, if there’s a difference in the baseline rate of which people offend. And so, if, due to all kinds of historical disadvantage, all kinds of things that happened before somebody reaches the criminal justice system: Poorer education, poorer opportunities earlier on, you have a situation where the re-offense rate is different for a minority group to another group. Then you’re going to have to accept that one of these two ideas of fairness will be violated to some degree.
10:50 MK: So what did they do? How did they… They just defended it and that was it?
10:54 FL: Well so I don’t… I think the COMPAS debate is ongoing. Some places are not using it, other places are using it. But the interesting thing about the chronogram is that it occurs regardless of whether you actually make the decision with an algorithm or with a person. When you look at the outcome statistics, one of these two things will be violated even if you just have a judge with no algorithm making this call. One of these two metrics of fairness will be violated unless you can remove the difference with the people coming into the system. If you wanna fix the problem, you have to fix it in terms of the differences in the groups before they reach the criminal justice system. Once you arrive at looking at adults with… And there’s a difference in the baseline rates between two groups, you can’t be fair anymore because you’re already in an unfair system.
11:46 TW: But that was one of the fundamental whiffs. There are times where it’s like, “Oh, the data is objective.” In God we trust, all others bring data. I think that part of what that has surfaced is this idea that the data is objective, the data is just facts. So therefore, if we feed it into the AI, then that will inherently help the judges because it is an objective measure. I think that’s how a lot of these wind up is that, it’s looking for a shortcut around cultural or societal or historical challenges and issues thinking, “Now that we’re doing this with data, we’re fixing it.” And I think the COMPAS case has raised up that, well, no. That AI is still working with the data that was historically generated. It was working with any biases that the creators might have had. It’s not just, “Oh, now we’re using an algorithm, now we’re objective.” Right?
12:46 FL: That’s absolutely true. People would tend to go one of two ways, when thinking about AI. Either to say, “This is the magical computer system based on objective data, therefore everything that it says must be correct and we should accept it whatever it says.” And the other extreme is to say, “I don’t trust computers. We should never ever have computers making important decisions in people’s lives.” And both of those extremes really don’t help us solve these kinds of problems.
13:14 TW: Well, and the distrusting ones because models are always inherently probabilistic. If you even have a very good model, you can still always go find anecdotes where, “This said low risk of recidivism but this white person re-offended, therefore the model is bad.” I think there’s a… I think I saw somewhere a… I can’t remember what kind of algorithmic literacy or something was… This recognition or understanding that you can’t judge the system by saying, “Let me find anecdotes where the model was wrong.” The model is always gonna be wrong in places. If you have a perfect model, something ain’t right. But it’s easy to say, “Look, the model got this case wrong, this is terrible.” That’s not really the way to evaluate whether there’s a bias or a quandary. That may be a surface, a clue that maybe there is, but then don’t you have to go with a more methodical means of assessing. Are the error rates different amongst groups that we’re concerned about that sort of thing?
14:18 FL: Absolutely. No, in these kinds of complicated decision-making problems, nobody knows the perfect answer for every individual. There is a tendency to compare models with somebody who got it right every single time, but the reality is that judges don’t do that either. Judges release people on parole who then go on to re-offend. There just isn’t enough information available to know for sure whether a particular person will or will not go on to re-offend. Whether or not we truly have free will, we certainly can’t predict human behavior perfectly 10 years in advance.
14:51 TW: Judges pass harsher sentences right before lunch, when they’re hungry, than right after lunch. There was this study that showed that, so there are definitely the human foibles even if they’re trying to do the best best job they can.
15:03 MK: Finn, I saw you present at Yale Data Conference and one of the examples that you talk through that I thought was really interesting was about loans, let’s say, loans from a bank. And in particular, I guess, that there were different perspectives from different stakeholders. Different stakeholders, in that example, I guess, have different outcomes that they’re seeking. Can you talk us through that a little bit?
15:28 FL: Absolutely. This comes back again to saying there are multiple different things that you might care about in terms of a decision-making system. And different people will put different weights on these different factors. In regards to COMPAS, for example, there are two different fairness metrics that we can’t satisfy simultaneously. There are also other things you might… You care about the number of people who go on to re-offend in the community. Somebody out in the community might put a high weight on making sure that people were going to re-offend are not released. That’s one stakeholder that people in the community who don’t wanna see further crime in their area.
16:06 FL: Another stakeholder will be, that’s different people in that same community, who want their partners or children to be released if they have not committed a crime. And same for those individuals. Then you have the government who has a desire to ensure that jailing is cost-effective. They don’t want to put people in jail who won’t re-offend, but they don’t want to release people who will. And then you might have a community as a whole, who doesn’t want to be misrepresented, in terms of the proportion of people jailed and further reinforce stigma against that group. There’s a whole lot of different people who will want different things that are often in conflict with one another. And when you’re designing an algorithm, or for that matter, any decision making process, it’s important that you try and consider as many of these things as possible, so that you don’t end up just optimising really hard for one of them and as a consequence, really disregarding everything else.
17:04 MK: So in your work as a data scientist, is that part of the process that you have to go through when you are building a new model for some particular piece of work, really understanding those different perspectives? Is that essential in your process?
17:22 FL: Yeah, absolutely, and this is particularly important when it comes to AI or algorithmic decision-making, and the reason is basically, the way that AI works is it optimises for the thing you ask for. We sometimes describe it as like a genie, you ask the genie for something and it’ll give you exactly what you asked for. But as all of the stories of genies in them go, you need to be really precise about what it is you’re asking for, or what you actually get is not what you genuinely wanted. And AI is the same, you tend to, whenever you’re trying a machine learning algorithm, you take some data but you also encode what you want in terms of the loss function you use and the label that you choose to optimise towards. And if you pick one thing, like profitability for a company for example, and optimise really heavily for that, the algorithm will ignore anything that’s not explicitly encoded in it, so it will ignore social norms, and so on. It’ll heavily optimise for profit. And then, if you’re not careful, you’ll find out, that it’s detected that all the people in vulnerable circumstances are less likely to check the interest rate they’re being paid is reasonable on their accounts, and so you drop that interest rate way down. And they never contact the bank. And so you increase your profits. And no human has deliberately decided to do that. The algorithm has just figured that out and done it.
18:48 TW: But do companies in general, and I think I actually read this on a blog at the Gradient Institute’s site, was making the point that you have to, when it comes to who determines what’s ethical, that that’s actually a thorny question, that if you’ve got a publicly traded company, and this is now me projecting on top of what I read, but if it’s, you’re just driven to maximise shareholder value, that’s the simple thing that gets floated out and then you wind up in these thorny areas, because sometimes you have to do things that maybe harm a group or aren’t necessarily best for society. All of a sudden, you wind up in a political or cultural debate and you haven’t even gotten to the AI ’cause you… Part of that post was saying that, “Hey, you actually are gonna have to build these rules, which means you’re gonna have to be very precise about what guardrails you’re building,” which means, to your point of the genie, you have to actually have the discussion. I don’t really know how companies do that now. How do companies determine that they’re just generally behaving ethically? They need to make sure they’re not breaking the law but does that become part of the discussion?
20:12 FL: It certainly does and it’s obviously a very difficult issue. And as we said, we hear many cases of companies not behaving terribly ethically and then there’s further regulation, and further discussions. I guess one thing about using AI is that you lose some of the safeguards you have when you have individual people making decisions. If you’re imagining that you’re the same bank and these decisions about which loans to give out to people are being made by individuals, then even if you’ve optimised or incentivised those individuals, to bring in profit to the bank, they will hopefully have some kind of ethical guidelines in their head so that when somebody rings up, and they’re clearly struggling to understand what you’re telling them, you may decide not try and sell them on that high interest credit card after all. This is quite imperfect. We know from various banking scandals that if you incentivise people…
21:09 TW: Hello. Wells Fargo.
21:12 FL: If you incentivise people badly, then they will break what would be widely considered strong ethical norms, but AI will be entirely willing to break them because it is not aware of those norms in the first place.
21:25 MK: I guess for me, the thing that I constantly come back to is whose responsibility is this because I feel at the moment that analysts and data scientists are bearing a lot of responsibility for what I think should perhaps be coming from a different place of leadership. For example, the company team, the leadership team or their values or… I feel like it’s on our shoulders at the moment, and I’m not sure it should be, or government or I don’t know, who do you feel is wearing responsibility?
21:58 FL: You’re absolutely right, that individual data scientists on the ground and analysing the data should not be responsible for making these ethical trade-off decisions. What you want to do ideally, is engage as many people as possible in that initial stage of thinking, “Okay, what is it that matters? What are the possible outcomes of this system that we care about? What could be the bad things, how would we write those?” You then design a system, you run it forwards and say, “Look, here is what the outcomes of this system would be under these various different levers that we could trade-off against one another.” You have to then find a way to communicate that to the senior decision-making people within your organization. If you’re a government body, then ultimately that will be the minister, but if you’re a company then this will be the CEO, or people senior reporting to him and say, “Look, we could trade off this against that. How do you think it should go? Have we missed anything? Are there are any other issues?” And then you go back and you iterate the whole process again.
22:58 MH: Yeah, and article 22 of GDPR actually addresses this. Basically, your data rights are such that you are allowed to say, “I do not wanna be included in any automated decision-making about me from either a business or legal perspective.” There are some of the things that are emerging in regulatory structures today, and so that’s where I think maybe this is one of those ones where everybody’s like, “Well then how do we do neural networks on anything, if we have to also then keep in mind consent and a description of how all those choices were made and those kinds of things?” It’s maybe at least a break on some of those things for the time being. But they obviously saw that that was a challenge and so they’re trying to address it. Whether we like GDPR or not, is a different issue.
23:51 TW: But Moe, back to your… I wonder if another way to look at it is, is an undue burden being put on the analyst to look out for this? Maybe, maybe not. How many times as analysts do we find ourselves enabling setting KPIs? That no, our business stakeholders should know that, but the analysts can raise, then I think Finn, the way you just outlined it heads in that direction. If the analyst or the data scientist is saying, “Okay. I’m working on this, but hey, I need to run through this checklist. We need to make sure we’ve got the right guardrails in place.” Starts asking the questions and saying, “Look, to do this, I need to figure out what business rules are there, which can then open the conversation to the more business-oriented discussion. What are our constraints, what are our guidelines, what do we need to protect for?” As you were talking it, I was starting to see that parallel that it’s just like, since time immemorial, analysts sometimes wind up driving discussions around the alignment, “What are we actually trying to get this campaign to do and how are we gonna measure it?” This has, I think much, much more serious broad-reaching ramifications, but I don’t know. I could see the people who have to actually account for it could actually be at least sparking the discussion and elevating awareness that it actually does matter or we could wind up on the front page of the New York Times with egg on our face if we didn’t think about it.
25:23 FL: The other thing I’d say in terms of Tim’s earlier comments about will companies actually trade off any profits against these kind of other objectives. When you actually expose these trade-offs in practice, it’s very frequently the case that your profitability region is really quite flat and some of the worst violations of ethical norms will actually have very marginal impact on the profitability. You might be willing to sacrifice 0.1 of a percent of your profit in order to avoid really exploiting an extremely already marginalised community. And these kind of trade offs, usually company executives were pretty happy to make because indeed, you’ve got a tiny amount of profit being traded off against the risk of appearing on the headlines of some major newspaper, even if you are solely concerned about profits for shareholders, that’s a fairly easy sell to make. But if you’re not aware of that trade-off, then you can just inadvertently end up ignoring it.
26:23 MK: I think the thing that I really personally struggle with and yeah, I don’t think I’ve mentioned this before, but when I left government and moved into working for private companies, the thing that I have most struggled with as an analyst, is that in government, I felt that when you were communicating to people, they wanted to know what you were uncertain about, and they were gonna make the decision. Ultimately, that was their responsibility. What I find in the private companies is often, stakeholders want you to tell them what to do. They don’t want you to have any uncertainty, they want a “This is why we should do this.” And I think that, yeah, maybe at the CEO level, they start to think about front-of-the-newspaper type situations, but I think largely, a lot of the problem is that people don’t want uncertainty from our analysts and our data scientists, and that this is a really tough conversation to have.
27:20 FL: That’s totally true. I have given presentations and been asked to “Remove the error bars, please.”
27:28 TW: Uh-oh, yeah. That’s a whole other topic of… Again, I think marketing, yeah, definitely, they get frustrated with the statisticians. “What do you mean? Just tell me the answer, what do you mean?”
27:43 MH: Well, and some of that isn’t always that person’s fault, because we’ve had a line of thinking in our world, that data and all these things will provide answers. And so, we’ve gotta start to shift it back to the actual way this should be working, which is, data is useful to inform and provide direction for what you’re choosing to do, but it’s not necessarily gonna just magically show you the way. And I feel like a lot of business people I’ve observed are sort of just sitting back, waiting for someone from the data department to come tell them what to do next. And that’s really bad, because there’s no magical thing happening with data that suddenly predicts what you should do as a company or all those things. This is the whole… There’s lots of challenges with that. I don’t think… I like your example, Moe, because I think it does highlight two different worlds a little bit, but I don’t think it comes from malice as much as it comes from incompetence.
28:48 FL: We may also be able to make progress by finding better ways of eliciting the weightings or importance that people place on different outcomes from them. The reason that we fundamentally would like to show uncertainty bars, is that they’re relevant when you’re deciding what action you’re gonna take in response to the data analysis, and how you use them depends on how sensitive you are to risk according to certain things. Now, if we could find better ways of getting people who don’t have a really strong understanding of statistics to indicate how concerned they are about different levels of risk, by ranking different outcomes or some other means of getting that information from them, then we can not have to show them uncertainty bars, instead we’ll say, “Well look, based on how you have ranked these various different outcomes in terms of loss, then we could say that this, on the basis of the data we have, is the best prediction.”
29:45 FL: I still think we need to communicate that there is uncertainty with data. I hate saying, “Run this model, it will make $6 million for your company.” When the reality is it will make between -1 and 20 million.
30:00 FL: People really have no idea. It reinforces this idea that data is a magical oracle of knowledge, but it is also unreasonable to expect to just hand over a chart with uncertainty bars and everything, and say, “Right now, you make the decision.” The background to use that.
30:15 TW: Well, but under the context of the ethics particularly, there’s also the Type 1 versus Type 2. If you take our bail example or the bank loan, what’s more costly? A classic confusion matrix, cost-benefit matrix calculation, is it really, really, really, really bad to have a false positive? But a false negative is maybe a little bit of lost value, but no one’s gonna get super crazy, or is it actually flipped? Which I think goes back to… That’s a little bit nuanced. People were expecting… How accurate is the model? It’s like, “Well, I built a model that is 97% accurate able to predict who will convert on this website with a 3% conversion rate, I’m gonna predict nobody is gonna convert.” The e-use, ethics guidelines for trustworthy AI, is that something that…
31:09 TW: Another thing I stumbled across were these governance principles that said, “An analyst can walk through and ask these 15 or 20 questions, and look at the different dimensions of ethical considerations.” Is that something that is widely recognised and acknowledged? Is that… We’re up to this is… We’ve got the Australians and the Americans, and I’m asking about something European, but is that… That was just something I found, I’m like, “Oh well, that seems actually kind of clever.” It is an attempt to come up with a framework against which any of these projects can be held up. Have we considered all of these things, have we answered these questions? Is that something that’s gotten traction, or is that just lightly introduced and I stumbled across it through Google?
31:55 FL: There’s a whole lot of work recently on constructing frameworks, and the EU one I think is actually one of the best ones around. They’re still quite high-level, in general, I would say. What I haven’t seen yet is something that’s really concrete enough to take somebody who’s working as a data scientist and go, “These are the things I clearly must do.” They give you high-level things that are good to do, and that’s excellent as a guideline. But they don’t yet address the really tricky issues around, how do you trade off different types of fairness metrics, for example? And some of these questions are really hard because they haven’t been legally addressed yet, either. Even when you do bring these considerations up to senior management, it’s not clear how you stay within the law, no matter what you do in certain circumstances. And part of the issue here is that by using algorithms, we’ve actually made transparent a whole bunch of issues that were not previously transparent. Once you start using an algorithm to solve a problem, you make it really concrete and replicatable and auditable, and this is really exposing issues that were always there, but a little more hidden when humans were making those decisions. And the law and regulations are struggling to catch up.
33:10 MK: Just out of curiosity, what’s your perspective on… Let’s say there is, I don’t know, gender bias in a recruitment system or the type of ads that are being shown to someone. How are you in your work… Do you get input from your stakeholder? Is that how you decide how to manage them or do you build something that is unfair or fairer to deal with the unfairness? I’m really struggling today.
33:38 FL: The first step is that when you’re designing that system, in order to detect, for example, that there’s unfairness across different groups, cause that means you’ve already considered how your errors are distributed across different groups is something you care about. The first step is to make sure you actually check that at the end of your processing. In many cases, when people deploy algorithms, they’ll just look at the final accuracy metric, for example, on whatever they chose as the target. And so then, you won’t even know if it’s unfair across different groups. The first thing is to surface, “Look, it seems like the way that this algorithm is working, we’re showing ads for high interest credit cards to minorities and we’re showing ads for the better rate cards to other groups.” And then you need to take that back to whoever asked you to build this model and say, “Look, because of the data that we have here, this is an outcome that we’re getting. There are a whole range of things that we could do to mitigate this issue, but it will have these other concerns in terms of trade-offs. What are we gonna do about it?” It is… These are very hard problems. There is no easy way to solve them.
34:44 TW: Well, and by asking the question, I’m now envisioning that it’s… As soon as you ask the question, all of a sudden, they say, “Let’s get our corporate counsel in the room so this is a protected conversation.” Because if you even raise the question that,” Hey, we may have a gender bias issue,” now you’re up to potentially from a… The risk mitigation could be, we don’t wanna talk about that because now somebody’s gonna recall in a deposition that we talked about it and we might not have fully addressed it and now they can point to it and say, “Look, under oath, this person said that they were aware this might be an issue.” I could just… Depending on who the executive is, it could be, “We don’t wanna have this conversation because we can plead ignorance. We’re creating a record, don’t put that in an email, copy our lawyer on it,” does that discussion happen? I didn’t come across anything about that but that’s where my mind is now going.
35:45 FL: People are concerned about that. There have been various proposals for some regulatory safeguards for companies that are using techniques with the intention of mitigating bias, certainly, understanding that things won’t be perfect. Again, this is legal and regulatory frameworks catching up, so the guidelines are far from clear at this stage. But the reality is starting to become that even if you don’t have that conversation, in a few years from now, a judge would be able to say, “You should have had that conversation because look, these issues about algorithmic bias are all over the headlines. How could you not check?” So yes, having these conversations are difficult. We do need to provide a way for companies to be able to have these conversations without being penalised just for thinking about it and trying to do the right thing. But avoiding thinking about it is not going to be a protection for much longer anyway.
36:40 TW: So the ethical thing to do is to think about ethics?
36:44 FL: [chuckle] That’s right.
36:45 MH: Well, or the bigger problem is companies who it never even enters their mind to think about it, and their algorithms and machine learning are happily going along generating all kinds of ethical concerns and problems. And it just never even… They’re just plugging away.
37:01 FL: Yes.
37:02 TW: Which is the useful… As things blow up for Facebook or Amazon, there’s value in having those cases bubble up. The question is whether average shoe retailer is thinking, “Oh crap, could this be me as well?” Actually I think FinTech might be a little bit ahead, at least is aware that they can’t… There have been enough issues of conscious discrimination historically that I would expect that financial services at least has some mechanisms and some internal responsibilities that there’s the kernel of starting with that. I could see…
37:40 MH: Well, there’s also more regulation in that, right? There are laws about discrimination in lending, for instance, in the US. And so whether or not we’re effective in enforcing those is different than whether or not people are kind of being thoughtful.
37:55 TW: Right.
37:55 MH: So it’s a stack of layered issues, right? Because the ethical part of it is just the starting point of like, “Hey, let’s set up a machine learning algorithm or an AI to do some of the cool stuff that we can do with computers now,” and not even have in mind that this could produce all these unintended consequences or outcomes that we’ve never even considered. To the point at which those become clear, then you need to take those right actions. But at the same time, I think you’re still… This is what’s important for business is, you still have to follow the laws that you are governed by. Even your AI does. That’s where everyone should be, if you’re gonna start doing machine learning stuff, you need to think really hard about what possible trade-offs or implications there could be from a regulatory perspective for you and your company. It gets real murky real fast.
38:53 MH: I was talking to a client the other day, and they were like, “Yeah, we’re passing this data into this system.” And I was like, “Does this data exist over in the EU, too?” And they’re like, “Yeah.” And I was like, “You’ll probably need to get with your privacy people and determine whether that’s a violation of GDPR or not, or how to incorporate that in a consent.” So it’s a whole… We’re in a whole new space, but actually I think the emerging regulatory structure is hopefully helping companies be more thoughtful earlier, which could help forestall the next huge ethical problem that emerges here.
39:31 TW: But they’re actually getting to the… Where you started was what the data scientists writ large get excited about is building a better model, that’s the engineering problem. Have I got this where it’s predictive. And what’s necessary, what’s this discussion is kind of on is, “Oh, yeah, and by the way, you also need to put on the hat of this really murky, messy unpleasant judgment call, squishy, uncomfortable area.” We’re just not gonna be drawn to that. That’s, “Do you wanna go eat the cheesecake or do you wanna have this bowl of celery?” And maybe there are people who really like celery, but…
40:10 FL: I really like celery…
40:11 TW: Good for them.
40:12 MH: Yeah, celery is good for you, and frankly, not all cheese cakes are good for you.
40:17 MH: I’ll take a bad slice of cheese cake over the…
40:20 TW: Over the celery?
40:22 MH: Over the celery any day.
40:22 TW: Maybe if we could take this celery and dip it in the cheese cake…
40:26 MK: Oh God they’ve lost it.
40:26 TW: That would be the most optimised.
40:28 MK: They’ve totally lost it.
40:30 FL: From a practical view, if you’re a data scientist, you care about making your model more accurate. The first small step you can do is to just consider eight different metrics instead of just one. And thinking about the headline-raising ones. You might have your standard accuracy metric that you always had. Then if you’re doing classification, you might look at your loss and you might just make a guess about, perhaps different ways of the thing and put that in and see how different the results come out. And then do your comparison across error rates across a couple of different groups. Even three or four different metrics and then just train all of your models and you’ll look at, you’ll have accuracy curves across the different things, and then you’ll be able to see, “Look, if I optimise really heavily on this one, suddenly it seems like this other one is dropping, maybe that’s something I should now hand off to whoever told me to write this model and say, ‘Look I’ve got my accuracy to 99%, but if you actually cared about finding the handful of positive instances, then it’s not getting any of them. Does that matter to you?'” And this would have already improve applied data science a huge amount.
41:35 TW: But we’ll go back to frustrating the people that said, “Wait, you started by telling me we had a good model and now you’re wanting to engage me in this deeper… ” I think it does go back to a data literacy or an algorithmic literacy…
41:48 FL: It does.
41:49 TW: That is a really tough shift. But presumably, that’s what organizations like the Gradient institute, that’s part of… I’m assuming there’s a level of trying to educate and raise awareness. I’m assuming why these things matter.
42:01 FL: Absolutely. One of our core principles that we’re trying to work towards is really trying to educate people around these issues. Both technical people, and people managing and designing AI systems from a high level as well as just the general public for in terms of understanding what it is about these systems that are different, and that it’s not that we’re talking about robots coming in to end the world, but these much more nuanced kind of issues.
42:27 MK: So do you think as practitioners ourselves… I’m just trying to think about… I do still feel a little bit the burden of ethics and AI still sits on our shoulders, even after this huge discussion. I’m trying to get to the point of what can we do to better, I guess, help this problem. Is it about talking to your peer that you’re working with? And I guess, asking them the questions about whether they’ve done any of that work in their own process to try and figure out if there are any biases to what they’ve built? What else can we do?
43:05 FL: Certainly, that would be the first… The first step will just be to think about it in your own work, to think about what might I have missed and if I just optimised for accuracy, how could I vision these other things that we might care about, how can I start having these discussions with my manager or the client, what works in terms of communicating those ideas to them. Tools and things are developing to aid these things, but machine learning, in some ways now, is super easy. You can download scikit-learn, you can fit a model, you don’t necessarily have to understand a whole lot about the internals of the way machine learning works to do that. And these wider issues about have you really translated the true problem you’re trying to solve into an algorithm well, is very neglected, I think, in the training that people receive in machine learning. And I guess that’s something we’re hoping to to change as well.
44:00 MH: Alright, Finn, this is really a great conversation and it has so many layers, but I really love that we’ve come this far. I know we’re barely scratching the surface, but we do have to start to wrap up. One thing we do love to do on the show is go around the horn and share what we call a last call. And it’s anything we found interesting in the last week or two. Tim, why don’t we start with you, what’s your last call?
44:25 TW: Mine is a simple little one that came out a while back from the ODSC and it’s just a blog post, it’s “25 Excellent Machine Learning Open Datasets”. And what I like about it is it breaks them down is to a couple of datasets for, if you wanna play around with natural language processing or if you want a couple of datasets, if you wanna play around with cinnamon analysis. If you wanna look at facial recognition or health or media. It’s just a handy little resource that I keep thinking, “One of these days, I’m gonna dive in and play around with some other data.” And I had basically bookmarked it because it seemed like a handy resource.
45:05 MH: Outstanding. Alright, Moe. What about you?
45:11 MK: Well, it’s a bit, I guess, it’s alright. I don’t know. Anyway.
45:18 MH: [chuckle] We’ll be the judge.
45:19 MK: Mark Edmondson tweeted a while ago that he had put together all of his presentations, and he was looking at how the topics had changed, and so then I started going back through all of his old presentations and was like, “Man, I wish I’d looked at these like two years ago, or a year ago.” The stuff he was doing in 2014, is the stuff that I’m just scratching the surface on now. He’s put all of this slides up to all of his talks and yeah, just going through them. I was actually getting a lot of really valuable content. Highly recommend digging up his tweet and going to look at some of his presentations.
45:55 TW: If I could be the man that Mark Edmondson was in 2014.
45:58 MK: Ditto.
45:58 TW: I feel like that would be a big step forward in my career.
46:02 MK: I hear you. All the way.
46:05 MH: Oh boy. Alright, Finn, what about you?
46:10 FL: I’d say if you wanted to get a more in-depth feel for some of these topics, there was a really great tutorial at NIPS 2017, now known as NeurIPS by Solon Barocas and Moritz Hardt. That gives a really good overview, a decent technical deck but without taking the equations too ridiculous. And we’ve also got some blogposts of practical examples of this on our website, which is gradientinstitute.org.
46:35 TW: Awesome. Nice. What about you, Michael?
46:39 MH: Well, my last call, I have a bunch of different directions. I recently read an article, which of course, now, I’m having a hard time finding the exact article I read, but it was specifically around this topic of AI, and specifically, deepfakes, and specifically, how they’re getting used more and more maliciously on individuals, especially in porn or with porn. And think about that, somebody taking your likeness and putting it out onto the internet that way is pretty mindblowingly abusive. And so, everyone should delete their Facebook account and get rid of all picture… No, I don’t…
47:15 MH: We have these huge emerging problems ’cause this technology and these things… It’s fascinating to understand what’s possible, but it’s also really scary, some of the things that are already bad, that are happening in the world, which will make the next couple of years kind of interesting. Alright, this episode has felt kind of like sort of a little bit of doom and gloom, but actually, computers are helpful. And you know what, there’s other things that are helpful, too, like you, the listener, and you’ve probably have been listening and thinking, “I have some really great thoughts or comments,” and we would love to hear them. Please do reach out to us on the Measure Slack, or on Twitter, or our LinkedIn group. We would love to hear from you.
48:02 MH: One of the things that makes this show great is we actually have a lot of folks who are behind the scenes helping out, specifically, our engineer Darren Young. And I’m excited to announce our new producer, Josh Crowhurst is also on board helping us out now. Alright, well, Finn, thank you so much for coming on the show, sharing your knowledge and experience with us and our listeners. Really appreciate it.
48:26 FL: Thank you so much for having me.
48:27 MH: Alright, and I’m sure I speak for Tim and Moe that regardless of whether or not there are ethical dilemmas ahead of us, we have to be brave and keep analysing.
48:43 Announcer: Thanks for listening, and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour, or @AnalyticsHour on Twitter.
49:02 Charles Barkley: So smart guys want to fit in, so they’ve made up a term called analytic. Analytics don’t work.
49:11 Tom Hammerschmidt: Analytics. Oh my God. What the fuck does that even mean?