Algorithmic decision-making and fairness with Sharad Goel

December 07, 2022

Algorithms quickly solve problems and are increasingly relied on to address nuanced social issues. So, can decision-making algorithms make better decisions than humans? What are the challenges to bringing a computational perspective to a diverse range of public policy decisions? Watch this Wiener Conference Call with Sharad Goel as he discusses how to make decision-making algorithms fairer and more equitable.

Wiener Conference Calls recognize Malcolm Wiener’s role in proposing and supporting this series as well as the Wiener Center for Social Policy at Harvard Kennedy School.

- Welcome to the Wiener Conference call series. These one hour on the record phone calls feature leading experts from Harvard Kennedy School who answer your questions on public policy and current events. Wiener conference calls recognize Malcolm Wiener's role in proposing and supporting this series, as well as the Wiener Center for Social Policy at Harvard Kennedy School.

- Good day everyone. I'm Ralph Renalli, of the Communications and Public Affairs office at Harvard Kennedy School. I'm also the host of the HKS Policy Cast podcast, and I'm very pleased to welcome you to what promises to be an interesting and informative Wiener conference calls. These calls are kindly sustained by Dr. Malcolm Wiener, who supports the Kennedy School in this and many other ways. Today we're joined by Professor of Public Policy, Sharad Goel, in addition to his Kennedy School appointment. Professor Goel is also a faculty affiliate of the Computer Science program at Harvard's John A Paulson School of Engineering and Applied Sciences. And he brings a distinctly computational perspective to the study of public policy. His research has examined a diverse range of contemporary social and political issues, including criminal justice reform, democratic governance, and the equitable design of algorithms. We are so fortunate that he's agreed to share his expertise today with the Kennedy Schools alumni and friends. Sharad.

- Thanks for that intro Ralph. Thanks everyone for joining today. Just try to set up here. Okay, everyone see this? Great. So I'm going to tell you a little bit about the work that my collaborators have been doing over the last five to 10 years or so on this kind of broad area of designing equitable algorithms, thinking about discrimination and thinking about ways to create better outcomes. So I'm gonna start with an example in automated speech recognition. So speech to text systems are now pretty widespread. I think this session is actually, there's a close automated close captioning happening. We carry speech text devices in our pockets, many of us in terms of phones, virtual assistance for dictation, translation, subtitling, hands-free computing. So this type of technology is all over the place. And more than just being a convenience, it can be life-changing for many folks, especially those with physical impairments. But the worry with this type of technology, as with, you know, much of these kinda advances in AI, artificial intelligence, is that they don't work equally well for everyone in the population. And so we wanted to investigate this idea. And so we audited five leading automated speech recognition ASR providers, Amazon, Apple, Google, IBM, and Microsoft a couple years ago. And we did a very simple audit. We started with human natural speech based on interviews, and then we had both professional humans and machines transcribe about 20 hours of audio from black and white speakers. And we just looked at the performance. And so we had the ground truth. The humans could go back and, and listen to the recordings and, and correct their transcriptions as we pretty much know exactly what the people said. But then we also have what the machine thought the people said in these conversations. So when we did this, we found this pretty striking result that overall error rates brought twice as large for black speakers compared to white speakers. So what I'm showing here is the error rate on the horizontal axis here. So these are error rates and so lower is better and here are five providers, Apple, IBM, Google, Amazon, and Microsoft. And for each of these providers we see the error rate for white speakers is about half as large as the error rate for black speakers. So even though the overall error is differ by provider, in almost all these cases, the error rates are about half as large for white speakers compared to black speakers. And just for some context, error rates are around this 20%. That's pretty good. So you can use these devices, I would say pretty usefully when you have error rates around 20%. Once you're in this 30, 40% range, they don't become super useful. And so this is a qualitative difference of the ability for, for folks to use this type of technology. So to try to understand what was going on here, we fed the identical audio snippets to our transcribers, our machine transcribers coming from both black and white speakers. And so why was this possible? Well, a lot of the questions were of the form, you know, what does your mom do? What does your dad do? And the answers were of the form, my mom is a blank. And so these kind of short snippets were where we could find these identical phrases are spoken book by white speakers and black speakers. And so we fed these short identical phrases into our automated speech recognizers. And what we found is even on these identical phrases spoken by black and white folks and our dataset error rates are still about twice as large for black speakers. And so this to us suggests that really the problem is the acoustic model of these speech recognizers. And so for a little bit of background, the way these automated speech recognition systems work is there's an underlying grammar model and there's also a speech like an acoustic model. And it seems like the issue is not in the grammar model. That issue is really in this acoustic model. And so what did this all point to? Well, what do we think went wrong? Well, in some sense I think there's a kinda an easy, an easy answer and an easy fix. So the answer we think is that there is probably a lack of diversity in the training data and all the audio snippets that are used to train these pretty complex systems. There was a dearth of data from black speakers. Yeah. And so in some sense the easy fix and we just go on and collect more data, but then there's this kind of bigger structural issue of how do we get to this point where these are, you know, these are trillion dollar companies producing products that are used for, you know, tens if not hundreds of millions of folks that are used in high impact applications. And so how do we get to a place where, where this type of the tool was being released, knowing that or with these, with these issues embedded in them. And I think this kind of points to these structural issues of a lack of diversity in the teams and these institutions that often build these types of tools. So that's in some sense also, it's a kind of a concrete thing that we can address, but also it requires these larger structural fixes to, to address that problem. So that's kind of one motivating example to set the stage here and understanding the ways that algorithms can lead to disparities and can go wrong even when they're, designed arguably to, to improve the accessibility of various services. So now I'm gonna switch gears a little bit and talk about some of our work on the criminal justice side. So after an individual's arrested in the United States, prosecutors had to decide whether or not to charge those individuals for an offense. And if they end up charging that individual then kind of a normal criminal legal proceedings go into place, there might be a plea deal or they might go to court. And if they don't charge that individual for all intents and purposes as if that arrest never even happened. So everything kind of goes away. And so this is a very, very high stakes decision about whether or not to charge somebody for an offense after they've been arrested, in many jurisdictions. In large part these charging decisions are based on police narratives of the incident. And so an example might be something like this, Rachel Johnson reported that a black male with brown hair wearing a black jacket assaulted her in midtown next to Johnson's home, she reported the incident to Officer Lee. So again, a very stylized example of what this type of police narrative might look like. And here you see there's explicit mentions of race here, black male, there are mentions of location which might reveal something about the demographics of the folks involved. There're also names of individuals which tell us about the, the race of the people involved. And so all this information you can say, well, it, it gives us perhaps a more complete picture of what's going on in the setting, but it also can lead to implicit bias. There's even explicit biases in some cases. And so we devise what we call a race blind charging algorithm that takes this natural language, free tax narratives applies our tool and automatically masks race-related proxies from these narratives. And so, as an example, we might have something like this where the original narrative is on the left and the redacted narrative is on the right. And so something like race here, we redacted to say race, but also a note that here, although black and black, the context matter, this was black jacket. And so that was, we felt important for understanding the context here. And so we left black jacket unredacted, but the algorithm is redacting black when it's used as in, in terms of describing, race hair color, we're redacting, names we're redacting. But again, here, one of the tricky things is we want to maintain the connection between this Rachel Johnson and this Johnson by saying victim one and victim one. And then the officer and the other kind of physical characteristics are also redacted. So this is again, of how our algorithm would redact this type of police narrative to scrub it of this race related information that we don't think in many cases is useful in making an informed decision but can lead to implicit biases. So this work inspired an actual recent law in California just a couple months ago that mandates that all district attorneys in California make these race blind charge decisions within the next couple years. And so that was a great outcome to see this kind of, in some sense very simple algorithmic approach being adopted across of straight in hopefully being, being adopted more widely across the country to create more, more equitable decisions in this high stakes arena. So one thing that I think is useful to understand in talking about race blind charging is the things that it can do and the things that it can't do. And so in best race blind charging can mitigate what we think of as the effect of race itself on decisions, but it can't correct for potentially unjust charging policies that may in some sense be race neutral, even if they have disproportionate effects on communities of color. So this is a distinction that in the legal community we often tend to think of as disparate treatment, meaning the, the effect that atomist or race itself is playing in decisions versus disparate impact. Even if we have these race neutral decisions, the extent to which it still might lead to disproportionately bad outcomes for racial groups. And so one way of make, of thinking about this in concrete terms is we can imagine a jurisdiction that has a race blind policy of charging everybody with, for marijuana possession if they had come through the door with an arrest for marijuana possession and so perfectly, let's say race neutral policy to do this. But if it's the case, as it is the case in many communities, if there's disproportionate arrest rates for black folks compared to white folks for use of marijuana, then this policy, this race neutral charging policy could still lead to disparate impacts and potentially unjust impacts even though it is race neutral. So even though the policy doesn't consider race, the fact that the policy exists could still lead to unjust disproportionate outcomes across race groups. So here the punchline is that when we're designing algorithms the same way that when we're designing policies, we have to carefully consider what the impacts are of our design choices and not simply apply these heuristics of saying, well, my algorithm doesn't consider race and so therefore the policy that results from using this algorithm must be just, so to dive into this a little bit more, I'm gonna give you another example from some of our work a few years ago that wasn't algorithmic, it was still empirical, but it wasn't algorithmic per se. But I think it contrasts this, this idea of dispar treatment versus disparate impact. So Nashville, a few years ago, the per capita police operate traffic stop rate was about seven times in national average. So this is a huge, huge traffic stop rate. So there was about one stop for every two people in the city. So again, a huge, huge traffic stop rate. And there were perhaps unsurprisingly at the time, there were significant disparate impacts in these stops. And that black drivers were about 45% more likely to be stopped per capita than white drivers. So we looked at the data, we found that these disparities, they really kind of broke into these two categories of moving violations. So we think of speeding stops and these non-moving violations like broken taillight, broken license plate registration violations, tinted windows. And what we found is that for these non-moving violations stops, there was about a 70% higher rate of stop for, for black folks compared to white folks. And relative to moving violations for the disparities while they existed were not nearly as high, they were about 25% higher for black drivers compared to white drivers. And so the question is, are these, you know, are these stop moving violations, actually creating some kind of good policy outcome? Or are these disparities themselves are, are bad and they're also not creating any kind of public safety benefits. And so we looked again at the data and we concluded that these non-moving violations in particular, they didn't lead to short-term or long-term reductions of serious crime. And when we talked to the police department before we did this analysis, we asked them, we tried to probe why are you conducting these types of non-moving violations? And the answer was, they're conducting them not because they believed that something like a broken tail like was inherently problematic in and of itself. They were doing these to crack down in more serious forms of crime, like burglary. And so this was a way of, of kind of showing police presence and they were hoping that these stops would reduce serious crime. And so given that it was not, these stops were not, and in fact affecting the, their goals or achieving the goals that they stated, we recommended the natural dramatically reduce these stops in order to reduce the disparities without, in our prediction was that they wouldn't increase serious crime, and so they ended up doing that. There was about a 60% reduction in the following year of traffic stops overall. And it declined to about a 75% reduction in the proceeding two years. And we didn't see any increase in serious crime. So this was a great outcome that again, even if there was no animus in these stops that were disproportionately affecting black drivers without a goal, without achieving their goal of reducing serious crime, they were still creating all these disparate impacts among the black community in Nashville. And so the fact that this policy existed was itself in our mind discriminatory, and the the department responded to this critique and ended up curtailing. So it was a great outcome to see. So now I wanna kinda wrap up with, with what I call a consequentialist approach to fairness, both in the algorithm sense but also in the kinda human decision making context. And so I want to give an example that still stays in the criminal justice space, but is a little bit different from the one that I gave earlier. So in many jurisdictions after your that, that after you're arrested, you're called to attend mandatory court date. And if you miss that court date, then you can, then you're typically, a bench warrant is issued and then if you're picked up for a minor traffic violation or something else, you can be sent to jail. And so in many cases, people are not actively trying to miss their court date, they're not trying to flee the jurisdiction, they're really, they have complicated lives. They might forget about the court date. They might have transportation issues, they might have childcare issues. And so it's hard for them to get to court. And so what we did, we interviewed a bunch of folks and we tried to figure out what is it that is causing, or what are some of the, of the big reasons that are causing people to miss a court. And we narrowed in on this transportation issue as a serious barrier for many folks. In one case we heard a story that a client was of the public defender wasn't able lived kind of several miles from the nearest public transportation. So it took several hours to walk to and from the court date to make the court to make their court appointment. And so our goal here was to reduce incarceration by helping people make it to their court dates. And we're helping people to make it to their court dates by giving them door to door ride share service to and from court. And so I wanna say that, you know, on a policy sense it would be better, I think just not to jail people for this kind of minor, this minor issue. You know, if somebody misses their doctor's appointment, we don't start revoking their health insurance. We understand that this is a loss to the system. There are some efficiency loss to the system when people miss court appointments. But we also know that this is, you know, I would hoping that we'd understand that this is an understandable thing that we can rectify ways other than incarcerating folks. And so that unfortunately is not kind of politically feasible in many jurisdictions. And so we're narrowing in on this more direct intervention of helping people get to court in the first place by providing them this door to door transportation service. And so imagine now that we have this, that this pool of money, let's say a hundred thousand dollars to keep things concrete, to provide rides to people with upcoming court date. And the question that we face is who should we give these rides to? And so this isn't, the hypothetical is something that we're preparing to do in California right now we're working with the Santa Clara County Public Defender's office to give this money out in a way that we think is equitable and will lead to a reduction in incarceration. And so one natural approach to allocating the rides is to do it in a way that maximizes the number of people who make it to court. 'Cause if we maximize the number of people who make it to court, then that will lead to a reduction of the maximum reduction in the number of people who will be incarcerated for missing court. So it's a very natural thing to do. And so how do we do this? Well, technically it's straightforward if we get into the details, it's hard to kind of instantiate this, but at a high level it's kind of straightforward how we do this. And we estimate for each person who has an upcoming court date what would the benefit be for them for receiving the ride? And so how do we do this? What would we estimate? What's the likelihood that they make it to court in the absence of a ride? So are they close to the court? Do they have access to public transportation? All these things we can estimate how much would they benefit from getting a ride? And then we say, well, to provide a ride to this person would cost this much depending on where they live. And so we rank people by the estimated benefit per dollar. We go down our list until we run outta money. That's kind of roughly how we think about allocating our funds, our limited funds in a way that maximizes the number of people who make it to court. Now here's a picture of Suffolk County, of Boston, and I've highlighted the Boston, one of the main Boston courthouses here in the star. And here's a demographic makeup of the city. And so if we take this through a very natural strategy of allocating the rides in a way that given our limited budget reduces the, maximize the number of people who make it to court, reduces the number of people that are incarcerated for missing court by the most amount, what's gonna happen while we're gonna give rides to folks who live close to the court house. Why is that? Because it's a lot cheaper to give rides to people who live close to the courthouse than to give rides to people who live farther from the courthouse. And now looking at the demographics of Suffolk County, who are the folks who live closest to the courthouse? Those are tend to be disproportionately white folks. And so here, so we can see there's this big clustering of, of black and Hispanic folks that are living further from the courthouse that are just more expensive to give rides to. And so now we have this trade off to make, so we can say we're gonna on one hand, we can optimize, optimize the number of people who make it to court in our hands not incarcerated. And so again, let's play out this hypothetical and say that if we were to give the rides to people in a way that is optimal from the standpoint we can get a thousand extra people to court. And so that means a thousand fewer incarcerations. But the problem is we have 30% of people from one group get the ride and 10% of people from the other. And again, we're getting more rides in this hypothetical to folks who are white because they live closer to the courthouse. On the other hand, we can do something like an equal allocation. We say, well, 20% of people from each group is getting a ride, but now we have allocated this, these limited funds in a way that is quote unquote sub-optimal, and so that means we run outta money faster, we're giving rides to people who live further away from the court. That means fewer rights we can give out total. And so in this hypothetical, we have about 800 new people that we can get to court and that means that 200 extra people are being incarcerated between these two scenarios, okay? And so again, our options are on one hand we can do the strict optimization, get the most people to court, reduce incarceration the most. But we get the downside there is we have, we are disproportionately giving rides to people who live close to the court who tend to be white. And the flip side is, well, we can do this in a way that's more demographically equitable. Maybe we give the same number of rides to people in each neighborhood. But the problem there is we give fewer rides overall. And so more people will end up going to be jailed for missing court. Okay, so now I wanna end with this. So it's a very hard problem to deal with. It's a normative problem. It's not something algorithmic. Think we do whatever we want to do. So once we decide what we want to do, we can build an algorithm that will do it. So I wanna see, and I wanna take the last couple minutes here for everyone to kind of fill out this survey here. It's all anonymous. And if we can post this in chat, that would be great. A link, you can see a link up here as well if we can post this in chat and just go in and click a slider. You can drop a pin on this picture here and you can kind of put it, are you more on this optimal appearance side where we wanna get the most people to the court reduce incarceration the most, understanding that that means that we're gonna have this kind of demographic inequity across people who get the benefit? Or do we want to go more on this equal allocation side where we're giving the same number of rides to people or the same proportion of rides to people in each demographic group, but that means fewer rides overall, which means we have more incarceration. So just take a minute and put a slider on on this. Again, it's all anonymous and I think it'll hopefully help us understand this issue a bit better. Okay. We'll just take 30 more seconds and then we'll see the results. We should have some music playing while waiting for responses to come in. Okay, let's see where we're at. Okay, great. I think this is, this is pretty much exactly what we see. We run representative surveys across the country and this is something that we see pretty much all the time. And so what is the high level takeaway that, my takeaway from this is that there's a lot of diversity and we span the spectrum almost entirely, even among a group like our own, which I, my guess is where we have kind of similar maybe perspectives on, on creating equity in the world and we have similar goals, but yet we have this very kind of high variance in how we think we should divide this limited resource. And we see this, what's interesting when we, when we run these national surveys, we don't see like huge differences across political party, across age, across race, we still see this wide variance. And so this, I think is getting at the heart of the design of, of these types of algorithms is that in many of these problems there's a technical component and we can build these algorithms to do anything we want. We can build the algorithm that's fully on the optimization and the spectrum, or we can build the algorithm. So it's more on this equal application or drop across group end of the spectrum, so we can do any of these things, that we want, now the question is what do we fundamentally do? And so now kinda just wrapping up, you know, what did we see here? The first part we saw that there are many popular algorithms that exhibit troubling disparities. We saw one example of that in the speech recognition. The second part we saw it's useful to distinguish between this disparate treatment and disparate impact view of the world, that even these blind algorithms, we can build them, they can reduce animous, reduce implicit biases, but they can't solve all these problems as disparate impact problem that we still see. And the finally this consequentialist approach of thinking about fundamentally the trade-offs that are inherent in policy decisions when we're trying to design for equity, we can build these algorithms in any way that we want, but we have to think about the outcomes that are inherent in our design choices. So I'll end with that and hopefully we can have a discussion now about some of these topics. Thanks everybody.

- Thank you very much, Sharad Okay, now we're gonna open the session up for your questions. Now to ask a question, please use the virtual hand raising feature of Zoom and please in true Kennedy school fashion, keep your question brief and make sure it ends with a question mark. You'll be notified via Zoom's chat feature when it is time for your turn to speak. So please be sure to unmute yourself when you hear from the staff, finally, our participants would appreciate it if you could state your Kennedy school affiliation. I'd like to start things off by asking question for Sharad that was submitted earlier by a member of this call. And that question is what disciplinary boundaries need to be blurred to allow for diversity in algorithmic ethics?

- Oh, this is a terrific question. Thanks so much for asking it. So it's interesting, so I myself, I'm trained as a mathematician and computer scientist. My PhD is in math, but I have kinda gravitated towards all these policy questions over the years, over the last 10 to 15 years. And I'm a strong, strong advocate of applying an interdisciplinary approach to all of these issues, which hopefully came out in the presentation. So I think at the very least, you know, I, this conversation involves legal scholars in addition to economists and computer scientists and statisticians and sociologists and this, I think in some sense we have all of these communities should be involved and, and what I've seen are actively involved in this area, which is very, very refreshing to see, I don't know that many areas that have, that has been so kind of expansive in, in the set of communities that are involved in these issues. And part is, I think there's a, a realization that this is one of the central problems that we're facing right now with the advance of AI and people across disciplines are becoming quite involved. Now, one thing that I will say is that this, well, there's, on the positive side, I'm seeing much more involvement in these issues. Many of these things were the, the first movers were computer scientists. When you say the world algorithm, you know, computer scientists jump up and they start getting involved. And in so especially five to 10 years ago, this was an area that was, that was initially dominated by computer scientists and that has shaped the way that we think about these problems. In some ways I think that is good, but in many ways I think that this is a very narrow lens on the question of equity and algorithm design. And so I am, I'm very happy now to see a much kind of larger group of, of intertraditional early folks engaging in these issues. And hopefully we'll, we'll expand beyond that initial computer science centric lens into these problems.

- Okay, we now have a participant ready to ask their question. Please identify yourself, state your affiliation with the Kennedy School and ask your question.

- Is this me? Sounds like it is.

- It is you James.

- Good morning, I'm James Erwin, I'm a mid-career from 2018. Thanks a ton for putting this on and for your thoughts. I do a little work on privacy and sort of throughout your comments, I had a few sort of questions about the, the tension between working on reducing bias and algorithms and respecting people's privacy and rights. And I have sort of a few surmises, all of which could be incorrect, but would love your thoughts. So, you know, when we're training ML models, you know, you're looking usually for massive data sets and, and if you're making sure that that data set is representative, in theory it should reduce bias, but that does have serious privacy implications, right? If I'm training a model on photos or voice and I learn the race and ethnicity and location of all of those people, which would be necessary in order to reduce that bias, I'm significantly increasing the privacy risk involved in doing that work, which in turn makes it, you know, if you reduce that knowledge about the the participants, it makes it harder to do biased. So is there a way to reconcile that tension? Like how do you think about that?

- Yeah, it's a great question. I definitely think there's a tension here and you know, even so it's like if we think about speech recognition, we could train on lots of people. I mean these devices are, are listening much of the time and we could train on the speech that it picks up and, and collect a more diverse sample of speakers that way. But they're huge, huge privacy implications for doing that sort of thing. and so now you need to, you know, get people to opt in and share this and like, you know, maybe provide monetary compensation or other compensation for participating and that definitely increases some of the barriers for training these more equitable models, but also I think, you know, respects people's privacy in a way that is necessary in this domain. And so there is this trade off here, but I do think that the privacy considerations are super important and I'm not sure there's really a magic bullet that will get us around that, that inherent trade-off of a couple other kind of tangential thoughts on this. I've heard that many companies will say things like, we don't collect race on people and so therefore we can't be discriminatory. And they're saying, well we don't collect race on people because we don't want to, you know, violate the privacy and we don't want to like, you know, we don't wanna make discriminatory outcomes. Now that I would push back again since saying that, well there is some privacy benefit, there's also a huge auditing limitation at that point. And so there is like in my mind not collecting this basic demographic information that I would, I would tip the scales towards doing so in order to audit these services, even recognizing that there is some privacy loss there. So I think that is like, you know, I think the trade-offs change depending on what the application is, depending on exactly what the information is that's being collected.

- Thank you, great. Sharad, I think Tyra is our next questionnaire. Tyra, please identify yourself, state your connection to the Kennedy School and ask your question.

- Hi, thank you so much. My name is Tyra Walker. I graduated from the MPP program in 2018 and also from the law school or yeah, from the law, sorry, from the Kennedy school in 2020 and from the, the law school in 2018. It's all a blur now, but so I, so first of all, thank you for this really, really fascinating presentation. This has been awesome and super interesting and I had a follow up question about when you were talking about like race blind policing and algorithmic masking and I guess particularly like the slide where you were talking about the redactions that are made to kind of make policing a little more race line. You know what, it just occurred to me that like some things like even when they're redacted can maybe be used as like a proxy for race or identifiable information. So for example, like, you know, if you say victim one reported that, you know, and the race is redacted male, it just occurred to me that some people would probably only identify the race if it were like a, a black or Hispanic or et cetera, male. So you know what I, I'm just curious like how does that kind of factor into the decisions for, you know, how to redact or what to redact and maybe it's the kind of question where it's like, don't let the the perfect be the enemy of like better than what we have. But I'm just curious and about how that kind of factored into the decision of how to redact information.

- Yeah, that's a terrific question and the example that you gave is, is exactly one that we considered as well is that there was a worry that someone might only say race in the case of a, of a racial minority. And the fact that we redacted that it gives some signal to the prosecutor about what is, you know, the underlying race of the individuals involved. And so the way that we tested for this regarded against this is we did two different tests. So the first is we showed the redacted narratives to people and we said we'll try to guess the race of the people involved here. And so we first we said, okay, here's some redacted and unredacted versions and now we're just gonna give you the redacted version and I'll guess the race of the people involved. And then the second thing we did is we trained a machine learning model to also guess the race of the people involved given the redacted narrative. So this was this adversarial approach where we used a machine learning model to redact and then we used another machine learning model to say, we'll try to break the redaction. And what we found is that humans basically couldn't do this, humans couldn't figure out who is, you know, what the race of the people involved was. The algorithm could do a little bit better and basically the algorithm could do about as well as what it could do if you only gave it the alleged charge, like the, the arrest that the offense that the officer listed for describing the incident. And so if this was like, you know, a drug related arrest or burglary or battery or something like this. And so there are disparities in who is arrested for what. And so to the extent, but that was the extent to which the algorithm could infer the race of the people involved from the redacted narrative. And so the punchline is if we are, if we say that we are going to reveal the incident itself then, which we believe we have to in this context, then we're at the kind of more or less theoretical limit of how well we can redact. But other than that there doesn't seem to be a lot of information seeping through.

- Thank you so much. It's really, really fascinating.

- So we don't have a hand raised right at the moment, so I think we'll go to another pre-submitted question, which is complex models are used to plan the electric system, including locating power plants and transmission lines. The focus is usually on the optimal mix of resource investments over time scales that range from minutes to decades. This optimization is already daunting and some regulators are putting more focus on equitable outcomes in terms of citing, jobs and economic benefits. Is there a prospect that algorithms can handle such extreme temporal and social trade offs?

- Yeah, it's a great question and it really, I would say connects well with this last example that we talked about with how do we allocate this limited ride share benefit to help people get to court. And the thing that the algorithm is very good at is once you say how you want to balance these types of trade-offs, which is a very, very hard question, but once you say that, then we can design the algorithm way that'll optimize for that particular type of allocation. Now the problem, and I think this is really the heart of the question, is can an algorithm figure out what the optimal trade-off is there? And I don't think so, I think this is it. I don't think there's a clear answer to this. I mean these are kind of their, they're whenever we're trying to make decisions that have equilibrium effects that might stretch out decades, I think it's just, it's too hard. It's like, we don't know, there's too much uncertainty in the impact of our actions today on this. We have intuition, but actually like rigorously saying this is what's going to happen when we, you know, if we were to roll this thing out. I think that's very, very hard. And so this is where the, I would say this is the, the limitation of the algorithm is saying, well this is our optimal mix or this is a mix that we want to do right now and now given that constraint, let's figure out what the optimal allocation is of this, of this resource. But I don't think it's easy for an algorithm to say this is how we should, what the mix itself should be.

- Great. So I had a question we've seen a lot and it's a more general question, but I wanted your thoughts on it. We've seen a lot of stories about algorithms, these as the bad guys, but you wrote a piece in the New York Times where you said that even imperfect algorithms could improve the criminal justice system and I was interested in that notion that something that's imperfect can still be an improvement. Can you expand on that a little bit and just give me your general, give us your general sense of are things improving with the design of algorithms, are algorithm designers taking this criticism to heart, has there been movement and where are we in terms of creating better algorithms than the ones we've had so far?

- Yeah, so I think it's a great question and I would expand it even further and say the question isn't even, are our algorithms getting better? But where do they stand relative to human biases and in human decision making, you know, in a world absent algorithms. And I think they're, you know, it's hard to know because all of these things are kind of, you know, they're complicated examples and we don't exactly know what the world would look like without these types of algorithms at play. But I would say the evidence is pretty clear for predictions if we're trying to predict who, you know, what is, what is likely to happen if we were to make this decision. Algorithms are clearly better at that than unaided humans. And so this isn't perhaps super surprising in the sense that if I said, you know, predict the weather a week from now, an algorithm is gonna outperform you. It's like it's just really hard for a human, you know, even though we have all this experience, it's gonna be hard to say the what is the temperature gonna be a week from now. That is something that algorithm are really, really good at. And so the same thing if we think of this ride example of allocating rides. If I were to ask a human, well who is gonna benefit the most from this ride? What is a likelihood that somebody is going to miss court in the absence of getting a ride? You know, we have some insight into this maybe as a human, but an algorithm is like really good at making that type of prediction. And so, you know, in these types of domains where we have a limited resource and we want to allocate it in a way that maximizes some benefit, I think algorithms have the upper hand now in these domains where the predict, it's not really about the predictions, but it's about the bigger, the hard question is what is that mix of allocation? Well, algorithms don't have a lot to offer and if that's fundamentally the problem, it's like, well we might as well we have to fall back on on human judgment. So I think in these types of domains where the prediction is a bottleneck that's, say algorithms are kind of showing clear benefit, but in domains where that's not really the bottleneck, then there's, I would say there's not a lot of, a lot of benefit.

- Great. Well I think we're probably at a good ending point. So thank you for those who listened into this call and of course a very special thank you to our guest Sharad Goel. And if you like these kinds of in-depth discussion, I'd also invite you to check out the H Cast policy cast podcast and we look forward to having you back on the line with us again next semester for the next Wiener conference call. Have a great rest of your day.

In This Section

Transcript