Podcasts SDS 597: A.I. Policy at OpenAI

83 minutes
Artificial Intelligence, Data Science

SDS 597: A.I. Policy at OpenAI

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

This week, we’re diving into the best of AI with Dr. Miles Brundage, Head of Policy Research at OpenAI. The world-renowned researcher and practitioner joins the podcast to discuss AI model production, policy, safety, and alignment. Plus, hear him talk about GPT-3, DALL-E, Codex, and CLIP as well.

About Miles Brundage

Miles Brundage is the Head of Policy Research at OpenAI. He has previously worked at Oxford’s Future of Humanity Institute and ARPA-E at the Department of Energy. Brundage earned his PhD in Human and Social Dimensions of Science and Technology from Arizona State University. In addition to his role at OpenAI, Brundage is currently an Affiliate of the Centre for the Governance of AI and a member of the AI Task Force at the Center for a New American Security.

Overview

Calling in from Berkeley, California, Miles started the episode by shedding light on his Head of Policy Research role at the world-renowned A.I. researcher, OpenAI. “My team tries to understand the social impacts” of A.I. models. This includes examining how models can be used in harmful ways –unintentionally and intentionally- and learning how to mitigate such behavior.

As someone integral to the rollout of OpenAI’s groundbreaking, front-page-news models such as GPT-3, DALL-E, and Codex, Miles then spoke about the specific considerations his team concerned themselves with when rolling them out, sharing OpenAI’s three-part framework that the team depended on regularly:

Prohibit misuse by publishing usage guidelines and terms of use and building systems and infrastructure to enforce usage guidelines.
Mitigate unintentional harm by acting proactively and documenting weaknesses and vulnerabilities.
Collaborate with stakeholders, build teams with diverse backgrounds, and publicly disclose lessons learned.

OpenAI’s complete list of best practices for deploying language models can be found on their website. These apply to any organization deploying large language models and have been compiled to address the “global challenges presented by A.I. progress.”

Miles was gracious enough to explore the deployment of each model in detail with us. In describing the process, we learned of the many steps that were taken to ensure the safety behind each one. With the DALL-E model, for example, we discover that the training dataset was thoughtfully pruned to exclude potentially offensive imagery.

Now, it wouldn’t be a complete episode on A.I. safety and policy if we didn’t address the existential risks associated with A.I. “There’s no shortage of [them],” Miles admits. Still, to better grasp the complexities involved, he explains the numerous overlapping fields that define this growing problem: A.I. safety, A.I. policy, and A.I. alignment.

Tune in to hear Miles dive deeper into A.I.-related topics, including how A.I. will likely augment more professions rather than displaces them.

In this episode you will learn:

Miles’ role as Head of Policy Research at OpenAI [4:35]
Open AI’s DALL-E model [7:20]
OpenAI’s natural language model, GPT-3 [30:43]
OpenAI’s automated software-writing model Codex [36:57]
OpenAI’s CLIP model [44:01]
What sets AI policy, AI safety, and AI alignment apart from each other [1:07:03]
How A.I. will likely augment more professions than it displaces them [1:12:06]

Items mentioned in this podcast:

Follow Miles:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 597 with Dr. Miles Brundage, head of policy research at OpenAI.

Jon Krohn: 00:00:10

Welcome to the SuperDataScience podcast, the most listened to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.

Jon Krohn: 00:00:42

Welcome back to the SuperDataScience podcast. We’ve got an extremely rich episode for you today with Miles Brundage, a world-leading researcher on and practitioner of artificial intelligence policy. Miles is head of policy research at OpenAI, one of the world’s top AI research organizations. He’s been integral to the rollout of OpenAI’s groundbreaking front page news models such as GPT-3, DALL·E and Codex. Previously, he worked as an AI policy research fellow at the University of Oxford’s Future of Humanity Institute. He holds a PhD in the human and social dimensions of science and technology from Arizona State University.

Jon Krohn: 00:01:17

Today’s episode should be deeply interesting to technical experts and non-technical folks alike. In this episode, Miles details considerations you should take into account when rolling out any AI model into production, specific considerations OpenAI consider themselves when they were rolling out their revolutionary GPT-3 natural language model, their mind-blowing DALL·E artistic creativity models, their software writing Codex model, and their image classification model CLIP which bewilderingly predicts classes of images that it was not explicitly trained to be able to identify. In addition, he fills us in on differences between the related fields of AI policy, AI safety, and AI alignment. And he provides us with his thoughts on the risks of AI displacing versus augmenting humans in the coming decades. All right, are you ready for this fascinating episode? Let’s go.

Jon Krohn: 00:02:15

Miles, welcome to the SuperDataScience podcast. I’m so excited to have you here. Where are you calling in from?

Miles Brundage: 00:02:21

I am in Berkeley, California.

Jon Krohn: 00:02:23

Nice. I have actually never been to Berkeley, California, but my understanding is that it’s a beautiful place pretty much year round, right?

Miles Brundage: 00:02:29

Yeah, it’s really nice. And yeah, a lot of folks who work in SF live in the East Bay, which includes Berkeley. And yeah, it’s kind of like more spread out version of Downtown SF, but you can get in and out pretty easily.

Jon Krohn: 00:02:43

Nice. Well, I look forward to checking it out someday. So we had not met kind of formally before. We were introduced explicitly for the purposes of having an amazing SuperDataScience interview. So I kind of know you indirectly through Melanie Subbiah, who is on episode number 559. So she used to work at OpenAI. She was one of the first authors on the GPT-3 paper. We had an amazing episode with her. If listeners want to hear a ton about GPT-3, an algorithm that we will be talking about a fair bit in this episode, that episode was number 559. And so through Mel, I got talking to OpenAI and seeing how they could help me with some things like getting DALL·E 2 images into a TEDx Talk that I was giving recently. And then I was also interested to see if there were some amazing speakers from OpenAI that could enlighten the SuperDataScience audience. And Miles, you’re my first victim.

Miles Brundage: 00:03:49

Hope to not disappoint.

Jon Krohn: 00:03:51

I am confident that you won’t. So OpenAI is renowned for being one of the top handful of AI research centers on the planet. So there are really big names that have been involved with it like Ilya Sutskever, Peter Thiel, Ian Goodfellow. And it’s founded by really well-known names like Elon Musk and Sam Altman, the former president of Y Combinator. There are lots of front-page news innovations that come out of OpenAI like Gym for reinforcement learning research, the DALL·E models that we’ll be talking about in this episode for generating art based on natural language inputs, and again the GPT-3 models. So it must be amazing to be working at OpenAI. You’re the head of policy research there. What is it like working there and what does it mean to be the head of policy research?

Miles Brundage: 00:04:43

Yeah. So first, just to say a little bit more about OpenAI as a company, one of the things that we do and how we try to realize our mission of making sure that AGI benefits all of humanity is we incrementally deploy a series of increasingly powerful technologies. So we started with GPT-3 and more recently we’ve deployed Codex models, embeddings models for both text and code. And that has sort of allowed us to see how people are using these technologies and use the API as kind of a mechanism for governing how these technologies are used and prohibiting certain harmful use cases. And over time, we get more comfortable that we’re able to manage these risks and currently growing through that cycle with DALL·E too. So we’re kind of gradually opening up to a larger and larger number of users as we get comfortable that we’re able to do that in a safe fashion. And that kind of helps us think through what are the social impacts of these technologies and what are they actually being used for.

Miles Brundage: 00:05:42

And so that kind of leads me to policy research. So policy research is a team within OpenAI. You can kind of think of it as an internal think tank of some sort. Basically, my team tries to understand those social impacts that I mentioned. So we use a wide variety of methods including red teaming and working directly with researchers outside the company and internally at the company on the product side and on the more basic research side to understand how could these technologies be used in harmful ways both unintentionally and intentionally, so both things like bias as well as things like disinformation, and how do we mitigate those through better training data, through fine tuning them with human feedback, through things like content filters and other sort of product policies that prevent them from being used in context for which they’re not well suited. So we kind of try and build this evidence-base that then informs our product policy efforts as well as our efforts to talk to governments and other stakeholders.

Jon Krohn: 00:06:44

Nice. And with the massively impactful models that OpenAI builds and the front page news that they all seem to generate when they come out, it seems like a really important role to be making sure that these technologies are, as much as you can control for, not misused. So let’s talk about those on a model by model basis.

Miles Brundage: 00:07:09

Sure.

Jon Krohn: 00:07:10

You mentioned that you’re currently going through this process with DALL·E 2, so maybe that is a model that we can start with. So I did do a Five-Minute Friday episode on DALL·E 2. It’s episode number 570 if listeners want to hear kind of five minutes or more on what the DALL·E 2 algorithm is. But I could introduce it, but I think, Miles, you could probably do a better job than me introducing what it is.

Miles Brundage: 00:07:36

Sure. So DALL·E 2 is a system for generating images. While it’s probably best known for taking in text and spitting out an image, so you say a man walking a dog in the park on a rainy day or something like that and it spits out an image of that, but it can also be used in other ways like creating a variation on an image. So you give it an image like the one I just described and then it’ll produce 10 different versions that have the kind of same vibe, same energy as the original one but with some variations. You can also use it to modify an image. So for example, you kind put the brush over the person’s head and then you say, “Oh, well, it’s like Pikachu. It’s not a person or something.” And then it’ll modify the image accordingly as you give it a new description.

Miles Brundage: 00:08:27

So it’s this very flexible and general purpose technology for generating images that raises a lot of interesting implications for creativity, for economic impacts, and potentially some risks as well. So my team has been very involved in trying to understand those issues. We worked with a lot of external red teamers and researchers to understand what are the biases that might be reflecting the distribution of the training data and bias towards generating images of men for example when you don’t specify the gender.

Jon Krohn: 00:09:01

Right.

Miles Brundage: 00:09:01

Those kinds of issues are things that we can then take back and tweak the model, tweak the training data, tweak the way that it’s kind of being served and improve. Similarly, things like filters on kind of being able to type in Donald Trump or Barack Obama or something like that, those kinds of potentially adversarial prompts that might be intended to create political misinformation or something like that, those are the kinds of policy issues that we have to think through and have that be grounded in an understanding of what the technology’s actually capable of.

Jon Krohn: 00:09:35

I think I read a while ago that one of the main ways that OpenAI has tried to prevent misuse of the DALL·E 2 algorithm is by preventing real people from being generated, right?

Miles Brundage: 00:09:50

Exactly. Yeah. So there’re kind of two levels to that. One is sort of having the system not kind of know about specific people’s faces with enough resolution to generate them. So it just isn’t good at generating sort of specific people’s faces. That’s one level of intervention. And then there’s also do you allow people to upload photos and then kind modify those images?

Jon Krohn: 00:10:17

Ah, okay.

Miles Brundage: 00:10:19

Our policies have kind of evolved over time as we’ve better understood what the risks are. At first, we allowed people to kind of upload a picture of themselves and then kind of modify it, but eventually we kind of concluded that that’s actually one of the more risky parts of the kind of surface because what if someone kind of uploads a picture of someone other than themselves, and then-

Jon Krohn: 00:10:42

Of course.

Miles Brundage: 00:10:42

… portrays them in a bad situation?

Jon Krohn: 00:10:44

Right.

Miles Brundage: 00:10:45

So currently, we’re allowing people to generate realistic images of imaginary people, but not either modify or generate images of real people. We think that that kind of maps on well to what the technology is actually useful for, which is kind of creating new images, trying out new concepts, as opposed to kind of modifying images of people which is probably better suited for something like Photoshop anyway.

Jon Krohn: 00:11:13

Right. There’s also restrictions on things like… I guess, there were particular classes of images that OpenAI might have tried to restrict from being in the training data. So things like violent situations or pornographic situations.

Miles Brundage: 00:11:29

Yeah. I think a lot of making a platform like DALL·E 2 safe and scalable to a large number of users is best done early on. So if you kind of bolt on safety at the end, it’s going to be pretty difficult. But if from the early stages it just doesn’t know about sexual imagery, it just can’t generate an image of a nude person, then you don’t have to be as paranoid about, okay, what prompts is someone going to put into it? What kind of adversarial… Because it just isn’t in the model’s kind of ontology. It just doesn’t know about those things. And we found that filtering for both sexually explicit and violent images is part of making our job easier down the road.

Jon Krohn: 00:12:14

Yeah. Those sound like a really smart restrictions.

Jon Krohn: 00:12:20

This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience. It’s the namesake of this very podcast. In our platform you’ll discover all of our 50 plus courses which together provide over 300 hours of content with new courses being added on average once per month. All of that and more you get as part of your membership at SuperDataScience. So don’t hold off. Sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.

Jon Krohn: 00:13:00

So now this is maybe getting a little bit philosophical because this is going beyond OpenAI. Okay. So OpenAI can be really thoughtful about the way that they release models. They have a policy team that you’re a part of. And they restrict having violent pornographic images being in the training data set. But doesn’t it seem like it’s kind of inevitable that at some point some other organization, whether it’s a nefarious organization or not, could train a similar kind of model using any kind of images in the training data set?

Miles Brundage: 00:13:40

Yeah. Indeed there are people trying to do just that and to kind of have a no rules version of DALL·E 2. I mean, two comments. One is just one way that you might think about this issue is that there’re kind of three axis. There’s the capabilities of a model, there’s who has access to it, and then there’s the extent to which there are mitigations against misuse. Right now kind of for DALL·E, DALL·E 2, we’re kind of rapidly growing access to it and has very strong capabilities, but they’re also pretty strict restrictions in place. They’re also various other models that have basically no restrictions, because for example, the weights are just published and anyone can do arbitrary purposes. There might be some licenses and rules, but in practice there’s no way to prevent people from doing arbitrary things with it.

Miles Brundage: 00:14:31

But on the other hand, the capabilities aren’t as strong as something like DALL·E. So very high on access, low on mitigations, but lower on capabilities than DALL·E 2. Eventually, there will be things that are sort of high on access and capabilities and have limited mitigations. We need to prepare for that as a society. Similarly, the same is true of language models. So there’s kind of a compute moat and a kind of engineering skill moat that a lot of leading labs like OpenAI and DeepMind and Google and others kind of had for a while that created this kind of lead where there was only a small number of places where you could access large language models. And over time the sort of open source versions are getting more and more capable and the restrictions are kind of breaking down.

Miles Brundage: 00:15:23

So I think all of that is to say that you need to think ahead, and this is part of why we need sort of public policy and why OpenAI itself also has a public policy team to talk to governments about where this is going and why we need to think ahead about these things. But I think we shouldn’t overstate the ease of creating these models. I think it is only a matter of time, but sometimes it can be longer than it seems like.

Jon Krohn: 00:15:50

Right.

Miles Brundage: 00:15:50

Because even if you’re not kind of super concerned about sexual imagery for example as a source of risk, there’s still a lot of work just involved in creating high quality data and getting the engineering of the training runs right. So sometimes it takes a bit longer than you might think. But yeah, I think big picture, we need to prepare for a world of increasingly capable, increasingly available, and often guardrail of free AI systems, which is scary. But I think hopefully by kind of taking advantage of the lead that OpenAI and other organizations have, we can often show that there are actually benefits to having these guardrails in place and that sometimes it’s actually a win-win scenario to kind of build models that are safe.

Miles Brundage: 00:16:41

So an example of this might be on the language model side, we have shown that teaching models to follow human instructions and kind of be helpful to the user is both a win from a safety perspective because it doesn’t just like go off the rails and spew garbage or make stuff up or whatever. It’s actually trying to solve a problem that you posed in natural language. That’s also beneficial from a commercial perspective, because it’s just easier to use if you have something that’s following your instructions as opposed to having to come up with kind of a complex way of prompting it and giving it kind of roundabout instructions. So all that is to say that I think hopefully the kind of commercial and kind of research utility of these kind of safety mitigations will cause them to be widely adopted, but we shouldn’t necessarily count on that. And that’s why we need governments to be thinking about these things.

Jon Krohn: 00:17:33

Yeah. I think it’s extremely fortunate that organizations like OpenAI that are on the vanguard of these enormous language models that have these huge, these enormously wide capabilities and that even the people who design them don’t know what the total breadth of their capabilities are, that OpenAI is also taking a lead in policy. I think to your point, because you guys are being so thoughtful about it, it allows maybe you have a couple of years on a much smaller nefarious organization being able to generate imagery with the same quality as DALL·E 2. But hopefully by the time that that comes around, the policy work that you and your team are doing as well as the public policy team can inform government sufficiently and other organizations sufficiently that we’re at least a little bit better prepared.

Miles Brundage: 00:18:33

Yeah, for sure. I would just add that while I’m proud of my colleagues and all the hard work we’re doing, I would also note that OpenAI is not alone. This is kind of an industry-wide issue that a lot of labs that want to do the right thing are grappling with. And just recently OpenAI, Cohere, and AI21 Labs put out a statement suggesting a set of best practices for a language model deployment. That was also signed on too by a few other organizations like Microsoft and Google Cloud and others. So I think the kind of goal of those kinds of efforts is to share best practices with one another and with the wider public so that new entrants into the marketplace aren’t reinventing the wheel and that there’s some kind of gold standard that the public and customers and so forth can kind of judge people against. And eventually that hopefully will inform government steps, which I think are coming but sometimes take a while.

Jon Krohn: 00:19:35

Yeah. Miles, I’m trying to give you tons and tons of credit here, but then you’re modestly diffusing it around to lots of other people in organizations. No, that’s really great. And actually, that’s something that we can talk about. We were going to talk about it later in the episode, but now that you brought it up, we can talk about this blog post. So it’s called Best Practices for Deploying Large Language Models. Would you like to dig into the details of some of these best practices for us?

Miles Brundage: 00:20:00

Yeah. First, just some context. So this kind of emerge from some discussions among OpenAI and other labs like Cohere and AI21 Labs. There were some discussions in the fall of 2021 around a workshop on language model disinformation. So folks from each of these labs were kind of presenting and talking about what are the kind of misuses we’re seeing in the wild and how do we prevent these issues. We also noticed that there was kind of this informal conversation going on among the different rules and policies of the different organizations. So we have our own use case policies and AI21 Labs has their terms of use and Cohere has their safety documentation. These all kind of have informed each other and we’ll kind of like read each other’s documents and be like, “Oh, okay. Well, do we consider that? How similar is their framing of hate speech to our framing of hate speech?” There’s this kind of informal dialogue going on and we kind of saw that there was something that could be crystallized.

Miles Brundage: 00:21:05

What we eventually landed on is this kind of three part statement. So we enumerated kind of high level principles as well as specific practices in three different areas. The first is preventing unintentional misuse. So there are some folks that just want to generate spam and political content and other sorts of things that might violate our terms of use.

Miles Brundage: 00:21:27

So there’s specific details there where we can check out the actual statement itself. But big picture, I’d say the point of that section was to convey that there should be teeth. It shouldn’t just be, “Well, we don’t like disinformation, but if you’re going to say that there’s no point of having policies if you’re not going to enforce them, so you might have things like rate limits to prevent kind of people generating techs at a much faster rate than a human could actually process.” Or you might want to have an approval process for new applications. Like in order to get your quota bumped up from a million tokens a month to 10 million or something like that, you might need to say, “Okay, this is what I’m planning to use it for. And these are the safety mitigations I have in place.” So things like that I think can help kind of flesh out this idea of preventing this use.

Miles Brundage: 00:22:19

The second cluster of topics was around unintended harms. So things like biased applications of models where it might kind of make decisions in a discriminatory fashion, or it might be overly relied upon where people don’t realize that the language model might make things up sometimes or sound more confident than it actually should be in some of the outputs. So kind of clear documentation and guardrails to prevent these kind of unintended harms is another part of the equation.

Miles Brundage: 00:22:51

And then finally working with the wider… The third bucket is working with the wider set of stakeholders in this kind of AI supply chain. So there’s kind of people who are affected by the technologies who need to have their voices heard. There’s having a diverse set of experts on your team who can kind of point out bugs in your systems in ways in which it might disproportionately harm certain communities. And then there’s thinking about the labelers and having sort of ethical standards for producing the kind of labels that you might use for fine tuning a model for example, or for cleaning data and all of those sorts of things. So being respectful not just of the kind of people who you’re directly working with, but this kind of wider ecosystem and sharing notes and best practices. So the third bullet is kind of meta, but engaging in this larger conversation with respect is the third thing that we talked about.

Jon Krohn: 00:23:49

That reminds me of something that some organizations are promoting. I can’t remember now off the top of my head if OpenAI is one of those, but it’s the idea of having cards that represents-

Miles Brundage: 00:24:00

Mm-hmm.

Jon Krohn: 00:24:00

Yeah. Is that [inaudible 00:24:02]?

Miles Brundage: 00:24:01

The model card. Yeah, I’m not sure it’s like a kind of formal… It’s more of like a thing that has gotten traction. I’m not sure how formalized it is as like a group of people, but it started at Google a couple years ago. So Meg Mitchell et al. had a paper on, I think it was model cards for model reporting or something like that was the title.

Jon Krohn: 00:24:22

Right.

Miles Brundage: 00:24:23

They kind of said, “Okay, this is a rough kind of rubric for documenting the limitations and biases and kind of properties of your models.” Since then, that’s kind of evolved and speciated in various ways. Sometimes people will have one certain kind of model card on GitHub and another kind in documentation. Sometimes it won’t be called a model card, but it’ll be functionally the same idea of clearly documenting the limitations of your system.

Jon Krohn: 00:24:51

Right.

Miles Brundage: 00:24:51

More recently with DALL·E 2, we use the term system card to refer to the kind of documentation that we put out for the system as a whole, not just the model itself, because the system includes prompt filters and image filters and rate limits and so forth. On that case, we were kind of borrowing a term from, I believe, Facebook system card thing if I’m recalling correctly. So yeah, it’s kind of an evolving kind of meme, but I think the gist of it is that you need to equip people with the information that they need in order to make responsible decisions about when is this model actually appropriate in a certain context.

Jon Krohn: 00:25:31

It’s very kind of loosely like the nutrition facts on a box of cereal it looks like.

Miles Brundage: 00:25:35

Exactly. Yeah, yeah, yeah. Some people made that analogy very explicit and like… Yeah. Yeah. Exactly.

Jon Krohn: 00:25:42

So I guess the kinds of things that a system card for DALL·E 2 would cover would be you won’t be able to generate the image of a real person. That would be something that would be explicit on there. And yeah, I don’t know. Maybe if there are more concrete examples that you can provide?

Miles Brundage: 00:26:06

Yeah. So some of the system card is explaining what have we done. Like, what are the mitigations that are actually put in place? Some of which I think is valuable for the user to know, but also another audience we had in mind is people who are building similar systems who might be thinking about how do we do this in a responsible way and kind of provided a kind of cookbook of ways to approach this. We also talked about the process that we went through for the analysis. So we tried to be transparent about both these strengths as well as the limitations of the red teaming process that we went through. And that we provide a kind of current snapshot as of a few months ago of these are the biases we’re seeing, these are the ways in which it could be misused and kind of efforts to guard against those, but try to be humble about the fact that there’s a general purpose technology in a lot of respects and we’re still continuing to learn.

Miles Brundage: 00:26:57

And that’s why, as I mentioned earlier, we’ve kind of gradually updated, in some cases, our policies on things like how do we treat human faces and so forth because before you actually see how people are using the technology, it’s hard to make some of those judgements.

Jon Krohn: 00:27:12

Yeah. I, just yesterday at the time of filming, came across this thing from Dallery Gallery, the DALL·E 2 prompt book.

Miles Brundage: 00:27:22

Yeah.

Jon Krohn: 00:27:24

I think it’s being updated regularly. But at the time of filming, it’s this 82-page guide to prompt engineering for DALL·E 2. I will be sure to include a link to that in the show notes because I think it’s the most comprehensive prompt engineering guide that I have ever seen for any model. The idea being that unlike a system where the designers know all of the constraints and all of the things that their system can do with a large language model based system like DALL·E 2, the designers can’t know what all of the possible range of outputs could be. And so there’s this art to engineering prompt effectively. And so this 82-page guide, it’s really dense. Some individual pages have a dozen kinds of prompt category suggestions. And yeah, I can’t now remember exactly why I started talking about it.

Miles Brundage: 00:28:29

Well, it’s related to the point of this being an evolving landscape.

Jon Krohn: 00:28:33

Right. Right.

Miles Brundage: 00:28:34

It’s hard to say in advance what are all the capabilities, what are all the beneficial and harmful uses. I think we’ve gone through the same evolution with language models, where at the beginning we were like, “Okay, you can maybe put a few things in a row, like question, answer, question, answer, and it’ll kind of follow that pattern.” Eventually, we fine tune models to explicitly follow instruction. So you don’t need to do the question, answer, question, answer thing. You just say question and answer it. Or say, “Answer the following question” and you just kind of give it explicit instructions.

Miles Brundage: 00:29:09

With DALL·E, I think we’re still exploring that surface. Even internally at OpenAI, it took us a while to kind of figure out how to think about the capabilities. And now we’re learning a lot from the wider public. So I think that’s part of why taking an iterative approach to deployment instead of just going from zero to a million users overnight, but rather kind of gradually broadening out as you learn more and kind of being able to throttle a certain usage or update the prompt filters and so forth as you learn more is very important.

Jon Krohn: 00:29:44

Yeah. A kind of a cool example that illustrates how these prompts can end up being engineered in a way that perhaps isn’t obvious but as soon as you figure out this kind of trick, it’s something that you might want to repeat a lot. So if you put 4K in your query in your input, then you’re more likely to get a high resolution looking image back, perhaps even more so than if you write high resolution. So there’s these kinds of tricks that you kind of figure out, “Oh, they’re somewhat intuitive once you’re told about them.” And so it’s cool that people are compiling these into these kinds of books.

Jon Krohn: 00:30:26

All right. So we’ve now talked about DALL·E for a while. I promised the listener many moons ago, it feels like now, that we would be going through a bunch of different models. But I think that a lot of what we’ve been talking about with respect to DALL·E 2 is the kind of stuff that applies to these other models anyway. Certainly, the GPT models. So you were involved not only in GPT-3, but also it’s predecessor, GPT-2. Is that right?

Miles Brundage: 00:31:00

MM-hmm. Yep, that’s right.

Jon Krohn: 00:31:03

So, as I mentioned at the onset of the episode, if listeners want a really detailed two-hour conversation about GPT-3, check out episode number 559 with Melanie Subbiah. But Miles, I don’t know if you want to give us just a couple-minute intro to this model that probably most listeners have heard of, but…

Miles Brundage: 00:31:22

Yeah, sure. And yeah, just briefly, I was involved in GPT-2 and GPT-3. There were somewhat different kinds of challenges in the case of GPT-2. It was about how do we responsibly publish this. We didn’t yet have an API or any kind of product. So it was kind of like, how do we do research and publication thereof in a responsible fashion? For GPT-3, we eventually deployed it via an API. That kind of in some sense made it easier to kind of limit potential harms. You can kind of build monitoring infrastructure and only allow certain use cases as opposed to just releasing the model. But it also changes the nature of the problem because people are starting to build whole applications on top of it and you have to start thinking in very granular detail about, “What’s our chat bot policy? When is a medical use high risk?” et cetera.

Miles Brundage: 00:32:17

We had to work through a lot of issues there. But yeah, so big picture, GPT-2 and GPT-3 are pretty similar to one another. They’re both transformer-based language models. They both take in kind of text and output it. Or you can just kind of unconditionally sample and get kind of text that the model thinks is similar to what it was trained on, which is a variety of texts from various sources. We found with GPT-2 and then to a much greater extent with GPT-3, that as a result of learning from a very wide variety of texts on the internet, these models be become capable of being prompted to perform a wide range of tasks.

Miles Brundage: 00:33:03

So the original GPT-2 paper was called something like Language Models are Unsupervised Multitask Learners. That was meant to convey the idea that on the internet there’s chat, there’s question answering there’s kind of news, there’s poetry, there’s all this kind of stuff. And as a result of just soaking in a bunch of language, you learn to perform a bunch of different tasks.

Miles Brundage: 00:33:25

And then the GPT-3 paper was called Language Models are Few-Shot Learners. And the idea there was that in addition to just soaking up a bunch of knowledge and then being able to kind of spit out texts that performs a bunch of tasks, you can also learn new tasks by being given a few examples in the context window. So basically example one, example two, and example three, and it will be able to do a better job after 10 examples than after two examples. So it’s kind of learning online in some sense. And so that was a really exciting finding. And as a result of both of those findings, there’s been a huge uptick in language model research at various organizations, as well as deployment. We’ve subsequently gone even beyond GPT-3 with things like Codex.

Jon Krohn: 00:34:12

Yeah. It has brought about a new dawn for sure. It has been the talk of the town and one of the models that probably comes up the most on this program certainly making a big impact. And so I guess similar to the conversation that we had around DALL·E 2 and how you’ve been careful about the role that of DALL·E 2, you’ve been careful about what training data went into DALL·E 2, I guess the same kinds of things have been happening with GPT-3. Yeah?

Miles Brundage: 00:34:42

Yeah. So basically as with each generation, we kind of get better and better at understanding what is high quality data, what are the risks of including certain kinds. So in kind of the most recent versions of our models, we’ve been more aggressive in doing things like filtering out potentially hateful speech or otherwise harmful kinds of content. I think, again, as I mentioned earlier with DALL·E and GPT-3, it’s kind of hard to anticipate all of these things in advance and wait for example. Right now there’s starting to be a burgeoning literature on sort of making language models truthful and honest. What are the sort of properties that underpin that and how does it relate to the training data and then how you fine tune it and how you prompt it?

Miles Brundage: 00:35:36

I think we’ll continue to kind of get better and better both on the upstream side of the pre-training data and then the downstream side of fine tuning and actually getting the models to put their best foot forward. I mean, a kind of funny example of this as a recent paper from Google, I believe it was Google but it might have been elsewhere, where they put into the prompt for the language model something like, “Let’s take this step by step. Or let’s think through this step by step carefully.” Basically just telling the language model to kind of think carefully, kind of made it try harder in some sense. So I think we still don’t really know exactly what’s going on under the hood there. I think in some sense the kind of capabilities and applications that we’re already seeing are a lower bound on what’s possible even with the existing models.

Jon Krohn: 00:36:33

Right. [inaudible 00:36:33].

Miles Brundage: 00:36:33

Just by getting them to perform better, let alone improve models.

Jon Krohn: 00:36:40

Yeah. Do better.

Miles Brundage: 00:36:41

Exactly.

Jon Krohn: 00:36:42

Stop being so lazy.

Miles Brundage: 00:36:44

Yep.

Jon Krohn: 00:36:46

That’s really cool. So you mentioned a couple minutes ago about Codex. So you mentioned that word. And so let’s jump to that model. Codex is an extension of GPT-3 in a way, but it takes advantage of a large training data set of programming language data. So GPT-3 also has a small amount, I think, of programming language training data, right?

Miles Brundage: 00:37:13

That’s right.

Jon Krohn: 00:37:13

In addition to being mostly natural language. But Codex is primarily programming language data.

Miles Brundage: 00:37:21

Yeah. So essentially what happened is, with GPT-3 there were kind of signs of life, so to speak, that there might be something interesting going on the code front. And that in particular, there might be something interesting about this kind of comment and then code structure where if you give it enough examples of the comment and the documentation, and then the code followed by it, it can kind of learn to turn one into the other. And in fact, going the opposite direction of kind of explaining code and documenting code is something that people have more recently been turning Codex towards. But the original idea was just how do we get it to solve programming problems given natural language input. So we built a very big data set and we did a lot of very high quality engineering to kind of squeeze the most out of GPT-3 models by fine tuning them on this code.

Miles Brundage: 00:38:16

And then we also explored various ways of sampling these models. What if you run them 10 times, 100 times on a given prompt? How does that kind of increase the likelihood of one of the answers being correct? And then what if you kind of generate a bunch and then ask the model which one do you think is likely to be correct? So those kind of exploring the surface of these models has led to very impressive outcomes. And eventually we made Codex models available via the API. Some of the listeners might be familiar with GitHub Copilot, which is kind of one of the most well known, probably the most well known products which is built on top of this family of models.

Jon Krohn: 00:39:00

Yep. In episode of 584, I go through some of my favorite applications that have been built on top of the Codex API. I think it’s super cool. I was blown away by some of the capabilities that this API has. I don’t know why I find it even more staggering in a way than DALL·E 2 or GPT-3. I think it’s because to me there’s so much about programming that is so hard for me. And so when I think to myself about, “Oh, making a drawing. Okay, that’s kind of hard” or “Coming up with a natural language response, that’s kind of hard.”

Jon Krohn: 00:39:45

But being able to regurgitate the code to create a video game on the fly, it’s something that just seems really challenging to me. And there are really cool… I provide links in that episode, in episode 584, to specific demos, video demos of Codex being used to do things like create this video game where there’s like a spaceship trying to avoid getting hit by an asteroid I think if I remember correctly. All aspects of this interactive game that people can actually play are created by natural language prompts. It’s pretty mind blowing.

Jon Krohn: 00:40:26

So that’s a really cool thing about Codex. But from your perspective, Miles, with in terms of policy, are there as many scary things about an algorithm that’s generating code than one that’s generating images or texts? It just seems to me off the top of my head, and maybe there’s all kinds of scary things that I don’t think about that you think about, but with DALL·E 2, okay, it’s obvious to me right off the bat that this could be misused in lots of ways if you’re not careful about restricting what kinds of training data goes into it or what kinds of prompts are allowed. And same thing with GPT-3. Obviously there are kinds of hateful things that you could be having GPT-3, the outputting, if people like you aren’t thoughtful about what the constraints are on model inputs and outputs. But with Codex, is there as much to be worried about? Am I just naive?

Miles Brundage: 00:41:23

No, I think you’re… I mean, natural language in some sense is more general than codes. It’s easier to think of ways that things can go wrong with natural language. I think there are a few things that you can think of for code, like generating malware and generating kind of malicious botnet code and stuff like that.

Jon Krohn: 00:41:49

Right.

Miles Brundage: 00:41:49

We did investigate that. We published a little bit about it in the Codex paper, but generally it didn’t seem to be like a huge deal at that level of capability but it’s something we need to keep an eye on as models get more powerful. I think the bigger concern for me at least when it comes to code generation is not malicious use, but kind of reckless or naive use where people might rely on it too much and end up generating buggy or insecure code, which is also something we talked about in the Codex paper and something that folks working at GitHub Copilot are very attuned to and thinking carefully about.

Miles Brundage: 00:42:34

It’s just kind of a whole terrain of kind of risks that we need to be very thoughtful about because it’s just not something we’ve been used to in the past. So there’s kind of always been an assumption that, “Oh, okay, if there’s this code sitting in front of me, it was written by a human, they probably had common sense and understood the natural language prompt correctly.” But some of those assumptions might not be correct. I mean, in the case of in AI, it might just have fundamentally misunderstood the natural language prompt in some way, or it might not be trying hard enough. Maybe it’s trying to simulate buggy code because it’s seen a lot of buggy code and something in your prompt made it think that you wanted buggy code. So I think there are definitely issues there. And people need to be vigilant about how they’re using these technologies. This is actually something that I’m currently starting to spin up a project on across not just code, but more generally what is the best way to ensure that there’s appropriate human oversight of language models.

Jon Krohn: 00:43:46

Codex, generate this next chunk of code for me without thinking about it step by step.

Miles Brundage: 00:43:50

Exactly.

Jon Krohn: 00:43:51

Rush to conclusions.

Miles Brundage: 00:43:53

Yep.

Jon Krohn: 00:43:54

Cool. All right. So that covers the DALL·E models. That covers the GPT models and Codex. The last model class that you and I discussed prior to recording, to getting this filming session actually going, is a model class that I’m least familiar with. And so I’d love to hear more about this. It’s a model class called CLIP. So Contrastive Language–Image Pre-training. And so in some way this seems to be somewhat like the DALL·E models in that it links images and language, but obviously it’s quite a bit different from the DALL·E models, or we wouldn’t be talking about it as a separate model class.

Miles Brundage: 00:44:39

Yeah. So CLIP is a very interesting model. It’s been used in a lot of different ways. So first, just on the various ways in which it’s been used. There are actually some ways in which CLIP, even though it’s trained to recognize images rather than generate them, it has sometimes been paired with systems that generate images and then they kind of get in this kind of two-way interaction where the result is better images. So for example, you can use CLIP to “steer” a gun or steer a diffusion model by kind of saying, “Okay, well, that kind of looks like Spider-Man according to my image recognition sense of Spider-Man.” And they kind of go back and forth and then output an image.

Miles Brundage: 00:45:20

So there was a very interesting kind of surge of creative uses of CLIP and other models over the past year or so once we started publishing them. And then there are various kind of open source, kind of artistic and creative efforts there. But in terms of what the original kind of purpose of the model was, it was less about generation and more about recognition. So essentially, the way that the model works is it’s kind of like an… So it’s contrast trained for those who have that context. But essentially what it does is it kind of compares the embedding of the text. So you give it some text like Spider-Man or a cat. It compares that to the kind of image that it’s kind of looking at and tries to figure out how close they are to each other and distinguish between different text prompts.

Miles Brundage: 00:46:19

Basically, that allows you to create new classifiers on the fly. So instead of having a model that is trying to distinguish hot dog from not hot dog, or cat from dog and you had to get this big data set of cats and a big data set of dogs, you can kind of just create on the fly, like, “Okay, I want an indoor outdoor detector.” Since it has some basic understanding of the English language and it has some basic understanding and has seen a lot of like indoor and outdoor kind of images, it will sometimes zero-shot, do a pretty good job at these kind of image recognition tasks. And that is just a very interesting kind of application. It can be used for things like captioning. It can be used for kind of providing an initialization for an even stronger model. You kind of like use CLIP as a foundation and then fine tune it to be even stronger at like a two way or N-way classification problem.

Miles Brundage: 00:47:16

So basically we found that this particular way of training it led to a very strong foundation. And so it has sometimes been referred to by people who use these terms of foundation model as kind of like a foundation model for image recognition in the same way that GPT-3 might be for a language generation.

Jon Krohn: 00:47:35

Yeah. I totally get it now. It’s a super cool concept. The idea being, to summarize back what you just said, is that it allows us to have an image classifying model even without ever having had labels necessarily for the classes that were asking it to classify.

Miles Brundage: 00:47:55

Exactly. Yeah. You can just make up entirely new, like cat… You can just make up a label that that has never actually been created before, like “Cats standing on desk.” And it kind of knows about cats and it kind of knows about desks. And then you can kind of get it to distinguish between that versus like a printer. I’m just looking around there and I’m like, “A printer on cabinet.”

Jon Krohn: 00:48:21

Okay. So then I guess, off the top of my head, the kinds of policy or ethical issues that we might be concerned about with any kind of classification model, when I think about image classification models having issues, a word that comes to mind for me is gorillas.

Miles Brundage: 00:48:42

Uh-huh. Yeah.

Jon Krohn: 00:48:46

I mean, I guess I should give it more context for listeners that doesn’t know what I’m talking about. But it was a Google image classification model that was in a broad range of circumstances misclassifying darker skinned humans as gorillas. Obviously, that is awful. The solution that Google came up with to this problem was to not let the algorithm output gorillas. So that was no longer a class that it was allowed to predict, which is a pretty clunky post talk solution. So I guess that’s one of the kinds of things that we’d want to avoid.

Miles Brundage: 00:49:30

Yeah. So I think there are other kinds of things you also might want to avoid. In addition to kind of offensive classifications, there’s disparate performance across different demographic groups, like performing better on men than women, et cetera.

Jon Krohn: 00:49:45

Of course.

Miles Brundage: 00:49:47

Also just like having kind of a set of knowledge that is very informed by the way that it was trained, which often has a bias towards Western kind of concept. So it might be better at recognizing like a Western style wedding than an Indian style wedding or something like that.

Jon Krohn: 00:50:04

Right.

Miles Brundage: 00:50:04

So there are various kind of biases in all AI systems to varying degrees. But in image recognition systems, those are some of the clusters. Yeah, so my colleague Sandhini Agarwal wrote a paper called Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications a couple months back or maybe like a year or so back. That kind of gave an initial assessment of some of the biases of CLIP which are very real and would give us pause if we were going to sort of deploy it for arbitrary uses or something like that. If we were ever to kind of do that, we would obviously have to think through what is it well suited for and what does it not well suited for. But so far, we’ve kind of released it as a research release. I think it sparked a lot of valuable conversation both about the biases in image recognition systems as well as kind of new ways of building contrastive models that can unlock various use cases.

Jon Krohn: 00:51:06

Really cool. So as a kind of general question now that we’ve talked about these four model classes, the DALL·E class, Codex, GPT, and CLIP, do you ever get to a point where you feel comfortable, totally comfortable with these models being in the wild? When is enough enough? You said this phrase a few sentences ago that tipped me off to this idea where you say all AMLs have bias and they’re always going to… To some extent, it would be impossible to wipe out all bias without having any signal as well. And so, yeah, are there kinds of internal frameworks or standards? I don’t know. Like statistical significance? Yeah. How do you internally decide that like, “We’ve done enough here to prevent the most pernicious biases. And from this point, the best approach is to have a system card or a warning as to the capabilities of the model”?

Miles Brundage: 00:52:27

Yeah. So it’s a great question. I don’t have a great answer, but I can give you some threads. I mean, first, what is the right level of evidence for supporting limited use is much less of a burden than for letting something being used for arbitrary application.

Jon Krohn: 00:52:48

Right.

Miles Brundage: 00:52:48

So that’s part of what’s helpful about an API-based mode of deployment, is you can kind of gradually scale up users and decline certain use cases that we don’t feel like we’re ready for. So it’s not like an all or nothing kind of decision. Another thing to consider is what are the alternatives? What is currently being used in the wild and what are kind of the… For example, what are the image recognition models flying around on the internet right now in the published literature? Is this kind of going to be differentially biased, or is it actually an improvement over what would be happening in our absence?

Jon Krohn: 00:53:24

Ah, yeah.

Miles Brundage: 00:53:24

And similarly, comparing to the human baseline, I think, is always something that one should think about because that baseline is not perfect. So as I mentioned with Codex and Copilot earlier, we should not be asking how do we get a perfect kind of bug-free system, but how do we get something that improves the performance and the quality of code over what would’ve otherwise happened. And I think that sometimes is a lower bar. Obviously, we want to get as close to perfect and fully reliable as possible. But if we waited until we got to that level before allowing anyone in the world to use these kind of systems, I think we’d miss out on a lot of value. Imperfect technologies have a lot of uses and we use them every day. The important thing is that people understand them and that you have appropriate guardrails to prevent those limitations from escalating into a real harm.

Jon Krohn: 00:54:26

Awesome. Well, that was a super helpful answer to a very complicated question for me, which I couldn’t possibly have a… It’s not like a math question where you could just have the right answer. But the answer that you provided was awesome and provides a lot of insight into how these large impactful models can be rolled out in a more thoughtful, considerate way while mitigating biases as much as possible.

Jon Krohn: 00:54:59

So earlier in this podcast, we talked about a blog post that you were a part of, The Best Practices for Deploying Language Models blog post. And given the question that I just asked, is there anything else from this other blog post that you wrote, Lessons Learned on Language Model Safety and Misuse, that you think might be helpful here? Or have we kind of covered the main points already?

Miles Brundage: 00:55:24

Yeah, I mean, I would just give one example of the benefits of learning from experience and not just going from zero to 60 instantly when it comes to deployment. We’ve learned a lot about ways in which models are actually misused in the wild, which sometimes doesn’t look like what we originally imagined. So for example, in the GPT-3 paper, we talked about disinformation as a risk. And that continues to be something that we worry about and we have various kind of mitigations in place to prevent people from just kind of generating a million kind of misleading tweets to support a certain candidate or to pretend to be human or whatever and kind of influence the political conversation in a harmful way.

Miles Brundage: 00:56:18

But on the other hand, that was kind of what we originally anchored on. We’re continuing to push forward research on that and you have a report coming out on that soon. But we also have seen all sorts of other kinds of things that people do with language models, some of which violate our kind of ethical commitments to ensure that this technology is beneficial for humanity. So just as an example, like role playing very racist fantasies with a language model is not something that was in our mental model of where things were going. And things that maybe we could have thought of but didn’t necessarily expect the extent of, like medical spam. So kind of spam that is promoting dubious medical products is something that there’s a lot of on the internet. So you can kind of in retrospect, say, “Okay, yeah, obviously people are going to use it for it.” But at the time there was also a lot of disinformation on the internet. So we were focused on one and we ended up seeing that, plus other things.

Miles Brundage: 00:57:13

So I think the kind of upshot there is that you shouldn’t necessarily think that, “Okay, we’ve done this kind of a priority analysis and reasoning. Therefore we know how to mitigate all these risk.” You also have to actually see what happens and update content filters, use case policies, et cetera, as you learn more.

Jon Krohn: 00:57:35

That makes a lot of sense. Yeah. I’m sure that there will be more lessons learned in the future in ways that can’t be anticipated today, but I think you guys are doing a great job. OpenAI and the policy research team that you lead in particular has been doing a stellar job of being really thoughtful about getting models out in a way that you’re mitigating the risks. So the world thanks you, Miles.

Miles Brundage: 00:58:03

Well, thank you.

Jon Krohn: 00:58:05

All right. So having now gone through the specific models that you’ve been involved with at OpenAI, I’d like to dig a little bit more into what brought you into this role and the kinds of work that you’ve done in the past. So prior to working at OpenAI, you were an AI policy research fellow for the Future of Humanity Institute at the University of Oxford. And so at OpenAI, you’ve continued with research that dissects AI/ML systems, understanding their capabilities, their limitations, or impacted making policy recommendations. Is that work a kind of direct… Does it follow on directly from the work that you were doing at the Oxford Future of Humanity Institute? Is there a lot of relationship there between the two roles?

Miles Brundage: 00:59:01

Yeah, I mean, at the time it’s always easier to explain career trajectories in retrospect than it is to plan them in advance. So it’s not that I necessarily plan this, but in retrospect you could say that I was studying some issues very relevant to what I’m doing right now in the abstract at Oxford.

Jon Krohn: 00:59:21

Right.

Miles Brundage: 00:59:21

So for example, I was involved in this report on the malicious use of AI and sort of forecasting, preventing and mitigating issues related to AI misuse. It talked about, “Well, maybe we need to think about taking special precautions for publishing certain types of models. Maybe there could be some disinformation risks, et cetera.” And that provided a kind of abstract foundation for the more applied work that I ended up doing at OpenAI, on GPT-2, GPT-3, et cetera, but obviously much more concrete and grounded. And I think I continue to follow very closely and engage very closely with academic research, because often that will provide a foundation and kind of early exploration of issues that often become concrete faster than one might anticipate, given how quickly AI is developing.

Jon Krohn: 01:00:16

Right. And so in that paper, in the Malicious Use of Artificial Intelligence paper, which we will be sure to include in the show notes, in that paper you paint a pretty bleak landscape of threats. And it’s been four years since that paper came out. So have we done a pretty good job of mitigating some of those threats? Are there new threats that have emerged in the meantime?

Miles Brundage: 01:00:45

I mean, obviously hindsight is 2020, et cetera, we weren’t trying to be perfectly predictive. It was more trying to do illustrative scenarios. That being said, I think it holds up reasonably well in some areas. So for example, we specifically talked about kind of spoofing of voices which has subsequently been used by real criminals in the context of defrauding people via kind of simulating their voices in order to get into bank accounts or impersonating someone in order to conduct a wire transfer. So I think in some cases, the specific uses that we talked about were recently oppression. In other cases, we just didn’t anticipate how quickly certain things would develop.

Miles Brundage: 01:01:33

So I think language generation has just gone way faster than we were anticipating at the time. This was a bit before… I guess the paper came out right around the time of GPT-1. And so things have evolved and escalated very quickly since then. And probably going back, I would’ve emphasized that a lot more heavily. But yeah, I mean, I think we tried to paint a realistic picture of where things might go if you extrapolated a little bit. I think we said five to 10 years in advance. So another one to six years we’ll have a better sense of how things are playing out. Oh, some of the other things we talked about are weaponized drones, which there have been a few assassination attempts using drones.

Miles Brundage: 01:02:23

I mean, some of which we… I, at least, haven’t been following enough to say I fully understand what exactly is going on with drones in Ukraine and where is AI involved, where is it not involved. There was that thing with the Iranian nuclear scientist a while ago where supposedly AI was used to stabilize a sniper rifle that was operated a thousand miles away in like this very complex plot to carry out an assassination. I definitely did not anticipate that kind of like AI as stabilizing the face recognition part of a sniper rifle. But yeah, I think it’s a serious issue and we try to put it on more people’s radars than it was at the time, which is… Yeah, we got a lot of flack for it at the time. So I’m proud that we kind of raise the alarm to an extent.

Jon Krohn: 01:03:19

Nice. Yeah. I think that that kind of technology is being used to a great extent in the war in Ukraine. Even things like the javelin missiles being able to detect where on a tank to be attacking, where this downward plunge of the missile will most likely be able to get into the tank and have maximal impact, I think it is driven by machine vision. It’s an AI system. And that’s also why I couldn’t figure out… I was seeing news stories at the beginning of the war of, “Oh, Sweden or some country donated 50 javelin missiles.” And I was like, “That doesn’t sound much.” But then it’s like, “Oh, those are $80,000 each.” So it really adds up. And it’s because of all the compute that’s backed in them, the R&D.

Jon Krohn: 01:04:10

All right. So to give people maybe a sense of how they could be getting into a role like yours… The question immediately after the one that I’m going to ask now will get more into how important a research area this is. But for people that are interested in getting into AI policy research, you have a PhD in human and social dimensions of science and technology. You studied at Arizona State University. That is not a degree name that I’ve come across before. What’s involved with doing a PhD in human and social dimensions of science and technology?

Miles Brundage: 01:04:52

Yeah. So you might think of it as an interdisciplinary social science degree. So I drew somewhat on political science, somewhat on sociology, and somewhat on areas that bridge social science in the humanities, like science technology studies, which is a whole thing and somewhat similar to my degree. Yeah. I mean, basically it involved learning a lot about the history and social institutions underlying science. So both policy as well as what is kind of the practice of science and technology. Like how do scientists think about what is truth and what’s valuable to do and how do values influence the kind of research process in terms of what is considered valuable to fund and to conduct and how to publish things and standards around integrity and plagiarism and so forth.

Miles Brundage: 01:05:49

So kind of understanding basically the science as a social institution and engineering as a social institution. And in particular, I drew on an excellent committee of people with backgrounds in political science, cognitive science, and a few other areas to understand what was going on in AI policy at the time, which has subsequently evolved but that was basically what I did in my dissertation, was providing a snapshot and kind of framework for thinking about AI policies circa 2018.

Jon Krohn: 01:06:24

Nice. And then onto my question about the importance of this area. We’ve had guests in the past year, some of the most popular episodes of the SuperDataScience podcast have been from folks like Ben Todd, whom I understand you know.

Miles Brundage: 01:06:40

Right.

Jon Krohn: 01:06:41

He was in episode number 497. And Jeremie Harris in episode number 565. They both talked at length about the critical importance, potentially just for the existence of humankind going ahead generations, the critical importance of AI safety research. So I was wondering if you could talk to that to give us your own opinion on the existential risks of AI and how the kind of work that you are doing, policy research, could help to prevent the apocalypse. And then it would also be great if you could distinguish for us the nuances between AI safety, AI alignment, AI policy, potentially other fields that I’m not aware of.

Miles Brundage: 01:07:32

Yeah. So a few thoughts. First, one way of thinking about AI is that there’s kind of something for everyone when you think about the risks and impacts and kind of reasons for working on it. Like if you’re concerned about social justice, that you should be worried about AI. If you’re concerned about warfare, you should be worried about AI. If you’re concerned about economics, you should worry about AI. So I think there’s no shortage of reasons on existential risk in particular. I think there are kind of various plausible ways that highly capable systems, if they weren’t aligned with our values, could cause harm. I think similarly, they can be deliberately misused by actors that don’t necessarily share our values, either non-state actors or by authoritarian governments to entrench their power.

Miles Brundage: 01:08:23

So I think we should think very seriously about where it’s going and have guardrails in place both in the near term and in the long term. Regarding AI policy ethics, etcetera, I think there are several different somewhat ill-defined and overlapping areas of work. So AI safety is a term that is sometimes used, which in turn could be broken down into things like alignment, robustness, ML security, et cetera. There’s a bunch of clusters of things that could plausibly be considered AI safety, including making sure that systems do what we ask them to, are honest, aren’t trying to deceive people and so forth. There’s also AI policy, AI ethics. So you might think of AI ethics as what are kind of the right things to do with this technology, particularly where it might involve trade offs between different values. So how do we weigh things like privacy and utility and other kind of factors when they might come into tension? And kind of think through the rights and responsibilities of people affected by AI or building AI.

Miles Brundage: 01:09:41

AI policy is sort of translating that kind of normative understanding of what’s right, what’s wrong into actual practices. So you might think of policy AI policy as sort of translating some evidence-based and some side of normative assumptions generated by ethical reasoning and understanding of the capabilities of models into actual practices. So for example, product policies and public policies, and sort of shaping decision making in an authoritative way to actually constrain some of the risks that are identified via AI safety and ethics research.

Miles Brundage: 01:10:19

But that’s a somewhat arbitrary way of slicing and dicing it. I mean, there are all sorts of other terms I could have mentioned like AI governance and strategy and so forth.

Jon Krohn: 01:10:29

Right.

Miles Brundage: 01:10:30

Different people frame these in different ways. But again, I think that the big ticket takeaway is that there’s something for everyone and lots of opportunities to make sure that AI helps rather than hurts people.

Jon Krohn: 01:10:45

Nice. And yeah, again, according to Ben Todd and Jeremie Harris, Ben Todd in particular has done a lot of thoughtful research and thinking about how someone can make the biggest impact in their lifetime. AI safety research or AI alignment research, or AI policy or AI governance, or whatever you want to call it, this kind of area is the big way from his perspective. So think about that listener as you think about how you might want to make an impact in your career.

Jon Krohn: 01:11:20

All right. So starting to wind down the episode here. We’ve got a question from the audience. So I pulled the audience ahead of filming to see if there were any questions for you. We got one from Amit Pande, who is the CMO of a company called Aviso AI based in San Francisco. Amit is curious on your views, Miles, on how creatives like writers and designers might view AI as either augmentative to their creative work or seeing it as a career threat, or just a cool toy. So I guess his question is kind of, do you think creative workers should be worried about their future career prospects? Or do you think that AI is going to actually augment their careers or maybe create more new opportunities?

Miles Brundage: 01:12:22

Yeah. So a couple ways of looking at that. So first there’s been a lot of investigation by economists and historians and others of the impacts of technologies like AI and including AI. So this is a big subject of study of a lot of researchers right now. Generally, the evidence seems to suggest that AI is not likely to cause kind of large scale displacement of workers or substitution for whole jobs at least in the immediate term. At the current level of capabilities, it’s mostly augmenting skills. And that’s consistent with how technologies have played out historically, including earlier forms of mechanization and automation in the 19th of 20th centuries.

Miles Brundage: 01:13:12

However, I think that doesn’t mean that there won’t be disruption and kind of changes and turbulence. It could mean that some occupations increase, decrease, or transformed. And even if a sort of whole job can’t be automated, there could be large tasks within those job that lead to kind of a reshuffling of what is entailed within someone’s portfolio at work.

Miles Brundage: 01:13:37

And so I think an example of this might be GitHub Copilot where we’re… And GitHub actually just put out a post today, which is not exactly the day that this will be aired, but on July 14th they put out a blog post on the impact of this on productivity. So I think typically what we see is AI augmenting rather than substituting for people. We shouldn’t be overconfident in this. I think AI progress has gone faster than a lot of people expected. So I think we should be prepared for the possibility that maybe things will be different from the past. But so far, it seems to be mostly augmenting.

Miles Brundage: 01:14:19

And in particular you mentioned creative professionals. So OpenAI has actually worked with thousands of artists now who have been interacting with DALL·E 2 and finding all sorts of ways of kind of combining the best of both worlds of human and machine intelligence to create new kinds of products. So I think there’s going to be very interesting creative applications. And mostly the upshot for creative professionals is that you probably want to be among the early adopters and sort of take advantage of these things, because first it’s just a lot of fun. And secondly, it might be useful for your job. I can imagine scenarios where people who are taking those advantages end up having a leg up on people who don’t. But I don’t think it’s necessarily going to take the human roll out of creativity, which is not what these are intended for and not the level of capability of current AI.

Jon Krohn: 01:15:22

Yeah. I agree with you on everything that you said, Miles. I think that instead of people… Although you make the point that there could be more displacement in the future, but mostly that we’re seeing augmentation, it seems to me from everything that I’ve read that that is what’s going to continue in the near term. And so I think from a policy perspective, a public policy perspective, we need to be pushing people. We need to be pushing governments to be focusing on funding for retraining programs. I think careers are changing faster than ever before. And there’s more opportunity, but people need to be adapting. Just like as you say, creatives could be getting at the front of this wave of AI augmentation with their work, and that could potentially allow them to have a much faster output, really interesting creative ways of doing things that are novel and surprising to viewers, allowing people that are at the front of this wave to be generating kind of more valuable art even if it isn’t more quantity.

Jon Krohn: 01:16:29

And going back to that DALL·E 2 prompt engineering book that I was talking about earlier in this episode, there are some really cool examples in there of ways that I had not thought about DALL·E 2 being used, where for example you have DALL·E 2 generate lots of images that can be stitched together to create these huge artistic works or that blend together classical works with modern works. And so some really cool stuff there.

Miles Brundage: 01:16:58

I would just add one thing, which is that while I think we have some evidence from the past about how things tend to play out and there’s a lot of anecdotal evidence to suggest that so far AI is mostly substituting rather than… Sorry. It is mostly augmenting rather than substituting for human labor, there’s a lot we don’t know about exactly how it’s playing out and how different professions and different demographics might be more or less affected.

Jon Krohn: 01:17:26

For sure.

Miles Brundage: 01:17:26

So there’s definitely a need for more research. And if people are interested in pairing as a company to be studied or as an economist to help research this, or data scientist to help research this, Google OpenAI Economic Impact’s research. We put out a call for people to partner with on understanding these issues better.

Jon Krohn: 01:17:46

Wow, cool. That sounds like something great to be involved with. Speaking of ways that you can get involved with OpenAI, OpenAI is hiring. At the time of recording, there are dozens of openings that I could see on the OpenAI hiring page, which I will be sure to include in the show notes. This includes software engineering roles, machine learning engineering roles, AI research roles, AI product roles. I didn’t immediately see policy related ones that I potentially would’ve missed it.

Miles Brundage: 01:18:18

Yeah. So we might be opening something up between now and when this podcast airs. So yeah, I would encourage people to just check it out for themselves and see if there’s anything of interest. If not, then I would still just… If you can imagine a role that isn’t listed, then I would still just submit an application and just add a note of like, “This is what I’m interested in.”

Jon Krohn: 01:18:40

Super cool. Yeah. See, it truly would be one of the coolest organizations in the world to work for so definitely check that out, listener. All right, Miles, you have been super generous with your time today. I’ve just got two quick questions for you. Do you have a book recommendation for us?

Miles Brundage: 01:19:01

Yeah. Some people don’t like this and I’ve recommended it to some people who didn’t really get it, but I personally think this is worth trying and persevering. It’s a book series called Terra Ignota. The first book in the series is called Too Like the Lightning by Ada Palmer. What’s cool about this book is that it’s a sci-fi series that is written by a professor of history at the University of Chicago. She envisioned it as hard social science fiction, which is to say that it reflects an expert level of understanding of how history and culture and institutions interact over time and how individuals can and can’t influence the course of world events. I think it’s very interesting. There’s some weird stuff, because it really written as a sort of future history of kind of like someone in this time period in the future kind of looking back at recent events. There’s a lot of weird uses of terminology and gender and jargon and so forth. So it’s definitely not an easy read, but I personally have found it to be very fascinating.

Jon Krohn: 01:20:09

Super cool. That’s an amazing recommendation and I guess one that shouldn’t surprise me from somebody that does work like yours. That is a cool recommendation. All right. And then final question, Miles. You were a font of an enormous amount of knowledge in this episode. If folks are interested in learning more about AI policy in the future, how should they follow you to get that?

Miles Brundage: 01:20:36

Yeah, so I have a Twitter account. I think it’s miles_brundage. It shouldn’t be too hard to find if you google my name. I also have a website which lists some of my recent papers. It’s not always up to date so Google Scholar is also a good resource if you want to see my publications.

Jon Krohn: 01:20:54

Nice. We will be sure to include links to all of those in the show notes. Thank you so much, Miles. It’s been amazing having you on the program. I have learned a ton. No doubt our listeners have as well. We can’t wait to see what you folks at OpenAI come up with safely next.

Miles Brundage: 01:21:13

Thanks so much for having me.

Jon Krohn: 01:21:19

Holy smokes. What a brilliant researcher Miles is. I was impressed by his ability to call out the specific names of papers and techniques across such a broad range of the AI ecosystem. In today’s episode, Miles filled us in on general AI model release best practices, such as staggered rollout, warnings, and informational cards. We talked about how natural language models like GPT-3 can be trained and prompted to be more likely to output the truth, how the training data set for the DALL·E models was thoughtfully prone to exclude potentially offensive imagery, of the reckless creation of buggy code as one of the risks associated with providing the world with an automated software writing model like Codex, how through contrasting training, OpenAI’s CLIP model can guess what’s depicted in an image without ever being explicitly trained to do so. And he talked about how in the decades to come on average across industries AI will likely augment more professions than it displaces.

Jon Krohn: 01:22:15

As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Miles’ social media profiles as well as my own social media profiles at www.superdatascience.com/597. That’s www.superdatascience.com/597.

Jon Krohn: 01:22:33

If you’d like to ask questions at future guests of the show like an audience member did at Miles in today’s episode, then consider following me on LinkedIn or Twitter as that’s where I post who upcoming guests are and ask you to provide your inquiries.

Jon Krohn: 01:22:46

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng, and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another outstanding episode for us today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 597: A.I. Policy at OpenAI

SDS 597: A.I. Policy at OpenAI

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 597: A.I. Policy at OpenAI

Share

SDS 597: A.I. Policy at OpenAI

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025