Podcasts SDS 709: Big A.I. R&D Risks Reap Big Societal Rewards, with Meta’s Dr. Laurens van der Maaten

81 minutes
Artificial Intelligence, Data Science

SDS 709: Big A.I. R&D Risks Reap Big Societal Rewards, with Meta’s Dr. Laurens van der Maaten

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Get ready for an extraordinary episode of the Super Data Science Podcast! In this captivating installment, we’re joined by Dr. Laurens van der Maaten, a Senior Research Director at Meta, who takes us on a journey through the fascinating world of AI. From pioneering dimensionality reduction techniques to unlocking the potential of privacy-preserving ML and tackling monumental challenges like climate change, he shares expertise and insights that will leave both seasoned data science practitioners and curious minds inspired.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

About Laurens van der Maaten

Laurens van der Maaten is a Senior Research Director at Meta AI (FAIR team). He supports a team of world-leading researchers, engineers, and designers that are developing the AI technologies of the future. Together with Geoffrey Hinton, he invented the t-SNE algorithm for dimensionality reduction that has since become a widely used tool for data visualization. Laurens was also the lead developer the CrypTen framework for privacy-preserving machine learning. His work received Best Paper Awards at the CVPR 2017 and UAI 2021 conferences.

Overview

To kick things off, Laurens takes us down memory lane to one of his earliest endeavors with the team – a colossal project involving large-scale learning of image recognition models from web data. Armed with a staggering amount of weakly-labeled images, he rewrote the rulebook for machine vision systems. He offers a glimpse into the infrastructure that powered this project (back when Tensorflow or PyTorch did not exist) that resulted in significant advancements in image recognition accuracy.

Next, Laurens unveils his involvement in de novo protein design, another challenging project aimed at creating proteins that don’t exist in nature. His team’s approach employed language modeling on extensive protein datasets, with potential applications spanning from drug discovery to designing enzymes for specific purposes.

Laurens then invites listeners to explore his CrypTen framework, an innovative concept resembling PyTorch in functionality but designed to ensure secure computations within the realm of machine learning. He also sheds light on the role of AI in climate change mitigation and the simulation of wearable materials for augmented-reality applications. By applying AI to such pressing global challenges and merging it with reality augmentation, Laurens emphasizes the transformative capabilities of AI.

Transitioning to the technical aspects, Laurens takes a closer look at the t-SNE dimensionality reduction technique. This technique compresses the dimensionality of high-dimensionality vectors, which can be used for visualization of natural-language token similarity but is also widely used with biological data.

Concluding the episode, Laurens shares his forward-looking insights on AI’s trajectory in shaping the future and shares his career advice to those looking to make a similar impact in the world. Whether you’re well-versed in AI or just embarking on your learning journey, this episode offers a window into the potential that AI holds. Tune in to gain insights and expand your understanding of AI’s evolving capabilities.

In this episode you will learn:

Large-scale learning of image recognition models on web data [05:05]
Evolutionary Scale Modeling protein models [16:45]
Fighting climate change by building an A.I. model [29:49]
The CrypTen privacy-preserving ML framework [38:36]
Concerns about adversarial examples [53:25]
Laurens’ t-SNE algorithm [58:56]
How to make a big impact [1:07:25]

Items mentioned in this podcast:

Follow Laurens:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon: 00:00:00

This is episode number 709 with Dr. Laurens van der Maaten, Senior Research Director at Meta. Today’s episode is brought to you by AWS Cloud Computing Services, by Modelbit for deploying models in seconds, and by Grafbase, the unified data layer.

00:00:20

Welcome to the Super Data Science podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.

00:00:51

Welcome back to the Super Data Science podcast. Today we’re joined by the visionary, world-class, repeatedly game-changing AI researcher Dr. Laurens van der Maaten. Think that sounds bombastic? Well, just you wait. Laurens is a Senior Research Director at Meta overseeing swaths of their high-risk, high-reward AI projects with applications as diverse as augmented reality, biological protein synthesis, and tackling climate change. He developed the “CrypTen” privacy-preserving NL framework. He pioneered “web-scale” weekly supervised training of image recognition models. Along with Geoff Hinton, he created the t-SNE dimensionality reduction technique. Their paper on this alone has been cited over 36,000 times in aggregate. Lauren’s works have been cited nearly a 100,000 times. He holds a Ph.D. in machine learning from Tilburg University in the Netherlands. Today’s episode will probably appeal primarily to hands-on data science practitioners, but there is tons of content in this episode for anyone who’d like to appreciate the state of the art in AI across a broad range of socially impactful super cool applications.

00:01:56

In this episode, Lauren details how he pioneered learning across billions of weekly labeled images to create a state-of-the-art machine vision model. He fills this in on how AI can be applied to the synthesis of new biological proteins with implications for both medicine and agriculture. He provides specific ways AI is being used to tackle climate change, as well as to simulate wearable materials for enhancing augmented reality interactivity. He introduces a library just like PyTorch, but where all of the computations are encrypted. He talks about the wide range of applications of his ubiquitous dimensionality reduction approach, and he fills us in on his vision for the impact of AI on society in the coming decades. All right, you ready for this exceptional episode? Let’s go.

00:02:45

All right. I’m here with Laurens van der Maaten. Thank you for coming in person to record this Super Data Science episode. It’s a joy to have you on.

Laurens: 00:02:52

Thanks so much for having me, Jon.

Jon: 00:02:53

So, we know each other through Alex Miller directly, who I guess reports into you at Meta AI.

Laurens: 00:03:01

He used to. Yeah.

Jon: 00:03:02

He used to, yeah. So, he was in episode number 663, which was focused on CICERO, this amazing algorithm. I think it was the biggest innovation in AI in 2022, which makes it one of the biggest AI innovations ever, up to this point. And so we’ll talk about that a bit more in the episode. We also had Noam Brown who was in episode number 569. And that episode kind of laid the groundwork for the kind of stuff that we talked about related to CICERO. So, these kinds of projects all relate to machines being able to do incredible, not only just natural language capabilities like we’re now seeing with generative AI models, but being able to, being able to negotiate and being able to anticipate what humans would want to do in a certain situation, and even if the machine just has limited information.

00:03:57

So, Noam primarily talk about poker and then with with Alex, we primarily talked about this board game Diplomacy. And yeah, just really, really, really fascinating groundbreaking AI. And I don’t know if you have anything particular you want to add on CICERO.

Laurens: 00:04:15

Yeah, I mean the exciting thing about CICERO right, is that I think it’s pretty much the first time that we were able to combine, you know, sort of like really advanced sort of reasoning and planning sort of the kind of stuff that people have been doing in-game AI for a long time with natural language communication, right? And so it’s really that combination that I think makes, makes CICERO a really unique achievement.

Jon: 00:04:40

Yep. And worked really well. It’s, it was the, it was the top competitor at 90th percentile.

Laurens: 00:04:48

Yeah, yeah, yeah. Like we’re, the CICERO robot is, is basically competitive with like the top players in the world of the game of Diplomacy.

Jon: 00:04:57

Yeah. And so listeners, you can refer back to episode 663 if you want lots on that project. So, Laurens, you are a Senior Research Director at Meta AI. What are some of the most impactful projects that you’ve worked on in your time at Meta AI?

Laurens: 00:05:14

I’ve, I’ve been at Meta AI for eight and a half years, so I’ve worked on on many things. So, I don’t know, sort of what is the most impactful one. Maybe one of the projects I’m most proud of is, is some of the work that we did on large scale learning of image recognition models on web data. Sort of the story behind this effort sort of comes, goes back to when I joined Meta back in 2015. And at that time, sort of the common approach to building image recognition systems was you would collect a set of images and you would manually annotate sort of what is in those images. Like here’s a coffee cup, here’s a cat, here’s a dog, et cetera.