Podcasts SDS 469: Learning Deep Learning Together

72 minutes
Data Science, Deep Learning

SDS 469: Learning Deep Learning Together

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

We talk about Neuromatch, an educational program lead by Konrad, his curriculum, the math and programming foundations you need for deep learning, Konrad’s thoughts on the limitations of deep learning, and more!

About Konrad Körding

Dr. Kording’s (He/Him) is trying to understand how the world and in particular the brain works using data. To do so, his group uses ideas from deep learning, neuroscience, and optimization. Much of the lab’s current focus is on Causality in Data science applications – how do we know how things work if we can not randomize? But we are also very much excited about understanding how the brain does credit assignment. Our style of working is transdisciplinary, we collaborate on virtually every project.

Overview

Konrad specializes in the relationship between biological neural networks and artificial neural networks. He has an interest in intelligence, what happens in the brain that results in intelligence, and how we can purpose that towards artificial intelligence. Part of this work is his Neuromatch Academy, an educational opportunity on artificial intelligence that was designed as an on-site summer school at Penn, which went online during the pandemic, collaborating with multiple entities to build the best summer school possible. With a team of about 300, they optimized the form and content of the courses focused on learning by doing. As part of this, they match groups of people with like-minded students, who speak the same language, who have the same career goals, utilizing an algorithm.

The curriculum involves a mixture of lecturing and hands-on experience in breakout rooms, monitored by a TA. This is complemented by discussion sessions. The second half of the day is composed of work on projects. This is done over 11 time zones, meaning most materials are localized while Q&A events are staggered through 3 time zones to make sure everyone has an appropriate time to participate. Konrad recommends potential students be familiar with coding in Python and the basics of data science—such as matrices, basic calculus, regularization needs, and other entry-level data science learnings. As a result of the communal aspect of this and the robust material, they have very low dropout levels, encouraging students to keep going on the course.

An interesting element of the course is the learnings of deep reinforcement learning through the context of games, specifically Go. We discussed AlphaGo, which overcame the computational complexity of the game to defeat the top-ranked human player in the world. From there, they discuss what deep learning isn’t so good at right now. For example, computer games have a very confined world which makes them easy for a computer to learn, while the human mind is extremely difficult to learn in the same way because of its near limitlessness in how it can think, what it can think about, and how it can go about solving tasks. These machines are so good at the game now that many human players will study the algorithm to learn better ways to play the game. Following this growth, we naturally had to talk about artificial general intelligence. Konrad says he thinks about this topic a great deal in part of his overall study of intelligence. He notes that almost anything organic in the world has a deep desire to understand the world, which is difficult to give to a machine. Another topic involved in this is symbol representation, which Konrad thinks will become a stronger part of the future of deep learning systems.

After all this, I had to ask how Konrad how he ended up here studying these topics. He notes he was never comfortable staying in one topic of study, going from physics to biology to statistics to cognitive science. Ultimately, he wanted to study the brain and how it works, talking with exciting people, and discussing society’s big problems. Konrad is even still exploring now, considering Neuromatch as an experience in organizing systems and what deep learning can do in multiple industries. In terms of what skills Konrad thinks, aspiring data scientists should be thinking about or exploring is causality. We want to make things better which is a simple casual question of if we do A then what happens B? It’s the key to solving the world’s problems.

What about the future? Konrad notes that linear regression becomes a hot topic every few decades before it vanishes again, and deep learning is on a similar trajectory where it gains a new skill which is assumed to inch towards AGI before the hype dies down again. He recommends everyone study causality to get a better understanding of intelligence and problem-solving in their data science work.

In this episode you will learn:

Konrad’s academic background [3:54]
Neuromatch Academy [5:23]
Artificial general intelligence [35:02]
Defining deep learning [41:24]
Symbol representation [44:12]
Konrad’s career journey [47:25]
What other skills should you develop for the future? [52:46]
What is the future of intelligence in our timeline? [56:37]

Items mentioned in this podcast:

Follow Konrad:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 469 with Professor Konrad Körding, of the University of Pennsylvania.

Jon Krohn: 00:00:12

Welcome to the SuperDataScience podcast. My name is Jon Krohn, chief data scientist and bestselling author on deep learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex, simple.

Jon Krohn: 00:00:38

Welcome back to the SuperDataScience podcast. I’m your host, Jon Krohn, and I am honored and delighted that Konrad Körding took time out of his week to join me on this episode. Konrad is a full professor at the prestigious University of Pennsylvania, where his research lab bridges the fields of biological neuroscience and artificial neural networks like deep learning networks. Yes, he carries out extremely interesting research that strikes at the heart of what intelligence is, and how it can be replicated in machines. We talk about this a fair bit in the episode, largely in straightforward terms with lots of vivid analogies.

Jon Krohn: 00:01:21

Another core topic in the episode is Neuromatch, a massive, innovative deep learning education program that Konrad leads that matches students with similar interests, languages, and time zones into 10 person study pods. This matching approach is wildly successful, with 86% of students completing the program compared to a 10% industry average. We dig into the Neuromatch curriculum, which allows us to both introduce and to discuss the state of the art in deep learning across all of the core deep learning approaches. We cover the mathematical and programming foundations you should ideally possess to make the most of studying deep learning, convolutional networks for machine vision, recurrent networks for processing natural language, generative adversarial networks for artistic creativity, and deep reinforcement learning for complex sequential decision making like playing games and driving vehicles autonomously.

Jon Krohn: 00:02:14

Finally, Konrad shares his rich thoughts on what stands in the way of deep learning enabling machines to learn as well as humans learn, and he fills us in on how these limitations may be overcome in the future. Despite Professor Körding’s incredible depth of experience, we largely keep the content at a high level, making this episode perfect for technical and nontechnical folks alike who’d like to understand the leading edge of artificial intelligence today, as well as predictions for how AI may or may not transform the world in the coming decades.

Jon Krohn: 00:02:55

Konrad, welcome to the SuperDataScience podcast. It’s an absolute honor to have you here. Where are you calling from today?

Konrad Körding: 00:03:02

So I’m at home in Philadelphia. It’s a wonderful day with sunshine on the outside, and I haven’t left my home in a long time.

Jon Krohn: 00:03:12

Ah, well, I think we’re nearing the beginning of the end of the pandemic. So we’re recording end of April, and something that’s been super exciting for me to watch over the last couple weeks on Manhattan, because we have regional-specific COVID stats, and three weeks ago we were at a 3% positive test rate, and now we’re at a 1% positive test rate with the level of vaccination that’s happened here. So you might get to go outside someday soon.

Konrad Körding: 00:03:39

Well, I’m kind of jealous, because our numbers, I believe [inaudible 00:03:43] right at this time.

Jon Krohn: 00:03:44

Oh no.

Konrad Körding: 00:03:45

So hopefully soon.

Jon Krohn: 00:03:48

All right, I hope so as well. Looking forward to a return to normality. So you have an incredible academic background. I’d love for you to tell us in your own words a little bit about what your specializations are. So I know that you have a big specialization in the relationship between biological neural networks and artificial neural networks, like deep learning neural networks, you want to tell us a bit about that?

Konrad Körding: 00:04:19

Yeah, I think ultimately I’m really interested in intelligence. And to really understand intelligence, I believe we need to look at biology, at people, and hence a lot of my research was about how do people solve problems? And I think we also need to look towards neuroscience, where we’ve been looking, “Well, what’s happening in the brain while we perceive things, while we move?” And then lastly I think we really need to start building artificial intelligence and trying to solve the same kinds of problems that biology’s good at solving.

Konrad Körding: 00:04:51

So in that sense, I’m interested in all these different aspects of that, and that’s why if you actually look at my CV, I look like I’m very badly specialized. I have written papers in lots of different areas, like how do we get electrodes into brains, how can we treat various diseases, and how does the brain work, but they all just aim at more or less doing the same thing, which is just understand intelligence, broadly construed.

Jon Krohn: 00:05:18

All right, so many exciting things to talk about related to your academic research. I can’t wait to dig into it with you. However, first, I would like to talk about the Neuromatch Academy, which is a program that you conceived of that is related to the intersection of your biological and your artificial neural network worlds that you straddle. So this program launched last year for the first time. It’s an intensive three week summer school on neuroscience, and then separately three weeks on deep learning. I assume people can sign up for one or the other or both, depending on their interests, and you’re running it again for a second time this summer. So tell us a bit about the program.

Konrad Körding: 00:06:01

That’s right. So everything started last year when the pandemic hit. At that point of time, before last year, I was organizing the summer school in neuroscience. How does that work? You bring in 20 or 40 students from worldwide, you bring in 10 lecturers. The lecturers all come, they fly in, they talk for a day, and then they fly back. The students learn a lot in that period of time. And then of course when the pandemic it, it was over. And so then we decided, well, what if we could bring this online? In fact, lots of other people had summer schools as well, there’s about 10 summer schools in neuroscience, so we all joined forces. So we decided, as a big team, we ended up being about 300 people …

Jon Krohn: 00:06:45

Wow.

Konrad Körding: 00:06:46

To try and make a three week summer school as the best that we could possibly build as this joint team. Now, how do you do that? Normally, a professor brings some tutorial, and the tutorial is not always so great. If we’re in a big team, we can really try and optimize all aspects about that, which is what we did. And we tried to optimize the format. So instead of the usual, professor gives lectures and then there’s homework work, we went to this format where it’s like, you just, as a professor, describe five minutes what that concept is, and they immediately do it. And we have this format where everything happens in Google Colab. So you see the video explaining things in Colab, and then right below it you do the exercises. And then comes the next concept. So you learn by doing, and we found that that’s incredibly effective in computational neuroscience,

Jon Krohn: 00:07:42

There’s also a tutorial component, like a small class size component, so that’s actually where the match in Neuromatch comes from, right?

Konrad Körding: 00:07:51

That’s right. We believe in the power of groups. So we bring people in small groups, 10 people at a time, with a TA that we pay, and we make it so that these groups are good fits to one another. We run algorithms that match people, bring together groups that have similar interests. What does that mean? We bring together people that speak the same language, and there’s a lot of people who prefer that there’s someone who they can speak with in another language.

Jon Krohn: 00:08:18

Oh, interesting.

Konrad Körding: 00:08:19

So for example, we had German, Italian, Mandarin. We had 13 different languages. We also bring people together that are interested in similar things. We bring together people that are at similar stages of their career. And with that, we can have these groups of learners that really help one another.

Jon Krohn: 00:08:40

So what kinds of questions do people answer, do people answer open-ended questions, or do you have specific multiple choice questions that people answer about themselves? Do you have an algorithm that matches people, or does it happen all by hand?

Konrad Körding: 00:08:53

Well, if you have thousands of people, matching people by hand is maybe not such a great idea. I mean, full disclosure, we had the idea, and we decided that we didn’t have the manpower to match people like that. So yes, we run algorithms for that. In fact, the whole Neuromatch movement came from, before the pandemic hit, we were bringing people together in these mind-matching sessions, where we introduced pairs of people that we felt had scientific common interests, and they had a great time talking with one another. And before the pandemic hit, we already wanted to bring that online. So once the pandemic hit, we were basically ready to go, and that’s how the movement got off the ground so fast.

Jon Krohn: 00:09:34

So when that was running in person, matching people together like that, even just the one on one matching, so that was, I assume, mostly academics. Was that happening physically at the University of Pennsylvania in Philadelphia?

Konrad Körding: 00:09:47

No, it was happening at conferences.

Jon Krohn: 00:09:48

Ah.

Konrad Körding: 00:09:49

So this is absolutely crazy. So you fill a room with like 200 people, they all have little printed cards where it says whom they should meet at which point of time. Someone’s in front of everyone and says, “It’s time for session number three,” and then everyone has which table and which person at which table. Now, the main complaint that we get is like, “I had great discussions with the people, but it was hard to understand anything.” Because you have a room of like 200 people where every pair of people talks with one another. So in that sense, online is much better, because we can bring these groups together and also pairs together.

Jon Krohn: 00:10:28

I think it’s a good idea. In fact, the DataScienceGO virtual conference that SuperDataScience organizes, it has a similar kind of thing, where you get paired online with somebody, it’s kind of like a speed dating experience in that case, that you can facilitate online where … I haven’t actually gone through it myself, so hopefully I’m getting this exactly correct, but it’s something like you spend five minutes with somebody, although I think that it’s random. So maybe there’s a real opportunity here to be using some information to be, having it be non-random. And-

Konrad Körding: 00:11:03

Yeah, our algorithms are all on GitHub, so you’re very welcome to-

Jon Krohn: 00:11:04

Oh, no kidding.

Konrad Körding: 00:11:04

… try how well it works.

Jon Krohn: 00:11:08

Oh wow.

Konrad Körding: 00:11:09

And what we do, it’s actually quite simple. Everyone describes their interests has a hundred word little abstract, it could be an essay or something. And then just, we do simple topic modeling and match people that are similar in this topic space. And then there’s a linear programming component to make the groups. You need to make sure that every person sees the same number of people, there’s a couple constraints like this.

Jon Krohn: 00:11:36

Very cool. So tell us a bit about the curriculum for the deep learning school. So we’ll talk a little bit about the neuroscience school as well, but I think for our audience, probably the deep learning school is probably most interesting. I think it’s three weeks, right? And so how much time do people spend every day, and how does the day break up? I guess you spend part of your day watching those five minute lectures, and then doing the hands on code, and then, yeah, so then how often do you [inaudible 00:12:06] talking with your teammates?

Konrad Körding: 00:12:08

Yeah, that’s right. Let’s go how the day works. So in the morning, everyone meets in this group of 10 people, and they jointly do the tutorials. Which is really great, because in many components of the tutorial, the students can share a screen. Basically, they’ll be in a Zoom breakout room, and the TA can then go from person to person, look over their shoulder, over their metaphorical shoulder, and help them, and they can then ask, “Look, I’m having this error,” or, “Why does my code not do what I think it should be doing?”

Konrad Körding: 00:12:41

So we find that it’s incredibly helpful to have this TA that goes to them. And then of course we also have these discussion sections in that group of 10 people, where it’s like, why do you think FaceNet is a problem from a racism perspective? So there are all those components where people need to talk with one another. And so this happens during the first full hour. You always do 45 minutes, which are three times the five minute lectures plus 10 minutes doing it, and then you have 15 minutes break. So that’s what happens in the morning. Then you have a break, and then in the afternoon you do projects, where you really get together with people with similar interests and start building code and deep learning that you actually use for a project.

Jon Krohn: 00:13:27

So with thousands of students, does morning mean morning Eastern Time in the US, or do you have start times all over the globe?

Konrad Körding: 00:13:36

Well, we are running on 11 time zones.

Jon Krohn: 00:13:39

Wow.

Konrad Körding: 00:13:39

We are running all over the world. Our map of who’s taking our course is online, but yes, we have a large group of people from China participating, a large group of people from India participating, we have the Japanese participating.

Jon Krohn: 00:13:53

Wow.

Konrad Körding: 00:13:53

So we are running on 11 time zones, and that means that most of our material is basically localized only within that group of 10 people that are together. And then of course we need to have Q&A sessions and mentorship events, and for those we make sure that we stagger them through the different time zones so that everyone has access. So we are using three time zones for that, and then of course it means that the local group will at times have to move a little bit to make space for the Q&A session on their respective time zone.

Jon Krohn: 00:14:31

That is so cool.

Konrad Körding: 00:14:32

Yeah, you can’t do global unless you’re really on all times zones. This [inaudible 00:14:37].

Jon Krohn: 00:14:36

Yeah. I’ve done a lot of conferences online since the pandemic hit, and I have not participated in one that has varying start times like this. That is a really cool approach, I like that. So what kind of person is ideally suited to taking this? What kind of mathematical or programming background do you already need to have to make use of the deep learning summer school at Neuromatch?

Konrad Körding: 00:14:59

This is a fantastic question. So for the course to be useful, you need to be able to code in Python, because everything is about building neural networks in PyTorch. So if you don’t have Python, this is not a good way of learning Python. So you need Python, and then you need the basics of what would be part of a data science course. Like you need to understand matrices, because tensors don’t make much sense until you understand matrices. You need to understand basic calculus, like calculate derivatives, the chain rule. You should know some basics about data science, like the need for regularization, L1, L2 regularization, the kinds of things that any data science course will teach you.

Konrad Körding: 00:15:47

But how do we define ourselves? We want to be like the gentle course, the course that makes it easy for you to get the skills that you actually need for your career. So therefore, we are trying to do all the things that make it easy to stick with the program. Now, you’re in a group of 10 people, which means that if you fall out, there’s nine other people who will have gotten to know you during the first day, so might send you an email like, “Hey, Jon, we are missing you, why are you not here?” And the effect of that was is that we have very, very little dropout. And the other thing is, we have this big group working on the materials, so the materials are just awesome. They’re just at the right level, and therefore, what we want to be is this inclusive, positive, friendly space in which you can learn the skills that you really want to have.

Jon Krohn: 00:16:36

You actually, on a previous call that you and I had, you mentioned to me the dropout rates that are normal in these kinds of massive courses. Do you remember that stat?

Konrad Körding: 00:16:45

Yeah, if I remember it right, typical completion rates for MOOCs are between five and 15%. And in our case, I believe it’s 86% of people-

Jon Krohn: 00:16:55

Wow.

Konrad Körding: 00:16:56

… really make it all the way through the course. And that isn’t because we make it easy, it’s because we support people at learning the things that they really want to learn.

Jon Krohn: 00:17:08

Brilliant.

Jon Krohn: 00:17:12

This episode is brought to you by SuperDataScience. Yes, our online membership platform for transitioning into data science, and the namesake of the podcast itself. In the SuperDataScience platform, we recently launched our new 99 day Data Scientist Study Plan, a cheat sheet with week by week instructions to get you started as a data scientist in as few as 15 weeks. Each week, you complete tasks in four categories. The first is SuperDataScience courses to become familiar with the technical foundations of data science. The second is hands-on projects to fill up your portfolio and showcase your knowledge in your job applications. The third is a career toolkit with actions to help you stand out in your job hunting, and the fourth is additional curated resources, such as articles, books, and podcasts, to expand your learning and stay up to date.

Jon Krohn: 00:18:04

To devise this curriculum, we sat down with some of the best data scientists, as well as many of our most successful students, and came up with the ideal 99 day data scientist study plan to teach you everything you need to succeed, so you can skip the planning and simply focus on learning. We believe that the program can be completed in 99 days, and we challenge you to do it. Are you ready? Go to www.superdatascience.com/challenge, download the 99 day study plan, and use it with your SuperDataScience subscription to get started as a data scientist in under 100 days. And now, let’s get back to this amazing episode.

Jon Krohn: 00:18:43

So in order to take this, you would really need to set aside the time, you need to set aside three continuous weeks to study it full-time, right?

Konrad Körding: 00:18:52

That’s right. Now, it runs from August 2nd to August 20th. Every morning you learn, every afternoon you work on your projects. There’s not going to be all that much extra time. Sure, if you have a family, you will be able to look after your family at the same time. But it’s not something where you could realistically do a full eight hour job and on top of it do eight hours of [inaudible 00:19:18]. It’s time that you need to put aside for this learning task.

Jon Krohn: 00:19:22

Yeah, that makes a lot of sense, especially with the interactivity. I suppose in that small percent of people who don’t make it through on time, I suppose some of them could be going through the materials asynchronously, which of course is something anyone could do, you could go back and review things.

Konrad Körding: 00:19:40

Yeah. All our materials are open source. All the videos are there, and in the Neuroscience course, we had a lot of people, they call it slow [inaudible 00:19:47]. They basically get together with a posse of friends and just say, “We do one day of the summer school every week.” And then they meet once a week, maybe on a Saturday, and then they go through the materials in detail, and that helps them a great deal. I would like to briefly, if that’s fine, Jon, talk about the curriculum, because I think that’s also very important.

Jon Krohn: 00:20:07

Absolutely. It was on my list to ask you, yep.

Konrad Körding: 00:20:10

So the idea is that we want to give people the full skillset of deep learning, not at the level where you could mostly be a researcher in that area, but at the level where you can use the things as a data scientist. So it’s this project, it’s this set of lectures that really build on one another. We will first teach PyTorch, then we will teach about optimization and linear systems, then we will teach about nonlinear systems, multilayer perceptions, and then we will teach about optimization and we’ll teach about regularization. So that way, the first week kind of gives you the basics, the stuff that goes into every neural network you’ll ever build. And then we have a second week that uses the tricks that we use to make things working. Like we’ll talk about convnets, we’ll talk about recurring neural networks. We will talk about these more advanced concepts. And then in the third week, we do-

Jon Krohn: 00:21:04

We can explain those a little bit more to the audience in case they don’t know as well, so convolutional neural networks that you could learn about in that second week, they are particularly widely used or renowned for their use in machine vision, so they’re great at recognizing spatial patterns in multiple dimensions. And they’ve actually, though, despite their genesis in machine vision, hugely widely used in natural language processing as well.

Konrad Körding: 00:21:30

Yeah. And it’s the idea that if you want the local meaning of a picture, let’s say I see in the background of your video, I see a guitar. Now, the guitar would be the same guitar if I moved it up or down, or left or right, or made it a little smaller, a little bigger. Convnets in a way are tricks that we use that allows us to use few parameters to solve those kinds of tasks, and that need for fewer parameters is always there. It makes the training of these systems much, much more efficient, and that is why it’s so useful for data science applications.

Jon Krohn: 00:22:06

Nice. And then the recurrent neural networks that you mentioned, those are traditionally associated with natural language processing, so they are adept at identifying patterns in a sequence. And so this could be a sequence of sounds, like the sound of my voice, or it could be a sequence of characters, a sequence of words. You could even use them, potentially, for financial time series, or other kinds of quantities that vary over time. And these are, I agree with you 100%. So I have seen Konrad’s curriculum, and the way that they go about teaching the curriculum is very similar to the way that I teach my own deep learning programs and built my Deep Learning Illustrated book, so I think it makes absolutely the most sense of … I do kind of the same thing, where the first third is focused on these foundational subjects, how you train a neural network, how the neurons make calculations.

Jon Krohn: 00:23:00

Then the second part is building on those foundations, exactly like you are. And by coincidence, I suppose, I teach convolutional neural networks first, and machine vision, and then second is recurrent neural networks. What then happens in week three?

Konrad Körding: 00:23:17

Yeah, and then in week three we talk about more advanced concepts that will include generative adversarial networks. I’m not sure if you’ve seen it, GANs, they’re called GANs, they are super awesome. They can produce fake images of faces, for example, and we’ve probably all seen that. Like isthisimagereal.com, or something like that, where it draws images that you or me have real trouble distinguishing from reality. And so it makes these images, and we can do all kinds of graphics things with it. It’s a new set of techniques that are now starting to be really used in industry. Like think Photoshop.

Jon Krohn: 00:23:59

If listeners aren’t familiar with generative adversarial networks already, GANs, which, by the way, is literally the next chapter that I do after recurrent neural networks, it’s staggering how this is … It’s convergent evolution, somehow. And so you can go to-

Konrad Körding: 00:24:16

Let me say something about convergent evolution. In reality, in the field, when it comes to teaching deep learning, we’re all learning from one another. Here is Jon’s book, for people who see the-

Jon Krohn: 00:24:26

No way.

Konrad Körding: 00:24:27

… video stream. So we are, of course-

Jon Krohn: 00:24:30

Wow.

Konrad Körding: 00:24:30

… looking at all the other books, and we also called a lot of the other people teaching to figure out what’s the best way of teaching things.

Jon Krohn: 00:24:39

Incredible. That is a real honor to see that book there, Konrad. But so for listeners who haven’t seen a generative adversarial network in action, a website that you can go to is whichfaceisreal.com, which Konrad alluded to, and I just looked up the URL here in real time to make sure that that was the right URL, and it’s a fun way, it puts two faces side by side and you have to guess which ones is real, and yeah, generative adversarial networks, GANs, are getting extremely good at creating fake people. So, interesting things.

Jon Krohn: 00:25:15

As you were saying, we kind of think about them in the context of deepfakes and the kinds of issues that come about related to these, but there’s a huge opportunity in allowing this to augment human capabilities, where fashion design, apartment design, anything that happens in a visual space, all of a sudden you could be using these GANs to create beautiful, photorealistic simulations of what you’re trying to crate.

Konrad Körding: 00:25:43

There’s another thing that’s made by GANs, as far as I know. There is now an app on my phone where I can record a video of myself, just a short video snippet, and it can then make me sing to one of the popular songs, [crosstalk 00:26:01] pretty good at producing those videos.

Jon Krohn: 00:26:03

Nice. Do you upload those to TikTok? Do you have your own TikTok?

Konrad Körding: 00:26:07

I don’t have my own TikTok account, but I should be doing it. In particular because some people at Neuromatch are now really interested in bringing what we are doing to TikTok, and I’m so curious to see how that will work.

Jon Krohn: 00:26:20

Yeah, this is something we were talking about a little bit earlier. So something that regular listeners would know about is that I primarily use LinkedIn, a lot of our guests primarily use LinkedIn, and that’s where we recommend connecting. But Konrad, with his academic background, one of the things that Konrad and I talked about first when we met was how he doesn’t really do anything on LinkedIn, that academics don’t really … Meanwhile on Twitter he has over 20,000 followers, and when I make tweets, it’s just like crickets. So it’s funny how we end up with these different worlds. And now, the Neuromatch Academy is making big strides in Instagram and TikTok, is that right?

Konrad Körding: 00:27:02

That’s the plan, at least, yes. I mean, I think, and Twitter, for the academic community, Twitter fulfills a very important role, which is, we switched from papers to preprints. We are no longer waiting two years until someone publishes our papers. As soon as we wrote them, as soon as we submit then, we also put them on arXiv. So Twitter kind of helps me, at least, find the papers of other scientists, and that’s why I think there’s so many scientists on Twitter.

Jon Krohn: 00:27:33

That’s cool. It’s amazing how things change, I mean, it’s been 10 years since I was a serious academic, and I wasn’t personally using Twitter as a means of getting papers, but I don’t think it made as much sense, because we weren’t doing that real-time publishing back then in the same way.

Konrad Körding: 00:27:48

Yeah, I think a lot has changed. But like, hey, let’s briefly go back to the syllabus.

Jon Krohn: 00:27:52

Totally, I was going to bring us right back. So third week, we’ve got generative adversarial networks, what else? I bet you have deep reinforcement learning.

Konrad Körding: 00:27:59

You bet. And not only do we have deep reinforcement learning, we will basically be doing deep reinforcement learning in the context of games. And you may have heard of AlphaZero, a set of algorithms that basically beat humans at a whole host of computer games, or non-computer games, like Go. It’s impressive to see. Go seemed impossible five years ago, and now beating a computer is impossible for humans. And that transition was brought in in large parts by DeepMind, and Tim Lillicrap, who was part of those teams, will be teaching them.

Jon Krohn: 00:28:37

Wow.

Konrad Körding: 00:28:38

That’ll be a wonderful day.

Jon Krohn: 00:28:39

That’s cool. Yeah, you really do have a lot of the big rock stars in the research field, who are on the cutting edge of devising these algorithms, speaking as the lecturers on your curriculum. It is incredible. So you were talking about the board game Go there. So Go is, I think the most popular board game in the world, but in the west, a lot of people don’t know it. So you have, it’s a two player game, kind of like chess, and you have a player that has white stones, another player that has black stones. And you place them on a grid, and your objective is to encircle your opponent’s stones, and when you encircle them you capture them and you get that territory on the board. And the computational complexity of this, I’m probably going to get this stat wrong, but it’s something like, the possible number of moves in the game, of board configurations, is more than there are atoms in the universe.

Konrad Körding: 00:29:37

That’s right, yeah.

Jon Krohn: 00:29:38

Which is astronomically more complex than a game like chess, with which we had expert AI systems that could beat the world’s best chess players like Garry Kasparov in the ’90s. And so what made Go such a task that, so not only the computational complexity, but also there was this sense that the best Go players in the world had an intuition of how to be incredible, and that this intuition, a machine couldn’t figure this out. And so, I’m almost at my final point here, which is that if you don’t know a lot about Go or deep reinforcement learning, a really fun movie that you can watch that has 100% rating on Rotten Tomatoes is the AlphaGo movie, which is available free on YouTube, it’s also on Netflix, and it’s about the story of machines becoming better than people at this complex and supposedly intuitive game.

Konrad Körding: 00:30:41

And I think it’s a nice example why deep learning is so successful. If we look at how AlphaZero works, or how AlphaGo works, it combines two things. It combines a position evaluation, a position evaluation, basically you look at the board and you’re like, “That’s great, I want to play this,” versus, “Ah, this is a lost game.” You have this position evaluation, you call it intuitional. People, humans might call it intuition, and then you can then combine that with the piece that we understand very well, which is the planning into the future, like as a function of how I put my stones, what’s going to happen?

Konrad Körding: 00:31:20

And that allowed us, for these games, to combine our understanding of the game, which is in the rules, we can write it down as text or computer programs, together with the intuition that implicitly is there in the neural network, which basically says, “This is how games look like that are good, and this is how games look like where you’re really hopeless.”

Jon Krohn: 00:31:44

It’s a beautiful thing, the things that are happening in the space are wild. It’s AlphaZero that you’re talking about. So AlphaGo was able to beat the world’s best Go players at this one board game, Go, and then AlphaZero, which you’ve mentioned a couple times, it’s able to beat … It’s a single algorithm that is even better than AlphaGo at Go, but also is better than any existing algorithm, and I think maybe even better than people at chess and a game called shogi, which is like a Japanese chess.

Konrad Körding: 00:32:18

Yeah, the game of chess is no longer hard from a computer science perspective. It’s fun to hear that, because I remember when I started in the field, chess seemed so-

Jon Krohn: 00:32:32

Yeah, I just said it in such a silly way. Of course, algorithms have been, since the ’90s, been able to beat people at chess, than if an algorithm can beat any computer algorithm, it can obviously also beat all people, and so that was a silly question for me to ask anyway.

Konrad Körding: 00:32:46

Yeah, people are just not so good at playing games. Now let me, in that context, plug the last day from the last week, which is we’ll then talk about the things where deep learning isn’t good yet. The things that we need to think about in the future. So computer games, games like Go or chess. In a way, they’re easy, because we know the entire world. Everything that matters is right there on the board. But in practice, in the world in which we live that’s around us, in which we humans live, most things are not known. Like I can’t look into your head when talking with you. There’s only a very small part of this complex world that we can actually see.

Konrad Körding: 00:33:34

And yet we somehow need to causally understand the world. Like, how do aspects in the world influence other aspects in the world? And this problem of causality is something that so far deep learning has been very bad at, and so we will basically spend a day talking about all the approaches that people have to deal with these, and the problems aren’t just causality. There’s also continual learning, for example. Neural networks are really good in the way we usually use them, at solving a task. But you’re not solving a task, you solve a different task tomorrow, and the whole world will look differently. It’s spring now, the visual world looks very different in winter. If you train your neural networks on summer data only, you will be really bad at winter. And humans have preciously little problem with that. So how we can learn in a lifelong way, it’s really interesting from a computational perspective, and we will round up the course, because we want our students to understand what deep learning can do for them, but we also want them to understand what deep learning cannot do for them, and that last day has an important role.

Jon Krohn: 00:34:47

That is also the final chapter of my book, Deep Learning Illustrated. Yeah, it’s uncanny. Except that I didn’t have that continual learning piece in there, I hadn’t thought about that at that time. But things like causality and definitely where there’s opportunities for big strides to still be made. So here’s an interesting, I mean, this is a bit of an open-ended question. So we’ve been talking about AlphaGo, we’ve been talking about AlphaZero. So part of what makes those deep reinforcement learning algorithms so interesting is that they’re more and more general, and that they require less and less programming, and less and less training data. So AlphaGo, relative to AlphaZero, AlphaGo was amazing at Go, AlphaZero is good at many board games, or many games with complete information.

Jon Krohn: 00:35:33

In the evolution of deep reinforcement learning algorithms, including AlphaGo precursors, we went from requiring lots of training data to requiring no training data at all, and simply having the algorithm play against itself. And so this led to, and this is a bit of a tangent, but it led to really interesting things happening, because it meant that it didn’t care what the standard curriculum was for learning how to play Go, it could learn advanced moves, or what we consider, humans, to be advanced moves, it could learn those early on, and then learn some basic things later.

Konrad Körding: 00:36:07

I also heard that it’s getting so good that now Go players will try and learn from how the algorithms play, because the algorithms don’t have these human preconceptions that we are-

Jon Krohn: 00:36:17

Exactly.

Konrad Körding: 00:36:18

… always bringing into it.

Jon Krohn: 00:36:19

There was a quote from, some champion Go player said that, “It’s like watching aliens play the game,” it’s like this alien intelligence amongst us. So that’s a bit of a tangent, but the point that I was trying to get to, and that I talked about in episodes 438 and 440, which aired earlier this year, in January 2021, I talked about artificial general intelligence, and I talked about MuZero, which is another algorithm, another deep reinforcement learning algorithm that followed from AlphaZero. And so artificial general intelligence, let me take one second to explain what that is. It’s this idea that you could have an algorithm that could learn as diverse a number of tasks as a person could. And so this ties into your point about continual learning and how learning is different. So is this an area that you work on in your research in particular? I guess if you’re interested in intelligence in general, this must be a big part of what you think about and write about.

Konrad Körding: 00:37:36

It’s a lot of what I’m thinking about at the moment. I don’t think we have great solutions yet. But I first want to point out how interlinked those things are. Why do we want to understand causality? And a lot of people say, “Well, causality’s kind of hard to define. What do we even mean with that?” But in reality, we walk through this world, most things in the world that we could possibly do are really bad ideas. I can push the wall here at a random place, nothing will happen. But if I use the light switch, something will change. So of all the things that we do, there’s a small number of channels with which we really influence the world, the future.

Konrad Körding: 00:38:19

At some level, if you give me only one task, I don’t need causality, I just need to figure out what is my right move. But if you tell me, “Well, quite possibly it will be dark later on and you might want to figure out how to make the house bright,” then I will have curiosity, and I will be interested in how causality works. So at some level these concepts like curiosity and causality, they drive from the fact that we are doing multi-task, continual learning, that we are always preparing ourselves for what we will have to do in the future instead of just being good at one task. Why even MuZero is easy relative to success in the real world, and animals can do much better behavior in the real world than any algorithm that we have.

Jon Krohn: 00:39:12

Right.

Konrad Körding: 00:39:13

Why? Because they’ve been pre-programmed to deal with this continual, lifelong, multi-task learning situations that we’re in. And all animals that I’ve ever observed have this deep desire to understand the world around them, and it’s hard to give that to artificial systems.

Jon Krohn: 00:39:32

Yeah. So do you think that these kinds of approaches, deep learning, deep reinforcement learning, so we have these issues around causality, continual learning. Do you think that in time these existing approaches, like deep learning and deep reinforcement learning, that we can adapt them so that they are better-suited to those problems, or do you think that we might need completely different approaches?

Konrad Körding: 00:39:59

I think we can. I think that deep learning will always be part of how we’ll solve problems. And in fact, a lot of people, including my friend Josh Vogelstein, say that we should never have called it deep learning. Because what is it? When we talk about deep learning, what is it really? It just means that we build system with lots of parameters, we figure out, for each parameter, would it be better if we make it bigger, or would it be better if it made it smaller, and then we do it on all of them at the same time. But like slightly clever approaches, but that arguably is just what we mean with learning. It’s a system that makes itself better, and it’s a system that is not [inaudible 00:40:39].

Konrad Körding: 00:40:39

So in that sense, I think it has to be part of it, but I think we need to go beyond it, and I think as humans we go beyond it. Not like, in our head, if I ask you, “Hey Jon, want to come visit me in Philadelphia?” You will run this program in your head where you’re like, “Yeah, then I go to Penn Station and I get on the train and I need to first buy the ticket,” and you can do this reasoning there. And when it comes to these kinds of high level reasoning approaches, it’s not enough to just build a neural network. It’s clear that we somehow need to deal with things that are a little like symbols, and I think this is a lot where the future of that field, as we head toward general intelligence, has to be.

Jon Krohn: 00:41:22

Right, so yeah. You kind of defined deep learning as gradient descent, in a way, where we’re adjusting parameters, and … So I might define deep learning as having these specific neural network algorithms and a few layers of them, but I guess even in deep reinforcement learning, we don’t have multiple layers. But I guess even in deep reinforcement learning, we don’t have multiple layers. Would you not define it that kind of way?

Konrad Körding: 00:41:45

Yeah, so in a way, if I’m a little cynical, the word deep learning comes from artificial neural networks, how they were called before, being ridiculously uncool. So I started entering the field and going to the NeurIPS conference like late ’90s. Back then, the number one predictor of your paper being rejected was it using the word neural network.

Jon Krohn: 00:42:14

Even at NeurIPS.

Konrad Körding: 00:42:15

Even at NeurIPS. At that point of time, it should have been called KIPS, for kernel information processing system, because everything was about support [inaudible 00:42:24] machines.

Jon Krohn: 00:42:24

Right.

Konrad Körding: 00:42:26

So basically, when, then, it was clear from the work of Yann LeCun that there was a lot of mileage to be had. For a long while, he was having these digit readers that worked quite well using artificial neural networks. But then when AlexNet came, which massively beat … So AlexNet was this network, the first one that people called deep learning, that basically just said, “Okay, let’s use your really big algorithm, let’s use graphics cards, let’s use a whole bunch of other tricks to make things better, and then let’s beat the guys at object recognition,” and beat the guys at object recognition they did.

Konrad Körding: 00:43:06

So they beat the benchmarks by a massive amount, and they immediately replaced the previous machine learning approaches for object recognition. And so at that point of time, they needed a new moniker, because saying artificial neural networks would have made it sound bad. And some of the inventions helped them go deeper. Like AlexNet was deeper than previous, why graphics card made that possible, large data sets made it possible, and then a whole bunch of tricks. They used this thing which is called ReLU, which is almost linear, like it sits zero below some value, linear on the right hand side. They used better ways of initializing the network, they used a cool way of using the hardware of the graphics card for the whole thing. So there were lots of innovations in that space that allowed the networks to be bigger, but it’s still an artificial neural network, now it’s like twice as deep, and by now it’s 100 times as deep. But in a way, conceptually, I think it’s the same intellectual line.

Jon Krohn: 00:44:11

Nice, all right. So bringing it back a little bit, we were talking about whether deep learning approaches like this could be a part of having increasingly general algorithms, maybe an artificial general intelligence algorithm that has all of the capabilities of learning of a person. But you mentioned there that it would need to have a better representation of symbols, and so is that something outside of back propagation, like some knowledge, some factual knowledge, like is, of relationships that it gets from Wikipedia, or something.

Konrad Körding: 00:44:53

Yeah, I don’t think it can easily be done directly in gradient descent. We can easily do things that you’ve never done before. I can ask you, “Imagine a monkey jumps into Konrad’s video frame right now and starts sitting on his shoulder.” You can do that, no problem. We can’t, at the moment, explain those things meaningfully to algorithms. This thing, like add an object to a scene, is remarkably hard. And similarly, causal things. Like if I have a phone and I drop it, what’s going to happen? You can very meaningfully say something to that. It’s very hard to teach that to neural networks. And why? Because at some level, things like phone, it’s a symbol. Like you can say, “Sure, I can embed it in some vector space,” but it still exists, in the scene where I am, there exists so many object. It’s not that it’s just like one big vector, and our representations of reality kind of support these modifications, like add object, remove object, what would happen if?

Konrad Körding: 00:46:06

So there’s a lot of things where it feels that the way we talk about things is actually more fundamental than a deep learning network would work with it. And if we look at the deep learning networks, semi-symbolic things are usually built into it. For example, AlphaZero. There is, like AlphaGo. There is a very symbolic thing, which is how would the board look like if I now use A7, and then I use B16, and then I use C3. If I play in that direction, that’s kind of built in. But that is very symbolic, if you think about it. That’s very much not deep learning. And in the same way, if you look at more modern systems, you might have a recurrent system that does something that does something and then does something else. It feels like in that sense we are building in symbols into the architecture with which we interact with the neural network. So I think the future will see a lot of these systems that have not just deep learning, but that also have a structure that matches the kinds of problems that you want to solve in the world, in a symbolic way.

Jon Krohn: 00:47:17

Very cool. That is such an interesting topic to hear about, lots of food for thought for me and I imagine for lots of our listeners as well. Konrad, how did you end up in this position where you are today, thinking about these problems? So how did you become expert in both biological intelligence, biological neural networks, as well as artificial neural networks? What was your journey to becoming a full professor at the University of Pennsylvania, a distinguished university, studying this incredibly interesting thing? How did this happen?

Konrad Körding: 00:47:48

Yeah, I think I just like to think about really exciting topics, and I always didn’t feel comfortable doing the same thing for a long period of time. I feel like I will learn much more by exposing myself to many different disciplines. So originally I studied physics, but within about two years I decided that I find biology very interesting, and started taking courses. And when the physicists in Heidelberg basically told me that that was a no-go for a physicist, I defected to Zurich. And the guys in Zurich were like, “Sure, we consider brains to be a perfectly legitimate objective of study from physics.” And when I went to London, learned a lot more about statistics. But I was already, during my time in Zurich, I was building these little neural networks. I didn’t think about it that it would eventually be a field, but I was trying to build neural networks to describe parts of the brain.

Konrad Körding: 00:48:50

And then I did some time in cognitive science, some times in statistics, and more broadly, I’m just interested in sitting down with exciting people and thinking about the big problems in life, and the big problems for society. And in that sense, my portfolio has been this wildly overlapping set of projects, including molecular biology. We were working on this idea that we could use DNA as the ticker tape, like as a cassette recording. What’s DNA? It’s like a long tape, let’s build a recorder that writes what a neuron does over time onto that tape. And so we’ve been working on that, we’ve been working on a lot of different areas.

Konrad Körding: 00:49:37

And UPenn has this awesome professor position, which is called Penn Integrates Knowledge, whose official job description is bring the disciplines, bring the different departments at UPenn, the different schools kind of together and have projects that span those disciplines. And so for me, this is just the most wonderful fit for what I’ve always been interested in.

Jon Krohn: 00:50:01

It’s PIK, Penn Integrates Knowledge?

Konrad Körding: 00:50:04

That’s right, yeah.

Jon Krohn: 00:50:05

Cool. Wow, yeah, that does sound like a really good fit for you, that sounds perfect.

Konrad Körding: 00:50:12

But in general, I do believe that for everyone’s careers, it’s useful to expose yourself to different ways of thinking. So right now, with the Neuromatch Academy, it helps me a lot thinking about organizational structure that I would never have done before. I believe that for a lot of neuroscientists and cognitive scientists and materials scientists, and so on and so forth, getting exposure to deep learning is really important. Similarly, I believe for a lot of people work as data scientist in a company at the moment, getting this set of deep learning ways of thinking about data science under their belt is going to be really useful for what they’ll be doing in the future.

Jon Krohn: 00:50:52

Yeah. And so I guess that’s a way, something like the Neuromatch program that you created. Is it fair to say that you created it, that you devised this? Or I guess it’s something that came about with a number of-

Konrad Körding: 00:51:06

It’s a big group. With all the Neuromatch things, I was one of the co-founders of it, but it’s a big group, and it’s too big a problem for any one person to pull off. Our CEO is Megan Peters, who’s doing a wonderful job of basically herding a lot of cats.

Jon Krohn: 00:51:26

Nice. And so where I was going with that is that it sounds like this curriculum that you’ve been involved in developing, it sounds like it’s the perfect way for people to be able to apply these kinds of deep learning things in industry, if that is something that they’re interested in. You’re probably deliberately tailoring it so that your own excitement, the excitement shared by people who are developing this curriculum comes through in that whole program, and it allows people to go from being data scientists without deep learning in their portfolio to being able to apply that in whatever industry they’re in.

Konrad Körding: 00:52:06

Yeah, that’s why we got together that amazing set of people. Like of course someone like me could teach a deep learning course, in fact, at UPenn, together with Lyle Ungar, I did teach the deep learning course. But we can make it much more meaningful by getting the world’s experts together on that. But, if we just get the world’s experts on it, it would be un-understandable. So we surround these experts with teams that give them really good feedback on what they can do, what people can realistically learn, and what they can’t. What the things are that are really useful for people’s careers, and which things they can learn in a more specialized course.

Jon Krohn: 00:52:45

Nice. So other than deep learning, which it sounds like is clear from conversation with you that you think that it will continue to be a hugely valuable skill for data scientists to have for years to come, what other kinds of skills should our listeners be developing to prepare themselves for the future?

Konrad Körding: 00:53:05

So I think, and this might just be my personal opinion, I think people need to think about causality. Why? What we really do as data scientists in the world is we help make things better. We help companies make more money, we help companies be more efficient, we help people in science make better sense of their data, and so forth. What everyone really wants to do is make things better. What does it mean to make things better? It means that if we do A, things come out better than if we do B. This is fundamentally a causal question. And of course we can A/B test, and everyone should know how to A/B test. We can randomize things, half of the people see one work page, half of the people see the other work page, we see which one has the better click through rate or something.

Konrad Körding: 00:53:56

But in many cases, we want to figure out how to make things better without being able to basically randomize everything. There’s a whole set of approaches for that, quasi-experimental approaches is one that my lab is very interested these days. But importantly, everything that we tell someone, “You should do X, you shouldn’t do Y,” is ultimately a causal statement. It says, “It will be better if you do this than if you do that.” Machine learning doesn’t answer those questions. It can be part of that, but we need to think in a causal way. We need to ask, “Okay, what would be better?” Instead of what is correlated with it. Let me give you an example of how things can go badly wrong. Let’s say we’re in medicine. We take people that take some vitamin. We discover that the people who take that vitamin live longer than the people that don’t have that. Now, should we recommend to everyone to take that vitamin?

Jon Krohn: 00:54:56

Right. Yeah, we don’t have enough information, I don’t know.

Konrad Körding: 00:55:04

That’s right. What could be going on? In that case, we actually know that people that are high socioeconomic status, have a lot of money, have time to dedicate for that, have money to buy the vitamin, are much more likely to take that vitamin. Now, also, in American society, people that are rich tend to live longer than people that are not, which is a big problem. And now, where in that situation, where in reality, the vitamin does absolutely nothing for your health. You can throw it out, no problem at all. But if we would just do simple-minded machine learning here, we’d find that there’s a strong, significant correlation of taking vitamins with longevity, we would start recommending that to everyone. That would be wrong. And this kind of logic happens in every company on a daily basis. We want to be very careful about confounders, and that’s something that, in a way, lives outside of traditional machine learning. I want people to think about causality.

Jon Krohn: 00:56:03

Nice. That is a really thoughtful point to make. Nobody has ever answered that question on this podcast with that answer since I’ve been host, and it’s a good one. It’s so important, and kind of obvious. So you sit, as we’ve mentioned many times, you sit at this very interesting position straddling biological neural networks, artificial neural networks, what the meaning of intelligence is. So a question that I haven’t asked a while on this show is what you think is going to happen in our lifetime. So let me frame that a little bit more. So you already talked about AlexNet, which in 2012, thanks to, as you alluded to, ever-cheaper data storage. So much bigger datasets, much cheaper compute, the idea to do things on graphics cards instead of just regular CPUs, that has continued. So 2012 is now nine years ago, and in that time we have much, much cheaper data storage, much, much bigger datasets. Compute, same idea, we have these huge parallel GPU systems and lots of open source software libraries like TensorFlow and PyTorch that take advantage of this highly parallel distributed GPU processing.

Jon Krohn: 00:57:23

So these trends will continue to happen for presumably decades, and we’re also going to have ever more abundant sensors, 5G rollout, faster internet connections between everyone, sharing arXiv papers and conference proceedings and doing things like Neuromatch and learning from each other, so we have more and more innovation and sharing. So technology, in this space, is advancing at a faster and faster pace every year. So what do you think is going to happen in our lifetimes? How interesting are things going to get?

Konrad Körding: 00:57:59

Yeah, this is just a wonderful question. So I think you can make two very, very different stories. In one story, you’re right. We keep pushing on deep learning or variants thereof, we kind of figure out these algorithms that ultimately can reason as good as humans, and we basically live in a hybrid world where half of the intelligent beings are robots, half are humans, and we pray that they don’t overtake [crosstalk 00:58:32] and rule everything.

Jon Krohn: 00:58:34

Right.

Konrad Körding: 00:58:36

But you can make the exact opposite point, and I want to credit my friend Ben Recht as having opened my eyes to that, which is every roughly 30 years or so, people discover regression. They discover all the wonderful things they can do with linear regression, and then at the end someone points out that the real problems in life like causality and actual real thinking cannot be solved by it. Now you can view modern deep learning as just one more of those examples, no? You can say, maybe in the ’60s we did regression in linear systems, and there was this huge excitement where people were like, “Hey, now you have a machine that can detect things. It will soon think like humans and everything’s going to be very different very soon.” And then Marvin Minsky pointed out that there’s very, very simple logical problems that those machines can’t solve, like the XOR problem. For listeners who don’t know that, it’s basically a system that has two inputs, they’re binary, or 1, 0, 0, 1, and so forth.

Konrad Körding: 00:59:47

XOR means it should spit out one if it gets one zero and one one as input, and otherwise it should spit out zero. And it’s provably not solvable by linear [inaudible 00:59:59]. And now you can say, like 20, 30 years later, people are back, neural networks are back, and they do back propagation. And everyone gets super excited about all the problems that they will solve, and pretty soon people point out that a lot of the interesting problems cannot be solved by [crosstalk 01:00:17]. Now you can say we are just in this [inaudible 01:00:19]. And a lot of things might feel like they’re very cheat-y, let’s say GPT-3. GPT-3, for people who haven’t thought about, it’s this wonderful natural language processing thing, and you will hear a lot about if you go to Neuromatch. You can prompt it with some text and it generates text and it feels almost like English. I mean, it definitely feels like English language.

Konrad Körding: 01:00:45

Now, if you first see it, you’re super impressed about it. But, GPT-3 has basically been trained on just about every text that has ever been said on the internet. So it has this massive training data. Now, it just turns out that under most circumstances, people say the same sentences over and over, that a human can meaningfully reason with you based on a very small dataset, and GPT-3 consumes all the data in the world, and it makes the most elemental logical mistakes. For example, if you give it the prompt, “Tell me the story of, here’s a couple of unicorns that live on a mountain, they speak English.” And the first sentence produced by GPT-3 is, “In this space, there’s the unicorns. They have two horns each.” And you’re like, “Hold on, what, what, what?” We talk about unicorns, and you talk about two horns each.

Konrad Körding: 01:01:46

And it turns out that like, one horn, two horn, it’s all correlated in big data [crosstalk 01:01:52]. But you see what that problem is there. There’s no real understanding. And so in that sense you can say, sure, we will use deep learning to solve all kinds of business-relevant problems and academic problems, but it will always just be a part of it, and the really interesting things, like how, we figure out how the world really works, in a way happen outside of that space. Now, where on that continuum we are, I don’t know. There’s going to be a little bit of we really move things forward, we might have self-driving cars or something like that, but there will also certain problems that it’s not really very good at solving.

Jon Krohn: 01:02:31

Beautifully said. That was, yeah. Beautifully said. I definitely fall into disillusion a lot about this time being different. You’re right, though, we’re still, whether we have the regression model happening over many neural network layers, it’s still a regression model. But what about quantum computing? Isn’t quantum computing … Or using DNA, or a lot of DNA computing, everything will be different, and it’ll be causal. Because it’s DNA, or quantum, that’s what we need, it needs to be DNA quantum computing, and then we’ll figure out the causality. It’ll be a pinch.

Konrad Körding: 01:03:14

In fact, let me give you a great example of kind of misinterpretation, a great other example. So neural networks usually have this property that you can do adversarial attacks on it. What does that mean? I can show you a picture of a banana and put a sticker on top of it, and your neural network will say, “This is a toaster.” It’s just that that sticker that I put on basically contains features that very much look like a toaster. In reality, a banana with that property never happens, but if you’d see that thing, you’d be like, “That’s a banana with a weird sticker on it.” But a neural network will be, “That’s a toaster.”

Konrad Körding: 01:03:51

Now, it actually works quite well in the real world. If I take an object and I attach a sticker to it which has written, in English text, the name of something else, it’s often wrongly recognized as the thing that I write on it. So what that means if I take a car, I put a sticker on it which says empty trash bag, a lot of algorithms will think this is an empty trash bag. And I think, I can’t get that idea out of my head to go, once self-driving Teslas all over the place, to walk through the city with a friend and basically stick a little, “This is an empty trash back” to all the parked, really expensive cars, and then observe how a bunch of Teslas destroy the entire city.

Jon Krohn: 01:04:39

Right.

Konrad Körding: 01:04:41

Why? Because [crosstalk 01:04:46]-

Jon Krohn: 01:04:46

Did you make that one up on the spot, or have you come up with that one before? That was really good.

Konrad Körding: 01:04:48

I thought about it a little bit.

Jon Krohn: 01:04:51

That’s great, I love that.

Konrad Körding: 01:04:52

But what it means is that there is this adversarial attack on neural networks that humans wouldn’t have, that is truly problematic. And you can’t guarantee that a neural network will not encounter an adversarial environment.

Jon Krohn: 01:05:13

Cool. Well, I’m sure we could talk about this for days and days and days, maybe we’ll have to have you on another episode soon to get even deeper into some of these topics. But we should start winding down the episode, which means that it’s time for that staple question at the SuperDataScience episode, Konrad, do you have any book recommendations for us? Maybe about things we’ve been talking about today, or just, I’m sure whatever is interesting you, whatever you’ve been reading about will be great.

Konrad Körding: 01:05:40

Yeah, so I can highly recommend to anyone listening to this to read a book on causality. Because as you heard, I think that understanding the world causally is very important. The most accessible introduction to that probably at the moment is The Book of Why, by Judea Pearl. The Book of Why is a beautiful, readable exposition of the need to think about the world around us in causal terms.

Jon Krohn: 01:06:08

Nice. I almost shouted out Judea Pearl, rudely interrupting you. But when I did that, I was thinking it was going to be his causality book. And of course, Book of Why is an even better recommendation, because I think it’s, yeah, way more accessible. I wouldn’t go around recommending Causality by Judea Pearl to my friends.

Konrad Körding: 01:06:31

Yeah, Causality by Judea Pearl is a wonderful, deep, thoughtful book for specialists who want to specialize in causality. The tools that Judea Pearl exposes allow you to prove exactly under which circumstances you can know or not know about causality, just a wonderful set of ideas, but The Book of Why makes it live for people who haven’t been in the causality space for a long time.

Jon Krohn: 01:07:00

Nice. That is a great recommendation, and then I have a recommendation, which is that if you’re looking to get into deep learning and you’re able to take three weeks off, take this Neuromatch curriculum that Konrad is helping set up. I am so jealous of anybody who gets to learn deep learning that way. I wish I could go back in time and have that be how I did it. It sounds like an absolutely perfect curriculum. And-

Konrad Körding: 01:07:22

I wish I could go back in time for that.

Jon Krohn: 01:07:24

Kids these days are so lucky. And also, I’ll note that, so one of the prerequisites that you mentioned, so you mentioned things like linear algebra, being able to do simple matrix operations, and being able to calculate derivatives. So between now and August, if that’s what’s holding you back from signing up for Neuromatch, I’ve created a machine learning foundations curriculum, which teaches exactly those topics, linear algebra and calculus, and in the context of machine learning, and also using Python libraries like NumPy, TensorFlow, and PyTorch, so this would be probably something that you could do in the interim and be all set to take the Neuromatch program by August, so I think that that is very exciting. How should our listeners contact you? My guess, your number one pick would be Twitter, and then number two is TikTok.

Konrad Körding: 01:08:21

No, I think number one is definitely Twitter. But you can reach me on LinkedIn. There’s far fewer people competing for [inaudible 01:08:34] being seen by me on LinkedIn.

Jon Krohn: 01:08:35

Yeah, after talking with me a couple months ago about LinkedIn versus Twitter, Konrad is now giving LinkedIn a shot and he’s making posts. So if you want to get his attention, he has only a tenth as many followers on LinkedIn as on Twitter. So you can go bother him on Twitter, you can like and love his posts and he’ll really appreciate it because he feels like there’s too many tumbleweeds.

Konrad Körding: 01:09:00

Yeah, LinkedIn, if you come from the outside, is just a bizarre environment.

Jon Krohn: 01:09:08

Yeah. I won’t disagree with you on that. All right, so thank you so much Konrad for being on the show. I learned so much, I had so much fun, and I hope to have you on again sometime soon.

Konrad Körding: 01:09:17

Thanks for having me.

Jon Krohn: 01:09:23

What a terrifically interesting episode. Thanks to Professor Körding, we were able to dig into the innovative Neuromatch curriculum that enables small groups of students with similar interests to learn deep learning together. Convolutional neural networks, recurrent neural networks, GANs, deep reinforcement learning, we covered all of the major approaches in deep learning, including the revolutionary AlphaGo and AlphaZero models. We also covered where deep learning doesn’t work today for enabling human-level intelligence in machines, such as issues with causal inference and continual learning. The understanding of symbols was a potential solution that Konrad brought up to overcome these issues. Wonderful to be able to learn from a deep AI expert like Konrad.

Jon Krohn: 01:10:09

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, and the URLs for Konrad’s Twitter and LinkedIn profile, as well as my own LinkedIn and Twitter details, at www.superdatascience.com/469. That’s www.superdatascience.com/469. I’m always happy to meet listeners, so please do connect and feel free to tag me in posts with your thoughts on the episode. Your feedback is invaluable for figuring out what topics we should cover on the show next.

Jon Krohn: 01:10:41

Since this podcast is free, if you’d like a hugely helpful way to show your support for your work, then I’d be very grateful indeed if you made your way to the Data Community Content Creator Awards nomination form, the link is in the show notes. Obviously we’d hope you could nominate the SuperDataScience for category seven, the podcast or talk show category. I’d also love my name, Jon Krohn, nominated for category eight, the textbook category, for my book Deep Learning Illustrated. And finally, I’d also love my name, again, Jon Krohn, nominated for category two, the machine learning and AI YouTube category, for my YouTube channel, which contains tons of free videos on deep learning, linear algebra applications, and machine learning libraries. The Data Community Content Creator Awards themselves are coming up on June 22nd, and I hope to see you there.

Jon Krohn: 01:11:29

All right, thanks to Ivana, Jaime, Mario, and JP on the SuperDataScience team for managing and producing another incredible episode today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 469: Learning Deep Learning Together

SDS 469: Learning Deep Learning Together

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 469: Learning Deep Learning Together

Share

SDS 469: Learning Deep Learning Together

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025