Podcasts SDS 575: Optimizing Computer Hardware with Deep Learning

83 minutes
Data Science, Deep Learning

SDS 575: Optimizing Computer Hardware with Deep Learning

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Director of Architecture at NVIDIA, Dr. Magnus Ekman, sits down with Jon Krohn to discuss the specific role that machine learning plays in optimizing computer hardware. The pair also review his exceptional book ‘Learning Deep Learning.’

About Magnus Ekman

Magnus Ekman has a Ph.D. in Computer Engineering, is a Director of Architecture at NVIDIA Corporation, and is the inventor of multiple patents. He has previously worked with processor design and R&D at Sun Microsystems and Samsung Research America. In his current role at NVIDIA, he leads an engineering team working on CPU performance and power aspects of chips for artificial intelligence in data centers and autonomous vehicles. He is the author of the 2021 book “Learning Deep Learning – Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow.”

Overview

Magnus and Jon kick off the episode by discussing his role with NVIDIA. He first joined the company in 2009, when the company had not yet become involved with deep learning. But this all changed in 2012 when AlexNet architecture began programming NVIDIA GPUs to perform image processing and model training efficiently. Now, fast forward to today, and you’ll notice that AI systems heavily rely on NVIDIA GPUs.

As Architecture Director at NVIDIA, Magnus spends most of his time working on CPU architecture. While most SDS podcast guests are involved on the software side of the industry, Magnus’s expertise in hardware architecture allows him to draw unique connections between hardware and ML that may not always be obvious.

Magnus is also the author of the book ‘Learning Deep Learning,’ which he was inspired to write after having difficulty grasping the field through Ian Goodfellow’s famous book ‘Deep Learning.’ According to Magnus, the 750-page book fills a gap that he noticed when he began learning DL. And it is primarily aimed at people who want to learn DL from scratch.

Despite being accessible to beginners, Magnus spends a large part of the book helping readers understand ML algorithms in-depth. In speaking with Jon, however, he does state that the “ground-up approach” isn’t required for everyone. Understanding algorithms under the hood is key to developing your own algorithms, but it certainly isn’t essential for everyone. In fact, as more and more DL practitioners are trained, it wouldn’t be realistic for everyone to be specialized in what is happening behind-the-scenes of a model.

In the book, Magnus also covers evolutionary algorithms, which are inspired by how biological systems mutate and evolve in the real world. Though lesser-known relative to other search approaches like random searches, Magnus explains how evolutionary algorithms could potentially optimize neural network model architectures.

Magnus and Jon also reflected on ethical AI, the ongoing importance of CNNs and RNNs, and how people can more realistically “transition” into machine learning. Tune in to hear more.

In this episode you will learn:

What hardware architects do [10:15]
How ML can optimize hardware speed [ 13:19]
Magnus’s Deep Learning Book [21:14]
Is understanding how ML models work important? [36:16]
Algorithms inspired by biological evolution [41:25]
How artificial general intelligence won’t be obtained by increasing model parameters alone [51:24]
Why there will always be a place for CNNs and RNNs [54:51]
How people can “transition” realistically into ML [1:09:15]

Items mentioned in this podcast:

Learning Deep Learning Book (including cheat sheets and sample chapters)
Learning Deep Learning Book (NVIDIA’s site)
Learning Deep Learning Book (Publisher’s site)
NVIDIA
SuperDataScience
SDS 473: Machine Learning at NVIDIA
SDS 565: AGI: The Apocalypse Machine
Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning with Python by François Chollet
Neural Networks and Deep Learning by Michael Nielsen
Deep Learning: A Visual Approach by Andrew Glassner
SDS 502: Managing Imposter Syndrome
A Radical Enterprise by Matt Parker

Follow Magnus:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 575 with Dr. Magnus Ekman, Director of architecture at NVIDIA.

Jon Krohn: 00:00:11

Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in Data Science. I’m your host, Jon Krohn. Thanks for joining me today. And now, let’s make the complex simple.

Jon Krohn: 00:00:33

Welcome back to the SuperDataScience podcast. You’re in for a treat with the deeply intelligent, Dr. Magnus Ekman, on the show today. Magnus is a Director of Architecture at NVIDIA, one of the world’s largest microchip designers and the most important microchip designer for AI. Thanks to exponential growth in recent years, that has been fueled in no small part by the explosive demand for training AI models, NVIDIA today employs over 22,000 people and they enjoyed 10 billion of net income in their most recent financial year that’s double their income of the previous year. In addition to Magnus’s nearly 12 years of experience at NVIDIA, he’s also worked at Samsung and Sun Microsystems. Alongside all of that rich experience, Magnus is also the author of the epic recently released 700-page book, Learning Deep Learning. The book blends theory, math and code to introduce deep learning in a broad range of deep learnings applications across machine vision and natural language processing. He holds a PhD in Computer Engineering from the Chalmers University of Technology in Sweden and a Master’s in Economics from Göteborg University, also in Sweden.

Jon Krohn: 00:01:47

Today’s episode has technical elements here and there, but should largely be interesting to anyone who’s interested in hearing the latest trends in AI, particularly deep learning software and hardware. In the episode, Magnus details what hardware architects do, how machine learning, including deep learning, can be used to optimize the design of computer hardware, the pedagogical approach of this exceptional deep learning book, which machine learning users need to understand how machine learning models work and which users don’t. He talks about algorithms inspired by biological evolution. He talks about how artificial general intelligence won’t be obtained by increasing model parameters alone, and he talks about whether transformer models will entirely displace other deep learning architectures, such as CNNs and RNNs. All right. Are you ready for this information-rich episode? Let’s go.

Jon Krohn: 00:02:41

Magnus, welcome to the SuperDataScience podcast. I’m so excited to have you here. Where in the world are you calling in from?

Magnus Ekman: 00:02:48

Hi, Jon. I’m in the San Francisco Bay area. And thanks for having me here on the show.

Jon Krohn: 00:02:53

Nice. Yeah, my pleasure. The Bay area is a popular choice for our guests to be from.

Magnus Ekman: 00:03:00

I bet, yeah.

Jon Krohn: 00:03:02

Yeah. The NVIDIA headquarters, I guess, is nearby.

Magnus Ekman: 00:03:07

Yeah. It’s in Santa Clara, so it’s in the heart of Silicon Valley, yeah.

Jon Krohn: 00:03:12

Nice. So, we know each other two ways. We were introduced by Deborah Williams, who is an acquisitions editor at the world’s biggest educational publisher, Pearson. And so, we both published books for them on their Addison-Wesley imprint. And I’d wanted to have you on the show because there’s a lot of commonality between your book and my book, but we also tackle different things. And so, I loved the way that you approached your book. I wanted to meet you and Deborah kindly made that introduction. But we’re also connected via Anima Anandkumar, who was in episode number 473, so she’s at NVIDIA. And we had an amazing episode with her last year and definitely recommend, if you’re a listener that likes deep technical deep dives, then that episode was an amazing one. And yeah, and so, she actually wrote the forward for your book that Deborah then published, right?

Magnus Ekman: 00:04:14

That’s right. Yeah. I asked Anima if she could do that then she graciously accepted doing that. So, that’s really nice of her.

Jon Krohn: 00:04:24

I know. She seems like a very nice person though, I suspect. It’s one of those things where she’s such an in-demand person.

Magnus Ekman: 00:04:31

Yeah, she’s busy, yep.

Jon Krohn: 00:04:32

But she’s also very nice. And so, she’s probably like any of the great guests we’ve had on the show where their life is this constant nightmare because everyone’s asking them for things and they always want to say yes. And so, well, we’re very grateful for Anima, both for being on the show and for doing a forward for your book.

Magnus Ekman: 00:04:54

Yep.

Jon Krohn: 00:04:55

Nice. So, at NVIDIA alongside Anima, you are a Director of Architecture, so that sounds like an interesting role. You’ve been doing it for a while. You’ve been there since 2014, so nearly eight years. And you had a previous stint there for four years as well. So, yeah, tell us about what it’s like, how things have changed at NVIDIA over all those years.

Magnus Ekman: 00:05:20

Sure. So, before I begin here, I should just state that these are my own opinions and just not reflect any official views of NVIDIA.

Jon Krohn: 00:05:27

Nice.

Magnus Ekman: 00:05:28

Yeah, things have changed when I joined. Yeah, I joined in 2009, the first time and at that point it was still just a graphics company. Not quite just a graphics company. It was CUDA was out there and there were pushed into both the HPC market and the mobile market, but the deep learning wasn’t really around at that point, yeah.

Jon Krohn: 00:05:56

Right, yeah, it was 2012 that the AlexNet architecture, the machine vision architecture out of the University of Toronto that brought deep learning as this incredible powerful technique. One of the big innovations in that was programming NVDIA GPUs. So, using the CUDA programming language that NVIDIA GPUs take advantage of. I don’t know what verb to use there. You could probably do a much better job than me, but it was one of the problems that people training deep learning algorithms have is that you often, especially for training your model from scratch, you typically need a lot of training data. And so, that AlexNet architecture benefited from this very, very large data set called the ImageNet training data set. But then you run into this problem of, “How are we going to train the model on any reasonable time span with all of these images and with our relatively large architecture, with all of these model parameters?” And so, Alex Krizhevsky and Ilya Sutskever and Geoff Hinton for that 2012 AlexNet architecture had the idea to program NVIDIA GPUs in CUDA to be able to do this image processing efficiently, this model training efficiently. And so, although in 2009, when you first joined NVIDIA, NVIDIA, yeah, wasn’t, I guess, involved in deep learning, but from 2012, it’s been a really important part of what NVIDIA can do as a company, can provide to the world.

Magnus Ekman: 00:07:32

Yeah, it has. But even in 2012, I think there was a very small number of people in NVIDIA realizing this or even knowing that the deep learning was coming, right?

Jon Krohn: 00:07:42

Sure.

Magnus Ekman: 00:07:42

I mean, overall, the AlexNet Paper-

Jon Krohn: 00:07:45

All of us.

Magnus Ekman: 00:07:45

Yeah, the AlexNet paper kicked it off or well, depending on how you put it. There were some publications before that, even we had using GPUs. But as you said before I left NVIDIA and came back in 2014 and when I came back in 2014, I still wasn’t aware of that this was becoming a big deal. I had started looking a little bit at deep learning, just reading about it because I was interested in it, but I had not internalized how big part of NVIDIA it would be or/and I think it would be because I don’t think it was at that point. It was still a pretty small part of it. I think it was in 2015, 2016 that we started seeing really things happening and we also started selling systems that were dedicated for this, with the DGX machines that are basically built to do deep learning. And then it has just exploded ever since, I guess.

Jon Krohn: 00:08:48

Well, yeah. So, prior to that, as you said, primarily these GPUs were being used for graphics. So, for rendering video game graphics or for editing graphics, any time that you needed to be able to have a lot of graphics processing happening. And I’m sure I’ve said this on the show before, but in order to render graphics efficiently, you need lots of parallel computing of relatively simple linear algebra operations. And as it happens, that’s exactly what you also need to train a deep learning algorithm. So, pretty cool that, yeah, you never know where things are going to end up that, yeah, for years of NVIDIA history, everybody probably in leadership was thinking about, yeah, graphics or maybe some other applications like mobile or something. But then out of nowhere, it turns out that the AI systems of the future are going to depend enormously on GPUs and by an enormous margin, it’s NVIDIA GPUs that are doing that relative to any other manufacturer. So, super cool. That and-

Magnus Ekman: 00:09:59

I should make it clear though that I’m actually not working on GPUs. I’m not on the GPU side of NVIDIA.

Jon Krohn: 00:10:04

Yes, yes.

Magnus Ekman: 00:10:06

But maybe we’re getting to that, yeah?

Jon Krohn: 00:10:08

Well, yeah and so, that’s, yeah. But I think just in terms of context of, yeah, NVIDIA as a company. But yeah, so while NVIDIA is renowned for its GPUs, you are an architect working specifically on CPU architecture. And so, maybe you can tell us a bit about what an Architecture Director does as well as specifically, a CPU architect.

Magnus Ekman: 00:10:34

Sure. So, yeah, the architecture we are referring to here is the computer architecture, the internal architecture of the processors that we’re building in. So, in general, a computer architect would be working on either CPU, GPUs, signal processors, network processors, any processing. And how do we build these machines to provide really high performance or low power or high performance given a particular power envelope, for example. So, it’s making the trade offs between how much processing power, number of execution units, how much caches memory, how many levels or caches those questions. So, that’s where you’re making the overall architecture of this processor that you’re then later going to implement in Silicon.

Jon Krohn: 00:11:28

Nice. So, most people that we have on the show, their primary role is in software. And even if we have someone on the show who has the title architect, they’re usually a software architect, so they are concerned with how to architect, how the database will call information and how information will flow within the software system. But you are a hardware architect.

Magnus Ekman: 00:11:52

Correct. Yeah.

Jon Krohn: 00:11:54

It is like physically how the information is going to flow and how much power it’s going to take as it moves through a CPU. That’s cool.

Magnus Ekman: 00:12:03

In the end, it’s going to be physical, but if you think about it all of the design work leading up to making it is really software. Today, you’re specifying the hardware in software. We are writing simulators actually in high level language, like CNC, C++ to model how this processor is going to become when we build it. And try to make the right trade offs and then in the end, we ship it off to the fab to actually implement it. So, there’s a lot of overlap between software and hardware.

Jon Krohn: 00:12:35

Yeah. So, you’re not just sitting around soldering all day.

Magnus Ekman: 00:12:39

I don’t. I used to do that as a hobby when I grew up, but I never do that anymore, actually.

Jon Krohn: 00:12:46

That’s really cool. Now that you say it, it seems like really obvious and I’ll probably remember it forever. But I had never made that connection that even with today’s computer systems at how complex and how small everything is, you can’t really be physically creating a hardware system. You need to design that with simulations and software in the first place, anyway.

Magnus Ekman: 00:13:10

Right. Yeah.

Jon Krohn: 00:13:10

Really cool. So then, what’s the relationship? We’re going to talk about your deep learning book shortly. We’re going to talk about it for a while. What’s the relationship between your work as a hardware architect and machine learning, deep learning, AI, what’s the connection there?

Magnus Ekman: 00:13:27

I think there are a lot of connections. They might not be obvious. And I think we’re going to see much more of this becoming more obvious in the future. If I’m looking at what’s being published today in academia, in the computer architecture community and see how much or little they make use of data science, machine learning and things like that, I think it’s going to happen more. So, I mean, we are seeing some people trying to put these things together, but it’s hard, right?

Magnus Ekman: 00:14:08

If you’re specializing in computer architecture and then, also you don’t really have time to also learn machine learning and deep learning and the cutting edge part there. So, from that perspective, I think there’s much opportunity there to apply machine learning, deep learning data science to the process of building processors. And this is just in general without talking about what’s happening inside of NVIDIA. But if you looking at just in general, what do I do today that relates to computer or to machine learning and computer architecture or what could I do, first of all, I mean, what we are trying to do at NVIDIA is to build architectures that are really good at running deep learning algorithms or not only deep learning, but machine learning, in general.

Magnus Ekman: 00:15:07

And so, there is a clear connection point. Just trying to see how can we build machines that do really well for these workloads. That’s the bread and butter. But if I’m looking at the work I’m doing and as I said, we’re doing a lot of simulations and we’re producing a lot of data and then you need to try to make sense out of that data. And I mean that’s data science, right? And applying machine learning algorithms to make sense of this data, I think is a clear thing where we can do much better.

Jon Krohn: 00:15:45

This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SupeDataScience, it’s the namesake of this very podcast. In the platform, you’ll discover all of our 50+ courses, which together provide over 300 hours of content, with new courses being added on average once per month. All of that and more, you get as part of your membership at SuperDataScience, so don’t hold off, sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.

Jon Krohn: 00:16:23

So, there’s an opportunity. So, you’re mentioning earlier how in this hardware design role, you’re trying to have efficiencies or speed. And so, you could be using machine learning algorithms to be trying to figure out what optimizes those outcomes?

Magnus Ekman: 00:16:39

Yeah.

Jon Krohn: 00:16:39

You have all this data coming in and you say like, “What features of this architected system lead to certain efficiencies or certain speeds?”

Magnus Ekman: 00:16:51

Mm-hmm. I also say another interesting thing is, so that is actually something that already in like early 2000, the published paper about how to extract interesting parts of benchmarks. So, the problem here is you want to run a benchmark, something like Geekbench, SPEC CPU. Something that it’s a suite of applications to see if this processor is doing well or not. But if you want to run that in simulation, you can’t really run the entire benchmark because it just takes too long. So then, you want to try to extract interesting pieces of this application. And there’s actually a machine learning technique that was used for that and that is used widely in the industry. And I think many people who uses that don’t even realize that it’s machine learning, because they wouldn’t even know what the field of machine learning was in 2001 or 2002. I mean, some people did, but people in the computer architecture community, certainly didn’t. Now, it’s not a very advanced technique, but it’s just interesting to see that, yeah, some people are using machine learning without knowing it.

Jon Krohn: 00:18:00

Really cool.

Magnus Ekman: 00:18:01

Then there are also other things. So, there have been publications about how you can use machine learning and implement that in your micro architecture itself. So, something that is very important for a CPU is something called branch prediction. So, you have this, this stream of instructions and you’re going to hit different branches in the code. And in order to do this really fast in a CPU, you need to predict the branch and speculate to go in that direction that you think is going to go before you actually know in what direction it’s going. And so, that’s a classic problem in CPU architecture. And in again, in early 2000 or something, somebody published a paper using a perceptron-based branch predictor. Again, this is very simple if you look at it from a machine learning perspective, but it’s interesting that they’re borrowing some ideas there from machine learning to try to do it, to do better branch prediction.

Jon Krohn: 00:19:12

Nice. Yeah. You mentioned they’re a preceptron and we’ll get into that shortly. But it’s the oldest or one of the oldest ways of designing a neural network that later became a deep learning network. But anyway, we’ll talk about that later when we start talking about your book. You were going to say something else. Maybe one more example or something.

Magnus Ekman: 00:19:33

Yeah, I have one more example and that’s more philosophical, I think. So, I was mentioning here, before that we tried to evaluate these processes by running standard benchmarks, like Geekbench, SPEC CPU, which is just an application you want to see, does it run well or not? So, let’s say now that we are building this CPU to do better on these benchmarks, and then we are trying to tweak different things to really do well on this benchmark. Well, there’s a very clear risk of overfitting there. Just like when we train these deep learning models, they may do really well at training data, but do they then do really do well on the test data? So, the question is that we build this CPU to try to do really well on Geekbench and SPEC CPU. Does it then do well for the real world applications out there? And there, I think the community could benefit from applying or taking techniques from machine learning, where you have a training data set and you have a test data set, so you can see that you’re not overfitting to the applications that you’re trying to do. So again, these are just general ways that I think there are connection points that may not be obvious between, or to everybody.

Jon Krohn: 00:20:49

Nice. And no doubt more and more of these kinds of opportunities will emerge, over the rest of your career, as the systems for designing, these hardware systems become increasingly complex and generating more and more data. There will be more and more opportunities to be applying machine learning analyses to the data and to optimize. Nice. So, your book, Learning Deep Learning, was published recently. It was published last year in August, August of 2021. And as I mentioned earlier on the program published by Addison-Wesley. It is a beautiful full-colored book, and it starts off with an introduction to the math of how the fundamental units of deep learning systems, artificial neural networks or these like little neural units, these simple algorithms that can be grouped together to form a deep learning system. And so, your book introduces the math of these, starting with the perceptron that we just mentioned.

Jon Krohn: 00:21:51

So, the opening chapter talks about perceptrons and then you build on that to cover tricks for training models. Because as you put lots of these little simple neural unit algorithms together, they can start to behave radically. They can be difficult to control. And so, you talk about tricks for keeping them under control and having them learn effectively and reliably. And then you get into specific kinds of applications. So, convolution neural networks for machine vision. You talk a lot about natural language processing. So, RNN, Recurrent Neural Network architectures, transformer architectures, which are relatively new and have proved to be very powerful in a lot of different applications. You talk about natural language preprocessing. You talk about text generation and you even talk about time series analysis, which has parallels to natural language processing. Because when we’re processing natural language, whether it’s written language or spoken language, it’s one dimensional, it’s over time.

Jon Krohn: 00:23:06

And so, these same kinds of model architectures that can be useful for processing natural language can also be useful for processing other kinds of time series like stock prices or sales predictions, that kind of thing. So, this brilliant book, Magnus, why did you write it? I have this inkling and so you can tell me if I’m wrong, but I suspected maybe you saw the enormous potential in deep learning and maybe like me, you knew that if you wrote a book about it, you’d understand it really, really well. You’d have to and then you’d be able to notice more and more opportunities to apply it in your job, so that’s why I did it. And maybe that’s related to why you wrote it, too.

Magnus Ekman: 00:23:51

Yeah. I mean, I think there are many reasons to write a book and maybe I’ll start a little bit about with how I was learning deep learning. I mean, that’s really where it started. So, as I mentioned before, I was working at NVIDIA. I saw how this field was just about to explode or had already started to explode. And I wanted to learn more about it. So, and this is like a few years before I started writing this book. I was trying to figure out how do I best learn these things? I picked up a Goodfellow’s book. I think that came out in, was it 2015, something like that. And I think I had the perfect background skills for learning this. I already knew programming. I knew linear algebra, multivariate calculus. I have a background in statistics. I knew about PCA, clustering, numerical optimizations. I started economics, which is about often you want to optimize the utility function. You have a individuals or companies are trying to optimize their utility function. And that’s very similar to how you’re trying to optimize a loss function when it comes to deep learning.

Magnus Ekman: 00:25:13

I had even implemented my own back propagation implementation from scratch back in the ’90s when I took a class in artificial neural networks back in school. So, I feel like I had the perfect background and still, it was super hard to read Goodfellow’s book. I had the struggle. Nothing wrong about that book. It’s awesome. It has all the information, but it just struck me, “Well, how can anybody who doesn’t happen to have all of these background skills pick up this book and walk away from it and knowing this topic.” And I think the truth is most people can’t. So, that’s one piece of inspiration was I basically felt I can probably make this easier for people to read if they don’t have this background, so that was one. One motivation was just to make this more accessible to other people. The other thing-

Jon Krohn: 00:26:19

One-

Magnus Ekman: 00:26:19

Yeah, go ahead.

Jon Krohn: 00:26:20

One quick thing there is that, so a lot of the content that I’ve been publishing, in fact, all of the content that I’ve been creating since writing my book, Deep Learning Illustrated, was inspired by a similar thing to your Goodfellow experience. So, I also was having a really hard time working through the Goodfellow book, but for me, because unlike you, I don’t have as strong of a linear algebra or multivariable calculus background or numerical optimization. I was stuck in the very beginning of the book. And so, I started learning a lot and publishing a lot of content on linear algebra, on calculus, on probability theory, on computer science. And so, I’ve now released over the last couple of years, all of that content. If people have a subscription to the O’Reilly Learning Platform, there’s 20 hours of content that I created on those foundational subjects…

Magnus Ekman: 00:27:24

Oh, nice.

Jon Krohn: 00:27:24

… specifically for tackling the Goodfellow book. And then, I’m now writing the book version. The first book will just cover the linear algebra and the calculus, so it will be called Mathematical Foundations of Machine Learning. And then also all of that linear algebra and calculus content is already available for free on YouTube if you don’t have an O’Reilly subscription. And I’m gradually recording more and more of it and releasing new videos every week to get through. Everything that’s on O’Reilly will eventually be free on YouTube, too. So, I can empathize with what you’re describing except I got stuck even earlier than you. And so, yeah, interesting to hear that and yeah, thanks for that bit of context. And so, you said that that was one of the reasons why-

Magnus Ekman: 00:28:10

Yeah, that was one. So, I should mention, there were three other books that really inspired me. Not necessarily that I had read them all before I started writing this book. But if you’re looking at what I’m describing in my book, you can see that there are probably three other books that also I have gotten a lot of inspiration from. So, one was Chollet’s book, the Deep Learning with Python, which takes a different approach where you get into the programming of things, getting to build things, but I feel like it didn’t really provide all the background of how do things work. And that, it really depends on who you are. There isn’t one perfect book there, so you should have multiple books because people want to learn different things and people have different backgrounds. But I felt that I really benefit from learning things ground up and learning the theory behind it, as well as how you put it together. Then I was looking at Michael Nielsen’s online book, which I really loved how he started things. And you could see there how I build up my network from starting with a perceptron and then building a application in Python without the deep learning framework. I mean, that’s very much inspired by how he did things. I think he did a really great thing there.

Jon Krohn: 00:29:29

Nice.

Magnus Ekman: 00:29:30

And then I also read the deep learning, let’s see what it’s called, Deep Learning from Basics to Practice or something like that by Glassner. So, that one, it was a self-published book at that time, which then I think he has come out with a reworked version of it, which is now, Deep Learning: A Visual Approach. That one is super intuitive. He describes everything without diving into detailed math, but at still a very detailed way of doing it. I felt that’s great if you want to shy away from math. But to me, I feel like having some of the actual math background, it doesn’t hurt. And especially for people who already knows some linear algebra and calculus, that it’s good to learn those things as well. So, that’s the books that inspired me. I think, as you said, the reason for writing a book could be you know that when you actually do that, when you go through that process, you really solidify your own knowledge.

Magnus Ekman: 00:30:47

You don’t really know anything deeply until you can teach it to somebody. And as you go through and try to explain things, you realize, “Oh, I don’t actually understand this fully. I have to go back. I have to read more publications on this to try to really get a crisp explanation of things.” So, learning more was definitely one part of it. And then I think to, I don’t know, get some proof that I actually know this area. I had done all of this studies on my own time and how do I now prove to the world that I know this topic, so I don’t have to feel like an imposter in this community anymore.

Jon Krohn: 00:31:31

Right. And yeah, you succeeded triumphantly in doing that. So, it is an epic book, 750 pages. It is monstrous…

Magnus Ekman: 00:31:44

That was actually-

Jon Krohn: 00:31:45

… in a good way.

Magnus Ekman: 00:31:47

No, it was a very frustrating experience. This started out as I was going to make a brief introduction to the topic to help people get started. I was planning between, I was going to do about 10 chapters and between 100, 150 pages.

Jon Krohn: 00:32:04

Wow.

Magnus Ekman: 00:32:04

It ended up being 18 chapters, plus the number of appends, and as you say, 700 something pages. And I felt like this is not what it’s supposed to be due and in detail, it took forever right there.

Jon Krohn: 00:32:18

Yeah. I imagine that did take a while, but I can see how you have the inspirations from those four different books. So, the Goodfellow book that we have already been talking about, it doesn’t contain any code. So, it is a university textbook with math and explanations and lots of citations, but absolutely no code. Whereas your book, it has math, kind of like it feel, but it also does have hands-on examples. And then Chollet, the second one you mentioned has a lot of programming, but no math. And so again, yours hits in between. It’s like the Goldilocks sweet spot where it has the key math as well as key hands-on examples. And then, I love that Nielsen was also how I got started with deep learning. So, super cool that that was also your first book and so, there are elements. In my book, for example, I use all the same notation as Nielsen because there are various ways that you could choose to notate neural networks, so I use the same as his. I love that you had people understand how to program neural networks themselves without the convenience of a high-level library.

Jon Krohn: 00:33:34

That is something that’s, yeah, very cool, inspired by Nielsen and certainly something that I wasn’t willing to tackle when I was writing my own. It’s awesome that you did that and it does make it easy that to then build up to the high level abstracted libraries later on, once you’re comfortable with understanding how things are going on underneath the hood. So, very cool. I’d love to hear the inspiration. And yeah, it makes so much sense to me that your book was needed and it fills in a gap that none of the individual preceding books did. So, then with that in mind, was there a particular audience that you were targeting with the book?

Magnus Ekman: 00:34:18

Yeah, I was targeting myself a few years before. I mean, I really wrote this book that this is the book I would have wanted to read. So, I’d say engineers and engineering students, who know some programming and know some math, that was really what I was thinking of. I didn’t want to, like I’m going to teach, “This is how you start programming or this is the basic math.” Although, I tried to have a little bit of recap that these are the things you need to know before you get into this chapter. But otherwise I’m trying to start from scratch. I’m not assuming that people have machine learning skills or even know what machine learning is or have run into deep learning before. I also decided to stay away from statistics as much as possible, because it’s my experience is that a lot of computer engineers don’t necessarily know a lot of statistics. They have taken one class or so, and then haven’t really spent much time on that. And I didn’t want to have too many barriers of entry here, so to speak, but I felt okay. They should know some programming and some math, but then otherwise it starts from scratch. That was my goal. And I don’t know if that works out or not, but we’ll see.

Jon Krohn: 00:35:46

Oh, I think it has so far. And so, this idea that I’ve already talked about of going from working and understanding how an individual perceptron, an individual neural unit works, coding that up yourself and then gradually developing it into a modern neural network. And then only later, getting into the high level abstraction libraries like Keras and PyTorch. How important do you think it is for machine learning practitioners to understand what’s going on under the hood? So, in your book, it’s a prominent part. It’s the most prominent part of the beginning of the book, so I guess you think it’s important, so maybe there’s a quick answer to that. But then assuming it is important, why is it important?

Magnus Ekman: 00:36:36

Well, I’m not sure if it’s important to everybody. I think overall as the need for machine learning, practitioners is increasing and we are getting better and better tools. I think there’s going to be large groups of people, perhaps, who are using tools for specific things and don’t necessarily need all the basics or all the details. But again, I wrote this one for what I wanted to see. I like to understand things from scratch and understand what’s going on under the hood. And certainly if you want to get into more advanced parts of this, as well as if you want to then develop new ways of using new algorithms for machine learning and stuff then I think you definitely need to know it. But taking this ground up approach, it’s not well-suited to everybody. So, it’s, again, it’s really about how do you like to learn. Some people would want to start with the deep learning frameworks, get something going, and then after that, go back and see, “Okay, how does this actually work from scratch? Well, I felt if I do it that way, I’m just, I feel lost because I don’t understand the things from ground up.

Jon Krohn: 00:38:03

Yeah, unsatisfying.

Magnus Ekman: 00:38:06

I think there is just, there are room for multiple approaches, so this is one approach.

Jon Krohn: 00:38:12

Yeah. And that is, so the direction that I went with, the pedagogical approach in my book is that later thing where I was like, “Look at these really cool things that you can do.” All right and now, you’ve seen that, let’s try to understand what’s going on here a little bit more. And so, some people like to do that, too. So yeah, different kinds of approaches for different people. I love the way that you summarized it here, that understanding machine learning algorithms under the hood is key to developing your own algorithms. But I think something that you were about to start saying and didn’t quite complete, because we started shifting to different topics a little bit, but I think what you were going to say is that as we have more and more machine learning practitioners and applications are more and more common, it won’t necessarily need to be the case that everyone using machine learning needs to understand what’s going on under the hood.

Magnus Ekman: 00:39:11

Right. I definitely think so. I mean, I think we need to get it to a point where we can enable more people to use machine learning without knowing all the details, because it’s just not realistic that everybody will learn things from scratch. And that’s going back to what I said before, so if you have a person doing a PhD in Computer Architecture, can’t really expect that they should at the same time become experts on machine learning to then be able to apply the two together. It would be very nice to be able to then that they can just know a little bit about machine learning to apply it on computer architecture, rather than having to be specialized in two things to make use of these new techniques.

Magnus Ekman: 00:39:54

Going back to I’m shifting topic back to where we’re talking about here, to your book here as well, I find it interesting if you look at also, I think this has to do with who you are targeting or who you are as an author. I really liked Nielsen’s example because he starts using a perceptron to show that it can implement the AND and OR gates which is bullion logic. And as coming from a computer electrical engineering background, that was just, “Oh, this is obvious to me.” While your book, you have a background in neuroscience, I think and neuro studies start with this is the biological vision system. And so, that’s how you think about this is how deep learning fits into the world. While I feel like, “Okay, this, you can use this for bullion logic.” And that again, I think just shows that there isn’t one size fits. This topic of machine learning and deep learning is assuming so many other surrounding areas that you already have this other experience and knowledge. So, depending on what background knowledge you have, I think, you need a different book, yeah.

Jon Krohn: 00:41:13

Yeah. We always gravitate towards what we already know. Everything feels easier. Cool. Very nice of you to make that comparison. So, speaking of biology influencing the way that machine learning algorithms work. So, in the way that, yeah, in my book, wherever I could, I would cite the biological, the often neuroscientific, the brain science inspiration behind why we set up a neural network a certain way or why deep learning model works a certain way. Something that you cover in your book, that’s also inspired by biology is evolutionary algorithms. So, in your neural architecture search chapter, you talk about these evolutionary algorithms. And so, they are inspired by the way that biological systems mutate and evolve in the real world. And it’s a lesser known optimization option in computing relative to other search approaches like grid or random searches. So, can you explain to us what evolutionary computation is? And then as a follow-up, given the ever increasing complexity of models and data size, do you think algorithms inspired by evolution will become more common?

Magnus Ekman: 00:42:39

Sure. Maybe I can take a little detour here first…

Jon Krohn: 00:42:43

Please.

Magnus Ekman: 00:42:43

… if you don’t mind, yeah? So, evolutionary computation is a field where with algorithms that are inspired by evolution. And it’s something I studied when I did my Master’s. So, I mentioned before I had implemented an artificial neural network in the ’90s. I also was looking at genetic programming and evolutionary algorithms, which is a part of the artificial intelligence field as well, I’d say. And I was considering at that time, maybe I should go into this field for real. But I also had this said background in more electrical engineering. Used to sit and solder and build my own stuff. And I felt that, “Well, this AI thing seems really cool and everything, but I want to build chips.” That’s what I had started studying for. I want to build these chips and I want to do real processors, so that’s what really wanted to do. So, I decided on doing the computer architecture direction instead, and then into chip design. But then I had this awesome strike of good luck, I guess. Coming back to NVIDIA and working on processor design and then artificial intelligence is happening right there. And I can now like get a second chance to work on both of these things.

Jon Krohn: 00:44:19

Cool.

Magnus Ekman: 00:44:20

So, that’s just the background there. So, I had this, some knowledge in evolutionary computation and I wanted to write a little bit about neural architecture search. So, what you refer to there is in the I think second to last chapter where I’m just trying to give a little bit of an idea of other things that I haven’t described in detail in the book, that’s emerging parts. And the neural architecture search is about having algorithms that automatically designs your neural networks instead of you building them yourself. So, then I wanted to give some examples of how that can be done and I agree, using evolutionary computation for that might not be the most straightforward way, but it’s a search algorithm. This is a search problem. And I felt that would be an exciting way of giving the reader a flavor of different ways of doing things. And also, it seemed so cool. You have a network that is inspired by biological neurons and now, we’re going to use an algorithm that is inspired by biological evolution to come up with how to connect these neurons. It make it sounds more like science fiction.

Magnus Ekman: 00:45:45

That, yeah, I know it’s a little bit controversial. You mentioned that you wanted to do a lot of draw parallels between biological systems as well as this artificial networks. And I know that there’s a lot of researchers feel that we shouldn’t take that too far because in reality, these networks aren’t as advanced as biological systems. So, I don’t know. Did you have any thoughts on that?

Jon Krohn: 00:46:11

Yeah. Great question.

Magnus Ekman: 00:46:12

You know this better than me, right?

Jon Krohn: 00:46:13

So, it’s pretty infrequent that I get asked questions on the show, but I love it. I’m glad that you did. So, I’m trying to be as careful as I can to use the word inspiration wherever I can. While modern neural networks, both the individual neural units being inspired by biological neurons as well as in some systems, particularly convolutional neural networks, these are inspired by the way that our visual system works. It is a very loose inspiration. And biological neurons and biological neural networks are so vastly more complex. There is a lot more subtlety in what all of the units do and there’s a lot more subtlety in the way that they interact with each other. So, biological neurons, for example, aren’t as simple as just being connected to each other. There are these support cells in our nervous system that surround and assist the brain cells that also play a role in what our brain is able to do, in the way that it’s able to learn. And I am not aware of any artificial systems that have even begun to try to capture that.

Jon Krohn: 00:47:47

So, I’m sure there are orders of magnitude probably, of complexity in biological neural networks that we haven’t begun to tap into despite these inspirations on the broad strokes of how the biological systems work. And so, that brings me to an interesting point, which is that some people, and we had an episode not too long ago. So, in Episode No. 565 with Jeremie Harris, Jeremie and I get into a little bit of a debate about when we might have artificial general intelligence. And Jeremie made the point that the way that our large natural language models are growing exponentially larger every couple of years, so our largest natural language processing models, so like GPT-3 was a famous example a couple of years ago. These models get 10 times bigger every couple of years. And so, by that logic, Jeremie was saying, “Well, so then in 10 years or so, we will have these large neural networks that have more parameters than we have neurons in our brain.” And yeah, so I think that this comparison of how many parameters you have in a artificial neural network and the number of neurons you have in a biological brain, I think that there’s a false equivalency there.

Jon Krohn: 00:49:23

I think that just because we manage to get the numbers to match up or we exceed the number, we can say, “Oh, this artificial neural network has 10 times the number of parameters as there are neurons in a biological brain.” That is, yeah, it’s not the same thing at all, because every one of the biological brain cells has many connections. And there are these direct electrical connections that we’re making via the action potentials that are like explicit. But then as I was alluding to earlier, there are lots of other ways that this system is interacting that makes it more complex. And on top of that, our brain has lots of different structural parts that have different characteristics. So, parts of your cerebral cortex are somewhat replaceable or consistent over parts of your cerebral, so that’s the outer most part of your brain. And so, there’s a lot of plasticity there where if one part gets damaged, another part can step in. But there are lots of other parts of your brain that are not the same. You can’t have brain damage in the outer part of your brain and have any random part of your neural structure step in and help.

Jon Krohn: 00:50:48

There are lots of different substructures in your brain that specialized a different kinds of tasks. And so, it’s not just a matter of having enough parameters, having enough neurons represented in the system or having enough connections between neurons represented in the system. It’s also about a much more complicated interaction between all of the information flows. And maybe as a computer architect, that’s probably something that maybe, that’s probably the way that you think about the brain a bit, Magnus, is that like just as in a computer system. We have a CPU for some kinds of tasks, we have a GPU for other kinds of tasks, the human brain has dozens, maybe hundreds of different kinds of areas that specialize in different kinds of processing and they’re not fungible between each other. It’s not just like having more brain cells is going to fix the problem. It’s the specific way that that sub structure is configured matters to allow our brain to have the complex cognitive capacities that it has.

Magnus Ekman: 00:51:52

Yeah, no, I think that’s right. And so, if you look at computer chips, we tend to talk about or at least the press tend to talk about the number of transistors and how that is increasing every year. But the truth is that it’s not just about how the number of transistors on a chip increases. It’s also about how we connect them together and the architecture and the innovation there. And I think it’s the same thing with these neural networks is that we have seen that. The different network architectures are good at different things. And if we had just continued doing fully connected networks, we wouldn’t be able to do what we can do today. We needed this innovation in the types of architectures like convolutional networks, transformer architecture, and so on. And I’m sure there’s plenty of things to be figured out there as well to be able to use the neurons efficiently and to use them for different tasks.

Jon Krohn: 00:52:55

Great. Well, so you and I are on the same page.

Magnus Ekman: 00:52:59

Of course, we are.

Jon Krohn: 00:53:01

I don’t think we’re going to have AGI anytime very soon. Maybe in our lifetimes, maybe, but the number of innovations…

Magnus Ekman: 00:53:09

I know.

Jon Krohn: 00:53:09

… that need to happen between now and then are there’s many, many innovations and [inaudible 00:53:15]-

Magnus Ekman: 00:53:14

Well, I think I’d say that actually in the section in the book we talked about where applying evolutionary computation to figure out the network structure here. That if you read this from a science fiction perspective, it does sound like science fiction that we now have this little life form that is evolving in some sense. But in reality, it’s a 300-line Python program. It’s not intelligence, but it can do some pretty cool stuff. So, that’s where we are today and let’s see where we go in the future, I guess.

Jon Krohn: 00:53:51

Nice. So, speaking of different architectures and the evolution of these architectures over time, a few years ago, when I was writing my book, which is only a few years old now, everyone was doing natural language processing with recurrent neural networks. So, you had these simple vanilla RNNs or what we thought were really fancy RNNs, long-, short-term memory units, LSTMs. And so, in my book, we use RNNs, LSTMS and convolutional neural networks to analyze natural language data, processed natural language data. And subsequently, these transformer architectures have evolved through discourse between humans, not through some evolutionary algorithm, But so I love that your book has a focus on these transformer architectures. That will be valuable for a lot of our listeners. Do you think that there are still reasons why we might want to use RNNs, instead of transformers for NLP?

Magnus Ekman: 00:54:58

That’s a very good question. I mean, I spend a fair amount of time on both RNNs, LSTM, and I have an appendix on the GRU. the gated recurrent unit. So, definitely, I felt that I really wanted to have it there as a part of building up towards the transformer. And I would struggle with coming up with how can I just describe the transformer without describing how we got there with the end code or decode or networks and the attention and these things. And there’s a lot of buzzwords here now, but the transformer is pretty complex. So, from a understanding perspective, that’s how it made sense to me to describe it, which meant that I definitely needed to go through the details of RNNs and LSTM.

Magnus Ekman: 00:55:55

And I would imagine that they will continue to have a place where that we will continue using them. I mean, people are using them. Transformers have been shown to be good at other things than NLP as well. Computer vision now, vision transformers. It does not mean that we don’t need convolutional neural networks either. I mean, some people would say that we really just need the transformer and then let’s build on that. And maybe that’s where we’re going, or maybe we’re going to. Maybe a few years from now, we have some other architecture that has been, we have arrived at from having the transformer as a stepping stone. I really don’t know.

Magnus Ekman: 00:56:40

But I think that there are these building blocks, that basic building also, you still need to know them. You need to know a convolution. You probably need to know how recurrent network works in order to then come up with even greater architectures. So, maybe they won’t be used just as is because just like who uses just a single preceptron today? Not that many people. But they will be used as building blocks in other architectures and mechanisms from them, I think.

Jon Krohn: 00:57:18

That makes a lot of sense. And something that is probably obvious to you, but we might as well state is that one of the advantages afforded by CNNs or RNNs relative to transformers is that for the most part, speaking in broad strokes, they are a lot less computationally complex. So, they require less compute power, compute time, compute money to train and implement. And for a lot of real world applications, we don’t need the most accurate or the most nuanced results for a lot of machine learning applications. We’d actually prefer something to be fast and cheap and maybe a little bit less accurate.

Magnus Ekman: 00:58:01

And that you can understand it easier, too. You don’t necessarily need to use deep learning either. There are traditional machine learning techniques and you should always consider them as well, because there’s also the part of explaining how the system works. And so, yeah, I think should learn a lot of different techniques and then use the right tool for the right task.

Jon Krohn: 00:58:25

Yeah. And so, on that note, this note of explainability that ties into ideas of ethical AI, and you have a section on ethical AI in your book. So, why is that important? Why not just tell us how to do everything? Why is it important to talk about the ethics as well?

Magnus Ekman: 00:58:47

Yeah, so the way I put it in the book, even in the introduction, I have a question about, “Is deep learning dangerous?” And you talked about artificial general intelligence and often those are the scenarios that have gotten a lot of attention that what’s going to happen in the future when the machines take over. And that’s certainly something to worry about if and when that happens. But I think we already have a problem today with how these algorithms can cause harm in society. And that’s really what ethical AI is about. To ensure that as we develop and build these systems, that we do it in a responsible way that doesn’t cause harm. And I think, at least for myself, that was a blind spot for a long time. You pick up a introductory book on the topic and they don’t really talk about these things. And I realized that, “Well, that’s not good, so I should at least try to open the eyes to people a little bit, so that they know that this is important.”

Magnus Ekman: 00:59:59

And there’s been a lot of examples where the systems aren’t… well, the people who designed the systems might not have thought through these things or it might be that they are used for things that they shouldn’t be used for. So, the prime example, you have a facial recognition system maybe used by law enforcement. And it was developed with photographs of people of a certain ethnicity, for example. And it claimed to be 99% accurate and stuff. And then you deploy it out in the wild and you have a 60% accuracy for other groups of people and that’s just not acceptable.

Jon Krohn: 01:00:44

Yep. Yep. Agreed. I think that in the last few years, rightly so and belatedly, it has become mainstream within data sciences, within data modeling to be thoughtful about the real world applications. I think that prior to a few years ago, it was standard to not reflect on this, like you say, in books that come out or papers that come out. And I think that a minority of books in the future will ignore these considerations entirely. And I think a lot of us come from technical backgrounds. And so, we’re aware if we understand how these algorithms work under the hood, we know that it is linear algebra and calculus and what could be biased about math. But of course the training data that we use to train our models is almost always, there’s something about it.

Magnus Ekman: 01:01:49

It’s data produced by humans, often, so that it contains a lot of human biases.

Jon Krohn: 01:01:56

So, yeah, so great.

Magnus Ekman: 01:01:57

I also found it, it was very interesting when I was doing the research to write on those topics that I had to find the different publications. And there has been great work done in this field, primarily by women actually, which is interesting because it’s not that many women in data science, I think or probably if you at, they’re probably a minority. And likely, they have seen the harm being caused and that has made them pay attention more perhaps than males. I don’t know. It’s just depressing.

Jon Krohn: 01:02:43

Yeah, these algorithms have historically been developed by yeah, by white men by the historically dominant group. And so, yeah, we’re just not, I mean, for people who aren’t watching the video version, both Magnus and I happen to be white men. And yeah, we enjoy privileges and opportunities that others might not get as easily. And part of that probably also leads us as a group to not be thoughtful about ways that the algorithms that we develop are impacting all of the users.

Jon Krohn: 01:03:36

And yeah, so we have lots of examples and hardware examples, software examples, where tools are designed for white men by white men. It’s everywhere. Tools, seats on public transport and machine learning algorithms. It’s like as you design these things, instead of putting the effort in to find a set of users that represents the variety of users that you’re going to have in the real world, you just look around the lab or whatever and say, “All right, I need three people.” And all those three people are right then and there. And then all these different devices end up being, yeah, also meaningful to us and yeah.

Magnus Ekman: 01:04:23

And it’s often not consciously. You’re looking at what you have available and getting data isn’t that easy. So, you’re looking at data around you and getting a representative set of data for the entire world is hard. But you need to put in that effort because otherwise you’re not going to build good systems.

Jon Krohn: 01:04:42

Yeah. It’s worth the investment. And the most valuable firm on the planet, Apple has been longer than it’s big tech competitors, yeah, but had to had to focus on this kind of. So, when Apple develops a facial recognition system, they were, even years ago, putting the effort into to pay, to bring in models with a wide variety of faces that represented the faces that they would need their device to work. Whereas, yeah, some competitor of theirs might develop their facial recognition system based on images that they just pulled from the internet, which are not as balanced.

Jon Krohn: 01:05:25

And anyway, so yeah. So, yeah, I’m glad that you, yeah, that you brought this up in your book and yeah, it’s nice to have a conversation about it. And everybody out there should be thinking about how your algorithm is going to impact people in the real world and what else you can be doing to ensure that the outputs. So, if your algorithm is making a difference in people’s lives in some way, which probably most algorithms are. If there’s some way for you to test whether different demographic groups are equally affected by the algorithm, you should try to do that. And yeah, so yeah, great chapter of the book. And then another piece of your book that I thought was really cool and that isn’t very common to see in books is an appendix with extensive and beautifully illustrated cheat sheets. So, we don’t often see these cheat sheets in machine learning books. What was your thinking behind including them?

Magnus Ekman: 01:06:29

Well, I can’t really take credit for it. It was actually one of my reviewers. One of my in-house reviewers. So, I early on got in touch with Eric Haines. So, he has written a book about Real-Time Rendering, about computer graphics. It’s the Bible in that field and he works for NVIDIA. And he gracelessly said that he would review this book as a target reviewer, like target group reviewer because he did not know deep learning and he wanted to learn about it. So, he was tremendously helpful in making this book as good as it is because he knew everything about writing books. And I had never written a book before. And he came with a lot of suggestions, which cost me more work and extra work to do this thing and-

Jon Krohn: 01:07:23

Yeah, we need more pages.

Magnus Ekman: 01:07:24

And he said, “So, shouldn’t you also have some cheat sheets in the end?” And I was like, “Ah, of course, I should.” I think it made it much better. I mean, it’s great to have these visual summaries of things as you can go back and you can look at this. “Do I know what this refers to?” Okay. Now then, I probably should go back and read that chapter. So, I think it was a great addition, even though it added more work on my side. So, that’s where they are coming from.

Jon Krohn: 01:07:55

Nice. Yeah. Very cool.

Magnus Ekman: 01:07:56

And then also, you can download them as PDF on the book website, actually, even if you haven’t bought the book. So, those are publicly available.

Jon Krohn: 01:08:04

Oh, nice. There you go. We’ll try to include a link to those in the show notes. All right. So, you have all this experience going back decades to machine learning models, evolutionary algorithm models, and also, decades of experience as a computer hardware architect. So, what do you think of yourself as first? Do you think of yourself as a data scientist first or as a computer architect first?

Magnus Ekman: 01:08:36

Definitely, computer architect, actually, so I should be a little bit careful with it. I don’t view myself as having decades of experience in machine learning. I had some knowledge there back [inaudible 01:08:47].

Jon Krohn: 01:08:47

It’s true. That’s right.

Magnus Ekman: 01:08:49

It took a long time before it came back. So, no, I’m definitely more of a computer architect than a data scientist or machine learning expert. But I’m intrigued by both of them. And what I really think is that if you combine them or as I said before, there’s a lot of connection points there. And I think that’s very powerful. I think a lot of people where I get the impression when I listen to episodes on your podcast, that people are looking at, “How can I transition from my current role into becoming a data scientist or how to transition into machine learning?” And I think that maybe that’s the wrong way of looking at it because I think if they already have a lot of knowledge in one field, what they should look at is how can I bring in data science machine learning into this field, because then I think they have a lot more to offer in their current field.

Magnus Ekman: 01:09:56

Because then, they can build on the expertise they already have. And then add on top of that, instead of trying to just abandon what they knew and then be really good at machine learning. Because after all, machine learning is a tool. And not all of us can be focusing on improving the algorithms for machine learning. I think a lot of the work will be to apply those existing algorithms or the latest greatest algorithms to other fields, but you can’t do that unless you really know the field. And the data scientists doesn’t necessarily know that field. So, I think I would encourage people to not view it as, “I’m going to jump ship and do something else.” But he says, “How can I do these things? Two things together?”

Jon Krohn: 01:10:48

Nice. That is a great summary point that I don’t emphasize enough on the show. And hopefully you’ll hear me saying that more on the show going forward, is that, yeah, you’re right. We do often frame this as, “How can I become a data scientist?” But why not just keep doing what you’re doing and integrate some machine learning or some statistics or some analytics or some data visualization, some elements of data science into what you’re doing and augmenting it. Yeah, you don’t need to think about jumping into a whole new industry. And yeah, as Magnus has no doubt found, it’s that thinking is bridging of your field with data science that could be a truly powerful combination.

Jon Krohn: 01:11:35

All right. So, as a SuperDataScience listener, Magnus, another topic that you’ve noticed we have recur on the show is imposter syndrome. And in fact, I did a five-minute Friday on it, specifically, Episode No. 502, if people are interested in hearing about imposter syndrome and my based on my research, how you can overcome it. But Magnus, do you want to tell us what imposter syndrome is? And any tips that you have for overcoming it?

Magnus Ekman: 01:12:10

Sure. So, imposter syndrome is, as far as I understand it, when you don’t feel that you actually know the topic you’re working on or that you’re pretending perhaps to be somebody that you’re not in this field. And I think it’s probably very common in a new field like this, because most of us who try to work on this don’t have a formal education in deep learning. I mean, the people who have has that are probably younger, because, “How could I possibly have a college degree in deep learning if it didn’t exist when I went to college? Well, I would have to go back to college now, perhaps.” And most people don’t. So, we try to pick up these skills and feel that we know this. We know the topic, but we don’t have this sense of, “Yeah, I formally know the topic. I just know this and little bit of this and a little bit of that. And can I really claim that I’m an expert.”

Magnus Ekman: 01:13:23

And so, I think that applies to many people who are a little bit older, who try to work in this area. And I think what to do is really just, well, if you know your stuff, you’re not an imposter, so make sure that you know your stuff and get some experience. As I said, I wrote the book because then I felt I’m going to learn a lot from it. And that’s going to give me some stamp of approval that “I actually know this.” And then give them presentations. “Now, I’m on this podcast,” so maybe after that, maybe I won’t feel like an imposter anymore. I don’t know.

Jon Krohn: 01:14:02

Yeah. This is the real trick. If you’ve been on the SuperDataScience podcast, then you’re definitely not an imposter.

Magnus Ekman: 01:14:07

Yeah, no, I mean, it’s scary. You do something that you feel uncomfortable with, that’s how you expand your comfort level in that and that’s how you’re learning things. And then after a while, you don’t feel like an imposter anymore. It’s just, I mean, it’s like just a completely different experience. When I had kids, in the beginning, “Am I a dad now? I don’t know how to be a dad. But then now, 12 years into this experience, I identify myself as a dad, much more than I did 11 years ago. So, just do your thing and then overtime, you grow into your role. That’s really what it’s about, I think.

Jon Krohn: 01:14:50

Cool. So, I’m curious since you are such a regular SuperDataScience listener, noticing things like imposter syndrome coming up on a lot of guest episodes, are there any particular, other topics that have stuck with you that have resonated with you from the past?

Magnus Ekman: 01:15:08

There’s a lot of them. I’ll just pick one, I guess. So, I think it was episode 503 with Pieter Abbeel. Is it Abbeel or Abel?

Jon Krohn: 01:15:22

Pieter Abbeel.

Magnus Ekman: 01:15:23

Pieter Abbeel, yeah. And this wasn’t even the main topic of that episode, but I think you had asked him about, he was starting multiple efforts at once or doing a lot of things and you asked him how he could do that. And I think there were two things. One was about the more other things you already know, the E-Series to do something new because you can build on top of things. And I think that’s something I feel all the time is that the more you learn, there easier it is to learn new things. But then the other thing is that you just have to jump into it and get started and then figure out how to do it. There will never be a perfect time for doing a specific thing. But if you get into something that’s interesting, you start working on that and then you figure out how to make it happen. And I think that was with this book, for example, I didn’t have time to write a book. Nobody has time to write a book, but you start and then you figure out how to make time for it.

Jon Krohn: 01:16:29

Yeah. Beautiful. Yeah, it’s such a wonderful episode. I mean, Pieter Abbeel, certainly one of the most extraordinary guests. I mean, we have a lot of amazing guests on the show, you being a great example of that. But yeah, Pieter Abbeel such an extraordinary scientist and entrepreneur. Amazing that he also then finds the time to come on a show. And yeah and that was one of his main points was, and I completely forgotten about that. I hadn’t thought about it very much since unlike you. But yeah, that this idea of the more that you start, new things, the easier it becomes. And it’s also, tying into your point, Magnus, about applying data scientists to your industry in the same way, there’s opportunities with the things that you choose to do with your life. So, you wrote a book on Deep Learning, which is relevant to your work. And so, it helps your work and your work helps the book. And in the same way, a lot of Peter Abbeel’s research very directly impacts what he can do as an entrepreneur. And then probably, the problems that he encounters when he’s trying to apply what he’s learned in the lab, in the industry, then he can bring that back to the lab. And so, there’s these synergies, as much as that’s like a cringy management consultancy word.

Jon Krohn: 01:17:56

You can be looking for opportunities, yeah. I mean, I made the same decision. I thought it would be nice to write a book, but it makes the most sense for me to write a book about something related to what I do for work. And same thing with the podcast. And so, yeah, so I guess that’s something that the listeners can be thinking about. too, is, when you’re thinking about ways that you can make an impact, maybe not looking to make that impact in a way that is wholly unrelated to things you already have experience with, but leveraging the experiences you already have.

Magnus Ekman: 01:18:28

Right.

Jon Krohn: 01:18:30

Awesome, Magnus. Well, we’re reaching near the end of the program here. We’ve already gone overtime on our recording slot. Thank you so much for doing that. Magnus and I are recording on a Sunday. And so, thank you so much for taking time out of your Sunday to record this long and content rich episode, I’ve really enjoyed it, but to start wrapping up, do you have a book recommendation for us?

Magnus Ekman: 01:18:53

As I said, you commit to something and then you make time for it, even if it’s on a Sunday.

Jon Krohn: 01:19:01

Yeah, exactly. Yeah, so book recommendation, Magnus?

Magnus Ekman: 01:19:04

Book recommendation, yes. So, I’m reading, well, I finished reading this one. It’s called A Radical Enterprise by Matt Parker. So, just to be transparent here, it’s a friend of mine, but it’s a great book. It’s about how to build organizations that are radically collaborative. So, it’s trying to figure out how to structure companies that in a way that people feel really committed to doing their best. And that they can have fun doing their work rather than having the more traditional management structure in place, I suppose. So, he gives a number of examples of companies that have done that and tries to figure out what are the different mechanisms in place there. What are the things that you need to do in your organization to make it a radically collaborative company? So, that’s interesting. Now, in some sense, I guess it’s trying to get rid of my role as a Director, because I think it’s more about, a lot about having flat organizations and where it’s more organic. How people are self-managing and self-collaborating. So, no, it’s interesting.

Jon Krohn: 01:20:34

Cool. That does definitely sound like an interesting recommendation. So, we’ll be sure to include a link to that in the show notes. And then we’d also like to include in the show notes ways for people to follow you, Magnus. Clearly, you are, yeah, a deeply intelligent person. Yeah, I’ve really enjoyed the depths that we’ve been able to get into on a lot of different topics today. And so, if a listener feels the same way, what can they do to get more?

Magnus Ekman: 01:21:05

Well, I am on LinkedIn, so they can connect with me there. That’s probably the easiest way to get a hold of me. So, just send a request and then send me a chat message if you want to talk.

Jon Krohn: 01:21:17

Nice. All right, Magnus, thank you so much for making the time on this Sunday, to hang out with me and record this great episode. Thank you so much. And maybe we can check in the future with you and see how things have been going.

Magnus Ekman: 01:21:31

Definitely. Thanks for having me.

Jon Krohn: 01:21:38

It’s incredible how deeply knowledgeable Magnus is across computer hardware, software, and mathematics. I thoroughly enjoyed filming this episode with him and was left in awe of him, frankly. In today’s episode, Magnus filled us in on how software is used to design and simulate the performance of physical hardware systems like microchips and CPUs. He talked about how hardware speed and efficiency can be optimized with machine learning. He talked about how understanding machine learning under the hood is key to developing your own algorithms, but it’s not necessarily necessary for everyone who makes use of machine learning algorithms. He also talked about how evolutionary algorithms could potentially optimize neural network model architectures, and how there will continue to be a place for deep learning architectures like CNNs and RNNs, because they’re faster, cheaper, and easier to explain than the currently invoked transformer models.

Jon Krohn: 01:22:28

As always, you can get all the show notes, including the transcript for this episode, the video recording, and any materials mentioned on the show, the URLs for Magnus’s social media profiles as well as my own social media profiles, at www.superdatascience.com/575, that’s www.superdatascience.com/575. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of our program.

Jon Krohn: 01:23:03

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you and thanks of course to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng, and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing and producing another fascinating episode for us today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Podcasts SDS 575: Optimizing Computer Hardware with Deep Learning

SDS 575: Optimizing Computer Hardware with Deep Learning

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

November 14, 2025

November 11, 2025

Podcasts SDS 575: Optimizing Computer Hardware with Deep Learning

Share

SDS 575: Optimizing Computer Hardware with Deep Learning

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025

November 11, 2025

SDS 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta