SDS 878: In Case You Missed It in March 2025

Podcast Guest: Jon Krohn

April 11, 2025

AI stacks, AGI, training neural networks, and AI authenticity: Jon Krohn rounds up his interviews from March with this episode of “In Case You Missed It”. In his favorite clips from the month, he speaks to Andriy Burkov (Episode 867), Natalie Monbiot (Episode 873), Richmond Alake (Episode 871) and Varun Godbole (Episode 869).
  

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jon Krohn rounds up his interviews from March with this episode of “In Case You Missed It”. In his favorite clips from the month, he speaks to Andriy Burkov (Episode 867), Natalie Monbiot (Episode 873), Richmond Alake (Episode 871) and Varun Godbole (Episode 869).
 
This month, Jon gets prospective with his guests, asking their opinions on how AI stacks might help us build smarter applications, how to finally systematize the process of training neural networks, and when the tech industry might realize AGI. For Andriy, to achieve AGI, LLMs must first be self-critical and able to recognize what they know and what they can’t or don’t yet know. 

Hear Jon’s guests explore what they feel makes us uniquely human, from the ability to plan far into the future to simply having a lived experience that cannot be replicated authentically. He also talks to Richmond about how the AI stack will have a different meaning to different people, and he learns how Varun’s resource “The Tuning Playbook” was given an incredible 28,000 stars on GitHub.

DID YOU ENJOY THE PODCAST?

Podcast Transcript

Jon Krohn: 00:05
This is Episode number 878, our “In Case You Missed It in March” episode.

00:28
Welcome back to the SuperDataScience Podcast. I’m your host, Jon Krohn. This is an “In Case You Missed It” episode that highlights the best parts of conversations we had on the show over the past month. This time through the month of March.

00:40
Given the enormous, positive social media reaction to his episode, I’m choosing my conversation with Dr. Andriy Burkov author of the mega-bestselling “100-Page Machine Learning” and “100-Page Language Model” books. That’s my selection for first clip. In episode 867, Andriy and I talk about when AGI (Artificial General Intelligence; an algorithm with all the cognitive abilities of an adult human) might become reality.

01:09
In your work is both a real-world AI assistant developer as well as through the books you’ve written, and recently this huge amount of expertise you’ve developed in language models to write this book on language models. You probably have an interesting perspective on AGI and when it could be realized. You just mentioned there that we might have it in the future, do you want to hazard any guesses in terms of timeline?

Andriy Burkov: 01:33
When I say that we may have it in the future, it’s like to say we may have teleportation in the future and it might work. Yes, it can work because if we humans are conscious, then something in nature changed compared to our predecessors, so we evolved somehow in humans, because what is the difference, the biggest difference between humans and the rest of animals? Humans can plan over an infinite horizon. Some monkeys like chimpanzee and the most developed ones, they can use tools because previously it was considered that only humans can use tools, but now after decades of research, we know that even some birds can use tools. For example, I think that it’s crows that they have a nut and they can throw it from a height and it falls and it cracks, and even when they live in the city, they can wait for a car.

02:59
They wait for a car, then they throw the nut, the car walk over the nut and the nut is broken, so they use tools. Some monkeys can even use tools. For most animals, they use tools, like in this specific moment. They will not keep their tools for tomorrow, but some monkeys will actually, like you, for example, you give one monkey a stick, and only with this stick she’s able to get a banana, so she will get a banana and when she goes to sleep, she will put the stick under her belly when she sleeps because she knows that tomorrow she will also need a banana. This means that some animals can plan one day in the future, two days, but if you remove bananas for more than three or four days, it will throw away the stick, it will not think that maybe in five days bananas will be back, but humans will think that I will still keep this stick because who knows?

04:05
We can even plan many years, even hundreds year, thousands years. Today we think about saving the planet. We think about reducing the consumption of plastic and we think about the global warming issue. Why do we do it? We will die maybe in the next 60, 70, 80 years, the planet will be still fine. We do it for the next generation, for our kids, for their kids and so on. This is what we managed to gain somehow through evolution. Now the question is, how can we get this AGI? Basically, the answer is, what inside us is different that makes us planners for infinity versus every other living creature on this earth. If we can answer this question, I think this will be probably the biggest breakthrough because this is something that our LLMs or whatever neural network you talk about, this is what they don’t have.

05:22
They don’t have the ability to actually plan. They are reactive. You ask a question, it gives you an answer. Even if you call it agent, they don’t really have agency, it’s because they might act as agent because in the system prompt you said you are an agent and your goal is to provide your users with the best information on a specific topic, but this agency didn’t come from the agent itself, it came from you, you instructed it to be an agent, because the LLM doesn’t really understand what it does, it just generates text. Sometimes this agency will be violated, so it will not do what you want it to do and you cannot really explain why. It’s like a black box, it works or it doesn’t, and you don’t know why. If we answer this fundamental question, what makes us planners for infinity? I think that this is where we will get one step closer to AGI.

Jon Krohn: 06:29
Yeah, I would suspect that some of the answer lies in our prefrontal cortex and the ratio of prefrontal cortex that humans have relative to other primates that allows us to kind of sustain a loop through our other sensory cortices over an extended period of time. Which brings me to a point that I’ve talked about on this show before, which is that it seems to me, and it sounds like it may be the case for you as well, that cracking AGI may require modeling the neuroanatomy of a brain, a human brain perhaps in a more sophisticated way than just scaling up a single kind of architecture like a transformer, that we might need to have different kinds of modules so that we have something like a prefrontal cortex that can be doing this kind of infinite horizon planning that you’re describing. You have different parts that are connected by large connections pre-planned as opposed to just allowing all of the model weights to be learned in a very fine way across the entire cortex in the same way, across the entire neural network in the same way.

Andriy Burkov: 07:44
Yeah, and it’s not only… Well, I simplify it a bit by saying that this is just one thing that will make us different. Another thing that we also have, and LLMs, for example, don’t, is that humans somehow have a feeling about what they know and what they don’t know. Okay? For example, I ask you about astronomy or about the universe, stars or galaxies, and if it’s not your domain, you will tell me, Andriy, I like to talk about these topics, but if for you it’s something critical, you probably should talk to a specialist because I can only tell that planets spin around stars. This is what I know. LLMs don’t have this mechanism to detect that what you ask about wasn’t part of its training data, or it was, but not in the level of detail granular enough to have an opinion that is worth sharing.

09:13
It will still answer you. For example, I made a test a couple of days ago with this o3-mini from OpenAI, I wanted to see, because all models, all LLMs, they have been trained on the web data, and on the web there is a lot of information about my first book, but my third book just came out so there is really few information and I’m sure that their cutoff was earlier than the book was released so they should not know anything at all about it. I asked o3-mini like, is my The Hundred-page Machine Learning Book good? What is interesting is that previously you couldn’t see this, currently they show this what they call a chain of thought, this internal discussion before they provide an answer. I read this chain of thought, and it’s funny, it starts by saying, okay, he asks about this book, but this book looks very different from the previous one, so probably it’s some new book.

10:23
Okay, what do I know about this new book? Not much. Okay, so what do I know about the previous book? Oh, the previous book is XYZ. This discussion and then it starts releasing the final answer where it just says that, yeah, this new book is very good, it’s praised by experts and by readers and it delivers content in a very good way. I’m like, where does it come from? It just made up the recommendation and it’s based on its internal discussion in which it says, yeah, but I don’t have anything about this book, but given that Burkov has a great reputation, this is what I might say. It doesn’t tell it you in the official answer that it’s a pure speculation. It answered this just as if it was a real deal. This is where the LLM cannot really understand this difference between, I’m sure about this, I’m less sure about this, I can be totally wrong.

11:41
Again, if we can solve this, this will be additional next step to AGI, the model that can be reliably self-observing and self-criticizing. Saying that I would love to help you, but here I feel like I am in the domain where I cannot be reliable. By the way, they try to fine tune models to tell this, but it doesn’t work this way. Basically, for example, some models, especially released by the Chinese company, they decided to fine tune their models to say, I don’t know this person.

12:25
Previously, for example, there is information about you online, you can ask a model who is Jon Krohn? It might say, well, he’s a podcast host, book writer, but it might also say that you are a Ukrainian footballer like me, so to avoid being ridiculized, like people Google themselves, people ask about themselves, they know that information, some information is online, but it comes out totally made up, so they decided that they will fine tune their models to say, I don’t know anything about this person, and they fine-tuned it by giving the names of really famous people and they say, let’s go answer, and then they give some just random names, people who don’t exist online or very small footprint and they say answer, I don’t know. It’s funny because I asked who is Andriy Burkov, it says first time I hear this name, don’t know anything, and then I say, who wrote Hundred-page Machine Learning Book? Oh, it’s written by Andriy Burkov. You just told me that you don’t know. No, they try to create some hacks around it, but it’s not really training a model to recognize where it can be wrong.

Jon Krohn: 13:44
It’s amazing to hear Andriy’s thoughts on this. Just a few weeks on from this episode going live and already it feels like major new model releases have brought us closer to AGI. The window of what we might consider “old news” seems to be narrowing by the day.

13:58
So, what do these rapid developments mean for humanity, and are we becoming obsolete? In a similar theme to Andriy’s contrast of human and machine intelligence, in episode 873, the entrepreneur and digital-twin expert Natalie Monbiot discusses what’s unique about our species’ intelligence, and why we can be hopeful about the future.

Jon Krohn: 14:17
And something that I was thinking about as you were talking about this evolution in terms of the cognitive involvement from a Google search to using an LLM to output something, and how with the Google search, you’re at least still actively looking up information. You’re maybe comparing a few different resources and saying, “Okay, four out of the five top results all say the same thing. So that’s probably reliable.” And so you’re doing some more critical thinking yourself there. It set my mind off on this, I just did this thought experiment about, I was thinking about, I had this visual of myself as a kid growing up before the internet, and I, in order to look things up, I was dependent upon the dictionary, the thesaurus, the encyclopedia that were in my family home.

15:05
And it seems like if the LLM is taking away a lot of cognitive ability, Google is on a spectrum. It’s in between the LLM and me manually looking something up in these physical books because there’s all kinds of interesting things that happen when you physically look something up in a book like that where you are being exposed to other rich information. So unlike when you do the Google search, the other kinds of things that are showing up on your screen, the things that are showing up top of the fold, the things that are most geared in terms of what humans have been designing, as well as algorithms to capture your attention on those page are ads. But when I’m opening up a dictionary, all of that is information rich. And oh, that’s an interesting illustration of some tree in Asia, and you just end up reading about that. And in ways that you don’t even do on purpose, that experience of discovering this other random piece of knowledge about an Asian tree makes it easier to remember and have this sophisticated web of connections on whatever it was you really were looking at.

Natalie Monbiot: 16:17
And meaning. It’s like, you’re actually discovering things and all of those different inputs are creating an insight and meaning that stays with you. You’ve just understood something in a different way that was very personal to you and your experience in the world. So what I would say the equivalent of that in this new era where we’re living with AI is I think we need to stay, use AI to our advantage, and we also need to remain competitive to AI. What is our unique human advantage and how do we double down on that? So even though earlier in this conversation I said that AI is part of our cognitive evolution, or at least that’s something that I believe, it’s part of us in the same way that language is part of us, but it’s not all of us.

17:02
We don’t experience everything in the world through language. We’re also these embodied creatures that were born to be human living in the world and having a lived experience and being able to notice things and engage with other people, be it, discover different places, make connections in the real world that inform our insight and our understanding of things. And so I think that example of you, the thesaurus, tough word, is actually something that is even more, I think in the age of Google, it was like, oh goodness. Well, Google totally replaces that behavior, but now we should be doubling down on those behaviors and those experiences because that is truly what makes us human and makes us competitive to AI.

17:53
Because AI, yes, AI is multimodal and can see and things, but it doesn’t really, AI ultimately is not human, and it’s not embodied and embedded in the world. And so what do we do in the age of AI? Yes, collaborate with it, free ourselves, solve big problems with it, and free ourselves to be more human. I think ultimately that is what I’m hopeful about and I feel is very true. We need to be more human because our future and our existence depends on it, and we need to tap into humanness in terms of human experience and into human ingenuity to be competitive with AI.

Jon Krohn: 18:38
AI cannot replicate human experience, so this should give us motivation to keep on exploring the skills and opportunities that rely on what distinguishes us from AI rather than what we already know an AI model can do more efficiently.

18:50
And, as humans, we think and feel and understand differently, and these different perspectives are what make us so useful and unique. In episode 871, I went to London to chat with the charismatic software engineer Richmond Alake, who told me about the AI stack, how that stack means different things to different people in a company, and how the unstructured database MongoDB does such a great job in simplifying the AI stack for AI engineers.

Jon Krohn: 19:19
So let’s now talk about something that’s an extension of this idea of vector databases, AI and MongoDB. You recently wrote a blog post on AI stacks, and it’s actually right now at the time of recording, if you Google the term AI stack, your blog post comes up as the number one hit. So I’ll have a link to that in the show notes. We talked a little bit earlier in the episode about things like the MEAN stack, which was this idea of back-end all the way through to front-end technologies, for the developer. Is an AI stack somehow related to that kind of thing?

Richmond Alake: 19:52
Yes. So we said the MEAN stack was… Well, we didn’t say, but we know that the MEAN stack is a composition of a bunch of tools and libraries to build application. So the AI stack is a composition of tools and library to build an AI application. One thing I would say is, the AI stack is different, in terms of how you visualize it, depending on the persona you’re talking to. So when I’m talking to developers, and when you see the… When you look at the article, when I’m talking to developers, there are more layers in the AI stack than when I explained the AI stack to a C-suite or VP-like person. And that’s because I feel developers need to really go… We really need to dive deep into what is making the AI applications today and understand the composition. But for some CEOs and VP-like folks, they don’t need to know the intrinsic detail, they need to know the high-level information.

20:53
So just to make the point is, most of VPs or high-level execs within companies would describe the AI stack as, you having the application layer, you have your data layer and you have your compute layer. Very easy, so application would be… Sorry, you have your application layer, you have your tooling layer, then you have your compute layer. So application would be any of the products you see today, so Cursor, a very popular IDE that is powered by AI, would lie in the application layer. Then in your tooling layer you have folks like MongoDB or any of the tools that enable the application layer, then in your compute you have NVIDIA. But when I’m talking to developers, I double click into that and we talk about the other layers of the stack. I’m not going to remember everything now, but programming language is very important.

21:44
When you’re developing this AI stack AI application, the languages you select is very crucial because not all libraries that you’re going to be using further in the stack, are written in all the languages you have available. Some are just Python or maybe some have… Or just JavaScript or there’s some that are evolving to have both now. But your programming language is crucial, you have your model provider, you have your database, which would be MongoDB, then you have your model provider. So you have… There have several layers to that stack, and when I’m talking to developers, I tend to dive deep into that.

Jon Krohn: 22:20
I love talking to people who are developing their own products, whether that means database programs like MongoDB or resources, as with my next guest, Varun Godbole. In episode 869, I ask Varun about The Deep Learning Tuning Playbook, a resource that Varun helped develop at Google, where he was a software engineer for nearly a decade, including working on their flagship large language model, Gemini.

22:42
So let’s talk about some of the stuff that you have been working on since then. One of those things is the Tuning Playbook. So you describe yourself as passionate about having more systemic neural network development. So neural networks, so the kind of AI technology that would be used to facilitate breast cancer screening, that also facilitates all the kind of generative AI capabilities that we have today across text generation image video, all that stuff happens with artificial neural networks. And you have this Tuning Playbook that you released as part of a team at Google. And so tell us about that.

Varun Godbole: 23:21
Yeah, so a lot of the motivation behind this playbook came after the work on this mammography paper actually. At the time, and it’s still kind of true today, training neural networks can be a very ad hoc, some might uncharitably call it alchemical, and it’s kind of true. But it involves a lot of experimentation, a lot of empiricism and a lot of research to train and deploy a model.

23:53
And so something I was really interested and excited about is, well, at that point in time, I just trained a lot of models. I knew a lot of people that trained a lot of models, and it was like, how can we systematize this process? The broad research agenda that we were interested in is kind of like, you could imagine the transition from alchemy to chemistry or something like that. You could imagine… Systematization can be very, very helpful for engineering. And so frankly, on that paper, even though I’m the first author, the other authors know way more than me, and everything I learned from that comes from them. And really, we kind of just got a bunch of our heads together and tried to write down what’s worked, what hasn’t worked, and we collectively have decades of experience training these models. And we wanted to provide a systematic approach for thinking about hyperparameter tuning architecture, just various aspects of model selection.

25:01
And it’s true, this playbook was kind of released, I believe, before ChatGPT came out. But I think that a lot of the things described in that playbook are still very true today because the intent of that playbook was to be a sort of fundamental look at how you should think about running hyperparameter sweeps, what sort of plots you should make, how you can be more systemically empirical with questions. I have this compute budget, these are the constraints of my problem, therefore, how can I systematically go through a bunch of steps and reliably reach a good outcome, and then what process should I have to do this over and over again.

25:43
And that’s kind of what the whole playbook is about. And so it got popular at the time on the internet and I was pretty excited about it. And we released it as a Markdown file. So at the time, the standard way of releasing papers or ML artifacts like this was a PDF and archive, but we really wanted to release this as a Markdown file with I think Creative Commons license or whatever the permissive license is, because we really wanted the community to be able to easily fork it, modify it, come up with our own best practices and give us pull requests back, whatever, for it to be a sort of collaborative thing.

26:24
I think we weren’t exactly clear, I don’t want to overstate it, but it is cool that what ended up happening is that a bunch of folks decided to fork it, and I believe crowdsource translations in a bunch of different languages, which are not endorsed by us because I can only speak English, but that was pretty cool and I think it’s still pretty relevant today for people training models.

Jon Krohn: 26:47
For sure. I think it’s an invaluable resource, and I’m not the only one. It has 28,000 stars at the time of recording, which is insane. That’s amongst the most stars I’ve ever seen on a project. So yeah, hugely impactful. Some amazing contributors on there. And so yeah, thanks to you and the Google Brain team as well as someone from Harvard University. Christopher Shallue.

Varun Godbole: 27:14
Yeah, he actually used to be a Brain before he went to Harvard. Yeah, he’s cool. Like I said, even though I’m the first author, the other authors, I really shout out to George Dahl, Justin Gilmer, Zach Nado, Chris Shallue, they’re the real brains behind the outfit, and I was kind of just learning from them and kind of getting everything going. And yeah, I’ve learned… It was a lot of fun working with them and I’m grateful we were able to get that out there.

Jon Krohn: 27:44
I teach an intro to deep learning course. I’ve been doing it for coming on 10 years now, and five years ago roughly, six years ago, it was published as a book, the curriculum that I developed as an introductory deep learning class. And something that every class always asks, once I explained that we can add some more layers, we can double the number of neurons in a layer or all of the layers. It was like, “Okay, but why? Why are you making those decisions?” And up until now, I basically always just said, “Well, you can either just experiment and find out empirically by experimenting with a bunch of parameters, or you can do some kind of search.” The simplest thing is doing a grid search, so just setting up some parameters to search over, but there’s also clever Bayesian approaches to homing in on what the ideal parameters could be.

Varun Godbole: 28:41
Yeah. So this playbook is about that question pretty much. It tries to take a much more general approach. So it’s kind of architecture-agnostic in the sense that it won’t tell you this is when you should add a new layer versus this is when you should change the width of the layer. But it is about helping practitioners grapple with the question, here are the experiments I have now, what is the experiment I should run next? Because the assumption is that if you can set up the base case and a good recurrence relation, you can iterate your way to success.

29:18
And so there’s a lot of thinking in the playbook about how should you think about setting up the right initial state for your experimentation and how should you think about, given the data that I have collected, what is the next experiment you should do? And I should emphasize this is meant to be a living document. That’s also why it’s the Markdown file on GitHub, we reserve the right to change our opinions and feedback is very welcomed and encouraged. And it’s not the final answer, or… I won’t speak for the authors, I won’t pretend to be the arbiter of how everyone should tune their models, but it’s just like we’ve been training models for a while, these are our 2 cents of how one could think about doing it. And so it’s not really… That’s the kind of vibe, right? And hopefully it helps people, if it doesn’t, please click create issue or something and give us feedback. Yeah.

Jon Krohn: 30:22
All right, that’s it for today’s In Case You Missed It episode. To be sure not to miss any of our exciting upcoming episodes, subscribe to this podcast if you haven’t already, but most importantly, just keep on listening. Until next time, keep on rockin’ it out there, and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Show All

Share on

Related Podcasts