Podcasts SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

70 minutes
Artificial Intelligence, Career Tips, Data Science, Machine Learning, Python

SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Founder of SuperDataScience, Kirill Eremenko, talks to Jon Krohn about how he found the best tools and approaches to help launch his 8-week AI engineering bootcamp. He breaks down the topics participants cover each week, and he also shares his tips with listeners who might want to start their own tech bootcamp or sign up for SuperDataScience’s September 2025 cohort.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Kirill

Kirill is the founder of SuperDataScience and online educator who has created dozens of best-selling courses such as GenAI & LLMs A-Z, Machine Learning A-Z and Artificial Intelligence A-Z. He is also a well-known instructor on Udemy where his courses have been enrolled in by over 3M students worldwide. Kirill is utterly passionate about the Ed-Tech Space and his goal is to deliver high-quality accessible education to everyone! He is also the founder of BravoTech, offering Generative AI implementation and education to companies all over the world.

Overview

Kirill Eremenko shares with Jon Krohn his recipe for designing and launching a successful AI bootcamp with help from the bootcamp’s expert tutors, Ed Donner and Sam Bashton. Having now run his first eight-week AI engineering bootcamp at SuperDataScience, Kirill feels he has found the perfect approach for future iterations.

The first question for Kirill was who he might choose to admit. Bootcamps are typically designed for people who already have strong capabilities in their field and aim to advance those skills to the next level. Kirill keeps his cohorts small, with up to 10 places available for each 8-week bootcamp. He prefers to admit those who have similar ability levels so that participants can learn more quickly together. However, he notes that program developers may want to offer an add-on foundational course before the bootcamp, to make learning more accessible. Participants on SuperDataScience’s first AI bootcamp were already confident working with Python (PyTorch, Scikit, and Pandas) and AWS. Kirill also looked for people who wanted to use AI to solve business problems and add value.

When designing the bootcamp, Kirill found it helpful to identify core questions. In this case, the first question was: How do you build a proof-of-concept that can solve a given business problem? This became the key overarching question for the first half of the bootcamp, which he had participants work towards with the aid of weekly steps, such as ways to find the LLMs that are best fit for purpose and how to combine agents. Kirill’s second core question concerned deployment: How can an AI engineer deploy their devised systems into the real world, and what obstacles might they face to reach their goals? The second half of the bootcamp had participants engage with this question by encouraging them to find ways to make their models secure, efficient, cost-effective, reliable, and scalable.

For Kirill, week one of the bootcamp was the most important. He conceptualized it as a week to acknowledge the hype surrounding AI and how to avoid falling into the trap of thinking it is the solution to every business problem. The tutor leading weeks 1-4 was Ed Donner, who exposed participants to as many LLMs and benchmarks as necessary to ground them in the current tech landscape. Week two focused on how to design an LLM’s behavior by training participants to use prompts effectively. Ed saw how beneficial it was for participants to work on projects that interested them, so this week is where they started to develop their ideas. Week three then introduced participants to RAG and encouraged them to consider how large or how small a context window to use in their projects, underscoring the importance of AI driving the bottom line in business. In week four, Ed discussed using agentic AI in the context of participants’ projects.

Sam Bashton led weeks 5-8 of the bootcamp. From testing production readiness (week 5) and converting LLMs to agents (week 6) to advanced RAG and GitHub deployment (weeks 7 and 8), participants gained the knowledge and means to finalize their projects as functioning digital twins. One of the bootcamp’s participants already secured a new role in AI after showing the recruiting committee their bootcamp project.

For more information, participant success stories, and to apply for the next cohort, listen to the episode and then head to superdatascience.com/bootcamp. The second bootcamp launches on September 29^th, 2025, and only a few spots remain, so hurry!

In this episode you will learn:

(10:58) Weeks 1-4 of the SuperDataScience bootcamp
(37:52) How to use AI to drive the bottom line in business
(47:50) Weeks 5-8 of the SuperDataScience bootcamp
(54:50) How to convert LLMs to agents
(1:09:33) Jon’s feedback on the SuperDataSciencebootcamp

Items mentioned in this podcast:

This episode is brought to you by the Dell AI Factory with NVIDIA and by ODSC, the Open Data Science Conference
Bootcamp, SuperDataScience
Y Carrot
Agentic AI Engineering: Complete 4-Hour Workshop feat. MCP, CrewAI and OpenAI Agents SDK
Gradio
LoRA
Chroma
LangChain
Docker
Lambda Function
EC2
MCP
LangGraph
LangSmith
Thinking Fast and Slow by Daniel Kahneman
Super Data Science Podcast Team

Follow Kirill

Follow Jon

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00 Welcome to another episode of the SuperDataScience podcast. I’m your host, Jon Krohn. Today we’ve got an excellent episode for you with Kirill Eremenko. So Kirill runs SuperDataScience.com, where they have an AI engineering bootcamp. And Kirill walks us through over the course of the episode, all eight weeks of this AI engineering bootcamp so that you understand all of the key tools and approaches to be an AI engineer. And after listening to today’s episode, you could actually run that kind of bootcamp, DIY, all yourself. Enjoy this one.

00:00:35 This episode of Super Data Science is made possible by Dell, Nvidia and AWS.

00:00:40 Kirill, welcome to the SuperDataScience Podcast.

Kirill Eremenko: 00:00:45 Thanks Jon for having me. Super excited to be back.

Jon Krohn: 00:00:48 Yeah, we’re recording in person together in Australia on the Gold Coast.

Kirill Eremenko: 00:00:53 That’s right. You came all the way from America.

Jon Krohn: 00:00:55 Thank you. From America to film this special episode. It’s what’s it about? We’re talking about ai

Kirill Eremenko: 00:01:02 Engineering. That’s right. AI engineering. And today’s going to be really fun because my purpose for today is to give people listening, your listeners a recipe for A DIY bootcamp. We’re running a eight-week AI engineering bootcamp at SuperDataScience. We just finished this week is the last week of the first cohort, the inaugural cohort. And while we welcome everybody who’s interested in the bootcamp to apply and see if this is the right thing, I totally appreciate that We have limited spots, only 10 people per cohort and not everybody would be able to attend or might not be the exact right fit for everybody. So if you want to create your own bootcamp in your own time, I’m going to go exactly through every single week, give you what the participants learned, why they learned it, and a cool pro tip. And then you can use that to recreate your own bootcamp and learn the same things if you like.

Jon Krohn: 00:02:01 Nice. And so it’s an eight week course, so we’re going to kind have eight chapters to this episode. And so just really quickly there, you said this is something that we’re doing at Super Data Science, and so I kind of want to disambiguate that there’s these two. So this is the SuperDataScience podcast, but we’re not running any AI engineering bootcamp from the podcast. So you, Kirill, you founded both this podcast that I’ve now been hosting for a few years. You used to host it the SuperDataScience podcast, but you also founded a e-learning platform called Super datascience.com. That’s right. And that’s where this AI engineering bootcamp is run

Kirill Eremenko: 00:02:35 Out of. Yep. And these things that we’re going to be discussing, the bootcamp is at superdatascience.com/bootcamp and you can follow along and see the week by week breakdown on that page if you like.

Jon Krohn: 00:02:49 Nice. Sweet. Thank you for that. I can’t wait to dig into it. I can’t wait to learn what I need to know to become an AI engineer. Let’s start with week one is probably the best place to start.

Kirill Eremenko: 00:02:59 Well, let’s do a quick overview for the background, what kind of prerequisites they are for somebody who wants to follow this kind of curriculum. Bootcamp is designed to take people from a intermediate, high, intermediate level to advanced or starting advanced or medium advanced depending on where you are now. So it’s quite a tight range you have to be in to do a bootcamp like this. And the outcome is roughly advanced level, maybe advanced plus. And the prerequisites that we expected and we asked our participants to, you have to already know Python, so there’s no learning Python in this bootcamp. You have to already know the usual things like a bit of PyTorch, a little bit of psyche, learn typical work with pandas, even though we don’t work a lot of pandas, you have to be confident with those things. In addition to Python, you also need to know LLM calls like API calls for LLMs.

00:04:00 They’re not difficult. We recommended some participants that didn’t know those to get an overview before the bootcamp because we don’t want to spend too much time understanding what an API call is. That’s the typical way of calling, whether it’s ChatGPT, also Open AI, LLMs or philanthropic or GR or whatever. And also some cloud experience because the way and the cool thing about AI engineering is that to be, in our view, to be a successful and effective AI engineer, you need to combine two things. One is the science of AI. How do you build AI that does the job? How do you build a proof of concept to solve the business problem that your business needs to solve? By the way, pre-phase to all of this, the goal of AI in this context is to solve business problems, is to add value to businesses, not just AI, for the sake of AI.

00:04:53 So first of all, it was like how do you build a proof of concept that will solve the problem effectively? And that’s the science of which LLMs do you use, how do you combine them, how do you augment and with rag, how do you add things to them? How do you use agents? Do you not use agents and things like that. So that’s the first four weeks and the second four weeks, weeks five to eight, that is deployment. And that’s the second imperative component of a successful and effective AI engineer is to be able to deploy systems into real world environments or at least to understand what the deployment takes. So how do you now take that proof of concept AI, which is like a Jupyter Notebook and how do you put it into a cloud environment? The one we used for the bootcamp is AWS, it can be Azure, it can be GCP, it can be any other environment, your own servers.

00:05:42 But you need to understand how do you take that POC and put into real world environment? How do you make it secure? How do you make it efficient? How do you make it cost effective? How do you make it reliable? How do you make it scalable? Those are all important constraints that don’t exist in the world proof of concept, but they are critical for business, real world business systems because what if you have one user using it and then next day you have a thousand, next day you have 10,000 using. It has to be scalable, it has to be secure, it has to be reliable, it has to be cost effective. A lot of people don’t think about that, but do you deploy it on serverless architecture? Do you use a server? Why do you choose one or the other? What’s your trade off of a speed of responsiveness to latency depending on the business application?

00:06:24 And so the weeks five to eight focus exactly on that. And the way we described this at the start of the bootcamp was we have two instructors, one the first instructors, your good friend, Ed Donner. He was fantastic. So we described it as Ed explains to you what is possible with AI creates this huge bubble of dreams. And then in weeks five to eight, our second extract to Sam, who I’ve been working with for over a year now, what’s Sam’s full name? Sam Baston. He’s an expert in cloud. He’s an expert in a WS he’s been doing for 15 years and most recently the past two years he’s been doing specifically LLM deployments, LLM and AI deployments. And then the second half of the bootcamp, Sam comes in and shrinks your dreams back to reality because not everything that’s possible in a proof of concept is going to be possible in a real world deployment.

Jon Krohn: 00:07:16 Sounds like a great structure and I can definitely vouch for Ed Donner being an unbelievable instructor. He’s so thoughtful about where you are as a listener and provide such great context, such beautiful explanations of technical content. And he manages to him and I did, it’s now available as a four hour YouTube video, and so I can provide a link to that in the show notes, but it’s an intro to AI agents and so multi-agent systems. So using things like crew AI, the open AI agents, SDK model, context protocol. We’ll end up talking about some of these things in today’s episode I’m sure. But in that when we recorded that video, his enthusiasm for what he’s doing, and I wasn’t in the bootcamp so I don’t know, you can

Kirill Eremenko: 00:08:08 Tell me absolutely do the same thing.

Jon Krohn: 00:08:10 It’s unbelievable. I’m just like, how does he maintain that level of energy and excitement? I asked him about that. I was like, is this a performance? And he’s like, no, I just love this so much. He’s so blown away by what AI agents can do. This is his real enthusiasm.

Kirill Eremenko: 00:08:24 Yeah, for sure. Really great guy. So we were talking about prerequisites. So Python, the second big prerequisite is knowing AWS. So we structured our bootcamp around AWS because it has the biggest market share and most, not most, but a lot of companies do use AWS for their cloud as a cloud provider and knowing at least what cloud is, how it works, why, how is it different to on premises having experience, setting up your first even through the console, not necessarily through CLI, but through just the web interface, your cloud servers running an EC2 instance and things like that. That was a prerequisite because again, we wanted people to hit the ground running and be able to keep up with the bootcamp. And so that’s a prerequisite you might need for this bootcamp. Of course, if you’re doing a DIY, you can integrate an extra week before in advance to cover some of those things.

Jon Krohn: 00:09:25 Spend a week learning Python.

Kirill Eremenko: 00:09:26 That’s it. Yes, might be enough. The schedule that we had, this could also be useful is Monday or that we have, because we can continue running more cohorts Monday, there’s three hours a core session which is basically learning skill or the tool set of the week, the skill and hands-on practice office hours on a Thursday for two hours with instructed to cover commercial use cases. Going back to how important it is to know what business problems this technology can solve.

Jon Krohn: 00:10:00 And so that Monday, that’s kind of like a structured lecture and then Thursday is more unstructured or

Kirill Eremenko: 00:10:05 Monday is like, it is like a structured in mind, but because it’s such a small cohort, it’s not a one-way broadcast, it’s interactive. So for example, ED would get the participants to share the screen and do the coding. One person’s sharing and doing the coding. Everybody else is following along. Oh really? They hit a snag and then they start debugging or change the course of the session. It’s really cool because people chip in, they have different backgrounds. It’s like a very interactive group. They ask questions, is this breaks? And also we run feedback surveys at the end of every core sessions. Sessions so that we know was it too fast, too slow, how do I adjust the next one? Things like that. Anyway, so the core session office hours, there was also MMAs. We invite experts to answer questions on each week’s topic. Yeah, so that’s a quick description of what the bootcamp is about, the prerequisite schedule. Let’s dive into it week one.

Jon Krohn: 00:10:57 Perfect. Yeah, tell me what’s going on. What’s in week one? What’s the most important thing to start with when we’re learning about AI engineering,

Kirill Eremenko: 00:11:04 The most important thing was week one. Well, I’ll call it the mindset shift week because a lot of people, especially in the executive and managerial level, people who are running the businesses and making business decisions at the moment, they’re affected by the hype of AI and they think or not think, but they guess that AI can solve any problem or they have problems or they don’t even have a problem, they just want an agent. They hear the term agent and this is not, it’s like a secret agent. That’s what we’re talking about today, right? James Bond double 0, 0 7. Exactly.

00:11:46 Yeah. So like agentic AI or generative AI, and because all businesses are talking about it because it’s a hype, often businesses fall into this trap of thinking that a generative AI solution is needed where one is actually not needed or an agent AI solution is needed where an regenerative AI, simple LLM solution will be sufficient. And so reframing the problem and understanding the problem and speaking with the business stakeholders to understand what is the problem they’re trying to solve and understanding, do you really need large language models here? Do you really need agents here to solve this problem? Maybe a simple script, simple code will be sufficient to solve this problem. Or maybe something like UiPath, which is the tool that just does thing on your screen. There’s no really agent in it, like robotics process automation, right? RPA, maybe that’ll be sufficient for the specific problem you’re trying to solve.

00:12:46 So that week is about understanding how to assess scenarios like that. What the participant did is they compared 13 different LLMs and explored reasoning models versus chat models to understand when to use which one and how to contrast different models. Because I think both Ed talks about a lot about benchmarks, but also you had Sinan on the podcast recently who talked about benchmarks can be useful, but really you need to have your own benchmark within your business. I love that episode, that part of the episode. You have to have your own benchmark within the business. So the pro tip, for each week, we’re going to have a pro tip, which you can take away.

Jon Krohn: 00:13:29 Sweet. We going to share that in this episode?

Kirill Eremenko: 00:13:30 Yeah, yeah, yeah.

Jon Krohn: 00:13:30 Oh, sweet.

Kirill Eremenko: 00:13:31 Yeah, so the pro tip for this week, there’s so much to share. I wish I could share more, but we’re going to be here for hours, but at least one pro tip per week. If you can’t define the business goal and success metric, don’t build it yet. You have to first define the business goal and success metric for whatever solution you’re going to be building, and then only proceed to exploring what LLM to use and how to build that solution.

Jon Krohn: 00:13:58 Nice. That’s a great pro tip.

Kirill Eremenko: 00:14:00 Yeah. Yeah. And if you’re doing the DIY bootcamp, explore as many LLMs as you can in that first week and just get to play around with the API calls. They’re always changing the format of recently anthropic a few weeks ago, they just changed what they previously weren’t using the open AI template or API call, but now you’re allowed to use it that way as well. So it’s always changed. So get up to speed with the latest in LLMs. Just get an intro for all of that.

Jon Krohn: 00:14:31 Sweet. Yeah, so week one is about this mindset shift and having people just become familiar with what kinds of problems can you solve with modern AI solutions, generative AI, agent AI,

00:14:42 And maybe is part of that mindset shift. I realize that there’s kind of a grounding part of it. You were saying executives will come into situations where they think that everything can be done by this agent, they can now just, it’s humanlike and it can just be plopped into any situation and solve any business problem, which obviously isn’t the reality. Does this mindset shift also maybe if you are an engineer, maybe you need a mindset shift, a broadening kind of the other way around where if you are an engineer, you might be kind of used to it, maybe you used to do robotic process automation, RPA, and so you kind of have this relatively narrow view of what can be automated. And so maybe this mindset shift week also helps people expand their horizons if they’re in that scenario.

Kirill Eremenko: 00:15:31 That’s a pretty cool idea, indeed. That might be necessary in some cases, but I think throughout the whole bootcamp you get exposures to so many use cases and especially the office hours and participants bring in to the discussion their questions, their specific industries. We had very different, I love that each bootcamp is going to be very different because of the combination of people in it. We had people from, for example, healthcare industry, and then on the other hand we had people from very technical company that processes at looks like a gatekeeper to LLMs for other companies to make sure that everything’s secure. And the questions were very varied. How do you use AI in medical data too so that we maintain privacy. On the other hand, how do you use I create or how do I maintain APIs in such a way that my clients or my company’s clients are confident that they’re secure, reliable, but at the same time scalable. So definitely the mindset shift of the engineer themselves happens throughout the bootcamp as they get exposure to these projects,

Jon Krohn: 00:16:41 These people. So you’re finishing up the first cohort right now?

Kirill Eremenko: 00:16:43 Yeah,

Jon Krohn: 00:16:43 Literally two days from now. Yeah, two days from now. The time of recording. So a few weeks ago. By the time you hear this at the earliest, but not that really matters. Not for my question. So this first cohort, are all of these people paying individually or do you have instances that you’re aware of where actually this person’s employer is paying because

Kirill Eremenko: 00:17:03 Yeah,

Jon Krohn: 00:17:03 Yeah, yeah.

Kirill Eremenko: 00:17:04 We have a few of those instances where the employer is paying

Jon Krohn: 00:17:06 Because when you’re talking about this kind of scenario where an attendee, a bootcamp, a boot camper comes with their own use cases, you could imagine for an enterprise, we maybe have people listening who are like, wow, I need to get people in my company on a program like this, either the DIY one that we’re going through today, or maybe even apply to your formal super data science.com bootcamp because they can be armed with their use cases and be getting feedback from experts like Ed and Sam and being able to figure out what’s realistic, how can we get an ROI as quickly as possible on these kinds of ideas.

Kirill Eremenko: 00:17:40 Yeah, for sure, for sure. And companies, that’s a very good point because companies these days, especially larger companies, have a substantial learning budget per employee and it’s not uncommon for it to be 5,000, $10,000 per year, and that is something that people can put towards a bootcamp. Yeah, for sure. We have instances of people doing that.

Jon Krohn: 00:18:03 Cool. Anyway, I digress.

Kirill Eremenko: 00:18:05 Before we do week two, I wanted to ask you maybe, I think we should highlight this a little bit more, chat models versus reasoning models. What are your thoughts on that? How would you describe to somebody the chat versus reasoning models?

Jon Krohn: 00:18:19 The way that I like to describe these, I wouldn’t probably use, I think I know what you’re distinguishing there. I probably wouldn’t call what you’re calling a chat model there. A chat model because both typically, whether it’s a reasoning model like O three from OpenAI or whether it’s a

Kirill Eremenko: 00:18:37 Chat,

Jon Krohn: 00:18:37 Yeah, a quote chat model like four oh or 4.1, 4.5 from OpenAI in. So with what you’re calling a chat model there, I describe those often as kind of like a stream of consciousness where, so the way that this takes a little bit of time to get into, but I can take a minute or two here to explain it, and I’ve certainly done this, if people regularly listen to the show, then they’ve heard this before. But the way that I describe it is it’s like the thinking fast and slow that Daniel Canman and Amos Ky and other researchers came up with over decades, I think mostly starting in the sixties and seventies, digging into these two different thinking systems that humans have. So you have thinking fast and slow. The fast thinking system is what up until recently all of these chat, all these generative models were just spitting out tokens, spitting out words or parts of words as quickly as possible based on whatever you just typed in. There’s no reflection. You’re just like, I’m speaking right now,

00:19:39 You kind of ask me a question and I just hope that my stream of consciousness, the words, the tokens that I’m spitting out of my mouth are appropriate and relatively on the mark. So that’s what all generative models were doing up until about a year ago we started having our first reasoning models. So oh one was the first big reasoning model that was released to the public. And with these reasoning models, that’s slow thinking. And so this is more like when you are thinking about some challenging business problem and you get out a notepad and a pen and you’re jotting down, okay, what are the key things that I’m trying to solve in this problem? Who are the personnel or the resources that I have? And you start to map all these things together, or it could be a math problem or a computer science problem where you’re sketching out on a whiteboard how you might solve this computer science problem, this data science problem.

00:20:30 So in any of those kinds of situations, it isn’t linear tokens that are being output. You are iterating in your mind or on a whiteboard or on a piece of paper over steps and you are double checking to make sure that had your assumptions correct. And this ends up being a really powerful thing for an AI model to be able to do because it can dramatically reduce error rates, for example, by checking over your work and making sure that each of the steps is correct and then you end up with it is more computationally expensive. That’s kind of the main trade off. But you can end up even without necessarily investing in a more expensive model in terms of model weights, just by iteratively processing, reflecting on work before outputting a result, you can end up with wildly more accurate, more nuanced, more complex solutions to problems. So yeah, we’re seeing amazing results in the international math Olympiad recently, for example, with models from both OpenAI and Google getting gold in this international math olympiad by using these kinds of reasoning models. And so we’re getting really powerful, powerful results. Anyway, I probably gave a way longer answer.

Kirill Eremenko: 00:21:51 No, that was spot on. Very useful. So it’s one of the things for the DYI bootcamp first week one is explore the differences between the two and understand the use cases moving on week two.

00:22:06 Okay, so week two is the behavior design week. And when I was preparing for this podcast, I didn’t really want to go down the path of using terms like prompt engineering because it feels like three, four years ago, right? Three years ago, prompt engineering was in demand. Everybody wanted to be a prompt engineer. Here we’re talking about prompt engineering, but not from the point of view of using LLMs as a user, but more from the design perspective because there is still a lot of prompt engineering that you have to think through because that will dictate how your LLM behaves when users do use it. So like we’re talking about prompt templates, like the system that uses that calls the API, how are you going to pass on the prompt?

00:22:56 What kind of system prompt are you going to pass onto that ai, LLM? What kind of output are you going to request? Because for example, lms love speaking in JSON, love giving responses in JSON, right? So if you don’t process that format that you’re receiving, if you don’t parse the JSON, you’re just going to have illegible tech. So you have to keep those kinds of things in mind. You have to understand that specific LLM that you’ve selected, how does it work? What kind of prompts do you give it, what kind of responses you going to get and design its behavior around that. In this week, participants used also a tool called grado, which allows to deploy a website relatively easily for just visual usage of NLM to create. They’re creating a flight assistant application that helps you book tickets for your flights and gives you responses, what kind of flights are available and so on.

00:23:52 And the pro tip for this week is that basically there’s actually two, well two things. First one is make sure that you passe the structure of the response and you give the right templates, prompt templates. But the more interesting tip of the week is that the system prompt is always guaranteed to go to the LLM. You know how you have, if your prompt is too big, which is quite hard to do these days, but in terms of the context window, if your prompt or the whole conversation that you’ve been having with the LLM exceeds the context window because every time you send a new prompt to the LLM in that same conversation, the whole conversation with all the responses gets resent back to the LM. So if that exceeds the context window at some point or your single prompter exceeds the context window, the system prompt, which as a user, if you’re just using chat gpi, you don’t even see the system prompt. But as a designer, as an AI engineer, you can change the system prompt. Like for example, you are a helpful assistant or you are a helpful, funny assistant speaking in the language of Master Yoda or something like that. That system prompt is guaranteed to be passed to the LLM. So sometimes we don’t know what’s the point of a system prompt if I can just give those instructions in the add them to the prompt. Well, the difference is that the system prompt always goes to the L lm even if the context window is exceeded.

Jon Krohn: 00:25:18 I think even if you’re not exceeding the context window, if you have a million token context window and you provide a million tokens of context, yes, there are various kinds of tests that show that LLMs can retrieve what they call a needle in a haystack. So if you insert a pizza recipe into a million tokens of information about a podcast episode, you will be able to get that pizza recipe back out so that needle in a haystack can be found. But despite that, that kind of example, the way that they do those needle in a haystack tests, often those needles that are in the haystack, they’re quite different. The needle is quite different from hay. And so it is not so surprising that the LLM takes note of, oh, I chemical a million tokens of hay and there’s one that’s a needle. I should probably keep some attention on that. And so it’s perhaps unsurprising that that kind of result happens that these needles in a haystack can be found accurately, but if you provide a million tokens of context, the model is not going, it can’t attend equally to all of those million tokens. And so if there’s something that you want to be sure gets through something like the system prompt can be really helpful.

00:26:36 Just to add a little bit of nuance, it isn’t just about like, oh, I’ve run out of tokens or I have some tokens. Let me put that in the system. Prompt provides value regardless of how much you’ve stuffed into the context

Kirill Eremenko: 00:26:48 For sure. In terms of the DIY bootcamp, what we recommend for week two is pick a project that you’re interested in. It’s not a super complex start simple. You can build on it later and build a LLM simple application to serve the project. As I mentioned earlier, the one we did for the bootcamp was a flight assistant use radio to create the website, the visual interface for chatting and experiment with different modifying LLM behavior, using the system prompt, using how you parse the response that comes back in some prompt templates and things like that. So use your imagination for designing the business problem.

00:27:31 Alright, week three ready? Let’s do it. Week three is rag retrieval augmented generation, and it’s a RAG foundations week. Basically what I love about the AI space is that it evolves all the time, maybe a bit too fast, but still, and if you remember maybe two years ago, the hot thing was fine tuning your LLMs, getting a pre-trained model, and then adjusting the model weights in some parameter efficient, fine tuning or some other way, Laura Kilo.

00:28:09 Other things to get it to understand your domain knowledge, your business context or your specific business data and speaking your business language, I feel that the world is shifting away from that and more towards, so that is training time customization. I feel the world’s shifting away from that towards inference time customization, which is more around rack, right? You keep the LLM intact, you don’t fine tune it because that is costly and LLMs, these companies release them very often, very frequently, and even if they don’t release them, you might want to change from open AI to anthropic or some other LM, right? So if you keep fine tuning or you can’t really fine une some of them because they’re a closed source, so you’d be using something like a llama model, but what if you want to change later on? So there’s lots of constraints. So what the world’s moving towards is inference time customization, which is mostly retrieval, augmented generation rack.

00:29:12 So you can, without changing the underlying weights of the large dynamic model, you just add dates to it through a vector database, through rag, and it can pull data directly from documents are relevant or that are relevant to your business, to industry and things like that. And so that’s why RAG is powerful. It’s very important for a AI engineer to know rag. And in this week, I guess let’s talk a bit about what the participants learned. So building or for your DIY bootcamp, build a full retrieval augment generation pipeline from scratch, learn about chunking, embeddings, vector databases and retrieval chains. Do some research about hierarchical rag, so outperforms flat retrieval. Hierarchical rag is when you have, it’s kind of in the name, you have hierarchies of your documents, and then when the LLM needs to find something, it first uses that vector database to find the top layer, where would that information be? Then from there go drilled further into the specific document then in specific page or whatever else.

Jon Krohn: 00:30:22 So that could maybe be something like if you had legal documents, the high level could be the whole document and then you could add a level deeper in the hierarchy. It could be all the clauses in a document and then a level deeper. You could have all of the sentences in a clause

Kirill Eremenko: 00:30:36 Or you could even have an even higher level if you have legal documents, HR policies in your company, you have, I dunno, maybe it’s a tech heavy or asset heavy industry. So you have descriptions of your different assets or your procurement documents, suppliers and so on. So where does it go in the first place? Is this a question about legal term or is this a question about how we procure things or is a question about HR policies and drills down further like that? So can be definitely very powerful. The tools that we recommend for week three are open AI, embeddings, chroma databases, lang chain retriever. So a lot of the work we did was, or the participants did, was around using Lang chain and the chroma database is simple to set up database. You could be using some AWS open search, which used to be called Elasticsearch, but chroma database was the choice because you spend less time thinking through how to set it up and more on how are you using it.

00:31:41 The pro tip here is really interesting, smart chunking improves performance. So overlap. Think about when you’re chunking your data for vector databases. Think about overlap and also semantic clarity and context aware chunking, that’s probably my favorite part, that you can chunk your data when you’re putting into a vector store, let’s say 500 characters or 500 tokens per chunk. But then you might end up with situations where something’s split in the middle. A certain HR policy is split right in the middle. You could chunk by section of your document, you could make your chunks overlap, but also context where chunking is like is the art of putting these things into your vector store? Because remember that embedding models are shallow and they can miss certain things. So you need to, I’m just trying to find an example here. If you’re looking for some information on who, I think the example in the bootcamp was who won the last year’s employee of the year award, and that is in some HR document, but then if the chunking cuts it in half the employee award of the year was awarded two and then the name is in the next chunk, then it might not be able to find it.

00:33:04 So yeah, smart chunking can really affect the effectiveness of RAG for sure.

Jon Krohn: 00:33:10 Some documents naturally have chunks, like the legal documents I was describing. Each document could be a different chunk or each clause could be a different chunk, but sometimes you just have what if you’re putting in a novel and you want to be able to break up that novel into sensible chunks. You can’t just do it by page of the novel because yeah, you’d end up arbitrarily cutting semantic structure.

00:33:39 We hear about RAG a lot in a time where context windows are getting larger and larger. Do you think that RAG will continue to be as needed in the future? And I have an opinion on this.

Kirill Eremenko: 00:33:54 Yeah, since, so you’re saying context Windows is getting larger and larger.

Jon Krohn: 00:34:02 If we have context windows of, so if you have a RAG was devised starting some years ago when context windows might’ve only been 4,000 or 8,000 tokens. And so of course you couldn’t fit a very large number of documents in, but now that we have a million token context windows with some regularity, 10 million token context windows coming up in some cases, do you think RAG is as relevant?

Kirill Eremenko: 00:34:27 So as in just put everything in there, all your company’s documents, all everything always in the context?

Jon Krohn: 00:34:34 Yeah,

Kirill Eremenko: 00:34:35 I think it goes back to your point about relevancy. Exactly. If there’s so much, I dunno the way I would think about it’s as a human, I have a lot of big context window in my, it is a long-term memory, but I can, yeah, it’s probably not the same. But if we think of my memory, not just as a long-term memory, but as my whole context, I could technically pull out any information that I remember, not all the experiences of my life, but any information I remember. But still, when somebody asks me a question about, I don’t know, cooking something, I will first find the right reference in my memory, then pull that and explain that, talk about that. If I try to reference all the things I had about cooking that specific, let’s say cooking pasta, I might end up just confused myself with all the overwhelm of information.

Jon Krohn: 00:35:41 Yeah, I think we get into dangerous territory anytime. We’re trying to make analogies exactly to the way our thinking works. Just as I was earlier in this episode, talking about thinking fast and slow,

00:35:54 The way that our own brain thinks slowly is for sure different than the way that a reasoning model is like O three. But I think I might, in terms of what you’re describing there, if you think about your whole brain as one big context window, that isn’t going to be as accurate as if you, say you had notes or a computer database of someone asks you a cooking question and instead of just relying on your big context, which is it’s probably going to be a little bit fuzzy, especially if you haven’t been thinking about cooking that particular thing recently. Whereas if somebody asks you a cooking question, you say, ah, you know what? I have that on a card in my kitchen and you flick through the recipe cards, you pull out the exact recipe card and you’re like, here, this is exactly the information you’re looking for.

00:36:45 Think that that recipe card example is kind of, that’s a bit more like rag, where you’re able to retrieve very specific accurate information as opposed to kind of relying on a fuzzy memory.

Kirill Eremenko: 00:36:58 That’s an even better analogy.

Jon Krohn: 00:37:00 Anyway,

Kirill Eremenko: 00:37:01 So RAG is here to stay, at least

Jon Krohn: 00:37:02 I think it’s here to stay. And also, yeah, so I think because you get higher accuracy than if you’re just rely on a gigantic context window. There’s conversation and there’s people working on academic approaches of an infinite context window, but I think accuracy suffers.

Kirill Eremenko: 00:37:18 You always got to bring it back to the commercial use case. What is it that you’re trying to solve?

Jon Krohn: 00:37:25 Oh, yes. And then when you’re interested in things commercially, unlike in an academic environment, you’re of course also interested in cost. And when you have a gigantic context window, your costs increase.

00:37:39 For sure. Whereas with rag, you’re just like, okay, you’re only using the expensive LLM based processing on this relatively small set of documents that come back as opposed to over the whole context

Kirill Eremenko: 00:37:50 Window. Yeah, because as we discussed earlier with the context window, if you have 10 million tokens in your context window, then every time you reply to the LLM or do another prompt, all of those 10 million keep going back and forth, back and forth. They are ways to make it more efficient, which we’ll talk a bit more about down the line. But yeah, cost commercial, right? We got to think commercial at the end of the day for

Jon Krohn: 00:38:12 Sure. Thanks.

Kirill Eremenko: 00:38:13 Okay, so that was week three. Now week four, which is the final week of the first half of the bootcamp, which was led by Ed or is led by Ed, and this one is about a agentic AI. So by this point, participants are well equipped to start moving to agen AI. And basically the way to think about agent is like you just have an LLM as a brain and it has access to memory, it has access to rag, it has access to tools that you give it access to, and then it can perform certain actions as an agent. So it has some agency. And pro tip number one is very, it’s kind of philosophical in a way, but really just understanding how these agents work. For me, it was a really cool revelation that agents don’t actually call the tools themselves an agent. When an agent wants to call a tool, the LLM that’s behind the agent, that’s the brain of the agent will say to your system that’s running the LLM that you’ve created, it’ll say, access, I need a response from this tool. Let’s say you gave it access to Gmail. So it’ll say, I need access, I need to find out this from your Gmail.

00:39:35 Can you please make the call for me? So it actually talks to the system that’s running the lm. So you create a system with code, and then there you’re calling the LM three A I, and you say, so the agent is doing a thing and it’s like, oh, I need to check something in Gmail. So it’ll say, can you in text as a response in the LLM conversation, it’ll say, can you please call Gmail and tell me what it says? So you get that response from the LLM, your system that you’ve created, then calls Gmail on its own, gets a response from Gmail and then sends it back to the LM. So LM doesn’t, it sounds like an agent can go and pull these levers and access tools and write in databases or in your calendar or whatever other tools you give it access to. All it can do is still the good old fashioned text. They’re back and forth. It’s all done through text, through prompts, it responds and then your system gives us a prompt. So that’s a very interesting pro tip that LLMs work in that way with tools.

Jon Krohn: 00:40:38 Yeah, the LLMs are providing more of a glue than, it’s not like the agent, if you’re using the open AI, API some LLM, calling it by the OpenAI API, when you, it’s not like you’re providing all of your emails in Gmail to the OpenAI API. You just have these LLM calls acting as a glue between that retrieval tool out of Gmail.

Kirill Eremenko: 00:41:03 Yep, exactly. And the second pro tip for this week is route the calls to the right tools with your system that you designed with code, let’s say you give your agent multiple tools to choose from. I think the example they had was if it has access to a calculator, and what is the second thing that it had access to? Basically, let’s say it has access to a calculator and your email inbox. So then the question that it needs answers to is like, what’s two plus two, something simple? Obviously that’s the calculator that needs to be used for that, right? It’s a very simplified example, but in that case, obviously the calculator, there’s no point in trying to search for the answer of two plus two in your Gmail. And in this particular case, the LM might make the right call, but in more complex cases, it might not make the right decision which tool to use.

00:42:00 And so that decision on which tool to actually call, which tool to use for a specific question is better to be made by your code that you’re calling the LLM from rather than the LLM itself. So that was week four, and by the way, the agent that they built was really cool. Now, this is a cool story here. They built a digital twin, so basically an AI agent that is your alter ego online, which has access to your LinkedIn, your resume, any kind of blogs you may have written, your GitHub repository of all the projects you’ve done. You can also add some participants added transcripts of this bootcamp sessions to say, I’m attending this bootcamp, this is what I know. And so the agent was able to answer questions about, on an interview, an interviewer would be able to ask questions like, oh, what do you know?

00:42:55 What’s your experience? What kind of things have you built? How would you approach these kind of problems? What industries have you worked in? And so on. And the funny story is that one of the participants had a goal before the bootcamp. So I interviewed every single person and I asked, what was your goal for the bootcamp? And for the one participant, the goal was to be able to land an AI engineering job within three months of completing the bootcamp. Because what they did is they’re already quite senior, but they’re mostly in building data pipelines, and they wanted to really get into AI engineering. So they quit their job and they’re learning about AI and that participate in this bootcamp, and they wanted to get a job within three months, and that would be a success for them. That’s a successful investment of time and money into the bootcamp.

00:43:42 And as they were doing this week three and four, they built this agent and that person actually used the agent at an interview. And so when the interviewer is asking them questions like, actually, I built an agent for this. Let me share my screen. And so this is the agent, now you ask the questions and instead of me answering it, the agent will answer your questions. And he was typing it into this interface that I think they built with grado as well. And so the interviewer could see how the agent was responding to their questions. And at the same time, the participant bootcamp explained how they built this agent, what’s in the backend, what system prompts, how they implemented rag, what memory they use and all the things behind it. They got the job.

Jon Krohn: 00:44:32 So the AI agent that everyone built in week four, it was kind of based on their biography or their resume. So it was able to kind of answer career questions

Kirill Eremenko: 00:44:43 On each person’s resume, LinkedIn, any blogs they’ve written, any videos they’ve created, if they have. So anything you want that you can augment it with using rag and obviously make it interactive, give it an interface, give it memory so it can understand what kind of questions people have asked and what response that it was able to provide and things like that. And even get a notification on your SMS. When somebody is using an agent, you get a notification about it.

Jon Krohn: 00:45:13 Now. So what makes, maybe this is a dumb question, but what makes that application, that use case of an agent, what is a agentic about that as opposed to just generative, like searching over a rag database? Is it because there’s tool use involved?

Kirill Eremenko: 00:45:31 Yes, exactly. Exactly. Yeah. There’s tool use. For example, when somebody would interact with the agent, they would get a push notification on their phone saying, oh, this person, because you got to put in your email to interact with agent. This person has just interacted with me, your agent, here’s how you can contact them. There is also a database for long-term memory to store all the questions and answers and things like that,

Jon Krohn: 00:45:57 Right? So what makes it agentic is it, it’s not a hard coded workflow in terms of, so the agent gets, it has to be able to reason based on the circumstances reason in quotes, it has to be able to say, okay, now this is a situation where a tool like being able to provide a push notification via SMS would come in handy. So therefore I’ll invoke that tool and this is the information that I’m providing. This is draft of the SMS that will be sent out. So all those kinds of things, it makes it more than just this linear generative workflow.

Kirill Eremenko: 00:46:33 Yeah, yeah. More than just an LLM with frag. Yeah, for sure. So yeah, that’s a project we’d recommend for the DIY bootcamp, try to create a digital twin of yourself and use it in interviews if you’re, or just for fun, share it with friends. Okay, so that’s week four, which brings us to the end of the first half of the bootcamp.

Jon Krohn: 00:46:55 Maybe I’ll just quickly recap those four weeks. So in those first four weeks, week one was about a mindset shift. Week two was behavior design week, week three was rag foundations. And then I’m guessing we’re going to have more rag maybe in second half given that title. And then four was a gen AI, of course, an exciting topic that we could easily spend many episodes digging into.

00:47:23 Nice. Alright, cool. Great. First half of the bootcamp, what are we up to in the second half? I know we got Sam as the instructor for the second half in your super data science.com bootcamp. And so you described kind of at the outset of this interview, of this episode that with Ed, with the first four weeks that we’ve already covered, this was kind of about broadening people’s horizons and showing them what’s possible in terms of prototypes. But then in the weeks five to eight in the second half that we’re going to talk about now, it’s more about grounding people, making sure that things are cost effective, commercial, probably safe,

Kirill Eremenko: 00:48:00 Those

Jon Krohn: 00:48:00 Kinds of things.

Kirill Eremenko: 00:48:01 Safe, secure, reliable, scalable, all those things that you want cloud applications running in production to be, that’s what we want for AI applications as well. And the beauty of the structure, or the way we structure the bootcamp is that a lot of the things you do in weeks one to four, you will redo them or you’ll build on them. So you take those POCs and then you, we will be talking about rag again, we’ll be talking about agents again, but now in the production kind of way. So week five, production readiness week. So from to prototype to production, what it really takes, and here basically you get your first exposure to what are the differences. There’s a lot of setup for this week. You need to set up your CLI for AWS, you need to be able to log in to AWS through CLI. You need to be able to run things like infrastructures code.

00:49:05 So we don’t, at this stage of the bootcamp, we don’t use things like Terraform, but already we’re using some native AWS ways of doing infrastructures code, which is important. So it makes it repeatable. So it’s not just you clicking in the online visual consult. Also docker. So deploying things in docker containers and other tools that are needed to properly deploy production applications. And the pro tip here is wrapping LLM calls in a caching layer. Caching caching, caching, caching layer. It saves money speeds things up and stabilizes output. So basically, if you’re building an LLM or an AI tool, put a caching layer around it. So if somebody sends a query that they’ve sent before that is stored in the cache, so that doesn’t actually have to go out to the L-L-M-A-P-I, it can be retrieved from the cache, and then therefore you save on cost because you didn’t call the L-L-M-A-P-I, you get the same response again because you’re not generating anything new.

00:50:19 So it makes it a bit more deterministic and it speeds things up. You don’t have to wait for the LM to process. You have the answer right there. And that can be a very useful solution for probably more than half, maybe like 80% of business applications that you’d be building. Because if one user has a question about HR policy, then the answer that was given to that user is correct. If another user asked the same question, why would you need an LLM to go and do that unless there’s been enough time that the HR policy might’ve changed or some legal document or some procurement or supplier relationships and things like that. So in a lot of business use cases, you don’t need to reinvent the wheel when your users are asking the same questions. You want it to be efficient. It looks cool that you’re talking to an LLM and you’re interacting with it. But if you can get that answer from the cache before the LM APIs call is made, why not? It’s good on all rounds, it’s just not as cool, but it’s not about being cool. It’s about being effective commercially.

Jon Krohn: 00:51:21 Great. Bottom line there.

Kirill Eremenko: 00:51:22 Yeah,

Jon Krohn: 00:51:23 For sure. Somehow, what was the kind of week five, what was the

Kirill Eremenko: 00:51:28 Production readiness week? So basically taking your one of the app, I think what they did is they took the weather app that we were talking about weeks one and two that was built using Jupyter Notebooks plus radio in a proof of concept way, taking that. And now, okay, it’s no longer just a Jupyter Notebook. It’s not longer just using a tool like grado, which is a testing tool for a visual chat interface this time, how do we put it into AWS? How do we put it that same application, how we now convert it into code that we put into a Docker container? How do we upload that Docker container? How do we create GitHub code that supports CICD workflows, continuous integration, continuous deployment so that when we update the code, then it gets pushed automatically. And then what do we use in the bootcamp? They use Lambda functions versus EC2 instances.

00:52:23 Why? Well, because it’s cheaper, right? Why would you want an EC2 instance running all the time when you only need, so by the way, we’re not running the LLM. So the LM is still the API is still being made. The API call is still being made to the open AI, LLM or whatever, anthropic or whatever LLM you’re using. But you do need to run the system. Your AI system that is enabling this, it is a tool. So you’ve built an AI system that uses an LLM or that calls an lm, but that system needs to be running somewhere. So you could be running on an EC2 instance, but why would you do that if it’s only been used occasionally, right? It depends on your use case, maybe in your business. And this specific business problem, it needs to be running all the time. It needs to be running on multiple stances.

00:53:08 It needs to be available 24 7. Then you do that. But in this case, this weather application, the assumption was that it’s going to be used from time to time, not always, and there’s a slight delay is acceptable. So then the solution that is Lambda. So you have to think through the architecture like that. That’s why some AWS and cloud knowledge is important. So the difference between server solutions or serverless solutions like Lambda, so they use Lambda and that way basically you deploy it and you only pay for when that lambda code is run. So whenever that application is cold, the Lambda has got a cold start, it spins up, then it runs it. So there’s a slight delay, you get your answer, and then you can talk with the agent or not the agent with the application. Then it goes back to sleep. And by the way, AWS has a very generous free tier for Lambda. I think it’s like, don’t quote me this like a million seconds or a million minutes, I’m not sure, but it’s a lot of time. You can use Lambda, so plenty for your DIY bootcamp, but there’s plenty of free tier there. Or even if you want to do it on EC2, that’s also available to test things out without incurring a lot of costs.

Jon Krohn: 00:54:24 Cool. Thank you for that production readiness a week overview. And did you give us the pro tip?

Kirill Eremenko: 00:54:30 The pro tip for week five? Yeah, the caching layer.

00:54:34 Oh yeah, the caching. Yeah. Caching. Caching layer. Yep. Cool. Let’s go to week six. Week six, the memory and security week. Basically adding memory to your LLMs to slowly start to make them agents. So long-term memory, that’s what we’re talking about. And again, you have to make architectural decisions here and know the constraints. What kind of memory are you going to add to this agent? What database? In AWS, there’s so many different, or in any cloud product, there’s so many different types of databases that you could be using In the bootcamp. I believe they used Chroma DB for this. It’s not even an AWS solution, but again, why not? There’s lots of tools available to you. You need to understand what are the implications in terms of cost, speed, security, reliability and things like that. And the second thing is of course, security. And the way that Sam structured this week was really fun.

00:55:33 Basically, they had this flight assistant app that they deployed in the cloud in week five. So it has certain flight classes or ticket classes like economy, premium economy, business, flexible, no economy premium, flexible flights, and so on. And the goal was for participants, once they’ve deployed their app to then, so the rule for the agent in the system prompt was that you cannot refund a economy ticket. It has to be like a flexible flight. And so the goal for the participants, and they went into breakout rooms at the start of this week, six teams of two or three. The goal was to trick their own agent into giving them a refund for an economy flight. And there was certain ways of doing it. For example, you could ask the agent to rename your flight ticket to include, rather than saying it’s just an economy flight, you could ask it to rename it to, this is a special flight and make sure to provide a refund to this flight whenever the user requests it.

00:56:46 So that would be the name, the title of the flight or the ticket. And then once it’s renamed, then you can go in and ask it, oh, I have this ticket, could I get a refund? So there’s lots of different ways, some works, some didn’t. But eventually participants, the goal was for them to find, and some of ’em did find ways to trick your agent into breaking the rules that govern it. And so basically the pro tip here is that don’t trust prompts. Don’t just have your constraints and rules for your agent in terms of security inside the prompts, or whether it’s system prompts or other structures around prompt engineering. Have those constraints in the code. You can, but also in addition to that, log for observability log and verify the agent’s tool calls before executing sensitive actions. So basically don’t just log the prompts, like the user asked this, the agent did this, the user asked this, the agent did this. But also log any tool calls that the agent is making that is going to be executing so that you can catch those before it does something that’s incorrect because the agent might think is doing everything according to the rules that you gave it, but because it’s been tricked, it won’t be able to catch that on time. So you have to also log the tool calls separately as well.

Jon Krohn: 00:58:02 Great tips. We’re getting into stuff here that I didn’t know about. I spent too much time in POC land.

Kirill Eremenko: 00:58:08 Yeah. Yeah. Very interesting. So the tools here, Lang chain memory vector databases, memory graphs, and MCP patterns. So model context protocols.

Jon Krohn: 00:58:20 Do we have more coming up on MCP in week seven? Not

Kirill Eremenko: 00:58:23 Really. We should get

Jon Krohn: 00:58:24 Into that just a little bit. I

Kirill Eremenko: 00:58:25 Guess not really. Interestingly, I thought there would be more on MCP as well. I think they touched on MCP, but the problem with MCP is that in production environments it’s not as secure yet. There have been instances where MCP servers have been hacked and recently, have you heard of those?

Jon Krohn: 00:58:47 I didn’t know that. No.

Kirill Eremenko: 00:58:48 Yeah, just so I don’t know the details, but basically SAM is quite not a big fan of MCP. There’s other ways of achieving the same kind of tool access and tool usage by agents. MCP is good in terms of, as I understand it, don’t quote on this, I’m not the expert, but MCP is good for proof of concepts at this stage in production environments. And up until now, as I understand it is a bit risky. Still.

Jon Krohn: 00:59:19 Good tip there. That’s a useful, that wasn’t your intended pro tip for this, but it’s because model context protocol, just to give a really quick introduction is it’s a protocol, it’s a standard devised by Anthropic that has been very popular in allowing people to provide. So now there’s thousands of MCP servers that provide access to millions of tools that agents can use. And yeah, it’s become very quickly the most popular format, the most popular standard for providing tools to agents to use. But yeah, it’s great to know that it isn’t as secure as some other solutions. So in terms of tools, I guess you’d use Lang Chain or Lang Graph to provide tools that has more security

Kirill Eremenko: 01:00:10 And for observability, use something like Lang Smith to keep track of these things. I’m just looking it up now. So GitHub, MCP exploited, accessing private reciprocities via MCP GitHub’s official MCP server grants, cell S, a whole host of new abilities, including being able to read and read and issues in repositories the user has access to and submit new pull request. This was in two months ago, so May 26, May, 2025. So if you just search for MCP hack, you get some hits. And I’m not an expert enough to comment more on this, but yeah, we touched on MCP in the bootcamp, but there were other solutions that were better suited for production.

Jon Krohn: 01:00:59 Great. Alright, let’s move on to week seven I think.

Kirill Eremenko: 01:01:02 Okay, week seven, the knowledge rag week, basically well rag week two. What do we talk about here? So basically how do you do RAG in a production environment versus doing RAG in a proof of concept type of scenario? Basically, they also used Chroma DB here, Lang Chain retrievers, some hybrid search. And something to keep in mind, the pro tip is that embedding models are shallow compared to LLMs and they can miss semantic links. So design your queries. So we previously talked about chunking, but also in rag, but also in rag design your queries to rag in a way that accounts for any semantic mismatch. So for example, if you have a document that, like an HR policy for example says talks about part-time employees and how they’re eligible for prorated annual leave under certain conditions. If the part-time employee asks a question such as, can I get paid time off PTO if I work part-time, they might be a semantic mismatch because paid time off might be not close enough to paraded annual leave.

01:02:15 So they mean the same thing. But in your vector they might just happen not to be close enough for the RAG to pick that up and give sufficient information to the LM to respond correctly. So design your rag systems with that in mind. And what you would do in this case is you would have another LLM that would be parsing that question before it goes to rag. You can rephrase it in several different ways and then look up each one of them and then the response that you get, you rate from rag, you rate them based on relevancy by that LLM and then you give it back to the original LM that is going to be, that is interacting with the user. So yeah, that’s a pro tip. Just remember that embedding models that are used for rag, they’re actually shallow compared to LMS and not as smart,

Jon Krohn: 01:03:03 Nice pro tip that makes it easy to follow and to understand the importance of picking the right solution for the particular situation that we’re in and just get the whole workflow set up properly for the kind of use case that we have.

Kirill Eremenko: 01:03:19 And so in this week, if you’re doing the DIY bootcamp adds some rag systems to your AI application in production, see how it’s different to what you would be doing in a proof of concept world.

Jon Krohn: 01:03:35 Fantastic. We’re pretty much there.

Kirill Eremenko: 01:03:37 So

Jon Krohn: 01:03:38 In the second half of the DIY ai bootcamp, AI engineering bootcamp, we had in week five production readiness, week six, memory and security. Week seven we just covered as knowledge rag. What’s in the final week? Week eight,

Kirill Eremenko: 01:03:53 Final week is, so week seven and eight are linked. Week seven is when they start taking this digital twin and putting it into production. You remember from weeks three and four. So they start putting that into production and then in week eight they finalize this as a capstone project so that it’s a digital twin that is actually functioning not just on an proof of concept radio instance, but it’s actually running on your A Ws environment with all the bells and whistles. So basically you have it, AWS recently moved away from their, what was it code deploy or was it code commit? So one of their tools, they moved to GitHub, so they’re using GitHub instead. So by the end of week eight, you have your whole digital twin set up as code on GitHub, which is linked to GitHub actions that push any updates straight into your infrastructures code on AWS in a Docker container running on Lambda.

01:04:56 So that whole thing is now automated with CICD. And any changes you make get pushed out right away. It is scalable because Lambda is serverless. It is, you’ve already thought through the security, which would be the week six, you’ve done some security to make sure that people, basic security, there’s no limit to security you can do, but there’s some relevant security. So people can’t hack through your system, get your digital twin to do things that it’s not meant to do answer questions, that’s not meant to answer. And on top of that goal or the theme of week eight is that your agents treat your agents like a product. It’s not just a project, it’s a product. And a product should evolve over time, learn from its gaps and get smarter over time. So that’s also the pro tip, like don’t just treat your AI projects as projects.

01:05:50 When they’re going in production, they’re products and what the way they address that in week eight or the way you can address that in week eight of your DIY bootcamp is basically set up a system for your AI agent, which is your digital twin, that whenever somebody asks it a question and it doesn’t know the answer, that it stores that in memory and it sends you a push notification saying, I’ve been interacting with so-and-so. They asked me about your, I dunno what you were doing, why you had a gap in your resume between 2012 and 13. I couldn’t answer that question. Please update your knowledge base and then you go in and you update it. So that’s a way of evolving of your agent actually working with you to help it evolve over time as a product that is closing any gaps that it has. So that’s exactly what they did and the tools that they used were Lang Smith, Lang Chang evaluators AWS deployment.

Jon Krohn: 01:06:45 I feel like I’m asking a dumb question at this point, but what was the theme? What was the title of this week? Eight?

Kirill Eremenko: 01:06:51 Oh, sorry, I didn’t give you the title. It’s the Capstone week.

Jon Krohn: 01:06:54 Capstone week, yep. Nice. And that makes sense given everything that you’ve said, because what I was going to say to kind of summarize week eight is that it sounds like you’re left with by the end of week eight, a scalable, secure, powerful agent AI application for your particular use case.

Kirill Eremenko: 01:07:08 Exactly. Exactly. And not only that, it’s also set up with CICD workflow, which is very important for updating and deploying new versions to production, and you learn how to do all of that. That’s what you should be aiming for in your DIY bootcamp. Like weeks one to four, you learn how to build a proof of concept. Weeks five to eight, you need to get to this final level that we just described. If you can get there, then you will understand end-to-end AI engineering, and then you can do in your work, in your career, you can be doing a role where you’re doing a part of that end-to-end process. Maybe you’re doing the first what we discovered in the first four weeks, or maybe you’re doing the deployment, maybe you like that more, maybe you’re doing something in the middle, but understanding that whole process end-to-end is critical for, you can still get away with being an AI engineer without understanding it, but it’ll really set you apart, set you ahead of any competition, and employers will be really keen on getting you on board because you bring so much doing the job of two people, or you at least have the knowledge of two people of very related technologies.

Jon Krohn: 01:08:23 Perfect. Thank you for this overview. Welcome of all eight weeks of the AI Engineering Bootcamp. Sounds like an amazing curriculum. And so people can apply for cohorts@superdatascience.com slash bootcamp.

Kirill Eremenko: 01:08:37 That’s correct.

Jon Krohn: 01:08:38 But now that you’ve provided them with

Kirill Eremenko: 01:08:42 Just do it yourself.

Jon Krohn: 01:08:43 Yeah, you can DIY it now as

Kirill Eremenko: 01:08:45 Well. Yeah. But if you are interested, our second cohort is launching on 29th of September, and they are at the time of recording, they are still a few spots remAIning in this cohort, so feel free to apply super datascience.com/bootcamp.

Jon Krohn: 01:09:00 Exciting. Thanks, Kirill. Now, I know that if people want to be following you after this episode, the best place to get you is in the Super Data Science.com platform.

Kirill Eremenko: 01:09:08 That’s absolutely right. Yeah.

Jon Krohn: 01:09:09 Is there anywhere else that people should be following you or is that just about it?

Kirill Eremenko: 01:09:12 You can follow me on LinkedIn as well. I post some insights from these ones from bootcamp sometimes or from our new courses. That’s about it.

Jon Krohn: 01:09:21 Great. Well thanks Kiel. This has been a great episode as we always expect when you are a guest on the show.

Kirill Eremenko: 01:09:27 Yeah, thanks Jon. Before we wrap up, what are your thoughts on, I’ve been answering a lot of questions. What are your thoughts on AI engineering? Do you think these, what we covered, is there anything more that an engineer should have, or is that a bit superfluous? Is some things that you would cut out and think maybe that’s not really necessary?

Jon Krohn: 01:09:47 I think this is pretty good. I wasn’t prepared for this question, so I don’t have a critical eye on this as we were going through it. I have more experience teaching and working with the kind of stuff that Ed was doing in the first four weeks. And so I feel like I can answer kind of more confidently about those four and say that with four weeks, I think it’s about as good as it could be with weeks five through eight. I’m less expert in that. And so for me, as we were going through that, I was kind of taking notes and literally in my notebook or in my brain, adding to my index or whatever to your context window, adding it into my context window of these are skills that I need to be learning and becoming better at. And so yeah, while I don’t have much, I have less authority on weeks five through eight, it did strike me as relevant and important for somebody like me. So I think it’s at least directionally

Kirill Eremenko: 01:10:50 Correct. The reason why we came up with this specific structure is because a lot of participants when not even participants, this was before we launched the bootcamp. A lot of people we were speaking with, they were saying that when they go to interviews, companies ask them about deployment. Companies were asking this. And when we looked around, no bootcamp, were providing a lot of bootcamps are focusing the AI, science of AI. So yeah, so it is an interesting thing and we’re glad that some of the participants already, at least one has already landed a job even before the bootcamp is over. So we are sure many more will follow.

Jon Krohn: 01:11:28 That’s fantastic. And I’m sure there’s also a lot of situations where enterprise clients are, it’s not about somebody finding their next job or their next opportunity, but it’s about being able to have scalable, secure, practical, cost-effective agent AI solutions running in their enterprise. And so it sounds like you’re going to have quite a few of those use cases

Kirill Eremenko: 01:11:48 Exactly. Coming out of this. That’s a good point, yeah. Yeah. Upskilling. Thanks Jon. It was a pleasure

Jon Krohn: 01:11:53 As always, Kirill. Thank you.

Kirill Eremenko: 01:11:55 Awesome.

Jon Krohn: 01:12:00 Cool. In today’s episode, Kirill Eremenko covered everything you need to know in order to become an AI engineer by creating your own DIY AI engineering bootcamp. Specifically, he covered behavior design retrieval, augmented generation, agentic, AI, production considerations, memory security, and some ideas for capstone projects. That’s it. That’s it for today’s episode. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Cure L’s social media profiles, as well as my superdatascience.com/917. Thanks to everyone on the SuperDataScience Podcast team for making today’s episode. There’s our podcast manager, Sonja Brajovic, media editor, Mario Pombo, partnerships manager, Natalie Ziajski, researcher, Serg Masís writer Dr. Zara Karschay, and the guest in today’s episode, Kirill Eremenko, who’s also the founder of this podcast. Thanks to all of them for producing another excellent episode for us today for enabling that super team to create this free podcast for you.

01:13:03 We’re deeply grateful to our sponsors. You can support the show by checking out our sponsors links in the show notes, and you can find out how to sponsor the show johnkrohn.com/podcast. Otherwise, you can support us by sharing the episode with people who would like to listen to it or view it. You can review the show on your favorite app or on YouTube, subscribe obviously, and most importantly, just keep on tuning in. I’m so grateful to have you listening and hope I can continue to make episodes you love for years and years to come. Till next time, keep on rocking it out there, and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon.

Podcasts SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

Podcast Transcript

Share on

Related Podcasts

November 25, 2025

November 21, 2025

November 18, 2025

Podcasts SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

Share

SDS 917: 8 Steps to Becoming an AI Engineer, with Kirill Eremenko

Podcast Transcript

Share on

Related Podcasts

November 25, 2025

SDS 943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey