Podcasts SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

57 minutes
Artificial Intelligence, Data Science

SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

VP of Engineering at Dropbox Josh Clemm speaks to Jon Krohn about consolidating search tools across apps with the AI-powered workspace, Dropbox Dash, the new collaborative AI systems that enhance interoperability between team members and their projects, and how to avoid “context rot”.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Josh

Josh Clemm is Vice President of Engineering at Dropbox, responsible for AI engineering and building Dropbox Dash – our AI-powered universal search and knowledge management tool. Prior to joining Dropbox in 2024, he was Sr Director of Engineering at Uber, leading the Uber Eats food delivery team. Notably, he started the Uber Eats AI team – improving the experience across marketplace ranking, search, recommendations, and conversational AI. He started his career in defense applying AI to real-world tracking, autonomous vehicles, and classification. He earned his M.S. in Software Engineering from Carnegie Mellon University and B.S. in Computer Engineering from Cal Poly San Luis Obispo.

Overview

Josh Clemm says that the team at Dropbox has found a way to facilitate search and sync across your digital ecosystem. With Dropbox Dash, a new AI workspace, users can get the best of Dropbox’s cloud storage and search functions, plus a “universal search” ability to locate information across multimedia and apps. As Josh explains to Jon Krohn, the best AI chats are those that function as teammates, moving away from the siloed “personal assistant” for single use and towards a chatbot that fully understands the team. “AI really needs to understand you and your team, first and foremost, and all that connected data,” says Josh.

With so many CEOs feeling the pressure to adopt AI before fully understanding how it can benefit them and where they might need to be more careful, Josh says it’s essential for companies to take a step back and ask themselves what they’re hoping to accomplish. Unclear goals risk companies returning little more than “slop”, complete with hallucinations, to their clients. Corporations should also prioritize training sessions, where employees can get their hands on demos before they actually implement AI in their work.

Jon also asked Josh about context engineering, and how its correct use is essential to ensuring that agentic AI systems are operating effectively. Complex queries may require tool calling from several remote servers, essentially clogging up context windows with millions of tokens that are unnecessary to answering the question posed to it. Of course, Josh also acknowledges that our context windows have to maintain a balance of “too much” information against so little data that it won’t know enough about the work. For this reason, Josh considers online context engineering a “high-status job”.

Finally, Josh explains that leaders shouldn’t be using AI as a crutch in decision making and that, to operate from a position of strength, looking at the data might not always be the answer. Sometimes, A/B testing can bog down otherwise innovative approaches and prevent companies from moving quickly. “You’ve got to stay sharp, stay convicted, stay human, and be very intentional with your decisions,” he says.

Listen to the episode to hear Josh Clemm discuss why browser-embedded AI is the next relevant step for cloud connectivity, why RAG isn’t dead, and Josh’s development of popular apps like the fantasy football application Draft Punk, as well as the apps Earthquake Alert, and yaddle.ai.

In this episode you will learn:

(01:07) All about Dropbox Dash
(10:00) The benefits of browser-embedded AI
(22:17) Why context engineering is so critical to agentic systems
(37:51) How creating apps helps tech leadership
(48:39) When to decide to use data versus intuition

Items mentioned in this podcast:

This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Airia and by MongoDB.
MongoDB.local San Francisco – Use the code “MDBBuilder” to get 50% off tickets.
MongoDB Agentic Orchestration and Collaboration Hackathon
Dropbox Dash
NoLiMa Benchmark
Clemm Apps
Okapi BM25
Masters of Doom by David Kushner
Super Data Science Podcast Team and Sponsorship Info

Follow Josh:

LinkedIn
X

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00 Think about how many search bars you use at work. Slack has one, Google Drive has another. Your email, your project management tool, your file storage. My guest today reckons knowledge workers juggle as many as 20 search tools daily, and now he’s built the fix. Welcome to the SuperDataScience podcast. I’m your host, Jon Krohn. Today my great guest is Josh Clemm, vice President of engineering at Dropbox, where he’s focused on building Dash, an AI powered search across every application you use for work based on his extensive experience, including eight years leading 400 engineers at Uber. This is an enlightening episode across AI building effective products and cultivating productive teams. Enjoy!

00:39 This episode of Super Data Science is made possible by Dell, Intel, Airia and MongoDB.

00:48 Josh, welcome to the SuperDataScience Podcast. It’s a delight to have you on. Where are you’re joining us from today?

Josh Clemm: 00:53 I am at the Dropbox offices in San Francisco, and right now I’m the vice President of engineering for Dropbox.

Jon Krohn: 01:01 Nice. Yeah, it’s like you’re reading what I was just going to say. You’re the VP of engineering at Dropbox leading the development specifically of something called Dropbox Dash, which is something that I’m excited to talk to you about in probably quite a lot of detail. So my understanding is that it’s an AI powered universal search and knowledge management tool, so I think people are probably predominantly aware of Dropbox. I am certainly predominantly aware of Dropbox as being a storage solution, and so this sounds like something that would enable me to leverage AI to make use of all of the information that I have stored in Dropbox.

Josh Clemm: 01:39 Yes and no. So if you think about the history of Dropbox, way back in the day, the story was really, you have all these files and they’re all in different computers and maybe you’re using a thumb drive and you’re, how do I get these things synced from one computer to the other? And that was really the origin story for Dropbox and obviously it’s been extremely successful about file storage, about syncing, and of course sharing. If you were to ask what is the 2025 version of Dropbox, what would that look like? Well, files are all in the cloud, and unfortunately there’s kind of scattered across all of your different work apps. You might open up your browser. Right now I’ve got probably about 50 open tabs throughout my day. I’m going over to Slack, I’m going over to our internal doc paper. You might have things in Google Docs, Google Slides over in Jira, and it’s really trying to figure out how to kind of sync all that in one place. So that is really a Dropbox Dash. It is a bit of a departure from just sort of pure file storage. We want to make sense of and organize your cloud content so that you can then search all of that and once you’ve done search, you of course can do chat, you can get answers, and then the more interesting stuff, a lot of the AgTech stuff comes after that.

Jon Krohn: 03:07 Nice. And so basically, if I’m understanding correctly, the no in your response to me saying, is it kind of AI search over my files? It’s because it’s much broader than that.

Josh Clemm: 03:18 That’s right. That’s right. Yeah, you can absolutely have Dropbox content with thousands of files. You could have images. We have a lot of customers who are creative, so they’re making documentaries, they’re sports teams and they’ve created all this footage. Maybe they’re doing music. So yeah, that’s absolutely one of the pieces that we’ll ingest and bring in, but it could be all your other work content really where people are working in today’s world.

Jon Krohn: 03:50 So this is I guess a general enterprise search tool, and in a recent article you said that enterprise search is broken and you positioned Dropbox Dash as a tool that unlocks the critical first step in many intelligent workflows, which is universal search. Can you elaborate for us on why enterprise search is broken and how you’ve devised a solution with Dash?

Josh Clemm: 04:13 If you’re trying to find something at work right now, you’re going to have to go to all these individual apps and they have their own search bar. You’re going to have 12, 20 different search bars to potentially consider, and that is just kind of ridiculous. Each one of these apps of course, have made our lives better for that particular area, but when you start to add it up, it’s very, very broken. You might have those files in all these different places, and so universal search is a huge unlock to overall productivity in the workplace. That’s kind of what we’re trying to do. That’s why we’re saying the traditional enterprise search is broken. You do need this more universal search, and that’s the context layer that we are trying to build for businesses.

Jon Krohn: 05:01 You’ve described this solution as an AI teammate, and before we started recording, we were discussing a little bit, I didn’t want to get into too much detail because I didn’t want you to spoil it for me on air, but you were talking to me about multiplayer AI, which is a term that I’ve never heard before and I spent a lot of time talking about AI. So is that related to this AI teammate idea? What is multiplayer AI?

Josh Clemm: 05:28 So think about a lot of the AI tools you’re using today. Maybe they’re different chat bots that you’re trying to bring in and use your work content. They’re kind of single player. I would say right now I interface. I might add a chat and they spit back an announcer. Great, okay, that’s great for me, but how is it better for my team or how is it making me be more effective at work? And I do think that’s what might be changing in the future. The most effective AI in our opinion, isn’t just that personal assistant. It is that teammate. It’s somebody that knows you, who knows your work and knows your team. You really have to understand where you’re sitting in the org. You sort of have to understand what projects that you’re working on, and a lot of that ends up being on the data layer.

06:18 When we bring in this various content from third parties, we’re doing a lot of work to normalize it to let’s say markdown. Then we do some amount of more advanced content understanding. Imagine these are PDFs, you’re going to have both text and images, figures, et cetera. We want to be able to extract that. You might see a bunch of images if you’re, again, more on the creative side, we need to understand what is in that image, how does it maybe connect with everything else, all the other docs. Of course, you have things like Slack and teams, more short form messages coming in, and you really have to almost build a graph. You want to sort of understand how are these things all connected, how are the projects? Might have some meeting transcripts, might have some documents, and it’s all connected to your teammates. It’s all connected to people.

07:13 So I think the first aspect of multiplayer AI really needs to understand you and your team first and foremost in all that kind of connected data. Then I think there’s some really interesting potentially more future looking product use cases where can you actually be doing something a little bit more collaboratively? We have a product called Stax, for example, and the idea here is you have content in all these disparate forms and sometimes these are just URLs. Sometimes these are just tabs in your browser and you want to kind of bring that together. Let’s say you’re working on a deck for the upcoming board meeting. Well, you’re going to bring in URLs. There might be slide content, there might be PDFs, and you might put that in one place, and then you want to go ahead and invite your team so that they can collaboratively see that, share updates, comment on things and ask chat for any sort of useful information from that. So that’s kind of my take on multiplayer ai. I think it’s both on that data layer and really understanding you and your team. And then on the application layer, I do think there’s some interesting innovations that might come from companies like Dropbox, but others I think are starting to explore how do you get multiple people to interact with these chat agents?

Jon Krohn: 08:33 I like this idea and it isn’t something that we’ve talked about in much detail on the show. In fact, it seems like such an important aspect that it blows my mind that we haven’t, when you talked about multiplayer ai, I kind of assumed that you were probably thinking about multiple agents working on a team together, but this isn’t that at all. I mean, so your multiplayer AI solution, it could be a single AI in a chat or it could be agent or it could be a whole team of agents, it doesn’t really matter. The whole point with your multiplayer AI is that the AI system understands more than one person, it’s interacting with more than one person. It can help your entire team to be more productive or be more creative or something like that.

Josh Clemm: 09:19 That’s right, that’s right.

Jon Krohn: 09:20 Cool. I like that. Hopefully we get to come back to that a bunch more in this episode because it seems like it’s a paradigm shift for my thinking and yeah, maybe we have a whole bunch of listeners out there who have already made that leap, but that is definitely a big leap in my mind, and yeah, I feel like it’s going to kind of change and color the way that I see a lot of features, a lot of technologies that I think about in the AI space.

Josh Clemm: 09:49 That’s right. Yeah, we got to get people out of their silos. One dataset just for me, two multiple data sets and potentially share it across your team.

Jon Krohn: 09:58 Cool. Alright, so I don’t know if this is going to relate to multiplayer AI at all, but my next question for you is about a Dropbox video in which you share your favorite Dropbox feature, which is browser extensions, and I don’t really think of myself having used Dropbox browser extensions and tell us about that feature and why you think browser embedded AI will reshape daily knowledge work over the next decade.

Josh Clemm: 10:26 So this is a part of the Dropbox Dash product. We really want to make sure we work where people are. Yes, we have a website. Yes, we have a desktop app. Yes, we’re in places like Slack with Slack bots, we have a mobile app, but a lot of people are working in the browser, especially for managers like me. I mean this is our IDE is the web browser. All of our apps are there. That’s where a lot of communication is happening and that’s where a lot of work is happening and you’re going off and you’re exploring the web, you’re doing a lot of research and I love that I’ve got this sort of a side panel where Dash is there. It’s able to take the context of the page that I’m currently looking at and it’s able to combine it with a lot of the existing data sources.

11:16 I’ve already connected with it and it ends up being just very, very powerful overall and it can enhance your workday quite a bit. I do think over time the browser could be a really interesting area of opportunity. I think you’re starting to see this in industry around potentially browser automations. You’ve got a couple other companies out there building these AI browsers, perplexity, OpenAI. There’s some significant security challenges right there. Anytime you provide tool use and agents to be able to do anything on behalf of your, you’d be very, very careful that they’re not subject to prompt injection and try to exfiltrate some of your data. And so I think there’s still a little bit of concern or at least thought around how those companies might be doing it or how you want to bring AI to the browser. At the end of the day, we still feel the approach, what we’re trying to do, where we ingest the content ahead of time. Building really interesting sort of graph representations of your data allows you to get the best of both worlds. You’re able to get those sort of combined data sets without necessarily needing to do that crawling. But I do think that’s an interesting area that a lot of companies are going to be looking at going forward.

Jon Krohn: 12:35 I see, I see. So it sounds like the browser extension is critical to dash being effective across more than just your Dropbox environment.

Josh Clemm: 12:42 Absolutely. Absolutely. Because again, you’re working day in, day out with in your browser. You’ve got all those tabs and it’s important that that’s some of your work. That’s actually a lot of your working set in a way. And so one thing that we’ll do with Dash, because we have that browser extension, we’ll go ahead and bring in some of your browser history so that when you kind of come back to Dash and open up Dash, it’s got your working set right there saying, Hey, here’s sort of where you left off. Here’s the work that you’re doing, here’s some of the tabs you’ve been on. Very, very helpful to kind of have that jumping off point.

Jon Krohn: 13:19 Nice. Really cool. Picking up on another piece that we found of something that you said online in a LinkedIn post, you commented on A-H-B-R-A Harvard Business Review report on how AI generated work slum is destroying productivity. And this is something that we have talked about on the show a fair bit, but I’d love to hear your take on it. What are your 2 cents on how this problem hit home for you and what are your tips to address work slot?

Josh Clemm: 13:48 Yeah, first of all, I love that term. I don’t know who coined it, but it absolutely is spot on work slot is the very plausible and somewhat impressive looking that content that you might see at work that you maybe create yourself or somebody sends you and then you start to look more at the substance and wait a second, this feels pretty generic at best or at worst, frankly, there’s just hallucinations in there and there’s a bit of a paradox. I think the more you work in AI like I am and a lot of your listeners are, the more you can start to spot the patterns. Everybody jokes about the M dash, but there’s other sort of markers that, hey, this is AI generated. So I think the big question is, well, why is this happening? Why is so much work? Slop getting created out there and you see different reports.

14:50 A lot of CEOs, almost three fourths of CEOs feel there’s a lot of competitive pressure just to adopt ai. You hear that, oh, we need ai. I don’t know what it is, but we need it here. And so you get these really early deployments or quick deployments, employees start using it and it isn’t really doing what you’re expecting. And we’re hearing that from some of our customers. Hey, we need ai. And the first question I ask is, well, what exactly are you hoping to accomplish? What are those use cases? What are the goals that you’re trying to do? And if you don’t really get crisp on that, you’re going to unfortunately get slop. The way I think about it, I dunno about you, but back in, let’s say college. Let’s say you’re writing an essay, you aren’t to going to write, maybe you’re working all night, maybe the last minute you’re kind putting together all this stuff for your essay and then that’s definitely me.

15:42 Yeah, exactly. The next day you don’t just turn in that first draft, you don’t sort of do a first draft and like, alright, I’m done, did my work, send it off. It’s going to be an A. No, absolutely not. You read it over, you update, you make things stronger. Maybe you go to thesaurus and you start switching. In some words, you add in a lot of extra research. That’s really how we should be treating a lot of these AI tools. They can be phenomenal partners, they can be phenomenal at generating a lot of that first draft, if you will. But it still requires that human touch. It still requires really ensuring that it reads correctly, it’s high quality, the signal to noise ratio is very high and it has very much verifiable facts. Very, very kind of important to get right. So how do you kind of fix that other than more of this guidance?

16:37 High level guidance kind of goes back to the stuff we were talking about before. Your work context just matters a ton here. A lot of companies will maybe superficially go off and add third party connectors. There’s a very popular approach is to use MCP tools in agents and they are a phenomenal, a phenomenal protocol. They kind of get up and running and build some really impressive agents right off the bat, but they’re very slow. They can’t really get access to all the types of content you may want and they use a ton of tokens, a lot of cases. So while that’s a good solution, you still really need to kind of think about where you want to get your work context overall. On our side with Dash, like I mentioned before, we do bring everything in, we ingest it, we do understanding, and we then index it.

17:33 Right now we use both. We’re building both the Lexile and a vector index on the Lexile side. We use BM 25. It is still the workhorse. This thing has been around for a few decades and it is very, very good at more like keyword type searches. And this is important if your customers need part numbers, you need to do more of a keyword search. If your customers are creatives and they’re looking for vintage cars, okay, great semantic works there. A lot of customers will want both. And so you really want to have that kind of hybrid retrieval. And so that’s something we’re doing here to just ensure the context that we’re providing these LLMs, these agents are of the highest quality. Other things to look at are the evals I mentioned before. When we talk to customers, you want to understand what are their goals, what are they trying to accomplish, what metric may they want to move.

18:30 A lot of that you can actually bundle and create a bunch of successful test sets, like okay, this is what good looks like. This is effectively my benchmark, my internal benchmark. And then once you have that, you can compare it with some of these AI deployments and it’ll be much more clear, is this working or am I just going to get more work slot? And the last kind of tip I’d say here, there’s still a lot of pressure out there to adopt ai. A lot of CEOs, CTOs, CIOs got to use ai, got to use ai, and so the pressure’s there, but about 55% of employees, they don’t even know how to use ai. So I do think you should look at training. It’s essential to do share outs, let people do demos of what they’re doing, and you’re most likely candidates to do that training are probably your highest performers. So even if you just went there, they’re likely already using and adopting these AI products, they’re likely doing it in a way where it’s much higher quality. Let them help train up the rest of the force.

Jon Krohn: 19:40 Yeah, this training and sharing is critical and I think collaborating together and understanding how AI can be used to actually be improving productivity is critical because it’s very easy to be inserting AI into lots of places in your organization, but it could end up being the case that the places that you’re putting AI in are parts of work that are actually not very useful in the organization at all. There’s all kinds of people’s workdays in organizations that aren’t moving the needle on anything.

Josh Clemm: 20:12 And

Jon Krohn: 20:12 If you’re getting AI deployed in those workflows, you’re not going to see an ROI from that AI solution because the work that’s being automated isn’t moving a needle anywhere anyway. And yeah, I guess that’s kind of a tangential point to what you were just saying.

Josh Clemm: 20:29 It’s very fair. If you were to look at what is the most successful AI tool yet to date, it’s really on the coding agents and they’re amazingly powerful. I absolutely love coding. It’s so much fun because I got into coding, not because I like to code, it’s because I like to build. I love creating. I love that side of it. And so if these coding assistant can help me code faster, absolutely I could just continue to build. But then you sort of step back and you say, well, what does the software development lifecycle look like at these companies? You have a lot of things upstream of when you’re actually ready to code, you’ve got all the work you’re trying to do, talking to customers, gathering requirements, trying to figure out exactly what you want, what the design might look like. That part is still in some ways very similar to how we’ve been operating before. Finally, you might hand it off on the coding side and you can absolutely accelerate that phase of the journey. So I think it’s also really looking at that entire workflow too. You might have optimized one part, but you’re going to create maybe bottlenecks in other parts

Jon Krohn: 21:43 For sure. Especially if you’re generating a lot of slop, you’re going to be

Josh Clemm: 21:48 Creating some bottlenecks. There’s a cleanup work at that point. Yeah,

Jon Krohn: 21:50 Exactly. Exactly. Just confusing people having people wasting time reading. Yeah, slop super irritating thing to have to come across, especially when you only realize it partway through, you’re like, ah, they’ve got me. They better admit to it. Alright, so you talked a bit there in your response about context engineering and we have a number of questions related to that. So you’ve thought about AI systems from the inside out shape by years working on context rich high reliability architectures from sensor data fusion in your early career to advanced machine learning ranking conversational AI at Dropbox as well as previously at Uber in a recent LinkedIn post regarding how your team uses context engineering, you wrote that context is the real constraint when building agentic AI systems and that bigger as in more context isn’t better. Too many data can lead to what you call context rot. So tell us about, maybe we should get a quick intro to context engineering for listeners who aren’t aware of it anyway, but then move on to why this is such a critical part of having agentic systems work correctly.

Josh Clemm: 23:00 Sort of stepping back and really considering how AI works and how these large language models work. You’re of course prompting it. That was really the beginning. We call it everything prompt engineering, your fat fingering in both your query, but you might be providing a lot of extra information that these LLMs might need and that’s very powerful overall. What we’re trying to now put in the context window for these LLMs is just continuing to grow. If you want to be able to support more complex queries where you’re doing tool calling, maybe you’re retrieving from remote servers, maybe you’re making a right action later, that all kind of starts to fill in the context window and at some point you get really bad quality degradation. The longer it gets. You might see these new frontier labs, these models get released and it’s like, okay, we have a 1 million context window.

24:01 Gemini was really the first one, but all the new ones, okay, now we’re at 2 million, we’ve got 2 million tokens we can handle in our context window. The problem is if you were to use all of that, you would see significant degradation in quality. They end up filling up very quickly, almost more like a hundred K or 200 k. And there’s some really interesting benchmarks out there. One I’ve always referenced is this benchmark called No Lima, and it is very much like this more complicated sort of needle in the haystack type situation where you give it a very, very big doc and then you’re trying to find some information and trying to extract it. And the longer the dock, the worse you see that curve kind of drop down more likely because a lot of us aren’t necessarily loading the Harry Potter series into an LM and then asking questions.

24:54 More likely we are asking an LM a question and then we have a follow-up question. And then we had another follow-up question and there’s more papers on that sort of multi turn conversation where these LLMs just they get lost and the accuracy degrades quite a bit on our side with Dash, we were seeing the same thing. Anytime we were adding all these tools and it would go off and retrieve some information, the data would come back, our accuracy just dropped off considerably. We ended up blogging about this. We talked about our approach in how we’re solving it, and it’s very much instead of using multiple tools, we’re going with just one tool. We’re going with sort of a super tool in a way to access that index I talked about earlier that has all the content you’re seeing. Other companies do the same thing. CloudFlare Andro, they’ve been blogging recently about the same topic and their solutions is often to have the LM right code to invoke those tools or pick a tool among a selection of tools. There’s a few different solutions here that you can do, but it’s all about trying to tighten up that context window. It’s like the sweet spot where if you give it too much, you’re going to have bad results. And if you don’t give it enough, the LM is going to have to just fall back to whatever data it was trained on, which frankly in the workplace, it’s not going to know anything about your work. So you have to kind of pack it with the exact data that you need and nothing else.

Jon Krohn: 26:38 How can our listeners, if they want to be great context engineers, and you’ve called online context engineering the high status job in AI right now, so we might have lots of listeners thinking, Ooh, I might like to up my pay or my value to my organization by being a great context engineer. What is the secret to being a successful context engineer?

Josh Clemm: 26:59 Well, I kind of think back to more in the early software days. Some of even my experience and anytime I was writing software, there was always limits. You might have memory limits, CPU limits. If I’m working in embedded computing, you only have so many bytes to work with. Some of the past projects I was working on writing software for UAVs, like low end computers or working in very lossy networking environments. And so you’re always thinking very deeply about what data am I shipping with my product to run on these low end computers or what data, how do I reduce the amount of data where we might have to send over the network? So folks who kind of understand that almost like constraints breed creativity, you have to be very creative when you have these constraints they’re going to do very well. Just think about a payload you’re sending with the Mars r rover.

27:59 You got to send that thing off. It’s going across space for five, six months. It shows up. Does it have everything it needs? You sure hope so. So people who kind of understand that piece and are very good at planning are going to be quite good overall in this area. A few other disciplines that make sense, a lot of people who have worked with search, so search is in effect, this is what Google’s been trying to do for decades now, is just taking the world’s information and serving it up in these little snippets, these 10 blue links. So if you’ve got that background or have an interest in getting that background, you’re obviously going to be able to then pull out the most relevant information. From a very wide corpus search has sort of been, it used to be the hero and it’s still the hero, it’s just more of an unsung hero in the world of ai. Part of that is retrieval, part of that is the recommender systems almost like more classic machine learning. And the last I think area that matters for really good context engineering is understanding the outputs, evals, understanding whatever goes in. I need to then be able to verify what comes out and somebody who’s very good at tests, defining things upfront, connecting it with the inputs you’re going to thrive in this field.

Jon Krohn: 29:26 Excellent. Thank you for those tips and I love how you were describing search as the unsung hero of success in ai. And so on that note, let’s talk about RAG retrieval augmented generation. So you’ve argued online that RAG isn’t dead, which implies that a lot of people think it is. And so tell us about how it’s evolved into richer forms of context engineering.

Josh Clemm: 29:51 Yeah, yeah. So the reason a lot of people will say rag is dead is really how RAG originated. This is the retrieval augmented generation. There was a paper many years ago and the technique that they described was using almost like generating different vector embeddings from your data and storing in a vector database and then of course retrieving that later, passing it along with the prompt and you’re off and running. And the reason vectors were chosen, it ties really nicely with the underlying architecture for large language models. Everything is very kind of token and embedding based. And a lot of people really went way in and decided to create vector databases and store all their content in vector databases. And again, there’s nothing wrong with that because at the end of the day you still have to retrieve something to be able to add and additional information to these LLMs, how it’s evolved.

30:58 I would say a couple things. One vector retrieval is great at meaning based. I mentioned before creatives might want to search for old car and vintage cars will show up, but a lot of people are recognizing that these more keyword search approaches using techniques like BM 25 can be very, very effective. And so hybrid retrieval has become much more important overall to getting your content for these LLMs. But the other kind of pattern that’s emerged is more around ag agentic retrieval. And the idea here is think about the world of you have the data layer, you have all this data, and then you have almost like the application that will go reach and go fetch that data. In the old world, you often just had one chance, I’m going to make one retrieval call, whatever I get back I’m going to use. And that was really kind of the initial architecture for rag.

32:00 Again, hybrid retrieval better. But if you want to handle very complex queries where based on the data you get back you may take a very different path, that’s where you want these agents to not just fetch once but have multiple tries to go fetch content. And that might be from a database, that might be from a third party an A-P-I-M-C-P call that might be actually taken an action. And then based on that it may decide to do a retrieval. So you have a bit of the data layer and then the almost agentic layer. And I think it’s okay to think about both. You do want those multiple tries, but you also want to organize your data and index your data in a way where it can be explored.

32:51 And you’re seeing this quite often. Again, going back to coding agents cloud code cursor, these are extremely popular tools. They need access to your code base. And the code base isn’t as simple just going and looking up some chunk of code, you need to almost explore the code base. You need to understand folder structures, you need to understand obviously parts of the code itself. You need to find exact function names, so you would need more of a keyword search. And so you have this hybrid architecture emerging where yes, there’s some amount of index in your code base, but these coding agents have an opportunity to almost explore that index and then pull back all of these different bits of context to get you the best outcome.

Jon Krohn: 33:41 Nice. I love that explanation. Thank you for the deep dive into the evolution of rag in search. And now tying what you were talking about right at the end there, you brought back your love of, if I dare call it vibe coding of building easily with tools like cloud code. And let’s now tie that into the search conversation that we’ve been having recently because you wrote an AI search app with a hundred lines of code and open sourced it. What did you learn from that experience and why did you do that?

Josh Clemm: 34:15 So I’ll go back a little bit of history. I’ll give you a little bit more of a side app I work on. I’m a big fan of sports. I like football and I like fantasy football and I like fantasy football for the reasons you’d expect. I love the stats part of it and I love to write my own almost programs or apps to help me do a better job in my fantasy football leagues. And one of the features I always wanted to build was more of almost a season preview for a particular player. You see these in a lot of publications, here’s what to expect this year with this particular player. And I was thinking, how do I do that? I could do sort of mad lib style, but I started to explore a lot of language models and my time at Uber Eats, we were starting to do some work with conversational ai, so I was familiar with the technology and this was back maybe 2021, 2022.

35:15 I was looking at a lot of more open AI models, GPT 3. I was like, okay, there’s something here that is very, very impressive. I can create the content. Of course, I had to go fetch fresh stats for the upcoming season. That’s where you want retrieval augmented generation. And I was trying to kind of put that together and it wasn’t super easy. I ended up getting a version of it working, but I learned a lot about prompting and I then stumbled upon a product called Perplexity, which I was a big fan of, and they really did a phenomenal job kind of almost pioneering this really intuitive product interface where they’re collecting all the search content and able to present in a really easy way effectively answers. And so when I saw that, I was like, you know what? I want to kind of go back to some of the work I was doing.

36:09 I want to maybe bring in some of the newer models, think about different techniques that I had sort of observed from that product and ended up building my own search app. Very similar to that, kind of connecting those two things. The fantasy football see now looks with trying to get my own search app going. And because I learned a lot, I figured let me put that out there and open source some of it so that others who may come along the way will learn different ways of prompting, different ways of adding grounded facts through citations, really trying to push the state-of-the-art with retrieval, augmented generation, and then a little bit of a preview of what you could do further. So I was sort of really starting to just understand how these models worked, how I could get the most out of ’em, and frankly, it’s a lot of fun. A lot of those techniques are still going strong, which is pretty cool to see.

Jon Krohn: 37:07 Nice. And then it must make it helpful for things like Dash at Dropbox, having that real experience of making something work. It must be easier to then talk to engineers on your team and help them brainstorm on how to be getting through some of the things that they’re struggling with.

Josh Clemm: 37:20 Absolutely. Yeah. I mean we want to be that answers engine for the workplace and it’s essential. You got to get these things right. You’ve got to be able to reduce the hallucinations. You have to be able to present real user facts and the better output where you can have citations so you can almost prove your work that is going to be important for any product, especially with these AI products.

Jon Krohn: 37:50 Nice, nice, nice. Makes a lot of sense. You’ve actually had more than just your fantasy football side project. Get onto our radar here at the SuperDataScience podcast. You also, it looks like you might have the number one earthquake app on Android called Earthquake Alert and something called Yaddle AI, which is an LLM powered search engine. And so yeah, tell us a bit about those projects as well and how doing these side projects helps keep you sharp and grounded. Yeah, why you think they’re an effective part of your leadership style?

Josh Clemm: 38:25 Yeah. Well, so Yaddle is that hundred lines sort of… that’s the fancier version. I continue to work on it on the side and just again, it’s a very fun, almost playground for me to do some of the real time search and try these different models that come out. I have a model switcher, I use some amount of query classification to figure out in certain cases which model to route to. It’s a fun project for sure. Same thing with the earthquake app. This goes back way back when I was getting my master’s and I was taking a entrepreneurialship class and we started to learn about these emerging platforms with mobile, with the iPhone and with Android. And so I was like, Hey, let me just sort of play around with this to see what I can build. Turns out when you’re an early independent developer and you put an app out there, you end up getting a nice flywheel effect.

39:30 And so yeah, over the years, anyone that’s been looking for earthquakes, earthquake information would spot my app. And I like earthquakes. I grew up in the Bay area, I’ve had to experience a lot of earthquakes in my life and just had sort of a fascination with that. It was really a learning opportunity overall. Kind of connecting your second question, I do feel very strongly that leaders, engineering leaders, they really need to find the sweet spot with how technical they are. You obviously can’t be too hands-on because then you have no time to think about strategy, people development, org design, things like that. But you also can’t be not technical and come across like, oh, you’re up in the ivory tower. You’re just saying, hey, we got to go do this and not have that backing to say here’s why or here’s why. I agree. So I do think leaders finding that kind of technical balance is really important.

40:35 That’s an important part of my own philosophy, who I look for in good leaders at work. And a lot of times you may not have a lot of time during the day so you can do things on the side. I also feel very strongly AI is changing things considerably. These tools are very, very new and it’s very hard to say, Hey, go use more ai. We talked about that earlier with work slot. That’s sort of why we’re in the work slot situation. You almost need to use them yourself. You need to embrace them, you need to understand how they work and frankly their limitations. And then you’re just so much more willing and able to provide crisp and concrete advice on yes, this is an AI tool we want to deploy. Here’s why I thought about it. Here’s some gotchas, here’s how we’re going to address that. So I do think AI changes things, which is why just staying much more technical, much more sort of on top of the trends is essential in today’s modern leadership.

Jon Krohn: 41:43 Speaking of your leadership, something that I haven’t mentioned yet on air is your tremendous background. So you’ve managed and scaled large teams from based on our research, over 150 engineers at LinkedIn, over 400 at Uber. And you did that driven by a belief in a people first approach, which I hear coming through in your answers already. So tell us what’s the most challenging part of maintaining this people first approach once a team surpasses, I dunno if you know Dunbar’s number. Yeah, yeah. So Robin Dunbar, he’s an Oxford University researcher and he studied lots of different communities all over the world and basically there’s this number of, it varies somewhat, but it’s around 150. It’s kind of the maximum number of direct relationships that you can have in a community. And that LinkedIn number is right there at Dunbar’s number and then the Uber number of 400 people, that’s well over 150. So how do you maintain that people first approach once you can’t know everybody’s name?

Josh Clemm: 42:48 And at LinkedIn that was more of a total number, but absolutely it’s very important to think about when you’re growing teams, there’s sort of growing teams and then there’s scaling teams and the things that you were doing or almost the spirit behind the different processes or cultural pieces, norms that you’ve defined. You want to figure out how to kind of replicate that in a way where you’re just not taking on more work yourself. That just won’t work. Maybe in the old day you’re doing that kind of sit down team meeting where you’re doing open discussions as the team grows, maybe those turn into small group a sessions and you can do multiple of those as it grows. Now you’re trying to meet with maybe smaller groups or your skips and at the same time being very deliberate with the leaders that you have to try to replicate that all the way down throughout the organization.

43:44 Maybe it’s a version of a regular all hands. That’s something we actually love doing. I started doing this back at Uber during the pandemic where everybody’s now working remotely. You’re kind of losing the sense of self sense of team identity and I got a really great suggestion from a manager of my team where hey, he’s like, Hey, why don’t we start a very short and sweet weekly check-in where they can hear from you on priorities. We can maybe share wins together, we can talk about any shifts that we may want to do. I said, yeah, let’s give it a shot. We ended up doing that for almost three years and I still continue to do that today at Dropbox. And it’s just like a fun ritual to bring everybody together. That’s only one way of doing it. You sort of have to think through all these different aspects of how to scale culture.

44:44 The other thing I like to really emphasize is I like to really obsess about the inputs of teams or team productivity. I think a lot of times we’re looking at the outputs, we’re seeing, well, how many prs are they putting up? How many features have we shipped? And a lot of times the solutions are on those inputs. It’s do we have clear goals? Is our strategy sound, do I have the org structured correctly to match that? Do I have all the right leaders in place? How do I think about team structure? Do I have good seniority mix so that we have mentors along the way? How’s our operational rigor? If you have outages all the time or a lot of bugs all the time, teams are constantly context switching back, not being able to really move forward. You think about tooling, you think about a lot of those different aspects. I think that’s one way to make sure that you can still drive in a very people first way, but also make sure it’s scaling overall.

Jon Krohn: 45:51 Yeah. Related to what you were just saying, an interesting piece that we pulled out is that you often describe your role not as building a product but creating conditions for great engineering to happen. There’s a tweet where you shared a meme about thinking like a farmer, so the farmer doesn’t yell at the plants because they aren’t growing. Instead the farmer focuses on preparing the soil. Irrigating. Fertilizing, and I love that. So our researcher, Serg Masis, he’s a data scientist at Syngenta, the agricultural company. And so he went into a lot of detail on his question on this farming one. And so using some inspiration from Serg here, what’s one soil problem that we have in tech companies today? What are we getting wrong about the soil for engineering culture and how can we improve it? How can we farm better products?

Josh Clemm: 46:45 Yeah, so I don’t know when I came across that picture, but I’m sure many of you have maybe seen it. It’s just sort of a screenshot from some sort of conference, think like a farmer, and it really kind of touches on the aspects I was just mentioning. You’re obsessing about the inputs, you’re really making sure, okay, am I planting correctly? Am I thinking about seasons? Am I building resilience in my organization? And I do think that’s probably one of the bigger challenges here. The world of ai, we’re moving very quickly. There’s all these new innovations, new frontier models coming out. They’re constantly leapfrogging one another. And there’s just always these questions around, do we bring this in? Do we adopt it? Do we not? What’s our competitors doing? And I think that kind of whiplash does affect a lot of teams very negatively. They end up losing a lot of morale.

47:38 They maybe are working long hours but not moving in the right direction or making meaningful progress. And so I think just kind of building that kind of more of a regular culture of resilience is really important. Being very upfront with your team saying, look, things are going to be tricky. Or yeah, there’s going to be these different competitors coming in almost emphasizing here’s what we think about it. Here’s our strengths, here’s how we’re okay. Just like that kind of constant communication to recognize the current environment we’re in. It is messy and it just feels overwhelming at times, but create that right culture and I think the teams that understand that and embrace that, they’re going to continue to thrive. They’re going to get through the noise, they’re going to be much more focused on their goal and teams like that are going to win.

Jon Krohn: 48:31 Great. I’ve got one last kind of technical leadership question for you before we start wrapping up the episode. This is another tweet of yours where you called out an anti-pattern, where big organizations become so risk averse that they outsource every decision to an AB test. How should leaders distinguish between decisions that need data and decisions that just need your gut instinct to get right?

Josh Clemm: 48:57 It’s quite problematic, especially with bigger companies, some amount of risk aversion. You may be operating in a position of strength, you might be a market leader and you almost don’t want to lose that, and you kind of open yourselves up for other startups or other competitors to disrupt you and it kind of just creeps in. It’s a bit of a problem overall, and it’s sort of a problem where the more scale you have, the more data you have. And so it’s very clear, oh, I just have to look at the data. It’ll tell me what I can do. And I see some analogies with AI tools. I do think you’re seeing a world where people who are starting to embrace AI for almost all their decision-making, they’re losing a little bit of their own sense of here’s what I think might work or some conviction.

49:51 And so I think those analogies are very, very important. It’s essential to try to stay grounded as much to try to maybe talk to your customers more, get more of the qualitative data to offset some of the quantitative data you might be seeing and kind of just bringing that together and make much more well-informed decisions. I remember a long time ago at, I think it was Uber Eats, we had a smaller competitor that was moving very quickly and shipping a lot of features, and we were always like, whoa, what are they doing? What are they seeing? And later we found out they didn’t actually have an AB testing framework even set up. This was very early days for them, and it was almost like you’re envious of that. Like, wow, that requires you to be very convicted. If you’re going to ship something, you’re going to ship it, you’re going to move quickly and you’re going to learn from it. And if it doesn’t work, sure you can pull it back. But a lot of times you bring in those extra data sources, it’ll draw out decisions, it’ll really kind of create some amount of lethargy in your organization. You’ve got to fight that as much as possible. You got to sort of stay sharp, stay convicted, stay human, and be very intentional with your decisions.

Jon Krohn: 51:11 Excellent soundbite to round off your technical responses there, Josh. Thank you so much. So yeah, before I let my guests go, I let you know just before we started recording that I always ask my guests for a book recommendation. It sounded like you might be reading something interesting right now

Josh Clemm: 51:27 I am reading Masters of Doom. It is the story of the two Johns, John Carmack and Romero, and they are the co-founders of Doom, the Video Game Doom.

Jon Krohn: 51:42 Oh, right. Yeah. That was such an iconic thing for me as a kid growing up that I remember I grew up in downtown Toronto and so I would take the subway to school and when I was very small, sometimes my grandmother would come and take me and she always wanted to spoil me and so she took me to a bookstore and I had her buy me the Doom Strategy guide, even though we did not have a computer and I just kind of memorized gameplay and the guns and the demons and yeah, it was a really iconic thing for me in the early nineties.

Josh Clemm: 52:18 Yeah, it’s been a fun read. It gives you that kind of glimpse of the early days of video game development and you’re back in the days when you had the only computers were at universities, there were these big mainframes and then it started to transition to the personal computer, the different Apple, apple two, Commodore 64, and a lot of that was some of that part of my childhood pretty a little bit on the later side, but it is nice to sort of get that nostalgia hit and it’s just sort of fun to see the creators of something and their backstory. All this stuff comes from somewhere.

Jon Krohn: 53:00 Yeah, exactly. Alright, masters of Doom. There’s something for our early 3D gaming lovers and yeah. Before I let you go, Josh, I’ve cited tons of great posts, LinkedIn posts, you’ve made tweets. Where are the best places for people to be following you after the episode?

Josh Clemm: 53:23 Absolutely, definitely check me out on LinkedIn and I’m on X Twitter. I like to bring in a lot of my observations in ai, but I do like to connect it to the world of maybe large scale software development that dominated my first kind of decade, couple decades of work. I am spotting a lot of similarities around what age agentic architectures compared to just the microservices. There’s a lot of overlap and I think there’s a lot of lessons that we can learn from one another, the ML practitioners, the backend engineers. I think there’s a lot of really nice synergy there. So that’s going to be a lot of the content I put out. And of course, I’m bringing up a lot about our innovations on Dash, what we’re doing at Dropbox, how we’re thinking about machine learning, how we’re thinking about different optimizations, context, engineering, the works.

Jon Krohn: 54:19 Love that. Yeah, our listeners I’m sure got a great taste of your brilliant ideas related to AI and engineering leadership, product development, and I’m sure that you will have a whole bunch more followers after this episode as well. Thanks so much, Josh for joining us and yeah, maybe we can catch up again in the future and hear how Dash and other AI initiatives at Dropbox are coming along.

Josh Clemm: 54:43 Yeah, thanks John. It’s been a pleasure. Love talking about Dropbox, love talking about Dash and love talking about some of those stories from the past.

Jon Krohn: 54:56 Plenty to learn from the rich experience of Josh Clemm in today’s episode, he covered how enterprise search is broken because workers now have 12 to 20 different search bars across their apps, Dropbox dash amps to solve this by creating a universal search layer that ingests content from all your work tools. He talked about how multiplayer AI means AI systems that understand not just you but your entire team, their projects and how everything connects, enabling collaborative intelligence rather than single player chatbot interactions. He talked about how context rot occurs when you stuff too much information into an M’S context window. The key is context engineering in only the needed data and nothing else. And we also talked about how over-reliance on AB testing creates organizational lethargy. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Josh CLMs social media profiles, as well as my own at superdatascience.com/951.

55:54 Thanks to everyone on the SuperDataScience podcast team, our podcast manager, Sonja Brajovic, media editor, Mario Pombo, partnerships manager, Natalie Ziajski, researcher Serg Masís, writer Dr. Zara Karschay, and our founder Kirill Eremenko. Thanks to all of them for producing another excellent episode for us today for enabling that team to create this free SuperDataScience podcast for you. We are deeply grateful to our sponsors. You can support the show by checking out our sponsors links in the show notes. And if you’d ever like to sponsor the show yourself, you can head to jonkrohn.com/podcast to learn how to do that. Otherwise, share this episode with folks who would benefit from it. Review on whatever podcasting platform you use or YouTube. I think that’s helpful for driving more engagement. And viewers on the show subscribe if you’re not a subscriber, but most importantly, just keep on tuning in. I’m so grateful to have you listening and hope I can continue to make episodes you’d love for years and years to come. Till next time, keep on rocking it out there, and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Podcasts SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

Podcast Transcript

Share on

Related Podcasts

February 6, 2026

February 3, 2026

January 30, 2026

Podcasts SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

Share

SDS 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

Podcast Transcript

Share on

Related Podcasts

February 6, 2026

SDS 964: In Case You Missed It in January 2026

February 3, 2026

SDS 963: Reinforcement Learning for Agents, with Amazon AGI Labs’ Antje Barth

January 30, 2026

SDS 962: Wharton Prof Ethan Mollick on Why Your AI Strategy Is Already Obsolete