Jon Krohn: 00:00:00 What if you could go to one place to get absolutely everything you needed to train and deploy AI models and it was all cost-effective. Welcome to the SuperDataScience podcast. I’m your host, Jon Krohn. I’ve got a really exciting episode for you today with the CEO of Lightning ai. They have just announced a huge merger that now means they have tens of thousands of physical GPUs. They have over $500 million in a RR. They’ve built a open source ecosystem including PyTorch Lightning, which has been downloaded nearly 400 million times, and that’s growing quickly, just this really exciting ecosystem and products. And today with Will we get to dig into the man behind all this innovation and exactly what they’ve done, I hope you’ll enjoy this one. This episode of Super Data Science is made possible by Dell Intel, Fabi and Cisco. Will Falcon, welcome to the SuperDataScience Podcast. It’s great to have you on. How are you doing today?
Will Falcon: 00:01:05 Amazing. Thank you so much for having me. Excited to chat with you and thanks for having me on here.
Jon Krohn: 00:01:10 You’re certainly experiencing some amazing times right now. It’s really cool to be a part of it in some way. We’ve actually known each other for a number of years. And full disclosure, I am not an unbiased interviewer of you. I have a fellowship at Lightning ai, which is a really cool role that I appreciate you created for me and I’ve been doing for a year now and absolutely love working out of the Lightning AI office in New York. So many talented people. It really gives me a lot of energy and hopefully I’m contributing a bit back as
Will Falcon: 00:01:44 Well. We love having you, so it’s been fun seeing you do your thing and I’m sure you’re seeing kind of what we’re doing as well.
Jon Krohn: 00:01:52 Yeah, and I think in both respects we’re just getting started.
Will Falcon: 00:01:55 Yeah, well, I’m finally glad we got to work together. It has been a few years. Yeah,
Jon Krohn: 00:01:59 Yeah, exactly. And so the big news that we’re here to talk about on air, and so even though I’ve had this fellowship at lighting AI for a year, we’ve been kind of holding on for a really special moment to have you on the podcast. And now the time has come because lighting AI has merged with a firm called Voltage Park, and you guys have created something really big. It’s mind blowing to me. You’ve described it as the full stack ai, neo cloud for enterprises in Frontier Labs. It serves 400,000 developers and companies, $500 million plus in a RR, and that’s been gained in under two years, which is astounding. $500 million in a RR under under two years. And now this company after merger letting AI plus Voltage Park has 35,000 GPUs making it the third largest neo cloud in the world. I guess a nice place to start on this is congrats.
Will Falcon: 00:02:56 Well, it’s been, obviously these are numbers from bringing both companies together and I think it shows just how amazing each one of us were doing on our own. And when we looked at it, we said, if you brought together, I don’t know, a Mac and Mac os, you’d built the best laptop in the world, it was pretty obvious when that became clear.
Jon Krohn: 00:03:20 Yeah, software plus hardware.
Will Falcon: 00:03:22 Exactly.
Jon Krohn: 00:03:22 Yeah, it’s a really cool pairing. And you have actually been, so Lightning AI was first a customer of Voltage Park and then yeah, it obviously matured into something much more. How did that kind of evolve?
Will Falcon: 00:03:38 So if you’re not familiar with Lightning full stack software, all the tools you need to build, train inference models, all that I’m sure we’ll get into at some point. So we have a lot of enterprise customers and developers and Frontier Labs who are, they’re doing the inference with us, for example, once like Cantina, which is Sean Parker’s new startup. So we do a lot of their inference there. And a lot of these players were using very single purpose tools to do that. For example, other companies where that’s all they do. And so these companies came to us at first maybe for training and other things, and that eventually they kind of move into these other things that we offer because we offer all of it. And as we started to scale, we found the need to go beyond just AWS, as you all know, AWS is a very premium product, I will say.
00:04:27 And so command a high price, but it’s lacking a lot of the specific AI tools that are needed because AWS is built for CPU applications actually. And so it’s trying to be retrofitted back to ai. And so then you have enterprises and startups that will try to make up the gap by buying these specialized products that only do that one thing and then they stitch ’em together. And then that creates a lot of problems because it’s security and enterprise, it would be firewalls and security between all these products, startups and Frontier Labs. They don’t see it this way, but it creates a lot of operational overhead. So anyway, so we started looking around for where can we get better compute for our customers. It was clear that they wanted the software, but the compute prices were really expensive. So we found this thing called Neo Clouds, which I don’t know what it was until a year ago.
00:05:13 It’s a brand new term. And neo clouds are basically a new type of cloud that are, they’re GPU first, so they have some CPUs now, but they were really designed around how do you make the best performing hardware on GPUs work? Really, really amazing for things like training models, things like Infinity Van and Vast Storage and all these different things. Whereas the traditional clouds like AWS try to roll a lot of that stuff out on their own. And so it’s not as high performance as a neo cloud because a neo cloud works directly with Nvidia to do a lot of this. So we find the slew of new clouds, there’s like hundreds of these things, by the way, I didn’t know this. So we partnered with the top seven or eight, we start working with them and then we start putting customers on all of these enterprise customers.
00:05:58 We’re talking about people like Cisco, et cetera, and they need a special type of flexibility that I was skeptical that I could find outside of AWS. Personally, I think startups, they’re okay dealing with this more. So we put the first enterprise customers and then they struggle and we struggle to get them to adopt these neo clouds for the majority of them. And then there’s just this new player that we have never heard of Voltage Park, and they kind of come out of the blue and it’s a interesting story how they got started. So they bring all these CPUs online. And then I think what was amazing, and probably every customer’s reactions here have been how responsive they are. And so now we’re a one company now, so I’m talking formerly Voltage Park,
00:06:43 They call it the white glove experience, which I think it’s a great product name, but hey, we’re going to do what it takes, give support as much as possible to make you successful, which is what you need as an early stage startup, right? Ultimately. So they’re doing this amazing job and through that iteration process, we’re able to actually win a few customers enterprises and increase retention. And so I was like, Hey, these guys are serious. They know what they’re doing. And at the time, we were trying to figure out how to bring in a lot more GPUs online because Lightning, we’ve never sold compute. All our customers buy compute from somewhere else, and then they connect our software to that compute. It’s called BYOC, bringer on cloud.
Jon Krohn: 00:07:24 And that is part of the great offering that you guys have is that kind of flexibility
00:07:28 Where that’s something that I love talking about with the lighting AI product in general, is that it allows you to bring your own cloud or to have access to a range of neo clouds as well as those traditional cloud providers like AWS that you mentioned earlier. You just have a dropdown box, you can see what the pricing is in any given moment for the kind of GPUs that you want, and then you can provision them at a click of a button and have access to that in minutes. And you can have been working already, say for hours as an individual or as a team, just on A CPU, basically free compute instance. And then within minutes when you need it, switch to however many GPUs you want on whichever cloud you want.
Will Falcon: 00:08:12 Exactly. And so customers love that flexibility. But yeah, the pricing on the hardware was just a little cost prohibitive. And I think a lot of the bottleneck that we saw last year for growth, I mean the revenue growth in both companies went super high. I mean, lightning alone, we went at least 30 40 x revenue increase without the merger, and we were really just always bottlenecked by compute. We lost many deals, like millions of deals that we couldn’t land because we couldn’t find the compute for it. And so that was kind of the predicament we were in. And on the other side, you’ve got this neo clouds where the only software that they offer is Kubernetes basically, and that’s it. And so you’re like, Hey, I need to do inference. I need to do this development. I have training, I have model hub things, I have experiment management. There’s dozens of things you need in between that we all just offer natively enlightening. And so they looked at it and they’re like, oh, you guys have a full Mac os? And I was like, yeah, and you have an amazing machine, so let’s get married.
00:09:12 And so that’s how this kind of came about.
Jon Krohn: 00:09:14 Alright, so you’ve used the word inference pretty casually, and probably a lot of our listeners know what that means, but just to explain it a little bit, when you have an AI model, you first train the model, and then once you have it trained, you put it on some kind of production system like Voltage Park, GPUs or AWS GPUs or whatever for that AI model to be used at inference time in production to be able to do something for users. So this is when you open up chat GPT and you type something that’s inference happening on some cloud somewhere that open AI is paying money to. And so there’s lots of, now that we have this big revolution of AI in more and more places, there’s more and more demand for this inference compute. And you might know the stats better than me, but it’s something like 99% of all GPU usage is for inference, not for training.
Will Falcon: 00:10:09 Yeah, I mean that’s right. I think I’m actually sad that the name inference stuck because inference is a math concept, which means to take a math model and infer have it infer things. So if you ever heard of in stats and things like that, they inference existed in that context. And so someone decided to call it inference, which means have the model make predictions ultimately. But there’s a lot of stuff that goes into that uptime. Ultimately inference is for a developer into being a container just like a web server, except that it’s receiving requests and how it handles batches of requests and streaming is a bit different. And so that became a whole thing, which I have many opinions on what that is. But yeah, one of the main things that people have to worry about there is the compute elements of it, and then people strap on all these things. So on this 99% inference thing, it’s funny, I think people are missing. I think people see that and say, oh, the world is just going to be inference. Let’s go back to early two thousands. And let’s say that there’s a product that exists called, I don’t know, S3, right?
Jon Krohn: 00:11:27 Okay,
Will Falcon: 00:11:28 What
Jon Krohn: 00:11:28 An arbitrary name.
Will Falcon: 00:11:30 Arbitrary name. So this product, people are like, wow, storage. Everyone needs storage, right? Correct. Now it’s this going to be its own market. Is it going to be its own company? That seems weird, but in early two thousands, you probably think that. I think if you look at what happened over time was S3 was just a primitive RDS EC2, all those things are primitives that if you put them in a bucket and you label that bucket, what is that called? Cloud just it wasn’t built yet. And that cloud mostly became a WSI think you’re in this process that we’re going through this building process where the thing that’s in front of us right now is inference, but there are other primitives like vector dbs, like training infrastructure, like Kubernetes, like agents, and there’s dozens of other primitives that would be created that have not yet been created. And not all of those are going to be their own companies. I argue they shouldn’t be. All we’re doing is that we’re in the process of creating something called a cloud. And that cloud, I believe we’re the first ones to actually have that today, which is a new type of AI cloud,
Jon Krohn: 00:12:39 This full stack ai, neo
Will Falcon: 00:12:41 Cloud. Yeah, just put all those things, all those little pieces into a bucket and label that now that’s an AI cloud. And we have the first one of that.
Jon Krohn: 00:12:49 So something that I’ve said to you before, I’ve come into your office and said, how do you explain all of this functionality? There’s so many different things that Lightning AI Studios can do. And so I guess that’s something that we should talk a little bit about is just even this idea of, so we talked about the Lightning AI company, the Voltage Park company. So we’re aware that lighting AI is the software, voltage park is the underlying hardware like Voltage Park. They are literally standing up data centers and GPU centers and there’s people physically screwing and screws
Will Falcon: 00:13:22 And
Jon Krohn: 00:13:22 Running cables and making,
Will Falcon: 00:13:23 We’re standing up our seventh data center today just for context cursor, for example. We build their training cluster.
Jon Krohn: 00:13:32 There you go. That’s super cool.
00:13:34 And so that gives us a sense, probably as much as we really need to know for what Voltage Park, how that works for this kind of audience, for AI practitioners, data scientists, that kind of listener. But the Lightning AI studio part of it, which is the product that Lightning AI has been building for years, and that you described that 30 40 x growth in the past year. Tell us about that lightning AI studio journey. You’ve described it as being kind of analogous to what AWS was, but designed for ai, so all of these bells and whistles together. But if I’m a user, if I’m listening right now, what’s my experience as I type landing AI into Google and start using it for the first time?
Will Falcon: 00:14:18 Yeah, I mean the products evolved, right? I would say our first product, waspy to sliding, which we’ll talk about in a minute, open source for training models, any kind of model including LLMs. And then the problem became, okay, this is cool, and I can train at scale. So at the time, this number, I want to come back to this, you said 99% of all workloads are inference. I was training models of Facebook AI in 2019, and if you ask anyone, what’s the number of GPUs that everyone uses? Everyone has said one 99% of workloads are one GPU, but why is it because people only want one GPU or because it’s hard to do multi GPU. So even the Facebook cluster was massively underutilized. And then I rolled out Pieter Lightning and suddenly people started training on multiple GPUs at Facebook, which was a very good team.
00:15:09 And then it kind of rolled out and eventually they trained most of their models like that, and then it kind of spilled out of Facebook and went to other companies. And that’s how Piper Sliding was born. But you look at stats now, and it’s not that I don’t think anyone would say 99% of workloads or was single GPU, right? It’s just that people didn’t have the tooling to do that. So I argue for inference, it’s not that 99% of the workloads are going to be inference is training is very hard and people don’t yet have the tooling to do that. We do today on Lightning. So if you’re struggling with training, you should go to Lightning. But so the studio is designed to basically be a, people are starting to need this today. So people are using cloud code, right? Well, what’s the problem with cloud code on your laptop could delete everything. So people are starting to try to find these cloud environments. That’s what a studio is.
Jon Krohn: 00:15:59 You’re also limited in, you can’t be training models in cloud code on your local machine.
Will Falcon: 00:16:04 Exactly. So Lightning gives you the studio. So Lightning has many products. One of them is studios. The studio is a cloud development environment sandbox, if you must call it that. That’s persistent. That acts like your laptop. So you could go and run cloud code on there and leave it running overnight and it’ll do something for you and you don’t have any kind of problem that is going to delete your laptop or anything. And you can have 20 of these running at the same time all on different GPUs and CPUs and things.
Jon Krohn: 00:16:30 And it doesn’t matter if your preferred environment is vs code or Jupyter Notebooks.
Will Falcon: 00:16:34 Well, that’s the idea of how you connect to the environment.
Jon Krohn: 00:16:36 Oh, right.
Will Falcon: 00:16:36 Yeah. So I want to separate the environment itself from how you code in that environment.
Jon Krohn: 00:16:41 I see
Will Falcon: 00:16:42 We provide a cloud kind of VS code interface, and we have Vibe coding in there as well. So you can describe things, but if you prefer to use Cursor locally, go for it. And if you prefer to use cloud code, go for it. We don’t care how you’re connecting to that thing. The point is, you’re getting a cloud environment that can be shared. If you’re an enterprise, it can be audited. For example, you probably don’t want your developers to have local files that are customer sensitive on your laptop, but on there you can have them, right?
Jon Krohn: 00:17:09 Right. Yeah, it’s a very flexible environment. It’s allowing you to be there in the web interface. And that’s kind of the default screen. So go to lightning.ai, you can create a free account. We’ll have something in the show notes for people to be able to skip the queue and get access to some number of free monthly credits. So check out that link to get access to Lightning AI right now. And then from in there you are in this VS code experience that you described as a default. And from there you have access to everything that you imagine you could need to train and deploy AI models. I feel like we could spend literally hours, and I have seen you demo for literally hours on all of the functionality.
Will Falcon: 00:17:58 I dunno if people know this, but lighting AI itself today is built on studios. Every single developer at the company today codes in studios, whether you using Golan or Python, whether you’re training models to an inference or coding a web app, everyone does it on studios today because it’s easy to reproduce, it’s easy to onboard new people, but they will probably code from the local id, but it’s connected to this remote thing. And then I think the other thing to notice is the studios themselves, like I said, are persistent. You can SSH into them, and you don’t have to use a web interface either. There’s a whole command line version of it where you can just start a new studio and just open it up and it’s like you open a new terminal, that type, that’s remote terminal, and now you can do whatever you want. So there’s that developer experience as well.
Jon Krohn: 00:18:44 For sure. Yeah. So whether you’re more comfortable and it’s kind of sometimes easier to get started maybe right off the bat, especially maybe if you haven’t been coding in a little while, you’re coming back to it, you’re like, oh, I’ve heard that all this tooling makes training and deploying AI models easier than ever before. Maybe you just get started within the web app itself, but if you are already used to doing things all the time in Cursor or VS code or Jupyter Notebooks or whatever environment you prefer, you can very easily at the click of a button connect that or yeah, just like you said, a terminal, any of those kinds of experiences connect to this remote studios instance. And so you get all of the security, all of the flexibility