Jon Krohn: 00:00:00
This is episode number 819 with Luka Anicin, CEO of Datablooz.
00:00:05
Today’s episode is brought to you by AWS Cloud Computing Services, and by Gurobi, the decision intelligence leader.
00:00:17
Welcome to the Super Data Science podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now, let’s make the complex simple.
00:00:48
Welcome back to the Super Data Science podcast. Today’s episode is all about PyTorch. And so we’re lucky to have Luka Anicin, a leading PyTorch expert as our guest on the show. Luka is one of Udemy’s all-time best-selling instructors on AI. Over 500,000 students have taken his courses. His latest course available exclusively at www.superdatascience.com is called PyTorch, from Zero to Hero. He’s also the CEO of full lifecycle AI consultancy, Datablooz, he holds a bachelor’s in computer science, a master’s in data science, and is nearing completion of his PhD in Applied AI.
00:01:23
Today’s episode will probably appeal most to hands-on practitioners like data scientists, software developers, and ML engineers. In it, Luka details what the popular Python library PyTorch is for, why you would select PyTorch over TensorFlow or Scikit-learn, the tensor-based building blocks PyTorch provides for designing, training, and deploying state-of-the-art, deep learning neural networks, including large language models, his top tips for accurate and efficient deep learning, his guidance on PyTorch portfolio projects, and real-world PyTorch case studies from his experience leading an AI consultancy. All right, are you ready for this tremendous episode? Let’s go.
00:02:03
Luka, welcome to the Super Data Science podcast. It’s so great to have you on the show. Where are you calling in from today?
Luka Anicin: 00:02:09
Thank you for having me here. I’m currently from Belgrade, Serbia.
Jon Krohn: 00:02:14
Very nice. I haven’t been to Serbia. It’s a country that I have been meaning to check out maybe next summer. I think the summer is the best time to be doing it.
Luka Anicin: 00:02:26
Okay. Yeah, generally, it was, but recently, we got too hot. It’s really, really hot. I would recommend-
Jon Krohn: 00:02:33
Climate change.
Luka Anicin: 00:02:34
Yeah, I would recommend either Spring or Autumn, it’s much, much more comfortable being here.
Jon Krohn: 00:02:41
Nice. All right. And so we know each other through Kirill Eremenko, the founder and original host of the Super Data Science podcast. He’s also the founder of the SuperDataScience company, and you have a long association with www.superdatascience.com, the education platform, you’ve been creating courses there for years, and hundreds of thousands of students… so for half a million students have enjoyed your courses that you’ve developed with SuperDataScience. So super cool to have you on. Specifically today, we are here to talk about your new PyTorch course, it’s called PyTorch, from Zero to Hero, and if I understand correctly, it’s available exclusively in www.superdatascience.com, right?
Luka Anicin: 00:03:26
Yeah, that’s completely correct.
Jon Krohn: 00:03:27
Awesome. So let’s start off by identifying for our listeners, who haven’t used PyTorch before, what the PyTorch library is, why you would use it, and I think, particularly, why would you use PyTorch relative to TensorFlow, which I think is the primary alternative for most PyTorch use cases.
Luka Anicin: 00:03:47
Yeah, all great questions. So I would say that for anybody that’s currently just jumping into the area of AI deep learning and developing custom algorithms, PyTorch is there to help us develop deep learning, mostly deep learning algorithms, that optimizes based on some data and allows us to process the data much faster and craft algorithms in easy way so we don’t have to code everything around it just to get it started. So it’s basically a great starting point for us to develop custom algorithms for our projects that we want to do. And yeah, why now? From the area when everything started, TensorFlow was one of the best libraries… and to this day, it still is, one of the best libraries to develop deep learning algorithms, but over the time, PyTorch was simplified enough so anybody can use it in a pretty decent way without thinking too much around a lot of things happening around it. So no overhead.
00:04:59
That’s one of the very great bonuses of this library. And when you take a look at big websites for tracking the research these days, for example, papers with code, and there is a statistics of how many… or what is the percentage of new release papers that have code associated with it were developed using PyTorch over some other libraries, like up to 70% is using PyTorch. So if somebody’s just starting out developing some, for example, algorithm that is backbone of their startup, or an idea based on some research, there is a high likelihood that there is already something implemented for us in PyTorch, which is amazing advantage for anybody these days, especially in this ever moving and really fast moving environment that we are currently living into. I’m in love with TensorFlow. I always been using TensorFlow for all of my projects in the very beginning.
00:06:00
However, as time progresses, TensorFlow was left behind, and currently, all of the big libraries and big algorithms were moved to PyTorch. And my team at Datablooz is currently using only PyTorch for their projects. And if we take a look for Hugging Face or any library that is supporting both backbones, like either TensorFlow or PyTorch, mostly PyTorch is pretty dominant there as well.
Jon Krohn: 00:06:28
Okay. So it sounds like if anybody is looking at designing, or training, or deploying machine learning models, your recommendation today would be just use PyTorch, that you don’t need to be using TensorFlow, but what about Scikit-learn? So Scikit-learn would typically be the easiest API for somebody to be getting started with machine learning models. What do you think about Scikit-learn versus PyTorch?
Luka Anicin: 00:06:52
So first of all, I love Scikit-learn. I would say that there is a place for both. Scikit-learn is there for common machine learning algorithms, so no deep learning ones. And I would still, to this day… we have couple of production algorithms written using Scikit-learn a couple of months ago. So we are using that as well. When it comes to deep learning, when it comes to creating something custom, like recommendation systems or computer vision algorithms, we use primarily PyTorch. There is a difference in that sense. If you want to do in a deep learning area, you would use that, and if you want to use normal or common machine learning algorithms, I would go with Scikit-learn there. Yeah.
Jon Krohn: 00:07:41
But you could actually implement… anything that is in Scikit-learn, you could do it in PyTorch because it’s a highly flexible library. I think to speak really generally, PyTorch and TensorFlow, we could think of them as automatic differentiation libraries, where they provide you ways of performing the partial derivative calculus that you need to train machine learning algorithms, most machine learning algorithms with gradient descent. And so they’re highly flexible. But I guess the point is that for most non deep learning machine learning algorithms, they are so tried and tested and we have such a clear idea of what kinds of hyperparameters, what kinds of adjustments you’d want to make to your model that there’s rarely going to be a need for using PyTorch for non deep learning approaches because Scikit-learn can basically handle all the common use cases.
Luka Anicin: 00:08:40
That’s a really interesting point that you’re bringing up. Yeah. So when it comes to native machine learning algorithms, yes, by all means, you can implement anything of them in PyTorch or using PyTorch. I actually implemented a couple of them just for fun, using PyTorch just to compare the speed. And when I did it, maybe I did it wrongly, but it had some overhead on top of it. So the speed of inference was much… well, longer in PyTorch version than the Scikit-learn version because Scikit-learn was developed primarily for that, and it has a lot of, computationally, things that is happening in the background that you and I don’t really think about or we don’t have to think about these days because it’s done for us. So yeah, generally, you can do any optimization or any algorithm using PyTorch, but it has some pros and cons when it comes to native ones.
Jon Krohn: 00:09:42
Very cool. All right. So you’ve made a strong case here, that if any of our listeners are thinking of developing, training, or deploying deep learning models, they should be thinking about PyTorch as their Python library of choice. What are the key fundamentals of PyTorch? I realize this is something that is probably lends itself better to hands-on tutorial like your PyTorch, from Zero to Hero course in www.superdatascience.com, but just generally, what are the fundamentals of PyTorch that you can describe in this primarily audio podcast format?
Luka Anicin: 00:10:21
When you say fundamental parts, do you mean primarily APIs that are supported by this library?
Jon Krohn: 00:10:28
Yeah, I guess so. So if you’re getting started in PyTorch, there’s probably key things, like working with tensors and performing operations with these tensors. So I think those kinds of fundamentals would be the key building blocks of doing any kind of… whether we’re talking about a deep learning model or any kind of machine learning model. What are the fundamental building blocks that you work with when you are working at PyTorch?
Luka Anicin: 00:10:54
Yeah. The fundamental building block when it comes to PyTorch is tensor, and it has its own object that is supported by all the most computations that is happening under the hood using PyTorch. So the first thing that I would learn when it comes to PyTorch, I would learn how to operate with tensors in a good way, how to load the dataset in it, how to structure it so I can do some computation on top of it, how to do computation with two or multiple tensors at one time, because when it comes to large neural networks or any deep learning algorithms, small, to big, it always comes to simple computation of doing on top of matrices, which is, at the end of the day, it’s tensor in an n-dimensional space. So starting from there, when you go there, the next step is how to optimize, what is the loss function, how to calculate the loss on top of it.
Jon Krohn: 00:12:01
One sec, quickly before you move on to that, I’m just going to recap back to what you’ve said. So the fundamental building block when we’re working in PyTorch, or actually in TensorFlow, is the tensor, and you hear it right there in the TensorFlow name. And so I think we should maybe, just quickly… you set it there rapidly, you said that a tensor is basically a matrix in an n-dimensional space, but I want to underline that or emphasize that for our listeners that aren’t aware of what a tensor is. So a tensor is basically… it is an abstraction of the idea of a matrix into any number of dimensions. So you can have a zero-dimensional tensor, which is just a single value, a single number, also known as a scalar, you can have a one-dimensional tensor, which is just a one dimensional array, you could think of that as a column of information, like a column of numbers in a spreadsheet, is it a vector tensor, an array?
00:13:02
And then a matrix tensor is two dimensions, and that’s the most common, probably, if you think about the most canonical kind of tensor shape, it’s two dimensions, but tensors are more than just zero-dimensional scalars, one-dimensional vectors, two-dimensional matrices, because it is, exactly as you said, in that I’m now highlighting, it’s this abstraction into any number of dimensions. So you can have a three-dimensional, four-dimensional, five-dimensional, a hundred-dimensional tensor, in theory, though, I don’t think we see a hundred-dimensional tensors in practice in machine learning very much.
Luka Anicin: 00:13:39
Yeah, maybe somewhere, I never saw it myself. But the three-dimensional tensors is really common these days, and that is an image. So any computational top of images is basically three-dimensional.
Jon Krohn: 00:13:53
A color image, because you have that-
Luka Anicin: 00:13:55
Color of course. Well, if you put a third dimension of one so you don’t have more than gray scale, so it’s still three-dimensions.
Jon Krohn: 00:14:04
Right. Right. Right. So yeah, you’re saying that even if you’re working with a black and white image, you would define it as a three-dimensional tensor, but one of those dimensions has a depth of one, because it’s just black… darkness of pixels. But as soon as you add that color dimension, so you then have a red channel, a green channel, a blue channel, and so you end up having… when you’re representing even just a static image, you have a three-dimensional tensor because you have a matrix in red, a matrix in green, a matrix in blue where every one of the cells, if you thought about it as an Excel spreadsheet, every single one of the cells in this spreadsheet has… it describes how dark the redness should be, the greenness should be, the blueness should be at each one of those locations. So it’s like having three Excel spreadsheets stacked on top of each other, each one telling you how much red, how much blue, how much green there should be at any given point in the image, and then altogether, that allows us to create any full color image.
00:15:11
Yeah, so that’s your three-dimensional tensor. And then from there, you very quickly have four-dimensional tensors, because if you think about a film, a movie clip in color, and now all of a sudden, we need another dimension, a fourth dimension to represent how those frames are changing over time.
Luka Anicin: 00:15:30
Yeah. And when it comes to that, you also have, these days, especially, when we have amazing algorithms for computer vision and to optimize on the 3D scans of, for example, lungs. And for that, you need a 3D convolution or a specific designed neural network to process 3D images, which is a scan of multiple images stacked next to each other. So yeah, basically… or in healthcare, you have a lot of… in radiology, especially these days, when you’re processing scans in 3D.
Jon Krohn: 00:16:08
This episode of Super Data Science is brought to you by AWS Trainium and Inferentia, the ideal accelerators for generative AI. AWS Trainium and Inferentia chips are purpose-built by AWS to train and to deploy large-scale models. Whether you are building with large language models or latent diffusion models, you no longer have to choose between optimizing performance or lowering costs. Learn more about how you can save up to 50% on training costs and up to 40% on inference costs with these high-performance accelerators. We have all the links for getting started right away in the show notes. Awesome. Now, back to our show.
00:16:49
Nice. Yeah, makes perfect sense. And what we’ve been talking about… because it’s easier for us, I think, to describe visually these kinds of examples involving different types of input data, so color images, color video, the kinds of three-dimensional scans, radiological scans that you’re describing, but these tensors aren’t just for storing the inputs, it’s for storing all of the weights, and in the case of a neural network, the weights and the biases of the neural network, and it’s also for storing the outputs, so our predictions. So these tensors are used to store data as well as model weights throughout the design, the training, and the deployment of our machine learning models, including our deep learning models.
00:17:40
And so these tensors, yeah, this fundamental building block is seen throughout, and they can end up being within the deep learning models themselves, they can end up being many-dimensional, because depending on when you’re talking about a 3D convolution, then you need to have high-dimensional tensors representations inside the model in order to represent those operations. So I’m hogging the mic here and describing what I think is key here, about these fundamental building blocks. So then when we go from these… if you think about, statically, having a tensor of information, then how do we go… what’s it like in PyTorch to then be doing operations to go from having these static representations to being able to do something like being able to train a machine learning model?
Luka Anicin: 00:18:38
Yeah. So you explain amazingly, when you are building a deep learning model, you have each layer or each part of that model being a matrix or n-dimensional tensor stored to represent weights and biases that we call layers basically. And there are many, many types of layers depending on what task we are trying to solve there. And in machine learning, we can’t control the inputs and the outputs when it comes to the dataset. So those are static representation of our data, that’s what we collected, and there are many ways to collect the data, and that’s something really crucial to do right before the algorithm comes into play. Now, I acknowledge, okay, we can’t control, we can’t change the image itself, or can’t control the prediction of what we want to achieve with that image, but we need to somehow learn or somehow make our algorithm reproduce these outputs on the newly come data with which users might upload.
00:19:54
And okay, what can we change? Well, we can change those representations of layers inside of our layers, and we know now that those are just numbers in matrices stored for now, and that’s what we tend to change. So how the process is changing is basically we take some images, or whatever the data that we have at the input, we do operations with the neural network, depending on, of course, the layer type, the operation might change, and then we have some prediction. At the very beginning, those predictions will be really poor because the model itself doesn’t really know what we want to achieve with it. So we make some predictions and we say, “Okay, cool, so now that we have these predictions on top of it, let’s see how right we are, or how wrong we are,” whatever the question you ask. And that’s the loss function, that’s the function that might tell us, “Okay, cool. So with this image, you are supposed to say it’s a cat, but you said it’s a dog. So sorry, but try again.”
00:21:02
And we are not doing that with a single image, we are doing with all the images in our dataset. No matter how complex our task might be, are we predicting next board in a sequence, which is LLM, or are we predicting what is the object on top of the image or multiple objects, we still need somehow to determine whether or not we are correct. And that is the loss function. We do… across all the samples in our data training set, and once we do that, we take that information and perform partial derivatives on top of every single layer, coming from the last one to the first one, based on the loss function that is telling us, “Okay, so this is how wrong you are, please correct yourself in some way.” Right now, we are just telling it correct, a small percentage basically. And that’s what we do.
00:21:57
Now, after the single step, that’s called an epoch, we have some corrections inside our network, inside those weights, we still have the static data, we still have the same images, same output, but now those representations of matrices inside of each layer are a bit changed based on that wrong predictions that we made. So now, let’s try again and do basically that all over again. So we make predictions, we get again estimated how wrong we are, and we do now with the newly estimated wrong list, how to correct the weights again. So we do that multiple, multiple times until we say, “Okay, now, we are good enough.” We are never perfect, because we are working with really complex models, and not only that, but we have many samples. And every single sample in a dataset might try to pull our weights in some direction and we are making compromises all over again.
Jon Krohn: 00:22:57
Nice. Yes. So to summarize back what you just said, we have our static tensors, including the static tensors of our dataset that we train on, and then we define some kind of loss function, some kind of way of quantifying where our model is wrong. You gave the example there of if it’s a picture of a cat and our model predicts it’s a dog, that’s wrong. And so we use this loss function to quantify that wrongness. And then the magic and the power of a library, like PyTorch or TensorFlow, frankly, is that then arbitrarily, regardless of what kind of deep learning model we designed… you talked about kinds of layers there. So you can have a small number of layers, you could have a shallow neural network with just the simplest kind of neural network layers, like a dense layer, and so you could have just a single layer in your neural network, a single dense layer, or you could have dozens of layers that are different kinds, convolutions, and dense layers, recurrent layers, transformers.
00:24:01
There’s all these different kinds of ways that we can have our information interact within the neural network itself. But the point is that whether we’re using PyTorch or we’re using TensorFlow, the magic, the power of these libraries is that we can do partial derivative calculus from that loss function, that quantification of how we’re wrong, and use partial derivatives to identify through all of these layers of our neural network, whether it’s a small number or a large number, to be able to say, “Okay, how can I adjust this model weight so that in the future, I’m more likely to guess correctly that this is a cat, instead of incorrectly, that it’s a dog?”
00:24:46
Yeah. So that is the power of these libraries of PyTorch and TensorFlow. And yeah, it seems like you gave those examples earlier about things like in 70% of machine learning papers that are published today, they’re using PyTorch. But why is it, do you think, that PyTorch has become so much more popular than TensorFlow in recent years? So it was the case TensorFlow had a head start, it was the de facto automatic differentiation library about five years ago, but then PyTorch came on the scene and it has overtaken TensorFlow. What do you think it is about PyTorch that allowed it to become the more popular automatic differentiation library?
Luka Anicin: 00:25:30
The same reason why Python is really popular among a lot of programmers these days, it’s easy to write. Its syntax is much easier to understand, and you don’t have to think about a lot of overhead of other programming languages. And most programmers, yeah, they are aware of the bad side of Python, it’s slower than the rest of the languages, it doesn’t support this and that, all of that is fine, but millions of programmers these days are still converging to Python because it’s easy to write. And of course, because of that, you have a large community that creates a lot of libraries around it, and that’s self-feeding prophecy basically. The same goes for PyTorch. When it was released for the first time, it was really easy to write and understand, and the syntax from that point to this day is not changed that much. You have additions here or there, but the syntax basically stayed the same.
00:26:40
In the TensorFlow, first of all, you had to define a graph, then execute the graph, and then basically run the inference on top of that graph, and you don’t see in real-time what mistakes you potentially made until you execute your code. So that provided a lot of headache to the beginners, especially at the earlier day, you have ChatGPT to check on your code, or ask somebody of your buddies that are currently in AI on how to debug certain part of it. In the PyTorch, when it was released, it was much later, so more community in the area, they wanted to remove the headache that TensorFlow introduced with a lot of API’s. So that is another part of it.
00:27:30
So you had a lot of API, the Community API, the legacy API, the Keras API, the layers API, so many, many TensorFlow API’s that, later on, were put under the single one, so following the architectural design of PyTorch. So you see that that is a reason. Even TensorFlow moved towards the PyTorch in the sense of the syntax because it was easy, it was elegant, and it was much easier to understand when you’re looking at your colleague’s scope. Yeah.
Jon Krohn: 00:28:01
Nice. Yeah, I agree 100%. It’s basically for the same reasons that Python is popular or the same reasons that PyTorch is popular, which is that it’s easy to understand code. In the case of PyTorch, it is a lot more Pythonic in the way that methods work relative to TensorFlow. And I think if you were a TensorFlow developer, you would probably think… especially years ago, you’d think, “Why would somebody switch to PyTorch when it doesn’t have all these great efficiencies that we built in in TensorFlow?” We’ve been so thoughtful in developing TensorFlow to make sure that we’re being as efficient as possible with whatever devices are available, with whatever CPUs or GPUs are available for performing computations on.
00:28:47
But ultimately, for the most part, you don’t need to worry about when you’re composing your code, when you’re creating your network, when you’re in a Jupyter Notebook, you typically don’t need to be so concerned about making sure that everything is 100% as efficient as possible, you just want to get something done. And what PyTorch did was they said, “Okay, then after you’ve created your computational graph, after you’ve already figured out how you want to be running your neural network, then afterwards, you can execute a script to figure out how to efficiently allocate this across whatever resources you have.”
Luka Anicin: 00:29:23
Yeah. Yeah, and especially at the beginning, even when PyTorch was introduced, the support of TensorFlow was much, much larger, so you didn’t have that much of an incentive to move there, especially when it comes to deployment. So you had, to this day, TFX, TensorFlow Extended, which is an amazing library, still complicated to this day, but an amazing library to deploy your deep learning algorithms. Deep learning management of a lot of algorithms in production is not an easy endeavor to have. So we don’t have perfect tools even today, and PyTorch’s support for deployment… there is a TorchServe, for example, that supports you to deploy your algorithms, but it’s not perfect one, and especially from an optimization standpoint, and you need an inference there to be as perfect as possible, but there are more ways these days than back in the day. So if you were a TensorFlow developer deploying some algorithms, you didn’t have a good reason to deploy… to move to any other language or any other library than just TensorFlow, these days, you have much more options. And even PyTorch is supported more freely right now to be supported by these tools.
Jon Krohn: 00:30:50
In a recent episode of this podcast, the mathematical optimization guru, Jerry Yurchisin, joined us to detail how you can leverage mathematical optimization to drive commercial decision-making, giving you the confidence to deliver provably optimal decisions. This is where Gurobi optimization comes into play. Trusted by most of the world’s leading enterprises, Gurobi’s cutting edge optimization solver, lightweight APIs, and flexible deployment simplify the data to decision journey. And thankfully, if you’re new to mathematical optimization approaches, Gurobi offers a wealth of resources for data scientists including hands-on training, comprehensive Jupyter Notebook examples, and extensive free online courses. Check out episode number 813 of this podcast to learn more about mathematical optimization and all of these great resources from Gurobi. That’s episode number 813.
00:31:38
Excellent. So Luka, what kinds of tips do you have for building really… I guess we never know whether something is truly the best that it can be, but how can we build more accurate or more efficient models in PyTorch? What kinds of tips and tricks do you have?
Luka Anicin: 00:31:58
I don’t have any unique tips and tricks, I’m just using whatever works. So I always start with a really simple way of thinking it, “Okay, start with the simplest model and then build from there.” You tend to think that over complicated models will work better, or because you have these easy tools, that you need to put more layers, more different types of layers or just number of layers, bigger layers, bigger networks and stuff like that. And it does work sometimes, but if you didn’t test much simpler models, then it comes to different comparison methods and you can’t get to the point of what actually is the bare minimum, what works. So what is the baseline there? So start from there. And the other point is not my point, it’s [inaudible 00:32:57] point that you have two ways of thinking about implementing and optimizing your models, that is a model way or data way.
00:33:09
So a lot of especially beginners in this area think about model way. So now, I have an algorithm, it’s not working properly, so what can I do with this? Can I increase the number of layers? Can I increase the complexity of single layer? Is the loss function the wrong one or hyperparameters one? All of the right questions, but you are not thinking about the data part. So sometimes your data is not right. So you can do and optimize your models as far as it can go, but sometimes you will go up to the ceiling and you will not be able to increase the accuracy of your models or the general performance of everything just by changing the model itself. So it’s much easier to think about that because it’s a couple of lines of code because of these libraries, but then in most real world cases, collecting more data or relabeling something is the way to go.
00:34:08
One of the projects where I worked a couple of years ago, we reached an amazing accuracy with the model that we designed ourselves. So fully custom model, it’s computer vision task, and then we reached a point of 70 to 80% accuracy. And then researchers in my team… so I had a team of about 20 engineers that they led there, and of course, they thought about, “Okay, can we implement better algorithm? Can we add more layers to it?” And we did a lot of experiments there. But then a couple of engineers said, “Okay, is the data part wrong in this case?” And it turned out that 10% of our whole dataset were wrongly labeled. So even if we increased the complexity of our layers or the models, we would reach the ceiling of, “Hey, but this is completely wrong.” So what we want to predict is not aligned with what we have. So yeah.
00:35:17
And sometimes, the data view is much more safer and, of course, tedious, more tedious than just model one, but basically, going that route is more safer in the long run just to eliminate it and confirm, “Okay, the data part is correctly labeled, we can’t… or why we can’t reach more data points?” And then start thinking about the model itself.
Jon Krohn: 00:35:43
I see. That makes a lot of sense. And then what about transfer learning? That could potentially allow us to take advantage of a more powerful model that was trained on more data than we have access to.
Luka Anicin: 00:35:53
Yeah. For context, for listeners that are not aware of the transfer learning part, transfer learning unlocked a lot of opportunities, and natively, when we start thinking about it, it doesn’t make sense, but it actually works. So what the transfer learning is basically large models that were trained by different companies like Google, Microsoft, Amazon, Facebook or Meta. There are a lot of tasks and different models trained on those tasks. They invest millions of dollars just to get a small percentage, better than their competitor, and to prove in some dataset that they can do better basically. But at the paper, it doesn’t make sense, but for us, researchers or developers, makes a lot of sense, because they invest a lot of money, so they train the model, and they provide weights to those models. So it’s trained on a certain task. Let’s say a COCO dataset or let’s say ImageNet. So it’s completely open source dataset on 1000 classes. And now, you have a specific task for your startup, for your company where you work, where you need to, let’s say, make a classification between this pencil and this pencil for a factory.
00:37:18
In the original way of thinking, you would need to collect thousands and thousands of different data points or samples depicting between these two types of samples, but that might be impossible or you didn’t have time or resources. Because these companies train these large models, what you do is basically take those models and remove the last layer and say, “Okay, now, you start predicting binary classification between these two pencils,” and you freeze the full network except the last layer that you just added. So you’re basically optimizing just the last part of the network where the rest of the weights are optimized for the different dataset.
00:37:59
And that’s why it doesn’t make sense on paper because it was trained on completely different task. But what they proved that a lot of these weights are transferable and they can be used in many, many cases, and it actually works. So instead of collecting thousands, you can collect hundreds of images, and bam, it works like a charm. So in PyTorch and TensorFlow, and other libraries, especially Hugging Face, these days, you can import these big models like a one line of code. And those models are really powerful for image classification, object detection, OCR, transfer learning for text these days because of LLM. So you can do the transfer learning, or basically, any task that you can think of and you have potentially a model built inside of these libraries so you can just import it and start working from there.
Jon Krohn: 00:38:52
Excellent. Yeah. So to recap back to what you said there, we as machine learning practitioners, we can take advantage of these huge models trained on huge datasets that might’ve cost millions of dollars to train. In some cases, like the most extreme examples like LLaMA 3 today, at time of recording, they might’ve spent tens or hundreds of millions of dollars on this whole project to create these gigantic large language models trained on billions or trillions of tokens, of pieces of natural language information from the public internet and also from maybe their own proprietary sources. And we can use things like Hugging Face, which you mentioned there, to easily download these model weights and then fine tune them to some specific task of ours, and then with some relatively small number of data points, like you mentioned, very commonly, now just hundreds of data points, you can have an extremely high performing model in natural language processing and machine vision for some specific task that you want to have running for your business, or your platform, or some personal use case.
00:40:02
And that fine-tuning, because you’re only fine-tuning a small number of the model weights from the whole architecture, fine-tuning on those hundreds of data points might cost tens of dollars or maybe even less. And you can see you have these really powerful models for very low cost. It’s a really cool thing. So in your course in PyTorch, from Zero to Hero in www.superdatascience.com, you have a project at the end of your entire course. So all of the kinds of things that we’ve talked about in this podcast episode, from the fundamental building blocks, tensors, to doing tensor operations, to creating neural network architectures, training those neural network models, doing the kinds of transfer learning and fine-tuning that we’ve been talking about most recently, all of that is covered in your course, and then the final project at the end of the course involves building an image capturing system, and it uses the COCO dataset that you actually alluded to in your response a few minutes ago. So tell us about this project so that people can get a sense of why it’s important to be doing these kinds of hands-on projects, these kinds of real world heavyweight projects when they’re trying to learn something new like PyTorch?
Luka Anicin: 00:41:26
Yeah. So when I started scripting out the whole course and start thinking about what is the end goal, why would somebody go through the whole process, I wanted them to feel that they accomplished something big and they have a project that they can put on the resume later on. And these days, everything is around text, everything is around simple Kaggle projects, which I don’t have anything against, but a lot of people that are applying to my job, I can’t figure out who is better than whom just by looking at the project. Because in most cases, it’s the same resolution, it’s the same type of the algorithm applied all over again. So I wanted them to feel a much bigger project and to have something interesting to work on. So when I started working in general AI, the first course that I looked at… one of the first courses was CS231n, I think, from Stanford, where Andrej Karpathy actually talked about his dissertation, which is image captioning system. And I wanted to actually bring back that kind of an old project to the people with new technologies.
00:42:47
So we are applying all the things that we learned throughout the course to build parts of the system to one big algorithm that combines working with text that combines working with images and couple of loss functions working together to optimize multiple architectures together to achieve the same goal, which is something commonly done in the real world. And that’s what I wanted to achieve. So we are starting from the basic as understanding the dataset, processing it, to actually making predictions both on text and images.
Jon Krohn: 00:43:20
Very nice. Yeah, it sounds like a great project. It’s one I would love to do myself. Maybe I will be doing it in www.superdatascience.com soon. Now we have a great idea about the importance of projects you talked there in your response about building portfolios. This is actually something that you do in the Super Data Science platform. So you regularly run labs live in the platform with people interactively to help them flesh out their portfolio for getting a job promotion or for landing their first data science job. Tell us about these live labs that run in the platform and how that helps people build their portfolio.
Luka Anicin: 00:43:57
Yeah. So we run, every month, a single lab or multiple, depending on if you just look at me as an instructor or multiple instructors, we have multiple labs, of course. The goal is basically really focused on a certain project, mini project that we can do in an hour or two and solve a specific task, learn about it, and go from there. That’s basically the end goal of that lab. It can go from, okay, how to create a custom dataset, what are all the techniques, like from the web scraping to collecting the data, buying the third party data and structuring it in a way that you can publish it, for example, on your own website or on the Hugging Face, for example, where I come always from the perspective of hiring manager, because my company, I’m the one ultimately making decision who’s going to join our team. So I saw a lot of portfolios and I’m basically helping people around the world through SDS Labs on how to achieve much, much better and more suitable portfolio projects that most hire managers would be happier to see them. Basically that.
Jon Krohn: 00:45:12
Since April, I’ve been offering my Machine Learning Foundations curriculum live online via a series of 14 training sessions within the O’Reilly platform. My curriculum provides all the foundational knowledge you need to understand modern ML applications, including deep learning, LLMs, and AI in general. The Linear Algebra, Calculus, Probability and Statistics classes are all in the rear view mirror, but the final three classes in the series, which are all on Computer Science, they are still to come. Registration for the first of these computer science classes is open now, that’s Intro to Data Structures and Algorithms on September 25th, and Data Structures and Algorithms Level 2 on Hashing, Trees, and Graphs on October 23rd. And registration will open soon for the 14th and final class Optimization that will be held on November 20th. If you don’t already have access to O’Reilly, you can get a free 30-day trial via my special code, which is also in the show notes.
00:46:07
Nice. Yeah, it makes so much sense. And when Carol told me that he was running these live labs for building people’s portfolios in the www.superdatascience.com platform, I thought it sounded like such a great idea. Maybe someday I should also be running some of my own labs in there. I don’t have the bandwidth right now, but it sounds like a lot of fun, and I would love to do that. Going beyond creating portfolio projects that are real world, you have a ton of experience doing real world projects for real world companies. So not just these kinds of, it takes one or two hours to scrape the data and trying to deploy, we’re talking about big projects, many month projects. And so tell us first about your consultancy, Datablooz. Data, and then blooz, spelled B-L-O-O-Z. So what does that mean? What is blooz? What’s Datablooz? Tell us about that name-
Luka Anicin: 00:47:04
It doesn’t mean anything.
Jon Krohn: 00:47:05
It doesn’t mean anything.
Luka Anicin: 00:47:05
No, it is a funny story because I wanted to do blues in a regular spelling, but the domain was bought out, so unfortunately, I couldn’t. So I said, “Okay, what can I spell differently but it sounds the same when you read it?” So that’s why I bought the domain Datablooz. And the thing that I just thought about like a slogan was Play Your Data Like Music, and that’s why it’s Data blues. Now, we are not using that anymore. But generally, that word doesn’t mean a thing to me at least, except the name.
Jon Krohn: 00:47:45
Nice. Cool. Yeah. So that’s the etymology of Datablooz, but tell us about the company and where you’re at today.
Luka Anicin: 00:47:52
Yeah. So I started out the company as a solopreneur, basically myself helping a lot of companies develop their own algorithms, consulting the founders of those algorithms, or doing corporate training. So basically, I’ve done all the spectrum myself, and I’ve done that for about two years without any help whatsoever. I started way back when the whole ChatGPT thing was not a thing, so 2001, basically… 2021, 2022 beginning. And yeah, basically, I started out like that, helping mostly startups build their own initial algorithm, deploying it, and then I would help them with the maintaining, hiring the first hire, educating the founders of those startups on how to manage people in AI and data science, and going from there. For a couple of companies, I understand when they approached me for that type of an engagement, they didn’t have an oversight of what is possible with their data. So basically, they had a single idea, but it was not… a blurry one, without clear vision of where to go from there. So I understood that they need more help of understanding what’s possible.
00:49:16
And that’s where I created a way of achieving just that, which is called AI Opportunity Mapping, where I go in the company, work with founders, and for a couple of workshops, I deliver their clear vision and a roadmap, when to do something, how to do something. We have a proprietary tool for that. So we basically cover end-to-end strategical point for founders. And in the first year of my engagement, I helped a couple of startups. They successfully raised over $25 million, either in raising capital or selling their algorithms that they built. I was a contractor, so a lot of people ask me, “Did they get the cut?” No, I didn’t, unfortunately, but it’s amazing for those startups.
00:50:03
And when the whole thing with ChatGPT came about, a lot of marketing was done for me, so I didn’t have to do a lot of marketing myself, and a lot of more companies started approaching me, and that’s where I started to hire more people. And now I have team of 10 engineers, 5 consultants, and we are hiring nonstop basically. So yeah, basically, that’s where we are currently. So we are doing end-to-end from a strategic overview of where the companies should go with their data, with their AI initiatives, to actually implementing them if they don’t have internal things for that.
Jon Krohn: 00:50:44
Very cool. And how can people reach out to you? If somebody’s listening to this and they want some guidance on machine learning strategy, or training and deploying production models, how do they engage with Datablooz?
Luka Anicin: 00:50:58
Yeah, the best way is to contact me via LinkedIn, or if they want, they can go to the website, but we are currently redesigning, so it’s not a pretty design just yet, at the time of recording. So maybe in the future when you listen to this, it’s going to be a good website to go there.
Jon Krohn: 00:51:17
Nice. Yes. That’s datablooz.com, I guess? DataB-L-O-O-Z.com
Luka Anicin: 00:51:23
Yeah, that’s a great question. So one project that is currently in production that was completely written using PyTorch…well, and Hugging Face, but basically, it’s a recommendation system for people that are sellers in a side of a company. So they have B2B sales, they have their own… like a platform where a lot of clients are based in, so they see previous transactions, they see the previous contracts, and when they go to the site and start offering what they want to buy that day, what they would do is basically they prepare the script beforehand what they would like to offer or based on the contract itself. However, in most cases, these companies… it’s a really large company, it’s based in Germany, but we are working for different parts of the company, they don’t really know what to offer. So what we do is basically recommend them in real-time of, “Okay, so this is going to be a great thing to offer to your clients.” So recommendation system, basically.
00:52:34
It has tens of thousands of products and the same number of clients as well. I can’t disclose the real numbers of products and the clients, but basically, that’s the gist. So you have a large number of clients, B2B, large number of products, and it’s always recommending a set of product that you need to give back to them.
Jon Krohn: 00:52:55
And so basically, PyTorch is the no-brainer, automatic differentiation library for you in these scenarios because it’s easy for you to collaborate on code with each other, I guess, it’s easy for you to share code with technical people at the client for them to be able to understand the PyTorch code and work with it easily, and then with the ecosystem that has developed around PyTorch in recent years, you are still able to get all of the kinds of strong deployment, kinds of considerations that you want to have in terms of efficiency.
Luka Anicin: 00:53:25
Yeah, exactly. And in most cases, in this project, as you mentioned, a couple of engineers are working, so they need a really nice tool so they can collaborate on top of that. Somebody will work on the model, somebody will work on the data process, everything comes in the pipeline, and then the third member maybe will be working on the envelope part. So basically, that’s where it comes real easy when you’re working with these tools.
Jon Krohn: 00:53:51
Awesome. So to wrap things up here in terms of the technical content in our episode, I’d love to hear from you about your journey getting going. So my understanding from Kirill is that you got started in machine learning by doing one of his courses back in 2017. So tell us about where that started, maybe what your initial interest in machine learning was, and how that led to you now being this instructor with more than half a million students, who runs a consultancy with 10 engineering consultants and growing rapidly? It’s quite a story.
Luka Anicin: 00:54:28
Yeah, it’s true. Yeah. So basically, I started out, my journey with IT, way before that. And I always wanted to work as a software engineer, so that’s what I basically got as a bachelor’s degree. I went there no ties to data science or machine learning whatsoever. And I met one of the professors at my university, that back in the 2015, 2016, that he completed his dissertation even before that in computer vision for different healthcare startup. And I got really excited because, “Okay, so you can understand things from the image, that is a really cool thing.” So I did not know a thing about that. So he started teaching me more about the whole area of AI, and back then, I got really excited about the term AI. But when somebody would mention, “Okay, but why not do machine learning?” I didn’t understand the difference. And I said, “Oh, machine learning is boring. I can’t do that. Come on.” So no, I know machine learning.
00:55:43
But AI is amazing, which that’s something that generally in my first month of learning, without knowing what is the difference or where is the overlap between these two terms, that’s what I thought, that was my mindset. So no understanding whatsoever. And yeah, I got one project done with scraping the Twitter, doing some data analysis on top of it. I moved to Python back then because I was primarily doing Java. And I got scholarship to move from Serbia to Latvia. And I got really excited about machine learning, got a free course on CS231n from Stanford on YouTube. And I started learning that, but it was a step above my understanding basically back then. So I needed something to introduce me in a more suitable way, much, much better way to get started.
00:56:43
And that’s where I found out about Udemy basically. And I didn’t know about Udemy back then. So I got there and I said, “Okay, so there is a machine learning course,” and that’s where I got to Machine Learning A-Z, and that was my first course. I got many more courses after that from Kirill and Hadelin basically. But yeah, so it was my first course introduction there. It helped me tremendously from understanding basic mathematics, to more advanced mathematics, to actually learning about Python in more depth, and then learning about Scikit-learn that we mentioned previously in the episode.
00:57:25
So yeah, I learned a lot from that course and that’s what prepared me to actually endeavor on more rigorous studies myself. So that’s where I bought couple of books on mathematics and then start writing everything from scratch. When I learned from their course Deep Learning A-Z about RNN’s, I derived the whole RNN in the notebook. I have somewhere there a notebook where I derived the whole notebook on a dataset that I wrote on a piece of paper, and it took me like two months. I didn’t do anything except that, to train the neural network in a notebook. It’s a worst idea ever, please don’t do that. But basically, it didn’t help me any, like, okay, understanding, but it didn’t help me. So please don’t do it today.
Jon Krohn: 00:58:13
I think that’s a really interesting intellectual exercise though. And so to be clear here, I can tell from the gestures that Luka is making, so when he says he did this in a notebook, he’s talking about a paper and pencil, not [inaudible 00:58:24], but that he spent two months literally writing out the math of how… not just how the information flows through a recurrent neural network, but actually how you do the model weight updates as well using real data. Yeah, that is ambitious for sure.
Luka Anicin: 00:58:44
It is, it is. And I did it because it was so difficult for me to comprehend the time-dependent optimization of a recurrent neural network. I didn’t get it from graphs, that’s why I did it. But basically, that… and after I understood that, I said, “Okay, I need now to write a CNN from scratch using just NumPy, that was my task. And I’ve done it, and I had the worst laptop ever, it was the worst, the laptop that I had to buy for myself because I was student, no GPU, nothing. So I wrote it, of course, the code was not optimized. And how I cooled it, please don’t do this, again, I put my laptop in a refrigerator, just to train the neural network because it was too hot. Yeah. So it was my starting point, but basically, that led me to understanding every single piece of what can go wrong from these models, how to understand the data. And it took me about a year of studying.
00:59:53
I worked part-time on software engineering gig. And back then, I landed my first internship by understanding CNNs and their RNNs and how they can potentially help with image isolation and object isolation in image. I landed my first internship, then a first job at BlueLife AI. That’s where I worked with Kirill and Hadelin. And from that point on, I landed a couple of gigs. So worked my way up from there.
Jon Krohn: 01:00:25
Nice. From BlueLife to Datablooz. Cool. That’s a great story. It’s so interesting to hear all the overlaps with Kirill and Hadelin, those two juggernauts of machine learning education. But you’re one of them too. And so it’s cool to see you go from student to being just like them. Have you ever met Kirill and Hadelin in real life?
Luka Anicin: 01:00:49
No, not yet. Actually, tomorrow was supposed to be my wedding at the time of recording-
Jon Krohn: 01:00:56
What?
Luka Anicin: 01:00:57
Yeah. Yeah. But it was at the restaurant where we were supposed to have the wedding, they basically is closed, so we moved to the next year. But that’s why that was the opportunity to meet them in person. But unfortunately, we need to move.
Jon Krohn: 01:01:17
Yeah, it’s a funny thing also. I’ve never met Kirill or Hadelin in real life, but yeah, maybe a wedding will bring us together soon as well.
Luka Anicin: 01:01:27
Yeah.
Jon Krohn: 01:01:27
Awesome. All right. So thank you Luka for this fun and informative episode. Before I let you go, do you have a book recommendation for us?
Luka Anicin: 01:01:34
So yeah, I’m currently reading one, which is called Psycho-Cybernetics.
Jon Krohn: 01:01:40
Psycho-Cybernetics.
Luka Anicin: 01:01:41
Yeah. I like it. It’s something that I would recommend because it have some interesting ideas. Of course, with any book that I’m reading, please always try to juggle the learnings from that. But basically, yeah, that’s what I’m reading right now.
Jon Krohn: 01:01:59
So this is non-fiction, right?
Luka Anicin: 01:02:01
Non-fiction. Yeah.
Jon Krohn: 01:02:02
So it looks like this is by Maxwell Maltz? Is that right?
Luka Anicin: 01:02:08
Yeah.
Jon Krohn: 01:02:08
And so it was written by him in 1960, and it’s like a personal development book. It’s interesting, Psycho-Cybernetics, I thought it might even be a novel about AI taking over it.
Luka Anicin: 01:02:20
It does have that name.
Yeah.
Jon Krohn: 01:02:25
It sounds really futuristic. Cool. And so for people who want to be following you and hearing your thoughts after this episode, obviously, going to the www.superdatascience.com platform and signing up. They will get access to you, they can be doing the kinds of labs that you run at least once a month in there, help people develop their portfolios so they can interact with you through that, they can message you in the platform. How else should people be following you or reaching out to you?
Luka Anicin: 01:02:54
LinkedIn is the best way to approach. Yeah.
Jon Krohn: 01:02:57
All right. Luka, thanks again for taking the time, and it’s too bad you don’t have a wedding tomorrow, but I guess I benefit because you probably wouldn’t be doing this podcast episode if you were. All right. Nice. It was great to meet you, Luka, and we’ll catch you again soon.
Luka Anicin: 01:03:12
Great to meet you. Bye.
Jon Krohn: 01:03:19
Great episode today with Luka Anicin. In it, Luka filled us in on how, thanks to its popularity and ease of use, PyTorch should be your default library of choice whenever architecting, training, or deploying deep learning models in Python, how tensors are the fundamental building block of PyTorch, allowing you to flexibly design and automatically differentiate any continuous equation, including any machine learning model, how you should start simple with your ML model at first to get the most accurate and efficient results, how you can develop a working PyTorch project for your portfolio in as little as a couple of hours, and how he grew over the past seven years from an entry level student on a SuperDataScience course, to teaching over half a million students on AI and running a fast-growing AI consultancy. Cool. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Luka’s social media profiles, as well as my own at www.superdatascience.com/819.
01:04:13
And if you’d like to connect with me in real life, as opposed to online, I’ll be giving a keynote and hosting a half day of talks at Web Summit coming up on November 11th to 14th in Lisbon, in Portugal, with over 70,000 people in attendance. I’m pretty sure it’s the biggest tech conference in the world. It’d be cool to you there. And other folks speaking include Cassie Kozyrkov, the CEO of Groq named Jonathan Ross, and Brazilian football legend, Roberto Carlos. All right. Thanks to everyone on the Super Data Science podcast team, our podcast manager Ivana Zibert, media editor Mario Pombo, operations manager Natalie Ziajski, researcher Serg Masis, writers Dr. Zara Karschay and Silvia Ogweng, and founder Kirill Eremenko. Thanks to all of them for producing another tremendous episode for us today, for enabling that super team to create this free podcast for you.
01:05:08
We are deeply grateful to our sponsors. You can support this show by checking out our sponsor’s links. It really does help. Those sponsor’s links are in the show notes for you to check out, to click on. And if you yourself are interested in sponsoring an episode, you can get the details on how you can do that by heading to jonkrohn.com/podcast. All right. Otherwise, share this episode with people who might benefit from it. Review the episode on your favorite podcasting app or on YouTube, subscribe, of course, if you aren’t already a subscriber, but most importantly, just keep on listening. I’m so grateful to have you listening, and hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there, and I’m looking forward to enjoying another round of the Super Data Science podcast with you very soon.