This is FiveMinuteFriday, What is Support Vector Regression?
Welcome back to the SuperDataScience podcast everybody. Super-excited to have you back here on the show. In today’s episode, I’m going to attempt something that possibly has never been done before, maybe because there’s a reason for it, that it shouldn’t be done on a podcast, but nevertheless, let’s give it a go. I would like to try to explain the intuition behind support vector regression in audio without any video assistance, and do it all in under five minutes. The reason for that is that I just finished re-recording the SVR tutorials, or recording them for the first time because I actually didn’t get to them in the Machine Learning A to Z course, at the onset of that course. So I just finished fixing that part up, and I’m very excited. It’s been a cool ride and I want to share with you, even if you’re not part of the Machine Learning A to Z course, what support vector regression is all about.
So here we go. Support vector regression. The assumption here is going to be that you know what a linear regression is. Now imagine a typical plot that somebody would use to explain linear regression. It’s got all these points, kind of in a diagonal fashion. So, if you put a trend line with ordinary least squares through it, AKA you draw the linear regression, then it’s a almost 45 degree angle line or maybe, let’s say, it’s about a 30 degree angle line from left to right in the first quadrant of your X, Y plot. So that’s the data set that we’re dealing with. It can be anything, it can be representing absolutely anything, but that’s the plot we want to have in our heads. These points, maybe 50 of them, in a kind of a diagonal fashion going from left to right.
Now if you put a linear regression line through it, we use ordinary least squares method. It’s pretty straightforward. We measure the distance from each point to the line. We take the squared sum or the sum of squares of those distances, and we want to find the line that has the minimal, minimal of the sum. That’s an ordinary least squares or the basic linear regression.
Now, with support vector regression it’s a bit different. Imagine you put a line through this dataset, a slightly different line can have a different angle because at the end of the day, the ultimate result will be different, and now this line has a tube around it, [inaudible 00:02:36] size Epsilon, and that size is measured vertically along the Y axis. Now you have a tube around this line and this tube is called the Epsilon insensitive tube.
This term, I believe, was introduced by Vladimir Vapnik back in somewhere in the 80s or 90s, and that is the key to support vector regression. So consider this, you’ve got a line that you’re putting through your data. It’s like a regression line, but it has a tube around it, so it has a distance of Epsilon upwards and distance of Epsilon downwards. Anything that falls into that tube, like virtual tube that we’re imagining around this line, doesn’t count. That’s basically it. So any points that falls into this Epsilon insensitive tube, just simply doesn’t count towards any kind of error that we’re calculating, and then what you do is all the points that are left outside of the line, outside of the tube, you measure their distance vertically up to your tube, not up to the main line itself, but up to the borders of the tube.
I think that’s pretty easy to picture. So we’ve got all these points, you’ve got this tube going through them, anything that falls inside the tube doesn’t count, we ignore it for the most part, and then you look at the points that are outside the tube. You measure the distance to the tube. And there we go, so in a support vector regression there’s a part that you want to minimize, which is to do with the slope of the line. We’re not going to talk about that in this audio. It’s not the main component of the support vector regression, but the main part or the most interesting part to us is that we want to minimize the sum of those distances from external points outside the tube to the tube. Those distances are called slack variables, so we want to minimize not the squared sum or sum of squares or anything like that. We just want to minimize the sum of the slack variables.
Why is it called support vector regression? Because every single point on this plot, including the points that are outside the tube, can be represented as vectors, whether it’s a two-dimensional space or a 27-dimensional space, every point is a vector. So basically your tube, if you think about it, imagine that tube, all the points that are inside are grayed out because they’re not participating in this whole calculation, but the points that are outside the tube, those points, imagine vectors going to them from the center of your plot, from your zero-zero, coordinate. Those vectors are effectively what dictate how our support vector regression looks, what kind of tube we’re actually seeing. So we’re selecting all the tubes, out of all the tubes that we could possibly draw with the height of Epsilon, we are selecting the one that will have this one other part that we minimize. Again, we’re not talking about that here, that slope of the tube, but the main thing is that we want to minimize those slack variables, so the distances to the parts that are left outside the tube are minimal.
That’s why it’s called support vector regression, because those vectors effectively are supporting this regression. Nothing else matters. So the points inside don’t contribute to the [inaudible 00:05:47] slack variables. So there we go. That’s what a support vector regression is in a nutshell. Hopefully that was a good enough explanation. The easiest way to think it is instead of putting a line through your data, you’re putting tube through your data, anything inside that tube doesn’t count and for the points that are left outside you’re calculating the distance to the tube, and you want to minimize that distance.
Why is that good? What applications does it have? Well, it gives your datasets, it gives your underlying data some flexibility. It gives it some room to wiggle around. There’s some wriggle room, meaning that you might have some error that you want, you know is inherent in your data, it’s normal. For instance, look at financial data, right? If there’s a trend going upwards, it doesn’t mean it’s always going up. We know in advance, there’s going to be fluctuations. It’s going to go up a bit. It’s going to go down a bit throughout the course of a day or a week or a month and so on, but we want to see the overall trend. So we don’t want that up and down, we want to have a margin of error that we want to allow the data to have, and that not to impact our trend line in the end.
So that’s where you would use a support vector regression instead of a linear regression, because the support vector regression allows for that margin of error, allows for that slight flexibility. It’s Epsilon insensitive, so your data has an Epsilon of going up or down, and that can be very useful. So, financial analysis is just one example of an application of an SVR regression. You can think of plenty of examples, like if you have machinery and the measurements that you’re taking can go up or down a little bit, slightly a little bit, or heart rates or something like that. Basically it depends. It really depends on the business knowledge, the main knowledge that you have about the data. You might choose to use support vector regression. So there we go, hopefully I did a good enough job.
If you’re a part of the Machine Learning A-Z course, and it’s highly likely because right now that course has 639,909 students just on Udemy alone, plus, of course, students in SuperDataScience.
If you’re a part of that course, those lectures are updated. Check them out. Really, I had a lot of fun creating the SVR intuition plus I updated also the SVR kernel intuition, and that goes into three dimensions and multiple dimensions comes back. That’s really fun as well. Also, Hadelin is updating all, like this is crazy, he’s spending months updating all of the practical tutorials from scratch. He’s doing them in Google CoLab. So if you want to get the brand new, even if you’ve done that course, if you’ve completed, do it again. It’s going to be a lot of fun. You’re going to learn how to do it in Google CoLab and code in Python, and really smash through everything. So we’re doing a massive cleanup of the course, updating everything. If you’re not part of this course, very easy to find, just go to udemy.com/machinelearning, one word, if you want to buy just that course.
If you’re a part of SuperDataScience, that course is already included in your subscription, check it out on the SuperDataScience platform, and, of course, you can always join SuperDataScience. Plus in addition, if none of that works for you, if you don’t want to be part of this course, no problem. I made this specific video in the course on Udemy free for preview. So even if you just go to udemy.com/machinelearning and you scroll down, I don’t remember exactly the number of that video. I can probably look it up right now, but you scroll down quite a bit through the course. It’s in part two. It’s in the fourth section of part two. It’s the first lecture there. You can watch this video absolutely free. You don’t have to sign up, just get a free preview to reinforce what we learned today.
So, that’s that. Hopefully this was a bit of added value to your data science toolkit to your machine learning toolkit, and I thank you for being here today. I look forward to you next time. Until then, happy analyzing.