This is FiveMinuteFriday, episode number 266, Exploration vs Exploitation.
Welcome back to the SuperDataScience podcast ladies and gentlemen, super excited to have you back here on the show. It’s super windy and that’s because I’m at the top of mount Batur, in Bali, it’s 5, I think it is like 5:50 AM. And I’m up here, it’s 5:43 AM. I’m here with some friends for the sunrise, the sun is, you can barely see it on the horizon behind Lombok, behind the mountains in Lombok and it’s getting beautiful. It’s just amazing. So it was about almost two hour hike to get here. Very excited. You might know from previous episodes how much I like hiking. So I thought I’ll just start this recording from here and probably I will continue it, not in such a windy place. So yeah, I just wanted to share this bid with you and will continue in a second.
Alright. And we’re back. So got off the mountain, and a few days later now I’m sitting in a studio. Well semi studio, Skype call phone booth at one of the co-working places at the Dojo in Canggu. And yeah, those are really cool hike. Hope you got a little bit maybe of a feel for that energy at the start of the audio. And here we’re going to continue with the FiveMinuteFriday episode and the topic is exploration versus exploitation.
So what does that mean? Well, what I’ve noticed is that when we’re building artificial intelligence, we try to mimic the human brain. We try to recreate a neural network, and kind of learn from the way that humans learn or from the way that humans make decisions and do things. Well, there’s at least one thing that I’ve noticed that we can actually learn the other way around. Something we can learn from neural networks and from artificial intelligence itself and integrate more into our life. And that is the concept of exploration and exploitation. So the whole field of reinforcement learning, especially online reinforcement learning is built around the concept of balancing out exploration and exploitation.
And what does it mean? Well, what it means is that if you are, by the way, if you’ve done our machine learning A-Z course, we discuss this in quite a bit of detail in the upper confidence bound algorithm or Thompson sampling. But that also applies to other types of reinforcement learning algorithms, which we talk about in the artificial intelligence A-Z course. So basically if you are building a reinforcement learning algorithm, it has to, which is working online basically meaning that data is coming in. For instance, let’s say it’s a reinforcement learning algorithm that is optimizing advertising for a website. So you have five different types of ads and they need to be displayed to users. And so basically every time a user clicks on a page or ends on the page, it has to pick which ads to display to the user in this case.
And what it can do is it can basically, you can get some data. So it needs to build up some data because at the start it has no idea which ads, which out of the five ads performs the best. So it has no idea. So it can gather some data around the ad. So like basically by exploring, by trying out these ads and then you’ll find that for instance, ad number two performs quite well. Well at that point it can continue using ad number two and exploit these insights that ad number two seems to be the best. And it can continue exploiting that and getting the higher conversion rate.
But the thing is it cannot know that stage for sure. Because what if ad number two, what if that is just a fluke? What if that we’ve just found we don’t have enough data yet to sufficiently tell. What if that’s like a sampling error, what if actually ad number two isn’t the best ad out there. That we haven’t done enough exploration and we need to check the other four ads more in order to determine which one is truly the best. And then in that case we need to do more exploration. And so in that sense, there’s a balance. What do we do? What do we focus on? Do we focus on exploring these options and therefore will have an opportunity cost because we’re not choosing the best ad, right. One of them is the best ad. And because we’re not using the best ad, we’re not displaying the best ad to our users. There’s an opportunity cost of exploring. On the other hand, if we’re exploiting something that we think is the best, there’s a risk that it’s actually not the best and we’re going down the wrong path.
And there’s other things out there for us in this case, four other ads that might be performing better and we just haven’t spent enough time exploring them. So therefore there is a balance and it’s quite an interesting problem to solve. And the differences between different reinforcement learning algorithms, for instance, upper confidence balance versus the Thompson sampling algorithm is how writes their balances in that sense or which one is better at balancing those two things out. And therefore gets better results. And of course, the other learning algorithms also need to look into that problem and address it in their own ways. And what I mean by us being able to copy that from artificial intelligence or integrate more into life is that we can actually, we as humans, we tend to fall into patterns. Once we see that something is working for us, we tend to just repeat that and not go out of our comfort zone in order to explore new things.
For example, you might be in a pattern of buying, of driving, of taking the same route to work, you know, driving the same way or walking the same through the same roads and so on. Or you might be in the pattern of going on holiday in the same places. You might be in the pattern of buying the same groceries to cook the same dishes. You might be in a pattern of where you have your coffee or tea during your lunch break or how you spend your lunch break. You might be in the pattern of the type of music that you listen to. So you found basically, often what happens is as humans, we find a local extrema something that works really well. And if you could shift a little bit to the left a little bit to the right, it doesn’t feel as good and therefore we think this is the best option for us. Full Stop.
But at the same time, that might not be the case. And if you spend a bit more time and effort on exploration, you might get better results. So for instance, you might feel that the job you’re in is the best option for you. And you’ve tried a little bit to the left, a little bit to the right and that’s doesn’t really work for you. But maybe there can be more exploration that we put into, into effect there. It goes pretty much for anything in life. But the interesting thing I find is that it’s not about just blindly jumping into exploration and replacing all of your habits with just random tries of different new things. No, there’s actually a balance. And that’s what I like about it that you got to figure out “All right, where’s the balance for me in this specific aspect of my life in terms of exploration versus exploitation”.
So I’ll give you an example. I’m in Canggu in Bali right now, a city which I really like. I can’t even, I don’t think you can call it a city. It’s more of a, a town or you know, a village / town. A very, very nice place, great energy here. And that’s why I come here because of the energy. And here is lots of things to do for any taste and habits and whatever you want. You know, you can go surfing, you can work, there’s a great co-working place here. You can go do yoga, there’s lots of yoga places, you can do meditation. You can party. There’s a, there’s like whole parts of the city town where there are people just partying. You can eat a lot of really great food. You can drink a lot if you want. You can network. There’s lots of different people here, interesting entrepreneurs and people working away or people on holidays. You can stay in your hotel, you can live in an expensive hotel, you can live in an AirBnB, you can live in a homestay, there’s lots of variety here.
And when I was here last year and what I did last year was I predominantly worked at the Dojo and I went, which is a co-working place. And then I went to the yoga studio called The Practice and Carl Massy, the founder and one of the founders of The Practice. He was on the podcast. If you go to www.superdatascience.com/podcast you can search for his episode. I don’t remember off the top of my head, but really great episode about being happy. And he wrote the book, The Guidebook to Happiness. Really cool. Really cool guy. Really cool place to practice.
And so when I came here, I already had this routine in mind, already had this, I already had in mind, how I’m going to exploit what I’ve already found last year, what I found last year. And I was going to, you know, go to Shady Shack, which is a place where you can get really good vegan food, really, really nice place for lunch or maybe or dinner, have it by the beach and then go to The Practice and go to the Dojo. But things didn’t turn out exactly that way. And one of the reasons was at The Practice of the Yoga Studio, they no longer have Yin Yoga, which I was a big fan of, and I was actually, also open to exploring. So I was thinking to myself, what else can I do?
You know, let’s not get into this whole pattern. And I’m actually glad things worked out this way because the other thing I was looking forward to doing is doing a bit of Calisthenics where you use your own body weight to train your muscles. And through searching for calisthenics, I found this really cool other gym, which is called Nirvana Strength. And that’s a world class gym for Olympic gymnastics. So you’ve got the rings, you’ve got the stall bars, you got maths, you’ve got everything there and it’s in Canggu. It’s just like so crazy. Not, not every city, like even in Australia, I don’t know of even one gym like that that I can, I’ve heard of seeing and there’s like one massive one here in Canggu and they’ve got really cool sauna, cold pool and so on. And so I thought to myself, well, let me try that out.
And I liked it so much that I’m, instead of going to The Practice every day of the week as I was doing last year, this year, I’m going to Nirvana Strength every day of the year. And there’s no right or wrong, I’m not saying one is better than the other, it’s just that in, for me right now, Nirvana Strength works better. And what I’m learning there, how to do pushups property, how to do pull ups and whatever else. That’s really exciting and interesting. And had I not been open to exploration, I would have not found that, I would have been an old pattern. And my life would have been different. Of course I would have learned things and had a great time probably as well. But I’m glad I was open to the exploration because that added something new to my life.
At the same time, it’s a balance. Right? So at the same time I’m balancing out, I’m not exploring crazily around everything. For instance, I’m still, you know, having several meals a week at Shady Shack because I know that’s a great place. I’m exploring some other places to have food, but I am balancing that out with exploiting the places I already know. And also the area that I am mostly in of Canggu cause is quite spread out. I could be in a different area. But I know I like this area so I’m exploiting being in this area. So basically the moral here is that in life we can fall into patterns and as humans we tend to fall to paterns of exploitation and we’re not open enough to exploration. It doesn’t mean that we need to go crazy about exploration and just completely forget about exploiting the useful things that we’ve found, the useful patterns that we’ve found. It’s just a matter of balance.
And so my call to you this weekend is how to think about where do you, where are you in a pattern, where are you like over exploiting things in your life when you could actually be more open to exploration and wouldn’t, it wouldn’t be too risky, it wouldn’t be too much out of your comfort zone that it’s like moving to a new country. It’s okay for some people. For some people that might be a bit too much as a first step, something maybe small that you can change your exploitation for exploration and, but you have to be kind of aware and prepared that there might be short term losses or short term, not necessarily financial losses, but what I’m talking about like short term sacrifices that you might have to do, undergo in order to have that exploration because any kind of exploration there can be a hit and miss.
You might get better results, maybe worst results. You might enjoy things more, enjoy things less. But you know, that’s a process. That’s a journey. So where in your life are you prepared to make a little bit of a sacrifice? Take a bit of risk in order to, but in order to facilitate some exploration and potentially find something new for yourself that you might enjoy as well, or might enjoy more.
So there you go. That’s the balance between exploration and exploitation and how does it apply to your life. Thanks so much for being here today. I look forward seeing you back here next time. Until then, happy analyzing.