Podcasts SDS 293: True Personalization Through Reinforcement Learning

61 minutes
Artificial Intelligence, Data Science, Machine Learning

SDS 293: True Personalization Through Reinforcement Learning

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

This was an interesting talk about reinforcement learning and how it may be the future of creating personalization. Peyman will be talking more about this at DataScienceGO this fall.

About Peyman Hesami

Peyman has an extensive knowledge in design, development, and testing of machine learning projects from idea to scalable production deployment. He is a former data scientists at Qualcomm and has 8 years of working on developing products that use real-time device-based machine learning on the mobile devices and networks. Currently, Peyman is focused on reinventing the credit underwriting system through the use of big data to make credit more available and affordable.

Overview

Peyman Hesami, instructor and data scientist, is one of our many panelists heading to DataScienceGO. He’s bringing a more advanced technical talk to the conference to assist those who are less a novice and deeper in their career. His talk may feature heavily on reinforcement learning, which is something he’s been focused on to personalize electronic devices to individual users and their environment.

What is reinforcement learning? Peyman describes it as training a puppy: punish it for doing something bad and reward it for doing something good. In that environment of conditioning, the agent will learn the correct outcomes through exploration and adaptation as it strives to achieves the correct outcomes. It’s powerful and requires less data than you might imagine. Most companies would look at personalization through the method of classification, but this allows a more targeted approach instead of a recommendation system. Netflix, Hulu, Amazon, and other recommendation systems aren’t truly personalization, which is what Peyman is trying to achieve. You have a limited number of settings and variables you can change in response to your environment. Personalization would go beyond this.

One thing I’m starting to learn is that reinforcement learning just might be our future. You don’t require tons of labeled data, the model adjustments are automatic. Model deterioration is a huge and the goal orientation of reinforcement learning helps keep the model fluid to continue to achieve the goal. There are things that can’t be accomplished by supervised learning without billions in spend. In most use cases across industry, the agents trained in reinforcement learning perform better.

An example of mapping a problem under reinforcement learning, Peyman describes adjusting a speaker to the environment. We want the speaker to be super smart and be able to take input from the light level, the time of the day, and the environment to decide what the optimal level is. So, what’s punishable? If the volume goes up when you need it low, that’s punishable. You have to get detailed about what is punishable though and at what level it’s punishable vs. ignoring the setting. Peyman says this kind of mapping is the most difficult to achieve. It requires patience. It requires the agent to explore, which Peyman calls the “puppy period”. This learning period is crucial.

As for those who want to get started in this? Peyman recommends exploring online workshops, of which there are many. Learn the theory very well before you jump into building a model. Knowing the theory will assist in optimization and adapting during the training process.

In this episode you will learn:

Peyman’s decision to explore in his career [6:51]
Peyman at DSGO [11:10]
Reinforcement learning [15:44]
Mapping problems [39:45]
How does someone start in reinforcement learning? [49:00]
Peyman’s biggest reinforcement learning mistake [52:47]

Items mentioned in this podcast:

DataScienceGO
SuperDataScience
Zero to One: Notes on Startups, or How to Build the Future by Peter Thiel and Blake Masters

Follow Peyman

Episode Transcript

Download The Transcript

Podcast Transcript

Kirill Eremenko: This is episode 293 with Data Scientist, Peyman Hesami.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur, and each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Kirill Eremenko: This episode is brought to you by our very own Data Science Conference, DataScienceGO 2019. There are plenty of data science conferences out there; DataScienceGO is not your ordinary data science event. This is a conference dedicated to career advancement. We have three days of immersive talks, panels and training sessions designed to teach, inspire and guide you. There’s three separate career tracks involved; so whether you’re a beginner, a practitioner, or a manager, you can find a career track for you and select the right talks to advance your career. We’re expecting 40 speakers, that’s 4-0, 40 speakers to join us for DataScienceGO 2019. Just to give you a taste of what to expect, here are some of the speakers that we had in the previous years: Creator of Makeover Monday, Andy Kriebel; AI thought leader, Ben Taylor; Data Science Influencer, Randy Lao; Data Science Mentor, Kristen Kehrer; Founder of Visual Cinnamon, Nadieh Bremer; Technology Futurist, Pablos Holman, and many, many more.

Kirill Eremenko: This year, we will have over 800 attendees, from beginners to data scientists, to managers and leaders, so there will be plenty of networking opportunities with our attendees and speakers, and you don’t want to miss out on that; that’s the best way to grow your data science network and grow your career. As a bonus, there will be a track for executives. So, if you’re an executive listening to this, check this out. Last year at DataScienceGO X, which is our special track for executives, we had key business decision-makers from Ellie Mae, Levi Strauss, Dell, Red Bull, and more. So, whether you’re a beginner, practitioner, manager or executive, DataScienceGO is for you. DataScienceGO is happening on the 27, 28, 29th of September 2019 in San Diego. Don’t miss out. You can get your tickets at www.datasciencego.com. I would personally love to see you there, network with you, and help inspire your career or progress your business into the space of data science. Once again, the website is: www.datasciencego.com, and I’ll see you there.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen. Super excited to have you back here on the show. Today, I’ve got a very interesting guest with me, Peyman Hesami. Peyman is a data scientist who used to work for Qualcomm, and just recently changed his career to work for a startup in the Los Angeles area. A very exciting time for Peyman. He’s quite an advanced data scientist, and here are a couple of things that we talked about in this podcast. Today, we focused on predominantly reinforcement learning. So, if you’re not up to speed with reinforcement learning, then this is the best place for you because today you’ll find out what reinforcement learning is and how it works on an intuitive level.

Kirill Eremenko: You’ll also find out the differences between reinforcement learning versus classification, or other supervised learning methods. We’ll talk about recommender engines. We’ll talk about reinforcement learning and how it’s used for personalization specifically, what implications that has. You’ll find out a couple of use cases with real numbers; for instance, how Alibaba is using reinforcement learning to accomplish astronomical returns on investment in their advertising. You’ll find out about what role reinforcement learning is going to play in the future of machine learning and why. You’ll find out six advantages. We were able to name six distinct advantages of reinforcement learning, so something very powerful to know. Also, Peyman will share some of his career, how and why he made this transition just recently to work for a startup, how he’s using reinforcement learning, and what is the biggest mistake he has made with reinforcement learning. So, quite a lot of interesting things coming up in this exciting podcast.

Kirill Eremenko: Also, something to note is that Peyman is going to be one of our speakers at DataScienceGO 2019. So, if you haven’t grabbed your tickets yet, you can find it at www.datasciencego.com. Make sure not to miss out on that, and you can meet Peyman in person there. On that note, without further ado, I bring to you data scientist and expert in reinforcement learning, Peyman Hesami.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen; super excited to have you back here on the show. Today, we have a very interesting guest, Peyman Hesami calling in from Las Angeles. Peyman, how are you doing today?

Peyman Hesami: Good. How about yourself?

Kirill Eremenko: Going great as well. Very good timing. For me it’s like 9:00 a.m.. For you, it’s almost 4:00 p.m. right now?

Peyman Hesami: 4:00 p.m., yeah.

Kirill Eremenko: I like it because it’s usually it’s either I’m very early in the morning, or the guest is really late in the evening, and this one works out quite well I think.

Peyman Hesami: Yeah. So far, so good.

Kirill Eremenko: Yeah. Well, first of all, congratulations on the new job. You just started off at a startup in LA, and it’s been your first week?

Peyman Hesami: Yeah.

Kirill Eremenko: It must have been a hectic week. How’s that going?

Peyman Hesami: It’s good. I’ve been working for Qualcomm for over the past seven years, and finally I decided it’s time. I’ve been trying to explore the startup scene for a while, and I found a good opportunity out here in LA, and I’ll be doing data science for a startup.

Kirill Eremenko: Fantastic. Congratulations. You said you finally decided it’s time. You stayed at Qualcomm for seven years; I’m assuming it was fantastic. You probably learned a lot and grew, and you wouldn’t stay there for seven years otherwise. But when does somebody make the decision that it’s time to leave somewhere where I’ve felt comfortable, felt at home, and to go explore something new?

Peyman Hesami: For me personally, the major driving force for me was being close to the product. Having that product ownership feeling has been always important to me. You can have a similar experience at a big company; but I believe once you enter smaller environments, you can get very close to the product, you can have more and more meaningful impact on that product, and that’s been the driving force for me to make this decision.

Kirill Eremenko: You wanted to see, or you needed to see, more clearly the direct impact you have on this product. It’s a need for contribution that you have, that you want to make people’s lives better, and you actually want to see how you are able to do that, right?

Peyman Hesami: Yes. And I think it might be true not only for me; for many other people. Once you see the impact that you have made clearly and very closely, that’s going to then make a big feedback loop that drives you further, so that’s been one of the reasons that I’ve been looking for an opportunity like this.

Kirill Eremenko: Was it scary to leave Qualcomm? Because Qualcomm is a very big company in San Diego, right?

Peyman Hesami: It is. I think it’s probably the largest one in Southern California.

Kirill Eremenko: Wow.

Peyman Hesami: My friends were asking me the same questions. I don’t know if it was scary or not, but it was definitely a hard decision, given seven years of living and working at one place; but I think what’s waiting for me, or what I have been expecting that is waiting for me, have been basically something much, much more exciting. So, I think that’s been helping me to overcome any fear, if I have had it in the back of my mind.

Kirill Eremenko: I have huge respect for you because jumping into the unknown is like jumping off a bridge on a bungee jump, like when you’re tied to the bungee jump. You kind of know it’s going to be safe, but it’s still so scary.

Peyman Hesami: Exactly, and I do love bungee jumping.

Kirill Eremenko: Me, too. That’s so cool. Where have you been bungee jumping?

Peyman Hesami: In LA there is a couple of places here. I’ve done it once. I wanted to do it once in Mexico, but I was a little scared there.

Kirill Eremenko: Wow. Yeah, me too. I love bungee jumping. I did the first one in Greece. It was only 60 meters, but I fell in love with it so much that I did it seven more times in the span of two days. It was so much fun.

Peyman Hesami: It is fun.

Kirill Eremenko: So, basically huge respect for you. I have a feeling a lot of people listening to this right now are in a similar position, in a similar situation. They’ve been in a job, they’re comfortable, but they are maybe unhappy for other reasons, or they’re ready for their next step; but taking this jump is horrifying, and it’s really stopping them from moving forward. How does it feel? Now that you’ve taken the jump, now that you have this job, you’ve done a week at this new job, how do you feel?

Peyman Hesami: As you said, it’s only been a week; but based on the experience so far, I think I’m happy with the decision I made. It’s been what I had been planning for, it’s been what I’ve been looking for, and I think it’s going to be an exciting journey. So far I’m pretty happy.

Kirill Eremenko: Awesome. That’s very cool. Very cool.

Kirill Eremenko: Peyman, super pumped to have you on the podcast, and one of the reasons is because this will give our audience to get to know you a little bit better before DataScienceGO, because you’re going to be coming to DataScienceGO this year as one of the panelists; and hopefully, once you finalize some things about your talk, as a speaker as well, and that’s super exciting. That’s end of September, if somebody doesn’t know who’s listening to this podcast. End of September we’ve got DataScienceGO happening.

Kirill Eremenko: So, how do you feel talking at DataScienceGO this year?

Peyman Hesami: I’m excited. I haven’t been able to attend the past conferences; but from what I have heard, it’s a very exciting, interesting conference that is more focused on building career and building that network of data science that you need as a junior data scientist, or as an inspiring data scientist when you enter this field. So, I’m pretty excited and looking forward to it.

Kirill Eremenko: Fantastic. At the same time, we also have technical tracks for advanced data scientists. Correct me if I’m wrong, but I believe that’s the expertise that you’re bringing. You bring the more advanced technical side of data science to the conference, is that correct?

Peyman Hesami: Yes. Hopefully, I will be having at least one technical talk on some more advanced machine learning stuff that might be interesting to more advanced data scientists, more experienced data scientists, but I’ll try to make my talk simple enough for everyone to at least understand what is going on.

Kirill Eremenko: What is this talk going to be about? Give us maybe a bit of a preview. What do you think it will be about?

Peyman Hesami: As of now, I’m still deciding, but it could be related to reinforcement learning, and specifically personalization using reinforcement learning; that’s been a topic that’s kind of new, and I have been recently over the last couple years been working on. The details of it I’m still finalizing, but it will most likely be something around reinforcement learning.

Kirill Eremenko: That’s really cool. I really like that idea. You know why? Because I’ve been doing some research actually for… we’re recording this course for executives called, “Artificial Intelligence for Executives,” and one of the sub-topics is reinforcement learning, and so I need to come up with a way to explain how reinforcement learning works, and what value, the most important thing for executives. What you’re doing is more of a technical topic for advanced data scientists, how it works and all the mechanics behind it, and the specifics actually. For this other course that we’re doing, I’ve got to explain it on an executive level. And so I’ve been researching, what executives care about mostly is the bottom line, the return on investment, the profitability of a business, like what impact can it have on a business; I’ve been researching this quite a bit, and one of the examples that I found, and this is pretty crazy… Do you know Alibaba as a company in China like eBay?

Peyman Hesami: Yeah.

Kirill Eremenko: They used reinforcement learning in advertising for displaying ads and bidding; and they were able to develop, as you said, a very personalized system through reinforcement learning that would display as based on people’s preferences, context, the ad itself, and all these other things that they know about their users. They increase their return on investment by 240% without increasing the advertising budget, just through reinforcement learning: 240% extra ROI. How crazy is that?

Peyman Hesami: That’s amazing.

Kirill Eremenko: This is a big company. This is not just some research thing. This is Alibaba doing it, one of the biggest retail companies in Asia probably servicing the world now. So, I’m pretty excited about this.

Kirill Eremenko: How do you know about reinforcement learning? Is this something you used to work on previously?

Peyman Hesami: Yes. I have been working on specifically personalization; you can use many, many different techniques to achieve some type of personalization. But I have been focused on basically using reinforcement learning to personalize a device, an electronic device, or your mobile phone, or your home speakers personalized to you as a user and your environment; and that’s been a very, very interesting couple years that I have been working on these. I believe it’s something that is going to keep going on, and there’s going to be more and more research, and more and more projects, coming out of these types of projects.

Kirill Eremenko: I totally agree with you. I was actually talking to someone on the podcast just recently from Microsoft, one of the top researchers in the space of reinforcement learning; I’m just trying to remember the name here. Okay, yeah. This one was with John Langford, and it was so exciting. Out of the three fields of supervised, unsupervised, and reinforcement learning, reinforcement learning has huge potential simply because it doesn’t require all that labeled data in advance. It doesn’t require a huge volume of data in order to do the training.

Kirill Eremenko: But to get everybody up to speed, in case we have listeners… which is totally fine, and maybe somebody is joining the podcast for the first time, or hasn’t heard of reinforcement learning before… in a nutshell, what is reinforcement learning?

Peyman Hesami: In layman terms, I can describe it as training a puppy. Basically, how you train a puppy is you punish the puppy for doing something bad and you reward the puppy for doing something good; and based on that environment, the puppy will learn to basically do something that will eventually reward the puppy, and this is exactly how a reinforcement learning works. We deploy an agent, or in our example a puppy, in an environment; and we let that agent, or the puppy, learn on the environment. So, what it does is that agent, or this puppy, will start exploring the environment in a random or a nonrandom fashion; but as this agent or puppy faces different scenarios, it will learn how to act to maximize the rewards that it’s getting at the end. Based on this concept, toward the end, let’s say after six months or after one year of training this puppy, or in our case after training this agent for a couple hours, the agent or the puppy will learn how to operate from that point on.

Peyman Hesami: If the environment changes around it… for example, the puppy changes owners and the behavior or the environment changes… that agent will learn how to adapt to the environment and behave differently. All of these, as you mentioned, doesn’t required any labeled data. All you need to do is to define an environment and define what is good and what is bad, what is punishable and what is rewardable. It’s not straightforward, but it’s pretty powerful.

Kirill Eremenko: Got you. That’s a great summary. One of the things that is interesting is that you’re doing this personalization through reinforcement learning whereas most people and companies are familiar with personalization through classification.

Peyman Hesami: Yeah.

Kirill Eremenko: So, if you take Netflix or if you take Amazon, of course we don’t know the details of how exactly their recommender systems work, but one of the key drivers there would be classification because you need to classify certain videos or products into different categories. Then, based on prior experience, so based on that labeled data of how users previously would experience different products that you offer, in what sequence and how there would be interrelations, combing it with the classification you can come up with a recommender system, or something like that. But this approach, from a reinforcement learning perspective, is entirely different where you’re allowing the agents or the artificial intelligence on its own to extract any kind of features that would be relevant to this unique individual, and that allows a more targeted approach. Is that about a good summary of what the differences are?

Peyman Hesami: That’s very accurate. As you said, Netflix, Amazon, Hulu, they have a recommendation engine that is based on let’s say a collaborative filtering algorithm, and what it does is it gives you a bunch of recommendations, but personalization doesn’t always mean recommendations; it’s a little deeper than that. Personalization, or true personalization, should be automatic and implicit. In the case of Netflix, if that was a truly personalized experience, the Netflix would have picked your next movie and you would have been pretty happy with that choice, but this is exactly what I’ve worked on over the last few years. For example, an example of that could be a device; for example, your TV, your speaker, your phone, your alarm. These devices usually only expose a limited number of settings for you to change, and you change those nobs to adapt to your environment; for example, you will lower your TV volume because your girlfriend is sleeping, and this is basically the way you adapt to your environment; but there are many, many other settings that these devices have that are not exposed to the user, but the user can benefit from changing those and adapting it to the specific environments they’re in, only if they knew how and what to change. Those settings are usually very complex technical settings that you need an engineer to basically tune that for you, for your environment.

Kirill Eremenko: Like what, for example? What’s an example of setting?

Peyman Hesami: For example, your speaker has many nobs to change the waveform of the sound that it generated for you, but all you have active is probably a bass and volume on your speakers. All right?

Kirill Eremenko: Yeah.

Peyman Hesami: There is a reason behind it, because those are not user friendly settings for you. Also, some of those settings don’t need to be changed at all, but many of those settings can be changed to improve the user experience at these specific situations. So, we have this opportunity, and I think the OEM space are also going in this direction, that for any device that you can think of you can come up with a way to automatically and implicitly adapt these settings to each user’s environment.

Kirill Eremenko: Just to clarify, OEM is “original electronics manufacturer”?

Peyman Hesami: Yes. We can basically achieve this using machine learning; and specifically, as you mentioned, using reinforcement learning. So, what the reinforcement learning model does, they learn the optimal value for each of these settings for you at a user level, and basically the device setting will adapt to each specific user environment. If I have a speaker, for example, in my bedroom, I will have a set of settings and yours will have a different set of settings.

Peyman Hesami: One other example of it: for example, if you consider any portable device, the battery life of that portable device can be significantly improved if I know the pattern of your usage; the specific user pattern of usage will have a big impact of, “How do I optimize my battery to basically improve its battery life?” Also, it could be as simple as your bedroom clock learning the pattern of your sleep and auto-setting your alarm for the next; this could be a very simple example of this. I think the work will get very close to that sci-fi type future living with robots where everything is reinforcement learned, and we will have eventually a truly personalized life; but I believe this type of device personalization is a tiny step towards that goal, and it’s already happening. We see it in different devices that comes onto market.

Kirill Eremenko: Got you. Those are some really cool examples. I’ve got another one to add to that. There’s a research group, Electa, it’s I think somewhere in Europe, and so what they did is they applied reinforcement learning to see if they can… similar to what you said, and this is why I thought of the example, the example you gave with portable device, the energy consumption. So, here they used reinforcement learning to see if they can reduce energy consumption for water heating… I think it was somewhere in one of the Scandinavian countries, or somewhere Northern Europe… where basically energy, electricity or gas, needs to be used to heat up water, and obviously you can keep the water hot through the whole day to make sure whenever somebody switches on the tap they have hot water. But if you take into account the patterns of when people are at home, when people are using water, when people are asleep, when people are not using water, and so on, you can reduce energy consumption because you don’t need to have hot water available when it is not required.

Kirill Eremenko: So, they deployed reinforcement learning on a set of 32 houses, and they were able to reduce the consumption by about 20% with no loss of comfort reported by the occupants. Basically, people didn’t even notice that they have less, that hot water levels are lower or being varied; they were getting the same experience on the end user. But the energy-saving, like 20% less energy: imagine what that means on a scale of a whole country, or let’s say the US, if that is deployed in the whole US. That’s massive electrical saving, especially in more northern countries like Canada or Scandinavian countries or Russia, for example. Huge savings in terms of electricity, in terms of energy, and that’s great for the environment as well. So, there’s applications like that.

Peyman Hesami: Yeah. That’s exactly what I meant by the sci-fi type future living where everything will be reinforcement learned. If you think of almost any example in your life, you can apply some type of these concepts and make your life better.

Kirill Eremenko: That’s right. Right away what pops to mind is how much food do we waste? Do we really need to buy all the groceries we buy? Or restaurants, how much food do they throw away, or grocery stores? That can be also one.

Peyman Hesami: Exactly.

Kirill Eremenko: There’s lots of examples. Through conversations on this podcast, and especially if listeners have been listening to the past few podcasts I’ve had… for instance, with John Langford, for example… there’s lots and lots of examples like this; and through my conversations on this podcast, I’m coming more and more to the conclusion that reinforcement learning is actually the future of machine learning and AI. Do you agree with that or do you have a different opinion?

Peyman Hesami: I don’t know if it will dominate the machine learning space, but I’m pretty sure it will be one of the major driving forces of machine learning going forward.

Kirill Eremenko: Got you. What is the main reason behind that? What’s a main advantage of reinforcement learning, in your opinion?

Peyman Hesami: One of the obvious ones, as you mentioned, is you don’t require tons of labeled data. As you can see, many problems in deep learning space… and the space of computer vision, for example… is tied to a lack of high quality correctly labeled data.

Kirill Eremenko: Yeah.

Peyman Hesami: And that’s something that is very expensive. We have basically companies or people that manually label those photos to be used by a machine learning algorithm, which is a funny concept, given that we basically have a very complicated ecosystem of machine learning and we cannot use it to [inaudible 00:30:07].

Kirill Eremenko: I know.

Peyman Hesami: That’s a problem.

Kirill Eremenko: It reminds me of the Flintstones. Remember, they have a car, but they still have to use their legs to move the car.

Peyman Hesami: Exactly, yes, and I think that’s one of the driving forces.

Peyman Hesami: The other thing is basically the power that reinforcement learning has to adapt to its own environment. So, right now if you have a machine learning algorithm, a machine learning model trained in production, you need to constantly monitor the performance of this model, that it doesn’t degrade in production; and once it does, and 100% of the time it does, you need to take care of it, and you need to take it off and retrain using new data, using a new algorithm, and you need to keep doing this constantly to basically take care of it while, and [inaudible 00:31:06] reinforcement learning this is basically the automatic.

Kirill Eremenko: Yeah.

Peyman Hesami: The model will adjust to its own environment, and I think that’s pretty powerful.

Kirill Eremenko: That’s huge. That’s a massive thing. Model deterioration, model maintenance, some companies don’t do that, but the mistake [inaudible 00:31:26]. I’ve seen models deteriorate to a level, from a 70% or 80% accuracy to a 48% accuracy, where it’s worse than flipping a coin. It’s ridiculous.

Peyman Hesami: Right.

Kirill Eremenko: So, that’s a really big one, it’s good that you brought that up, and I think the reason behind that is that reinforcement learning is goal oriented unlike supervised learning.

Peyman Hesami: Sure.

Kirill Eremenko: Where you’re like, “Okay, here’s the labeled data. All right. I want you to copy that.” It’s a copy-paste type of solution. Whereas reinforcement learning, “Here’s a goal. I want you to always accomplish this goal,” I don’t know, 200% ROI, or 20% conversion rate of users, or whatever else, 30% reduction in heat loss, or whatever it is. So, you can set a goal, and then the system itself will find a way, as you say, to adapt. It’s a very, very powerful that it’s adapting.

Peyman Hesami: Right. It’s exactly like an optimization problem versus… In the case of supervised learning, you have an optimization problem that you’re basically learning whatever you’re feeding your model.

Kirill Eremenko: Yeah. In that sense, this brings up another benefit or another advantage; a third one is that reinforcement learning is more innovative. Supervised learning can only learn, as you just said, what you’re teaching it, great; but what if you want to come up with new solutions, with new designs? What if you want to designed a new type of airplane, or a new type of… Recently in Boeings they deployed this section delimiter between business and economy, or between different parts of the economy class; that part of a plane that limits these sections, which is I think 30% lighter or massively lighter than what humans were designing, was completely designed by artificial intelligence, and you cannot accomplish that through supervised learning.

Peyman Hesami: Absolutely not. Unless you spend billions of dollars for generating the data, they will eventually have a similar performance on a supervised learning schema.

Kirill Eremenko: Yeah. What about this? Maybe you can give me some input on this. I have this thought that because reinforcement learning doesn’t require this supervised data, it is much more bias resistant. So, sometimes we come across a problem, for instance in classification, and artificial intelligence might be exhibiting sexism or racism, right?

Peyman Hesami: Sure.

Kirill Eremenko: For instance, it might be giving out fines to minority groups more frequently than non-minority groups, and things like that. If it’s AI designed for that, or different AIs that deal with people, might be exhibiting racism, but the problem there isn’t that the AI is racist; it’s that it’s taking the bias, it’s inheriting the bias, from the data that it’s learning through the supervised learning process; it’s inheriting the bias, whereas reinforcement learning doesn’t have that problem, in my view. I’d love to get your opinion on this. Reinforcement learning learns on its own; therefore, it wouldn’t be able to pick up that human bias from the supervised data. What are your thoughts on that?

Peyman Hesami: True. Basically, if the machine learning algorithm, or this AI is racist, it’s because you taught it to be racist, it’s because of your data, and that’s a whole different field basically that right now companies different… even in research, they’re trying to come up with a way to make a model fair; after you trained it, how to take care of this prediction in a way that is not biased toward the population. Or sometimes you go back and you play with your data and you try to remove the bias in the data itself, and that’s very true.

Peyman Hesami: A reinforcement learning model can also get stuck in an environment and only learn a very limited behavior, but you can basically easily fix that by expanding the world, or expanding the environment that your agent is going to see. So, the more environment you give or you put your agents on, the better it will learn the environment; but as you said, it doesn’t have that element of bias built into it.

Peyman Hesami: Also, the reinforcement learning itself can be used on top of other supervised or unsupervised learning models to make them better as well, so that’s one other thing; for example, if you have a trained supervised model in production, instead of monitoring the performance of your model and making sure it’s stable… and if it’s not, try to retrain and change parameters… you can basically have a reinforcement learned model on top of it to basically learn the optimal setup parameters that in this case could be the hyper-parameters of your model, to basically learn that on a daily basis or a weekly basis, and adjust your model as well to that specific environment.

Kirill Eremenko: That’s really cool.

Peyman Hesami: Yeah. It’s basically you can use it even in conjunction with your supervised learning model; it still helps in those cases to remove bias, to make your model more stable and more fair.

Kirill Eremenko: Wow, those are some really cool examples, and using [inaudible 00:37:37], it’s next level.

Peyman Hesami: Uh-huh (affirmative).

Kirill Eremenko: What about online learning? That’s another big advantage of reinforcement learning, that you don’t have to first train the model and spend time doing that, and then only deploy it; but you can deploy it, and it can combine exploration and exploitation to get your results faster.

Peyman Hesami: Right. That’s one of the main benefits of reinforcement learning algorithm. You can have a pre-training phase to come up with a default set of behavior, but you can even skip that; you can just deploy your model in the world and let it learn by itself. Yes, it might take some time for the agent to train well to that environment; but most of the use cases we deal with, even in industry, you have that privilege of having your agent to train itself in an environment for a while before deploying it to the real world and production; as you mentioned, that’s basically an amazing feature of reinforcement learning, basically continuously learning on its own, even if your environment is completely different than what it was when it was trained on it.

Kirill Eremenko: Yeah. That’s a huge advantage. Again, adaptability and the whole online feature. Is reinforcement learning hard? It sounds like it has massive benefits, massive advantages, for a business to implement reinforcement learning, or more important for a data scientist to master reinforcement learning to the level where they can add value to the organizations that employ them. What does it take?

Peyman Hesami: I believe that if you’re talking about its learning, it’s hard to learn, I would say maybe a little bit. The type of things that you have to know maybe it’s a little more complicated. For example, if you know game theory, it can help you a lot in building the reinforcement learning models, or novel models. But if your question is, “Is it hard to deploy?” I would say no. It has basically the same complexity that any other machine learning algorithm does have, with benefits of being adaptable, requiring less labeled data, or even no labeled data.

Peyman Hesami: But I believe the main problem, the main complexity in reinforcement learning, is how to define your specific problem in the context of reinforcement learning. I said, yes, you can apply reinforcement learning to many different problems; but how you cannot map your problems to this space, it’s the tricky part. How you define your environment. How you define your words. How you define your punishments. What is rewardable and what is punishable? All of these will have a big impact on how you basically transition from your problem to a reinforcement learning setting, and that again goes back to the depth of your knowledge in reinforcement learning; and the better you know the algorithm, the easier it is for you to basically map the problem to a modeling problem in RL.

Kirill Eremenko: Interesting. That’s really cool. Can you give us an example of mapping a problem? I understand there’s some things you can’t share from your work, and so on; but even just a general example of a problem and how you would map it through reinforcement learning, to give us some context for this insight.

Peyman Hesami: Sure. I would go back to my speaker example, for example. If let’s say I’m trying to map the problem of adjusting the volume of my speaker to my environment, basically what I want is my speaker to be super smart; and depending on the environment, depending on the lights, depending on the time of the day, I wanted it to adjust the volume for me. I have this problem now, how I map it to reinforcement learning. So, first thing you need to do is define your environment. What is punishable? If the speaker volume goes up while I’m expecting it to be very low, that’s punishable, but how do you define a metric of punishment? What if the smart speaker with this reinforcement learning agent increased the volume a little bit more than what I’m expecting in that environment, now how do you punish that? Do you punish it or do you basically ignore that setting? So, basically it’s a very detailed problem like this that you have to specifically define and map; and from there, once you define your environment, go about simulating and see how it works in that specific environment.

Peyman Hesami: This kind of mapping is what I believe is the hardest, and it depends on two things: your reinforcement learning knowledge, and your specific business domain knowledge that you have on that topic.

Kirill Eremenko: Got you. Inevitably, in this example with your speaker, you have to be patient, right? You have to go through situations when you’re just sitting there and the volume goes up, and you have to click a button or say something to enact the punishment, so the artificial intelligence will understand that that’s a bad thing, but it will need to explore before it can actually come up with the right solution. How long would it take for that exploration to occur?

Peyman Hesami: Exactly. So, this is this period of time where the reinforcement learning agent learns to a point that the behavior starts becoming acceptable, and I would call that period the puppy period; that’s basically the period that you take to train your puppy, and that is basically a crucial metric for some type of reinforcement application like this one. Some applications, you don’t care about how long it takes; usually, it might be in the order of minutes or hours, and you can basically let the agent learn and then start using it. But in cases like this where that period is crucial and you want to limit that, or you want to minimize that as much as possible, you have a couple of options. One of your options is to create a simulated environment; so instead of the actual environment that you have in your bedroom, simulate your environment and let the agent to learn in that simulated environment. Then, once you come to the real environment, you have a default set of behaviors that you know how to act on, and you just basically adapt to the very specific details of your environment that wasn’t in your simulated environment. So, that’s one approach.

Peyman Hesami: The other approach is you can minimize this puppy period in a technical way as well. The better the model that you have, the less this puppy period is. So, that’s one [inaudible 00:46:17] that you basically can optimize this puppy period from a technical point of view.

Kirill Eremenko: What do you mean “better model”?

Peyman Hesami: A model that is well-optimized. Reinforcement learning, for example, it could be a neural network, implemented using a neural network; and basically the more it will optimize that network, the better the outcome of your [crosstalk 00:46:42].

Kirill Eremenko: You mean the architecture is more tailored to the problem.

Peyman Hesami: Yes. Architecture, hyper-parameter. Yeah.

Kirill Eremenko: Got you. Better designs for better optimization and simulated environments. That actually reminded me that’s how they do the self-driving cars. You don’t see Tesla or Waymo… the Google version of self-driving cars, or whatever other companies… you don’t see them just like, “All right, here’s a reinforcement learning algorithm for the car. Go drive,” and then it makes all these [crosstalk 00:47:17].

Peyman Hesami: No. Let’s go on [inaudible 00:47:20].

Kirill Eremenko: It’s like they simulate all of it, right?

Peyman Hesami: Yeah. They simulate. They have simulated environments, which could be basically an environment where an actual human drove the car, and the reinforcement learning agent made decisions in the background, but we didn’t act on it. The human acted on it, and you still have a simulated environment built by a human driving the car. Then, once you have that, you replay that environment, and that’s an actual environment that you can use to train your model after the fact.

Kirill Eremenko: Yeah. I was talking to Gary Saarenvirta, I think, on the podcast not long ago, and this was funny, and this is actually in one of his other videos online, “You can’t train a self-driving car through classification.” Like you can’t just show it a thousand photos of a car in the ditch and a 1000 photos of the car on the road, and then it will learn how to stay on the road. You have to use reinforcement learning in order to train it, so it’s really funny like that.

Peyman Hesami: Yeah. The number of scenarios in any environment that involves human behavior is basically infinite, and in that case you quickly realize, “We don’t have enough computing power to basically adapt to those kinds of scenarios.”

Kirill Eremenko: Yeah. Surprisingly.

Kirill Eremenko: Okay, so we can see also you like the value of reinforcement learning. Where would somebody get started if they want to learn reinforcement learning and get really good at it? What would you recommend?

Peyman Hesami: There are many, many different online courses, workshops, and specifically reinforcement learning. So, I don’t have any suggestion on that, but the only thing I can recommend is learn the theory very well before you jump into building a reinforcement learning model. Once you learn the theory deep enough, it will be much easier to build it, to optimize it, to change it, so that would be my suggestion. Take one of these courses Coursera, Udemy, anywhere that you can find a course like this, and most of those courses are pretty good. Once you learn the basics behind it, get a toy project and get your hands dirty, and build your first reinforcement learning model. There are so many other documentations and resources from basically Google, from Amazon, that you can… They have a lot of APIs that you can use if you don’t want to learn about reinforcement learning and you just want to use it. But if you are going to be building a model yourself, learn the theory and then get your hands dirty.

Kirill Eremenko: Fantastic. In term of courses, you don’t have to go far, Udemy, Coursera for sure, and there’s other providers as well. But even on SuperDataScience, we have three or four courses about reinforcement learning, including Deep Reinforcement Learning 2.0, which we published just recently. So, if anybody is interested, you can find them there as part of SuperDataScience.

Kirill Eremenko: I wanted to ask you, though, when you say “learn theory,” do you mean the mathematics behind it or do you mean the intuitive understanding of how it works? I find that distinction quite important.

Peyman Hesami: The main thing is the intuition behind it and how the model works, but I have learned that it’s a little hard to learn the intuition behind the model without understanding the math behind it; and I don’t mean to know exactly what’s going on, but to have a feeling for what is being optimized here, how do you define your optimization problem, or how do you define your loss function, how do you define your environment? Know all of those details and that will help once you get into implementation of it.

Kirill Eremenko: Got you. How long have you been studying reinforcement learning yourself?

Peyman Hesami: Close to three years.

Kirill Eremenko: Three years.

Peyman Hesami: Yeah.

Kirill Eremenko: Three years and that’s gotten you to this level. That’s really impressive. That’s a very advanced level that you’ve got into in three years.

Peyman Hesami: I think machine learning in general is like that. It might have a steep learning curve; but once it gets to that point, it gets easier.

Kirill Eremenko: Got you. What’s been your biggest mistake in reinforcement learning? Something that you can share with us to help others avoid it.

Peyman Hesami: I was basically trying to apply reinforcement learning to a wrong problem. There was a problem at work, there were way better, easier solutions, and I wasn’t thinking out of the box at the time because I was very excited about reinforcement learning and I wanted to apply it to everything. I think that’s basically one of the pitfalls, not only for reinforcement learning, but for machine learning in general. I was joking around with my coworkers; I was telling them, “I’m going to create a course titled, ‘When Not to Use Machine Learning.'” I’ve seen many, many examples where there is a very simple problem that you can solve using basic statistics, and you see people applying deep neural net to solve these problems, and I think that’s one of the pitfalls. I think that’s one of the things that I have done myself in the past and still I’m trying to prevent it, even at this point. Once I’m basically faced with a problem, I make sure there is no non-machine-learning related solution for it before jumping into it.

Kirill Eremenko: Yeah. I guess everybody [inaudible 00:54:15] discipline to have an approach that you follow every time about investigation, “Can I solve it with a simple solution? All right, I can’t do that. Can I take it to the next level?” Somebody on the podcast, Andreas Mueller, one of the developers behind scikit-learn, he has this very strict approach. You first try a logistic regression; if that doesn’t work he tries a random forest; and if that doesn’t work he goes to XGBoost. You know, you have to have the discipline to follow your own design approach to investigating problems.

Peyman Hesami: [inaudible 00:54:57], and it’s something you basically do in your life. You basically apply that to nontechnical problems in your life. You always try simpler solutions first before getting into more complicated ones.

Kirill Eremenko: Yeah. Totally agree.

Kirill Eremenko: Peyman, on that note, we approached the end of this podcast, and it’s been a very exciting conversation to have you here on the show. Before I let you go, could you please tell us where are some of the best places to find you, so our listeners can get in touch, follow you, maybe read more about our career and where it takes you?

Peyman Hesami: As of now, I believe that would be LinkedIn and GitHub. I’m planning to build a better online portfolio, and hopefully start publishing technical blog posts, but I will hopefully be posting those on my LinkedIn account as well.

Kirill Eremenko: That’s a great idea as well. Fantastic. Of course, those coming to DataScienceGO will be able to find Peyman there, the end of September this year, and get in touch there and meet in person. That will be exciting.

Kirill Eremenko: So, on that note, one more thing I have. One more question I have for you. What’s a book that you can recommend to our listeners that has inspired your career or even your life?

Peyman Hesami: I think one book that I read not too long ago, but it had a big impact on me, was Zero to One by Peter Thiel. Basically, this book discusses how to build new things instead of copying other things. It explains how to go from zero to one, instead of going from one to N, and it’s a very interesting book. It will apply to your technical and nontechnical aspect of your life.

Kirill Eremenko: Fantastic. Zero to One by Peter Thiel.

Kirill Eremenko: Thank you so much, Peyman, for coming on the show. It was super fun to talk about reinforcement learning, and I’ll see you at DataScienceGO.

Peyman Hesami: Thank you. See you, too.

Kirill Eremenko: Thank you so much for being part of our conversation today with Peyman, and I really hope you enjoyed it as much as I did. We covered off some very exciting topics related to reinforcement learning: what it is on an intuitive level, how it differs to supervised learning methods; and what role, and why, it’s going to play in the future of machine learning. My personal favorite part was discussing the six advantages of reinforcement learning; I think it’s very important to always keep those in mind so you can see where it’s necessary and appropriate to apply reinforcement learning.

Kirill Eremenko: I’ll just list them out here so that you can jot them down if you need to, because I imagine you are listening to the podcast and maybe you didn’t have time to write everything down, so here it goes: 1) they don’t require large labeled data sets; 2) they’re innovative, that means that they can come up with methods and solutions that supervised learning algorithms cannot because they’re simply copy-pasting what they’re learning; 3) they are much more bias resistant because they don’t learn the bias from the data set, unlike supervised learning algorithms; 4) they have the benefit of online training, so they can get you results right away as soon as you deploy them, so they combine exploration and exploitation to get those results while they’re learning, unlike supervised learning algorithms that need the pre-training and then they can only be deployed; 5) they are goal oriented so you can set a target for them, and therefore get them to work towards that target, rather than just copy-pasting, as we discussed, what the human was doing previously; 6) and finally, that they adapt to environments, so basically you don’t necessarily need to retrain them over time like other models; they can retrain themselves.

Kirill Eremenko: So, there you go. That’s the six advantages of reinforcement learning that we identified. I really enjoyed that part of our discussion, and of course there were lots of other gems in this conversation.

Kirill Eremenko: On that note, as always you can find the links to the materials mentioned in today’s episode at www.www.superdatascience.com/293. That’s www.superdatascience.com/293. Of course, don’t forget that Peyman is going to be one of our panelists, and most likely speaker as well, at DataScienceGO 2019; so you can meet him in person there, shake his hand, give him a hug if you liked this podcast a lot, and of course ask him lots and lots of questions about his experience in data science. If you don’t have your ticket for DataScienceGO yet, this is your chance to get one. Head on over to www.datasciencego.com and secure your seat there. We’re expecting between 600 and 800 data scientists attending, with dozens of speakers and executives, and lots of interesting people to connect with like Peyman.

Kirill Eremenko: On that note, thank you so much for being here. I look forward to seeing you back here next time; and until then, happy analyzing.

Podcasts SDS 293: True Personalization Through Reinforcement Learning

SDS 293: True Personalization Through Reinforcement Learning

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 293: True Personalization Through Reinforcement Learning

Share

SDS 293: True Personalization Through Reinforcement Learning

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025