Welcome to episode #007 of the Super Data Science Podcast. Here we go!
Today's guest is top analytics Consultant Artem Vladimirov
Wooh! Just finished this episode with Artem… What a blast!
You see, Artem and I are very good friends back from University. We’ve known each other for over 6 years now and it was so great to catch on this podcast.
After completing our Masters’ degrees we worked together at Deloitte and then about 2 years ago our paths went in different directions: I went into industry and Artem moved to the Boston Consulting Group…
And since then his Data career has skyrocketed!
Seriously. It would be very challenging to find someone more hard working and more successful than Artem in the space of Data Science consulting.
Just in the past 2 years Artem to at least 6 different countries ranging from Spain to Hong-Kong to India working on million-dollar projects and above.
And the best part is – he built his Data Science career FROM SCRATCH. His degree was completely unrelated to the skills he needed to succeed, but he didn’t give up!
In this podcast you will learn how he did it.
Also Artem will walk us through a real-life case study of using Machine Learning algorithms to solve a business challenge for a large bank. This is INVALUABLE stuff!
Finally, in this podcast you will learn about an absolutely new field that is developing in parallel to Data Science – Advanced Analytics. Artem will share his knowledge of the Simulation space and tips on how to get started here!
And those are just some of the things we talk about.
There is so much value in this podcast, I cannot wait for you to check it out!
In this episode you will learn:
- Advanced Analytics vs Data Science
- Traveling half the World to develop Domain knowledge
- Learning Data Science and Analytics from Scratch
- Random Forest case study walkthrough
- Interpretable vs non-interpretable modeling
- Transforming logic into high-school math
- Overview of Alteryx and Anylogic
- Keeping the end in mind – focusing on delivering business value
Items mentioned in this podcast:
- Statistical Models: Theory and Practice by
- Categorical Data Analysis by Alan Agresti
- http://www.anylogic.com/ Anylogic Simulation Software
- http://runthemodel.com/ Sample Models – Check It Out!
Watch the webinar recording below:
Kirill: This is episode number 7, with top analytics consultant, Artem Vladimirov.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week, we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Hello everybody, and welcome to this super special episode of the SuperDataScience podcast. I hope you're ready for a crazy rollercoaster. This episode is with one of my best friends, Artem Vladimirov. Artem and I go way back. We studied together in our Masters degrees starting from 2010, and then in 2012, we joined Deloitte and worked there together in the data science department, and then our paths split in around 2014. I went into industry and Artem went into another consulting firm, which is called the Boston Consulting Group, which is a top tier consulting firm, and he continued doing data science there.
And it was so great to catch up now. It's been a long time. We do talk occasionally, but I don't keep track as much of what he does and his career, and today I learned so much about how he has grown and what a great consultant, world-class consultant, he has become.
I'll give you a few examples. Just in the past two years, Artem has travelled to at least six different countries, ranging from Spain, Hong Kong, India, Italy, all over the place, performing consulting engagements with large clients. And we're talking deals and projects that range from million dollars, and sometimes even more than a million dollars, which is normal for this organisation, the Boston Consulting Group. So you can kind of like tell what calibre of consultant he is.
The interesting, the very interesting thing that I learned from this podcast is what Artem does is actually not entirely classified as data science. It is a mix of different approaches and methodologies, and it's actually called "Advanced Analytics". And Advanced Analytics is a bit different to data science, and I think you will find this very interesting. Advanced Analytics involves more of a simulation type approach. So I don't know if you've ever heard of, or even played this game called Sim City. Back when we were kids, there was game, Sim City, where you would be able to build a city and then stuff would happen in the city, and then your fire trucks would dispatch, and they would go, and you would be controlling the city from a bird's eye view.
So that is the simplified way of the way I imagine what he does, is he builds these simulation models which are actually little miniature models on your computer of, for example, a supply chain. Or of a warehouse. Or of a company that's producing something on conveyor belts, and they have some bottlenecks. And adjusting certain parameters on this simulation model, he can identify where the potential bottlenecks are, where the challenges in the supply chains are, how the company should place its warehouses, and so on.
And it's great that in this podcast, Artem actually goes into several case studies in a lot of detail. So Artem will walk us through, like literally walk us through a case study of a project that he did for a bank, where they were performing some modelling, and he will explain exactly which algorithm they were using, it was random forest, and how he thought about it, and I really drill into the questions, and I ask him a lot about the way he thinks about it, what the overall business challenge was, and we learn a lot from that.
Then there'll be a case study about some warehouses which he was optimising somewhere in Europe, the placement of warehouses or storage facilities. So that was also a valuable thing. And regardless of what you're using analytics for, whether you're pursuing a career in analytics, or you're building an analytics culture, an environment, or you're an executive and you want to leverage analytics some more in your business, you will find a lot of value in this podcast. We go into all of these different details and ways you can be applying analytics.
And also, Artem will share a little bit of his background with us. And it's very interesting because I actually knew this, but I forgot, and he reminded me that Artem's background isn't actually in data science or analytics. So the stuff he studied at uni was economics and finance. And when he went to Deloitte, he really had to develop these skills, such as R programming, any logic, and SQL from scratch. So he didn't have any of these skills, and his career is a great testament to the saying that where there's a will, there's a way.
So just by looking at how he approached this challenge in his life of becoming a data scientist from scratch after university, you will be very inspired to go and do the same. Because if he had the determination and the willpower to persevere and actually achieve the results that he has achieved, and build this super successful career for himself, then you should be inspired to find that same willpower, that same perseverance, and determination to build a career for yourself just like Artem did.
Can't wait for you to check out all of the value inside this class. You'll notice that we went a bit over time. That is because we just got so carried away in all of these discussions. This is a super exciting episode, and without further ado, I bring to you Artem Vladimirov of the Boston Consulting Group.
(background music plays)
Hey guys, welcome to this podcast. I'm super excited you can probably tell by my voice. I've got my good friend Artem
Vladimirov here. Artem, hey mate, how you going?
Artem: Hi Kirill! Great to talk to you. I am good. And you?
Kirill: Great, great, thanks. For those of you who don't know, Artem and I go way back. We met -- when was it? Like back in 2010, yeah?
Artem: Yeah, 2010-2011, back at uni times.
Kirill: Yeah. We both went to the same university and studied pretty much the same degree. And do you remember that crazy story of how we met?
Artem: Something like you introduced yourself to someone from Zimbabwe? Yeah, I do remember something like that.
Kirill: I tell people still. It is like the stupidest thing ever. Remember, it was our first lecture of our first class. And we both went to the wrong building. Remember that?
Artem: Yeah, yeah, that was true!
Kirill: We were supposed to go to a statistics class, and we went and we were the only two out of our statistics class, we went to the biology class or something. And we didn't recognise anybody, we didn't understand what was going on. We were just sitting there like two idiots. It was so cool. It was such a coincidence back in the day. And then since then, a lot of things crossed still in our past. We got a lot of time to bond and connect. We did all our assignments together, especially for economics, the group assignments, that was fun. And then we started working at Deloitte together, yeah?
Artem: That's true. We worked together for several years at Deloitte.
Kirill: It was a fun time. It was -- what was the department? It was called Data Analytics at first, and then it was called Decision Science, yeah?
Artem: Yeah, Data Analytics, and then it was renamed to Decision Science and Analytics.
Kirill: Yeah, I remember how they called it DADS for a while?
Artem: Yeah, Decision Science and -- what was it called?
Kirill: Deloitte Analytics and Decision Science.
Artem: Yeah, that was it.
Kirill: Yeah, that wasn't the best choice of name. It's just DADS. Yeah, good old days. Anyway, and since then, I left Deloitte and I went into industry, and then quit that, and now I do this. And you moved to a very exciting and new role which, personally, I don't even know much about. You moved to BCG. Boston Consulting Group. Right?
Artem: That's right. As it happened, I also left Deloitte shortly after you left, but I didn't leave the industry, I stayed within consulting and I joined the Boston Consulting Group, working in the same team in Big Data and Advanced Analytics, with a slightly different office than I had before. But I'm happy to discuss it in more detail.
Kirill: Awesome. Yeah, that'd be great. If you can tell us a bit about it. Because all I can hear from you is like when I want to catch up, or have a chat with you, you're like, oh, I'm in India. Or I'm in America today. Or I'm in Japan. You're like all over the place. And what do you do for a job?
Artem: I'm working in the Big Data and Advanced Analytics team. It's an expert team which provides expertise to our case teams, so we provide the expertise, so we do cases for the clients. And I personally work at the intersection of data and advanced analytics techniques, so just spatial modelling, dynamic simulation, mathematical optimisations, and with practical applications of my work including things like network design and optimisation for financial institutions or for retail stores, things like supply chain optimisation pretty much for any industry, the bottlenecking of manufacturing facilities, so you may guess that it's more advanced analytics rather than big data. I would say probably 25% data scientist, and 75% is advanced analytics.
Kirill: Very interesting. And before the podcast, you actually mentioned to me that you're doing more advanced analytics than data science. Could you tell us a bit more, what is the difference between advanced analytics and data science?
Artem: Yeah, sure. So for instance, as I mentioned, I am doing dynamic simulation. So that's programming in Java and Java is one of the tools that I use. That's creating models which look like simplified versions of computer games, where you can see things moving around. So I create these models for industrial shops, where they produce stuff, like metals for instance, and then I use these models to test various scenarios, like if they're going to change some production logic, how it would input the total production in terms of pounds.
And to develop these things, you don't need data per se. So you just need estimates. So let's say, what's your average processing time, what distribution does it follow? What's your maintenance logic for your equipment? So whether you're taking down your equipment, let's say, once in a month for planned maintenance and then there is a certain probability for unplanned maintenance.
And it's just literally a few numbers for each of these rules. So let's say 15 minute average time, and 5 minutes standard deviation for processing time for equipment A, etc. etc. But in order to get these estimates, of course you need to do some data crunching. So that's where data analytics comes in to help as well. But I usually ask someone else to do this. But just to develop these models, you don't need this to do this data crunching, you can just use dummy variables to see how it's working. Like I can use, let's say, 30 minutes, having no idea what the real processing time may be, to develop this model and then I just feed the estimates into this model. Does it make sense?
Kirill: Yeah, yeah, it makes sense. So you're kind of like building a little mini version, or a computer version of the factory, or of the supply chain, or of the network, or something, so that you can model it and speed up the process and understand where the issues and bottlenecks will occur in real life. Is that correct?
Artem: Exactly. Or take for instance just spatial modelling. So that's taking into account geography into consideration in your analysis. You don't need huge data sets to do just spatial modelling. So what you need to know is locations of your, let's say source, or points of interest, and locations of your competitors, things like raw distances, and then you can find out what the optimal locations of your warehouse is, for example, to minimise total transportation costs for your client.
Kirill: Very cool. The way I imagine it in my head right now is like a little Sim City. You know that game, Sim City? Where it's like you're building a city, and then something happens, like a fire breaks out, and your little fire truck has to get there on time, and you kind of like model it. You can try to rebuild a big city like New York, or something like that. So yeah, that's the way I think about it.
Artem: Pretty much, except for we don't have disasters!
Kirill: Alright. So that's pretty cool. So in that sense, Advanced Analytics isn't like -- because I thought, when you mentioned Advanced Analytics, I thought it was like a step up from data science. From what you're explaining, it sounds like just a something that's parallel to data science, right? Is that correct?
Artem: I would say that it's something that supplements data analytics, yeah. It's definitely something separate to data analytics, but at the same time, it often complements each other.
Kirill: Ok. Ok. That sounds pretty cool. And so if you can do this all on your computer, why do you have to always go and actually visit the client, whether it's India, Japan, America, and all these other crazy places that you've been to recently?
Artem: That's a good question. So as part of my role, I look after the Asia Pacific region. So I do projects for the client not only in Australia, but in the whole Asia Pacific region, and with some occasions for global projects. So for example, recently, I had been in Europe to do a project for one of our European clients. And the reason why I have to travel is because while I can develop all these models remotely from Sydney, where I currently live, which I often do, an important part of my work is to understand the business rules and, let's say, the rules of the game. So you need to understand what is the business problem first, and then what are the business rules? What are the business constraints that can shape this solution or this problem. And you need to discuss it with the client. So the most efficient way is to face to face with the client discuss these things, and then start developing the model.
And then, after you have, let's say a first version of the model ready, with some preliminary insights, or results, you need to validate it. You need to make sure these results make sense in the context of the business. So you do some validation of [14:51] how they can use it with the client. You sit with the people in the business to understand whether this result makes sense, and more often than not, the results from the first version would not make total sense, just because it's very hard from the first time to take into account every single business rule that can shape your solution. So you will probably miss something in the initial iteration. And then you will try to understand the case. If your solution does not really make sense, does not make sense 100%, what do you do? What did you miss? What can you add to the model that's realistic, and that will shape this solution?
Kirill: Ok, yeah, makes sense. And so you're kind of like visiting these places and talking to these people, and actually seeing the place to develop a certain level of domain knowledge. Is that correct?
Artem: Yes. That's correct. I even visited some metal plant, metal making plant, so that I worked with the case team on the ground there, and I also had to visit a shop itself so that I basically know exactly what I am modelling.
Kirill: Can you tell us, just so that our listeners can get a feel for your lifestyle, what countries have you been to in the past two years?
Artem: So I work in Australia, but in the past, I've been to the States, I've been to Japan, I've been to Hong Kong, Italy, Spain, Singapore, India, Russia. That was pretty much for holidays though.
Kirill: That's crazy.
Artem: That's pretty much it.
Kirill: That's so cool. Even I'm a bit jealous. Like in a good way. I'm happy for you. I'm really happy that you get to travel.
Artem: Thank you.
Kirill: Do you find it stressful, travelling all the time for work? Or do you find yourself doing -- what do you do on the plane? It's such a long flight from Australia to Italy and Spain, and so on. What do you find yourself doing? Do you just keep working all the time?
Artem: If I have to work, I have to work. But very often, it's night flights, so in general, if I'm in business class, I can sleep well.
Kirill: Nice. Nice. Yeah.
Artem: Sometimes you just feel like you can watch movies.
Kirill: Your frequent flier miles must be through the roof!
Artem: Oh yeah!
Kirill: Which airlines are you with?
Artem: I'm with Qantas. But actually, to be honest, I have not travelled much in the last 6 months, so they're probably going to downgrade me from Gold to Silver, and then I'm also Gold with Singapore Airlines, I also have some status with Emirates and Etihad.
Kirill: Everything. A little bit of everything.
Kirill: Sounds like fun. You mentioned cases. You said you're with the case team, or you're working on a case. That sounds like a police case, or a legal case. I'm assuming it's not, of course. But what is a case? Because I remember at Deloitte, we never had that term. What do you mean by case?
Artem: It's a project. So every company has its own terminology for a project. I think at Deloitte, we had an engagement for this, so we called engagements. So here it's called cases. I think in some other place it's called project. But it's essentially a project for the client.
Kirill: Ok. Now that our listeners have envisaged who Artem is, and that he's obviously just without a doubt, you're very successful in what you do in your career, and you sound very happy about what you do, and who wouldn't be, travelling the world and doing all these exciting projects, can you tell us please a little bit more about your background so that our listeners can understand what pathway you took to get to where you are.
Artem: Sure. So my first degree was in economics, and then I got my postgraduate degree in finance, which is completely different from what I'm doing right now, to be honest. And then I started to work at Deloitte in the data analytics team, and to be honest, in the first three months, I thought I was going to leave it, just because I use a bit of programming, but not much, so things like R I didn't know about. I had to learn it from scratch. I had to learn SQL from scratch. So most of the tools I had to learn from scratch, and then let's say I was thrown in the ocean and I was looking at the SQL scripts that they had in place, and I just didn't understand anything, to be honest. And I thought I'm not going to survive for long.
But then I kind of started to understand everything. I spent a lot of time after office hours trying to understand all the procedures, all the scripts, trying to learn the language, and I kind of liked it. And that's how I became a data scientist.
Kirill: Wow, I love it. That's a great story. And I actually forgot about that. Because you had told me those things, that you needed to learn even SQL from scratch. So it's a great example that yes, you do have two degrees. You have a Bachelors and you have a Masters, but they're completely in unrelated fields. Yes, it's economics and finance, so it's somewhat related. But you still had to build your data science skill set from scratch. And I think that's going to stand as a lot of inspiration to a lot of people who are going to be listening to this podcast who don't know where to start. Like, you're a great example of a person who didn't give up, who actually just pushed through it, and like you say, late nights, and perseverance, and actually learning all these tools from scratch. So yeah, that's great to hear.
What would you say is the one biggest piece of advice you can give to somebody who's going to be in the same shoes you were in 4 years ago, or was it 6 years ago?
Artem: Do what you like. If you don't like the area that you are working in, or the area that you are studying, think hard about it, whether you should continue. Because the main strengths are in what you like. If you like it, then you will find inspiration, you will find strength to do it. And just don't give up.
Kirill: That's fantastic. Thanks a lot for that. And speaking of learning R programming, because recently we chatted, and we were talking about R. And if you don't mind me mentioning, you said that you don't use R much, and you're slowly starting to forget that skill. Do you think it's easy if you want to recover it? Do you think it will take you a long time to recover R programming now?
Artem: I don't know, really quickly? In a few hours, I can pretty much remember everything. I think the reason why I'm not using R so much is that I now switched to another tool, which is called Alteryx, which is kind of a mixture of SQL and R. So basically, what I could do is a combination of these two tools, SQL and R, I can now do in one. So Alteryx has an in-built module, and in-built residues for R. So there are in-built things like regressions, Random Forests, things like that, which are based on R code. So you can basically run R code in Alteryx. It's quite cool.
Kirill: Ok. Can you tell us a bit more about Alteryx, is it a free software? And also these models that are incorporated, do you need to download libraries to install them, or do they come pre-packaged and stuff like that?
Artem: So it's not free, it's commercial software. To be honest, I'm not sure how much a licence costs. As far as I'm aware, it's not too expensive. It's definitely cheaper than some of the other software that we use. And what it can do, so it's very good in data manipulation. So things like queries, data restructuring, joining tables, things like that. Aggregations, grouping. But then it also has some other modules, in-built modules, which allow you to do some additional things.
So for example, there is module called Statistical Model, which is linked to R, which can do regressions, Random Forests, neural networks, and it's very easy to set up. So easy you don't need to program, you just drag and drop different elements together and create a diagram. And then you can also do simple chess spatial modelling in Alteryx as well. So for example, if I have a client who is a retailer, and I know the locations of their stores. And I know competitors, locations of their competitors. I can pretty quickly derive something like 10 minute drive time radius based on actual drive time network, based on actual road network, and I can understand what kind of population lives within 10 minutes of our client's stores and within competitors, the 10 minutes of competitors' stores. So what are the demographics of these catchments, compare it, and do some analytics on that.
Kirill: Wow, and that all happens within Alteryx?
Kirill: That's really cool. So is that what you predominantly use it for? Or do you also utilise the Random Forest, and neural network algorithms that you mentioned?
Artem: Yeah, I also use it for statistical modelling.
Kirill: Ok, ok. That's very interesting. What are the probably 2 or 3 most used modelling algorithms, or which ones are the ones that you use most?
Artem: So if we are talking about statistics, then it will be linear regressions and Random Forests, or GLMs, or boosted models. Sometimes you would use things like Random Forests, which are essentially black boxes, right. You don't know exactly how they operate. Well you know roughly how they operate, but you don't know exactly how they transform the input data into the final recommendations, so you don't know exactly how each of these different attributes that you put into this model, how does it affect your final output. You can do some sensitivities, but it's effectively black box. And sometimes I use that, like if I don't need to explain how exactly I got to this result, if I just need to predict something. So, for instance, recently, I used it to predict total value that a band can get from each area in Australia based on their current customer base. So let's say they have current customers distributed across the whole of Australia. They don't have customers everywhere, in every single region, and we had 55,000 different areas that we can split Australia into, and they obviously don't have customers in every single area.
Now, what we can do, based on the current customer base, based on the demographics and value of the products that they take, we can infer what are the other areas that are worst for this bank if they put in a branch in these areas. And I did that using the Random Forest, just because I had to make a prediction. I didn't need to explain exactly which demographics attribute results in uplifting this metric in my final metrics.
However, there are other situations when you would discard Random Forests, or GLMs, or whatever other model you are using, in favour of a much simpler model, something like a linear regression. It can have a bit less predictive power, but its major strength is in the fact that it's interpretable. You can easily interpret it in terms of coefficients. So let's say each of your predictors will have a coefficient associated with it, and the value of that coefficient will basically indicate what's the impact of this predictor on your final outcome. Which can be very, very useful assuming you're doing your statistical analysis right. Assuming there is no multicollinearity and other very scary statistical things.
Kirill: Homoskedasticity, yeah?
Artem: Yeah, yeah, something like that. Assuming all major assumptions hold, the coefficients are pretty interpretable and you can basically say how much each predictor, what's the impact of each predictor on your final outcome.
Kirill: That’s really cool. And it’s interesting that you mention that because -- probably this is more for our listeners -- that if you're interested in learning about any of those algorithms, like Random Forests, or linear regressions, and all of the interpretation, you might want to check out the courses on SuperDataScience, which are the machine learning course, and Data Science A to Z. We discuss all those things in a lot of detail, so a lot of the students listening to this podcast actually should be quite familiar with these concepts. The only one I would ask you to clarify a little bit is GLM. What does GLM stand for?
Artem: Generalized Linear Model which is just a more sophisticated version of linear models like that. It can take up different combinations, so it’s not just linear relationships that it can test.
Kirill: That’s really cool. And Random Forests, linear regressions — and that was a great example about the bank and how you would predict the outcome for the bank. I’m still trying to get my head around how do you think of that problem in a way to say, ‘Oh, it actually makes sense. I probably should use a Random Forests algorithm.’ Because at the end of the day, Random Forests is a combination of decision tree algorithms, right? So it’s just many decision trees and then their averaged out outcome from there. So how would you go about thinking a business problem to come up with a conclusion that the Random Forest is the way to go in this situation?
Artem: Well, you start with the business problem as a whole. So you need to understand what the business problem is, what the business implications are. And then what I do is I spend a few hours in front of a whiteboard trying to basically pencil out a solution of drawing out a methodology, how I would approach this problem. And let’s say -- the example that I gave you -- it was actually the part of the problem. So that was, let’s say, a first step to solving a much larger problem. And the problem was that we had to determine best locations for their branches, so to do a network optimisation for that bank and to determine best locations for the branches. And then I thought about it, about this problem. How do you solve it? Let’s say, in order to make a decision on where would you put branches geographically, you need to know: a) how is the value distributed geographically, so what is the potential of each area. And then you need to know how do these branches capture this value.
Having these two pieces together, you can then run mathematical optimisation, maximising the value that these branches capture based on their locations. However, what you don’t know, like you don’t know these two pieces in advance. So you need to determine the value, like what’s the potential of each area. And that’s where I use this Random Forests technique. So I thought about it: how can I derive the potential of each area in terms of value per bank? Of course, we have a current customer base, we know their profitability, we know which products they take, we know where they live roughly, we know the demographics. Again, roughly, their age, their gender, etc. And we can use this to understand whether these demographics, whether these attributes affect the profitability of a customer. So whether, for example, older people take more valuable products, take more mortgages, things like that. And then you use these insights to run a statistical model to make a prediction. Again, now that I know this information about my demographics, how it impacts my profitability of my customers, now I also know population of each area in Australia in terms of number of people and their associated demographics from the census data, and now I can run a model, a statistical model, which basically will predict what is the potential of each area based on things that I know.
And then I thought about a statistical model, what statistical model I can use. So, obviously, you have a range of different statistical models, such as GLM, Random Forests, and even neural networks, so simple linear models. The choice of which model, so what I often do is I often run several models at once and then I compare the predictions. So I compare the performance of each model and then it’s like the best model, the model which performs the best, subject to certain considerations in regards to whether I need to interpret, whether I need an interpretable model or not, in which case I would not use Random Forests. Like, what kind of an output, what kind of an outcome, an output variable do I test on it again? So for example, if it’s a categorical variable that I want to predict or whether it’s a numerical variable, it will all shape the choice of the final technique.
Kirill: That really explains it well. Thank you so much, especially that part where you started mentioning different characteristics that you have about the customers of the bank -- their age, their gender, and other knowledge that you have about them. That kind of in my head now makes sense where the decision tree comes from. So you're kind of like, decision trees, like ‘Are they over 30 or under 30? Are they male or female? Do they work in white collar or blue collar?’ and things like that. And then based on that, you would get a Random Forests algorithm to work on through using those decision trees and kind of like -- it makes sense how it would work. So it was a great case study. So thanks a lot for walking us, literally walking us through this case study. I think it’s a lot of value for people studying statistics and especially machine learning. Another question I had, since we’re talking about some of the work that you’ve done, is what would you say – if you can share this information, of course, because I understand BCG has certain non-disclosure statements and stuff like that – but whatever you can share, what has been your recent biggest win, you would say, in the space of data science or advanced analytics, and your biggest challenge in your day-to-day role?
Artem: Tricky question. I can probably mention one of my previous cases that I’ve done, which was in Europe. It was a network design for a European utility company which I did last year. And I can probably say it was a recent win that I can share with you. And the reason why it was a very big win for me personally is the work involved very advanced models, modelling techniques. So it was very technically sophisticated because it was including an optimisation model -- so mathematical optimisation -- and a simulation model just to solve one problem for a client, which is quite a rare case, to be honest. It’s a very rare occasion that you would need to. Most often you would just use either optimisation, for example, if you want to understand what are the best locations for your warehouses, or you would use a simulation to test certain scenarios. So for example, if you want to test how certain production initiatives will impact your total productivity at the plant. But in that particular case, the problem required the use of two different techniques, including some just spatial modelling as well, which together had very tangible results achieved for the client. And client appreciation for the whole project made it a very enjoyable case for me. So it was a big win for the client. It was a big win for us. A very nice case, very nice team to work with.
Kirill: Wow, that’s fantastic! And that’s also a good example. I know you probably can’t go into a lot of detail about the project itself, but a good example for those who are listening who have their own businesses or who are in managerial or even executive positions. Like, when you think about it, you can just place warehouses anywhere, right? You can just place warehouses wherever it’s cheaper. But then, why would you do that if you can run some optimisation, supply chain optimisation, and other analytics to understand what is actually the best location for your warehouse. It’s just something that doesn’t come to mind right away and maybe for those listening who have their own businesses, maybe there’s other parts of your businesses that you are just like placing or going about based on your intuition, or gut feel, or just based on some common standards, acceptable ways of conducting business. But at the same time, maybe there’s a better approach through data to actually come up with a more optimised solution. So, thanks for that. And what would you say is your biggest challenge?
Artem: I think it’s a very, very good point that you just mentioned because very often, and I see that a lot with our clients etc., that people just use their gut feeling to make certain decisions, right? So they base them either on Excel spreadsheets, which don’t take into account all the business rules, etc. They base their decisions on gut feelings and their intuition based on how the business did it in the past, which is most often not the best way to do things. And let’s say, for example, let’s take again this example of warehouses. You need to put 10 warehouses across the whole country and you need to put it in such a way as to -- you also want to minimise your costs, your supply chain costs. And then there can be lots and lots of different considerations that can shape, that can impact this transportation cost. So obviously you want to minimise your transportation businesses, whether it’s road, or rail, or whatever else. You want to minimise your inventory costs. Then there can be other business rules you need to take into account.
So, for example, I had a case when I had to take into account that when you transport your materials from a warehouse to a customer, and if they’re in the same state, there is no tax applied. But if they’re in a different state, there’s an interstate tax that has to be applied, that the client pays for. Which basically has lots and lots of impact on your final solution because it basically incentivises you to put warehouses in the same states as where customers are. And then there can be lots of these different business rules that you need to take into account. You may have capacities of your factories playing a huge role. You may have even limited capacities on your, let’s say, railroad transportation or road transportation. And all of this shapes your solution.
You just can’t take all of this into account if you make your decision based on your intuition, or based on the historical, how your company did it in the past. So mathematical optimisations, they have become so powerful in the last decade with the computer power basically raising, hugely increasing in power, and optimisations have become much easier to solve just from the pure processing time and algorithms. Algorithms have developed intensely over the last decade. And then these mathematical optimisations, they can be applied to solve business problems as well, taking as an example this warehouses problem.
So you can describe all these business rules and constraints in the form of mathematical equations and mathematical problems. You have an objective, you have your levers, things you can pull to shape your solutions. So in this case it will be locations of your warehouses, which the model can change, and then your constraints. Your business rules are constraints such as it can be capacities of factories, it can be capacities of your transportation, it can be different taxes etc. And then you can formulate this problem in the mathematical form that the computer will understand. And it will try to optimise it, or try to find, let's say, a minimised cost, so the least cost, subject to all the constraints that you’ve put in, and it will determine what is the best -- in our case, what is the best location of the warehouses which minimises the cost, but at the same time satisfies all the constraints that you’ve put in.
And it is so powerful and we see -- so, we’ve implemented these techniques with so many clients, and we can see huge benefits from just using these techniques. You can do things slightly different, so slightly differently than you do now, and you can save lots of money just because historically you just don’t make optimal decisions. And optimisations can consider from an infinite range of alternative solutions. So you can have hundreds of thousands of different locations in the country where you can potentially put a warehouse. You can choose either. And optimisation will choose the best ones.
Kiril: That’s definitely something you can’t do just with gut feel or on a piece of paper.
Artem: Exactly. And most standard approaches would involve just tweaking the things, so testing different scenarios. Let’s say if I just move a warehouse from location A to location B. You would recalculate all the costs, etc. You would compare these two scenarios, and then ok, if it's better, you’d say, ‘We need to move this warehouse to another location just because it will improve the costs.’ But then what you don’t have an ability on — again, you may find another location which is slightly better than the location which I found which would certainly improve your costs. Let’s say even if you have just hundred of locations, and if you have ten warehouses -- I can’t do math in my head right now, but I believe the number of possible combinations where you could put ten warehouses out of a hundred locations is enormous, more than trillions of different combinations.
Kirill: You’ve raised a couple of very interesting points when you were describing this problem. So mathematical equations, in my understanding, it might sound complex, it might sound like Fourier transformations or some crazy high level mathematics, but it’s actually not, right? Am I right in saying that the mathematical equations you’re talking about are very straightforward, like eleventh grade or tenth grade mathematics? Is that right?
Artem: Yeah. It’s pretty much right. So, the equations are pretty simple. The trickiness is to formulate this problem into a form that this optimisation will understand. And there are also pitfalls. The most common approach is to use linear programming. That’s when all the equations and constraints are set up in a linear form. That’s the easiest way to solve it because there are lots of algorithms that just basically correct this problem easily if it’s formulated in a linear way. But then some problems, or most of the problems in reality, are non-linear in nature. And there are a few ways how you can approach that. So, one way—there are tricks. You can basically transform a non-linear problem into a linear form just using some binary variables, using some tricks. Then you can use non-linear optimisation techniques, but that’s slightly harder to use. And finally, you can use something like genetic algorithms.
Kirill: Yeah, well familiar with those. Those are very popular in the financial world. So can you give us an example of a non-linear problem and a trick that you would use to change it into a linear problem? I know it must be a hard question, but something simple just so that we get a better understanding of what you mean by non-linear.
Artem: Sure. So let’s say you have a fixed cost for a warehouse. I’m just going to stick with this warehouse example. Which means that if you are using a warehouse, if you are putting a warehouse in a certain location, then there are certain costs associated. There are certain variable costs which are dependent on your throughput, so the more commodities you transport via this warehouse, the more you pay. Because it’s handling, it’s inventory costs, etc. So, it depends on your throughput.
Then there are also things like your fixed costs. So basically what it means is that if you have a warehouse, whether you rent it or whether you own it, you pay some money, irrespective of how much you use it. So whether your throughput is 1,000 tons or 100,000 tons, you would still pay the same amount of money to use this warehouse. It’s a fixed cost. It’s a fixed cost on the business and it’s effectively non-linear in nature, while variable costs are linear. So let’s say if you have a unit cost of $1 per ton, then if you have 1,000 tons of throughput it will be $1,000; if you have 100,000 tons of throughput it will be $100,000. That’s linear. But then fixed costs, including incorporating fixed costs into this formulation, basically non-linear.
And then there is a technique, like a trick which you can use to transform this non-linearity into a linear problem. So what you can do is you can introduce a separate binary variable which says ‘Okay, if we have a warehouse in this location then it’s 1; if we don’t have a warehouse in this location then it’s 0.’ And then what you do is you multiply. So you use a product of this binary variable and your fixed cost, so a fixed cost is $25,000 per month for one warehouse. And then what you do, a model we’ll choose, for each location we’ll choose either 1 or 0 whether we’d like to put a warehouse or not, and then you multiply this by your fixed cost. If it’s 1 then it will be multiplied by 25,000, so it will be a fixed cost. If it’s 0, it will be 0.
Kirill: So that’s how you transform a non-linear problem into a linear one?
Artem: Yeah, so you just introduce additional variables, and most often it’s binary variables which basically introduce some additional logic side.
Kirill: Okay. Yeah. No, that totally makes sense and it’s actually a very interesting example. I think I learned a bit about that myself like that. You don’t think about it but really, these constant costs for the warehouse in this scenario, they are non-linear. So they don’t increase with your throughput. So you do need to come up with a way to deal with them. So that might be a handy trick. Thanks for that. It kind of reminds me of dummy variables in regression, when you have a categorical variable in your regression and you need to introduce a dummy variable like 1 or 0.
Artem: Yeah, exactly. And another good example is when you have, let’s say, a constraint which has to be MAD, not simultaneously but let’s say if there is a condition that the first constraint is MAD, then the second constraint has to also be MAD. There are tricks how to transform this non-linear logic into a linear set of equations as well. I don’t remember from the top of my head how exactly to do it, but again, introducing one or two binary variables can solve this, can transform it into a linear problem.
Kirill: All right, thanks. It sounds like a very interesting field. So we’ll get back to that. I have some other questions in terms of career wise. But also I wanted to make a comment on what you said earlier, that it is so easy that people, especially business owners who do not use data and linear data science in their decision making process are making an unforgivable mistake because computers have developed so rapidly and also algorithms, with the computers, have developed so rapidly over the past decade that you should be using them.
One of the things that pops to mind on that topic is that before, back in the day, decision trees, when they were first brought into life, they were popular. But then they kind of died off because more sophisticated algorithms took their place, like linear regressions, logistic regressions, support vector machines, and so on. But now, decision trees, even though they’re not as powerful, because the computers are getting so powerful, now we’ve got algorithms like Random Forests or gradient boosting, which actually employ those previously used methods such as decision trees, but they use them in ensemble way. So instead of having one decision tree you have like 500 or 50,000 decision trees working for you. And as an ensemble, they make better predictions than one individual decision tree. So it is exactly the case that both algorithms and computers have developed so rapidly over the past decade that it is so easy to come up with a model, or even to just hire somebody like BCG or any other consulting firm to help you out, place those warehouses or whatever you’re trying to solve using data science. So that was a great comment and I totally agree with you on that one. So moving back to what we started talking about, what is your daily challenge? What is the most challenging thing in your role?
Artem: I think the challenging thing, so very often I have projects that use huge amounts of data that I need to handle. So, for example, my previous project involved something like over 30-40 different data sets that I had to manage pretty much on a daily basis. So I need to remember what kind of information is located in which data set. I need to remember how to link all these data sets, what do these different fields mean. If I need to pull out some additional information that I didn’t have in my analytics data set before, which original data set do I go into. And that was quite tough with regard to the challenge to handle just because I was the only person on this case doing the data analytics and Advanced Analytics stuff. Yeah, that was quite challenging. And the way I overcame it is that I just used—like, I built in very, very quick additional tools for myself, very basic ones in Excel where I basically just had a list of all the data sets I had with the correspondent business owner from the client side, so who can I go back to if I had any questions or if the data is slightly off. And then I had a list of comments across fields and general comments like ‘Oh, this field is not reliable. Don’t use it,’ etc.
Kirill: Okay. And is that something that you continue doing on new projects now?
Artem: It depends. Sometimes it’s not. I generally don’t work with lots of data sets now that I’m at BCG. So if I have, let’s say, a supply chain optimisation case, of course you need data, you need things like transportation data to understand what the historical price rates are, just as an example. But usually, I have someone else do it for me and then I just use these calculations for my modelling. So I don’t usually work with large data sets, I just only use this technique or trick if I have very, very large data sets to work on.
Kirill: Okay. All right, that’s a good example of a challenge and maybe some of our listeners can learn from that, that you shouldn’t get lost in all the data sets that you have, so make sure to keep track of them from the very, very start. And now that we know a little bit more about exactly what you do and this new style of analytics, I’m sure that a lot of our listeners will find that this is a new kind of approach or a new field in analytics that they haven’t explored before, this simulation type of analytics, and Advanced Analytics. What would you suggest, what would you say the one most important thing is for somebody to look into to get into this field? Because not everybody has to go through the same pathway that you went through – data science through Deloitte and learning R programming and so on. This sounds like a field where you can get into even if you don’t have a passion for R programming or Python or SQL, that you could probably—just if you have that mindset, you could probably get into this field. What would you say is that one thing that people should focus on in order to break into the field of Advanced Analytics?
Artem: Mindset is very important. Again, you have to like this thing in order to start learning it and you incentivise yourself to learn it, otherwise you have no chance. And first of all, for example, the moment I saw this first simulation model, which was an animated supply chain with trains moving around, I literally loved it. I want to be doing this! I wanted to build something like that! And at that time, I thought about doing something like that in R, which was pretty much impossible, and then I learned about these other softwares that are available that can do these things and I started to learn that, which helped me a lot.
And then, to some particular things, so for example, if we take simulation modelling, you also need to know some programming because all of the software that I know, they are based on some kind of programming language. And the one that I use is based on Karel, for instance. You don’t need to know hardcore Karel, you don’t need to be a hardcore Karel programmer, but you need to know basics. That’s the minimum. Ideally, in the beginning, you need to have an intermediate level of programming in that language which a tool is based on.
Kirill: Okay, that’s a good one. And are there any open source tools or softwares or maybe even just websites where people who want to try their skills out in this field, they can go to or they can download these open and free tools just to get a feel for it, you know, like a playground? Can you suggest any tools that are free?
Artem: As far as I know this is very commercial, commercially savvy area, so all of the tools that I know about are commercial and they are not free, unfortunately. So the one that I use, for instance, is AnyLogic. They have a trial version which is available on their website which is free. They also have a so-called student or an educational version, which if you are a student at university, and you are writing a coursework which may require some simulation modelling, then they can provide it for free, I believe as well, which you can try. Then there is also a website. They have a website called runthemodel.com which is the repository of the models built in AnyLogic and it has models across all industries, whether it’s supply chain, whether it’s manufacturing, whether it’s finance. You can find lots and lots of different models there, and I highly encourage you, if it is something that you might want to look at, just go to this website and check different models that they have just to get a feel of what it is and whether it’s something that you may want to try or not.
And also, in order to do these kinds of things, you also need to have business acumen. No one wants this models per se. No one cares if you build a simulation model or optimisation model. And consulting actually is very, very tough on that. Like, 99 percent of consulting is all about delivering value for our clients. And in 99 percent of cases, this value is expressed in dollar terms. No one is interested in the simulation or an optimisation model per se. Companies are interested in how they can use these models, or the insights from the analysis, to generate more revenue or to reduce their costs. So that’s where my economic background and, more broadly, my knowledge of how businesses operate helps a lot. But then if you want to try something like that, you also need to have this business acumen. So, no one is interested in just the model. People are interested in what they can do with these models, how these models can be used to drive their profitability, increasing their revenue or decreasing their costs, so that’s where knowledge of how business operates helps.
Kirill: That’s awesome. So it’s very good advice. It’s very easy to get carried away doing the analytics and not actually thinking how the business is going to drive dollars. Because it might sound a bit cynical, it might sound a bit too money-focused and money-driven, but that’s the world we live in. We live in a capitalistic world and a lot of the time, or most of the time, people, especially businesses, are going to care about the dollar value. So it’s very important when you’re building a model to keep that in mind and, as you say, business acumen helps a lot.
And the other thing that you mentioned, the runthemodel.com, super excited about that. Everybody who’s listening to this, jump on your browsers and go to runthemodel.com and check out those AnyLogic models. I’m going to personally do that as well. I’m really curious, because I’ve seen some of those AnyLogics that you’ve created, Artem. Those were very powerful and even very exciting to look at. So I would love to see some more of that and understand how they work as well.
Artem: And just to add to my previous point, I have heard one saying once, which I quite like, and I would slightly paraphrase it. So imagine that we have a chart where an X-axis represents time, and the Y-axis represents level of granularity of your work, so how deep you go into the rabbit hole. Let’s say bottom of the chart being very, very granular level of detail, and top of the chart being C-suite level, so like CEO and CFO, etc. Did you imagine that? There may be different opinions on that matter, but I say that you start working as a data scientist, and especially in consulting. You start very high on the Y-axis. So you start with the big picture of the problem, and what are the business implications, and then you go very deep into the data, into the level of detail. You crunch the data, and you analyse it, you derive some insights. Then you go back to the high level with some preliminary insights or some results. You start to check these, validate it at the high level. You go down, you go back down to the number crunching and so on and so forth. So you’re almost never in the middle. A lot of the time you are spending cutting the trees in the bottom, if you like, but you also jump high to see the whole forest and mustn’t get lost in these trees. So that’s what I quite like in my area, this velocity. You need to go down and then you need to go up again. Pretty interesting.
Kirill: Yeah, that’s a great analogy. I’m just drawing it and yeah, it’s how it looks.
Artem: Of course, in industries it may be slightly different. Like, you won't go—if you have a boss, you won't bypass your boss and go directly to the CEO or CFO to present your findings. Unfortunately, you need to go to this middle level.
Kirill: Yeah, yeah, middleman. Thank you very much for that, Artem. I just have two last questions. So the first one is, where can our listeners find you, how can they follow you if they want to learn more about your career and maybe connect with you?
Artem: Probably LinkedIn would be the best option. So if they can find my name, it's the name in LinkedIn. I wouldn’t expect many people having my same name and surname as me popping up, so hopefully you will find me pretty quickly.
Kirill: That’s great. We’ll leave that in the show notes at SuperDataScience.com. You’ll be able to find the show notes for this show and we’ll have a link to Artem’s LinkedIn there. And one final question, what is the book? So we usually ask about the book, but in this case we had a specific request from Bo – big shout out to you, Bo in the U.S. – who is interested in learning more about statistics and he would like to know a book on statistics that could help him get into the field and develop some advanced knowledge. Can you recommend a book on statistics for our listeners, including Bo?
Artem: Can you give me—sorry Kirill, can you give me a bit more detail whether it’s like basic statistics or advanced statistics, whether it’s a particular technique that Bo is interested in? I have a few different options depending on what kind it is.
Kirill: So Bo, when we spoke with Bo, he said he was interested in more of an advanced level of statistics, so his problem was that his organisation uses a lot of—like, it presents findings to—I think he was working with Microsoft, actually, like as a consultant or something like that. And the findings that he presented weren’t—like, the company that he was working with didn’t like the results simply because he didn’t present them in a statistical enough fashion. They didn’t have distributions, he didn’t talk about standard deviations, so he just gave them like numbers and charts, but the company on the other end wanted some actual more deep statistical backing and to actually prove that these were statistically significant results. So something more on the advanced level of statistics.
Artem: In this case, I can probably suggest the book "Statistical Models" by David Freedman. It actually includes some basic stuff as well, but it’s a good overview of all statistical models, and in fact it’s one of the classic statistical books in a few universities including Berkeley, so I would recommend it. Then there are also lots of various books on different techniques. This area is becoming very advanced and even things like Random Forests, GLMs or boosted models, in fact they can be—they will have their own books just devoted to this one technique. So let’s say—if you’re interested in GLMs, then there is a very good book called "Categorical Data Analysis" by Agresti, I believe it’s pronounced, so have a look at that if you’re interested.
Kirill: Okay, fantastic! So that’s "Statistical Models" by David Freedman and "Categorical Data Analysis" by Agresti so I will definitely put those into the show notes. Do you have any final comments? Maybe other books or maybe something that you’d like to wish our listeners on their way into becoming data scientists as successful as yourself.
Artem: That’s a huge compliment from your side, Kirill! Thank you. Look, I think my last piece of advice will be just don’t get lost in the tricks because data science is a very handy area just in terms of number crunching, and you can easily get lost in the data, in the numbers, etc. And I saw many people do that. But just remember somewhere in your head that you’re all doing this just because there is a business problem required to do this. What people and what company executives are interested in is to how they can use your work and the results of your work to improve their balance sheet or to improve their profit and loss. And once you understand that, once you will be able to understand what the business problems are, and even identify business problems yourself, so identifying the areas, just being proactive and identifying areas where you can add value as a data scientist. Because very often people don’t know how they can use data science to improve their current operations. Actually, it’s part of your job to tell them, ‘Look, we can do this or something, for example, on customer segmentation which can allow us to do this, this and this. This will allow us to improve our market income plan, etc.’ Just be proactive, think about the business problems that you can use your skills to solve, and proactively engage with the business stakeholders to use your analysis to solve these problems.
Kirill: Fantastic! Thank you very much. So guys, advice is basically, to sum it up, keep the endgame in mind. Remember, always remember, why you’re doing what you’re doing. Thank you very much, Artem. I really appreciate you taking this time out of your busy schedule to share your knowledge and insights. This was a fantastic catch-up. I’m very excited about this, and I’m sure a lot of our listeners will learn so much from what you’ve shared. Thank you so much.
(background music plays)
Artem: Thanks, Kirill. It was a pleasure for me.
Kirill: And there you have it. I am still so excited about this episode. I hope you derived so much value from here. Personally, I learned a lot. Personally, for me, the most mind-blowing thing was the whole concept of Advanced Analytics and how it’s different to data science, and that you don’t really need to develop those data science skills if you want to get into advanced analytics. Yes, you will need to know modelling. Yes, you’ll need to know a bit of stats. But ultimately, you don’t have to go the same pathway that Artem did. You don’t need to first study R programming, and then do data science for two years, and then only discover Advanced Analytics for yourself.
The website that Artem recommended, runthemodel.com, if you check it out, so I had a quick look, but if you check it out, you will see these models there that other people have built. So you’ll see examples, and maybe that will inspire you to research this type, or this field of – I won’t even call it data science because it’s not data science – this field of analytics that is completely different. And maybe you will be so interested in it that you will decide to build your career around that. So I highly encourage you to check out that website, maybe get a trial for AnyLogic. And at the end of the day it’s just a good thing to know that this part of analytics exists.
And it’s interesting how we previously had the episode with Dmitry Korneev, which was episode number 5, and there we learned about data science and forensics and fraud investigation. Here we are also learning about a whole new field, which is Advanced Analytics and Artem was kind enough to take some time out of his day and show us a glimpse from this field. And if you found it interesting, then I highly encourage you to research it further and see if you like it. And maybe this is something that you will decide to some how include in your career.
And as always you can get the show notes at www.superdatascience.com/7, so just a number 7. There you’ll find the transcript for this episode, you’ll be able to subscribe on iTunes and Stitcher. Also, at the bottom, leave us a comment. Let Artem and I know how you felt about this episode, what new things you learned about this episode. And also you’ll find a link to Artem’s LinkedIn. Make sure to hit him up, show him some love and connect with him, follow his career. I’m sure he’s going to be up to some extraordinary things in the coming years. I look forward to seeing you next time. Until then, happy analysing.