Kirill: This is episode number 15, with Data Science Enthusiast Paul Brown.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Hey guys, welcome to the SuperDataScience podcast. I’m super excited to have you on board and today we’ve got one of my students, Paul Brown, joining me for the podcast. We had a very interesting chat and one thing you should know about Paul is that he is a self-learned data scientist. Paul actually completed a degree in accounting and while he was completing the degree in accounting, he was also doing some work on the side and he got into the space of data, and he actually decided to continue pursuing that path. And so he learned everything from scratch. As you can imagine, they don’t teach you R programming or Python in accounting, and yet Paul has picked up R, has picked Python, has picked up Tableau, and lots of other tools that we use in data science, and he’s done all that in his own free time.
In fact, Paul is that type of person who brings new tools into his organisation. So for instance, there were some challenges at work in terms of data, and Paul decided to tackle those using R programming. Then at the same time, Paul is studying Python programming for his own side projects at home. And moreover, in this podcast, he will share exactly what they are, and you will learn how Paul is using data science to make additional money on the side. I thought that was a very ingenious idea that Paul had, and which he is pursuing right now.
Also, we talked about some very interesting books and blogs. We will mention a couple of books that we are reading and what we’ve learned from them, and a very interesting blog which you can follow along.
So all in all, this is going to be a very exciting episode, especially if you are in that stage of being unsure about how to proceed with learning data science and how to pick it up on your own, and where you should start and whether or not you need these skills.
Paul’s a very inspirational guy. Can’t wait for you to listen to this episode, and without further ado, I bring to you Paul Brown.
(background music plays)
Hey everybody, welcome to the SuperDataScience podcast. Today I have one of my students, Paul Brown, with me on the show today. Hi Paul, welcome to the show. How are you?
Paul: Hi Kirill. I’m doing great. Thanks for inviting me on.
Kirill: Awesome, and can you tell us a little bit about where you’re from. You’re in America, which part are you in?
Paul: I’m in America, I’m in Utah. I’m in a city called Ogden, about half an hour north of Salt Lake City.
Kirill: Ok, fantastic. And what brings you today to the show? Can you tell us how you found out about the data science courses I teach and how you got into data science?
Paul: Yeah, I was working in Excel, and I got to a point where I wanted more. And working online, I looked and saw Tableau as a data visual software, and looked for some ways to learn some Tableau, found Udemy, or some people call it Udemy, I’m not sure people understand or know that most of the students say Udemy or Udemy. But I saw that you were a highly rated instructor on Udemy and took your courses. And I interacted with you a couple of times, I sent you some feedback, sent you some emails, and been listening to the podcast since podcast 1. And now we’re here!
Kirill: Yeah, yeah, exactly. And it’s interesting how you say Udemy vs. Udemy. I also don’t get it. Like it’s so hard to understand what is the correct way to pronounce it. When I go to the US, people say U-deh-my. When I’m outside, it’s Udemy. And they can’t grasp what is the most common way to say it.
Paul: Udemy actually sent me a survey, and one of the questions was “Do you have trouble pronouncing our name to other people?” They asked me that in a survey.
Kirill: Wow, that’s pretty cool, that’s pretty cool. I wonder what their position’s going to be, what they’re going to choose. Yeah, so exactly, it’s right. I figured that you had some interesting feedback, some interesting comments, and then I checked out your LinkedIn, and by the way, guys, Paul’s got a very impressive LinkedIn and we’ll get to that in a few minutes, about all of his experience. And so I figured you got some interesting stories to tell and things to share. And before we get into your current experience and how you use data science now, can you tell our listeners a little bit about yourself, and me as well, because we haven’t communicated that much, so I don’t know your background. What did you study and what did you learn back in the day before you actually got into data science?
Paul: Right, right. I think my path is similar to a lot of your listeners and other guests where I didn’t really know what I wanted to do in college. An uncle said to me in like the mid-90s, when I was much younger than I am now—I liked math and business and he said that I should be an accountant, and so I got my degree in accounting. But if you think about that now, Kirill, think about if someone asked you “What should I do for my career?” and you asked them “What do you like?” and they said “Math and business,” there’s just so many career paths nowadays to go down. Back then, accounting seemed normal and that’s what I did.
However, while I was working, I went to school full-time and worked full-time, and at work I started getting into the data, becoming the go-to guy for data, and after I got my degree in accounting, I decided to keep going in data and learned Excel basically from scratch and then I started getting into R and Python, and we could talk a little bit about some of the stuff I’m doing there. I worked for a company called MarketStar, which is like a sales and marketing and business process agency, like an outsourcing company. So I’ve had the chance to work with a lot of different companies that you probably heard of. Like Google, for example, is one of the companies. I’ve worked with HPE, which is Hewlett Packard Enterprise now, so I’m doing data with them now. That’s kind of my background. I started in accounting and now I’m doing data science stuff.
Kirill: That’s really cool, because I can really relate to that. I also started in accounting. I actually did a degree in accounting as well. My Masters is in Accounting, and then after that I moved on to Deloitte. And for me, it somehow randomly happened that they put me in the forensics department, like they mixed up my application or something, and there they said “You’d be good at data stuff.” And since then, I’ve never looked back. And you’re totally right. Like, if you ask somebody now, if somebody tells you “I’m good at programming and math. What should I do?” there’s so many options. But back in the day, accounting was one of those options that is very popular and sounds like a secure job and sounds like a stable type of thing, and it kind of like just stands to say how quickly our world is evolving and how in the next 10 years, the top 10 professions that are going to be dominating the market or the world in the next 10 years, we don’t even know what they are. They probably don’t even exist at this stage.
Paul: Yeah. I like something you mentioned about Deloitte. They said you’re going to do forensics and that’s what you liked and you started doing that. It reminds me of my background as well. You know, I had managers that said “Hey, Paul, you’re good at data? Can you do this for me?” Or, “I have noticed that you’re doing this and you should do more of it.” I think that managers and leaders have an important role in helping tease out people’s abilities and their skills. I can say some of my mentors have been those managers that have really said “Hey, I see you’re good at this. You should do more of it,” and that’s really helped me out in my career.
Kirill: Yeah, totally. And we actually have a number of people listening to this podcast who are in that position, who are in a managerial or executive position, and I know this because we have run surveys asking people what they want from data science and what kind of work they do. And to those guys listening to this who are in such positions, it’s really great advice. You need to tease out the skills and talent of people. It is your responsibility to put people on the right track. It’s also, of course, a person’s responsibility but you have that extra visibility of what they’re doing and that opportunity to help them and guide them. And just following on from your accounting, Paul, are there any skills that you developed in your accounting degree that you’d say are helping you in your day-to-day role while you’re applying data science?
Paul: I’d say with accounting—and I kind of went towards tax accounting—that there’s so many rules and laws and regulations that you have to be aware of in order to make a good judgment call because a lot of my professors would come to us with a certain question, some scenario, and the answer would always be “It depends.” Right? It depends on so many different factors. So that kind of stuck with me throughout the years as I’ve grown in my analytics roles, not just taking someone’s opinion or comment to heart and saying “That’s the way it is. Let’s run with it.” I kind of, in a good way, second-guess and double-check to say “Okay. That makes sense,” or “Have you thought about this information?” I think accounting kind of forces you to think of a lot of different options, a lot of different paths to an optimal end goal.
Kirill: Very interesting, and not the first thing that comes to mind when you think of accounting, because for me, or for somebody else talking about accounting, it’s kind of like strict rules and laws, you know, left, right and that’s it, but in reality—you’re right—often there’s options to debate or try to see if there’s a different approach to solving the same problem.
Paul: Right.
Kirill: All right. Cool. So tell us a little bit more about what you do now. So you mentioned you work in a company called MarketStar and you’ve done some work with Google and with Hewlett Packard Enterprise. Those are huge companies. What services do you guys offer them and what is specifically your role in those services?
Paul: The basic service that MarketStar provides is sales and marketing for companies. So if you had a product or a service or wanted to generate leads for your business, you might hire MarketStar to do that for you instead of building out your own sales and marketing department. So we’ve leveraged that where we can do it better because we have lots more experience, it’ll be cheaper etc. So I worked with Google on their Google retail program and they sell tablets – Chromebooks is a big Google product that they sell, so we have a bunch of sales representatives going out to retail stores, a lot of the electronics stores, to sell these Google products. Chromecast is another one. Amazing product, by the way.
And my job was to collect information that those sales reps were experiencing while they were in store. So, for example, they interact with customers, ask customers about reasons why they liked or did not like a product. They’d submit information about how many sales they had and lots of data collected through a Customer Relationship Management tool, a CRM, like Salesforce, and my role was to collect that data and provide reporting and provide feedback and insights to Google on how their program was running, what we could do to improve that program through promotions, or where we think that we should add more people because there’s a lot of foot traffic in certain stores, things like that. I moved over to the HPE, Hewlett Packard Enterprise, about a year ago. It’s kind of a big jump. I went from the retail side to a B2B type environment where we have partners or vendors that sell HPE products. And my role there – we have a sales team – is to provide reporting for them, for multiple teams that sell those products.
Kirill: Okay. That’s very interesting. I liked how you mentioned these companies—you’d assume Google, with their funding and their size, they could do anything. But they still choose to outsource some of these tasks to more experienced companies and with the right people and tools. Speaking of tools, what kind of tools do you use in your day-to-day role?
Paul: So day-to-day, I’m probably working mostly in Excel and Microsoft SQL Server, and then every week or two, I’m in R and Tableau. I’d say Tableau, I’m in there every other day. Love Tableau!
Kirill: Yeah, Tableau is pretty popular and a very powerful tool. Okay, so Excel, SQL, R, Tableau – that’s a huge suite of tools. Was it challenging to get up to speed with all of them? How did you go about learning all these tools?
Paul: Excel kind of came naturally. I did take a course at university to learn that, but as many people do, I did a lot of Google searches, YouTube videos. I just learned some things and I applied it to the next task. And then in the next task, I learned some new things, and just kind of built my skill there.
At MarketStar we had some people train us on SQL and we went from kind of a situation where we had individuals just doing Excel and reporting, and then other individuals doing SQL. And they’ve recently tried to get everyone trained up on Excel and SQL, so that’s where I’ve learned that, and then R has been your courses on Udemy.
Kirill: Oh, thank you.
Paul: Kirill training in R.
Kirill: Wonderful. And Tableau?
Paul: You too. Tableau as well.
Kirill: That’s really cool. So did you need to pick them up rapidly? Or was it something like a gradual transition that your company introduced to them, or when you came into the company they were already using R and Tableau?
Paul: They actually don’t use R and Tableau at all right now. I put that together on my own and I presented it to management and they’re on board and we’re kind of in talks of adopting Tableau, which would be amazing. We do have an advanced analytics department that uses R and some other – GMDH, and two other advanced analytics tools, like Gretl. I think that you’ve used Gretl before?
Kirill: Yeah.
Paul: We’re not really keen on R and Python so much. We’re kind of just in the beginning stages of introducing that in our company.
Kirill: Very interesting. So you, not even being in an executive position, like doing the actual work, you decided for yourself that R and Tableau are the tools that you need, and they will make your work easier, and you just introduced them, started using them and talked to the company about adopting them. Is that correct?
Paul: Yeah.
Kirill: That is so cool.
Paul: Yeah, I definitely think that a lot of us analysts and data people are all about efficiency and optimization, and I’m in that boat as well. And I find myself in Excel hell a lot. I shake my head thinking man, Excel can’t do this, or it’s taking so long to do that, or I need this large dataset to put in and Excel is hard, so what’s next? Okay, SQL. What’s after that? R. So I’ve started introducing that and I presented a model I put together in R to the advanced analytics team and they are going to take it up and use it as a product that we sell to other companies. So it’s gone well, yeah.
Kirill: Fantastic. Well, tell me this then. I also had this situation once when I was working in a company and they didn’t have Tableau. No, they did have Tableau. They didn’t have R, they didn’t have any of these other tools, and I wanted to introduce R. So I said to them “Hey, guys. This problem will be easier solved in R. We can do this with R. We can do that.” Some of them were excited, but the main pushback was the fact that it has to go through a lot of compliance reviews and especially that R is a freeware, or free software, so you don’t have to pay for it. From my perspective it’s a good thing but from a large company’s perspective, it’s actually a dangerous thing, meaning that there’s no support and nobody will take liability for anything that goes wrong with the tool. So did you ever encounter this kind of adverse reaction or pushback from your company when you were talking about introducing these tools and explaining to them that you have already started using them in your day-to-day role?
Paul: Oh, yeah, definitely. Luckily not from upper management, like the Executive Vice Presidents on board to introduce some of these tools, but some of the people I work with day-to-day—it would be a heated conversation about like, “You can’t learn about R. Because what if you leave and we have all your reports you’ve done in R and no one knows R?” You know, that’s a huge deal. Like, “No, don’t learn R. You’ve got to stay in Excel.” So you’ve got to overcome those people, right Kirill? You’ve got to keep going. So talking to upper management about if they want to expand their products that they sell to other companies, advanced analytics, they’re going to need to get more advanced tools. You can’t do the advanced analytics, the predictive analytics that we want to do, without these advanced tools and your people need to be trained on them. So they’ve bought in, and I wouldn’t say that there’s a huge pushback, but just from a few certain individuals, it’s hard to overcome change.
Kirill: Yeah, totally. And at the same time I think they have valid comments. I can’t completely disagree with somebody who tells me, or tells you in this case, that “Hey, Paul, it’s really cool you’re learning R, and you’re applying it, but what if you leave? Then nobody here will be able to understand it.” So on one hand, they can’t just tell you not to learn it, but on the other hand, you guys have to work together. You can’t just be the only person learning and applying R in the company and then everybody else doesn’t know what’s going on. So I feel like some of the comments are valid, it’s just the way they go from there is probably not always the best situation, when people say “Okay, you just can’t learn it,” and that’s it. That’s the red flag.
And in that sense I really appreciate you sharing how you got the executives on board and I feel that for me it also has been kind of the same scenario. When I want to introduce something, when I learn something new, you get a lot of pushback from the people. You might get it, not always, but you might get a lot of pushback from people right above you, and then people above them.
But if you circumvent that and, you know, bump into the CEO in the corridor or in the elevator, and you explain to him how much value this will bring at the end of the day, because machine learning and algorithms and visualization techniques and all of this advanced analytics stuff, if you convey that to the people at the very top, then that changes the whole playing field. As soon as they’re on board, everybody else all of a sudden gets on board.
Paul: Right. Definitely.
Kirill: Yeah. All right, cool. Speaking of R and Tableau, how come you haven’t mentioned Python? Because when we talked before, you said that you’re also learning Python and you’ve used it before. You don’t use Python at work. But you use it for other things. Is that correct?
Paul: That’s right. I don’t do Python per se at work, but I do have like a side hustle, a side business where I buy and sell things online and—
Kirill: Can you tell us more about that? When you told me about that story, I think that was the coolest thing ever. Guys, this is a snippet of how to use data science to make money on the side. Get your pens and pencils and papers out and start writing notes. This is going to be awesome.
Paul: Thanks, Kirill. So I use a tactic called retail arbitrage which is pretty common now, and I think in the future with AI, Artificial Intelligence, this will go away, but for now I buy things on a website called Kohl’s, it’s a department store here in the United States, and I resell those products on Amazon. So I found myself constantly checking the Kohl’s website for the price of the product and then “Okay, how much does that product sell for on Amazon? How much will I make one at a time?” And I thought “Man, this is taking way too long, like hours of my night. This isn’t going to work.” I’ve been reading up on Python and web scraping. So I used Python. I built a very rudimentary script in Python to pull some of that data from the web, from Kohl’s, using a library called Scrapy and BeautifulSoup in Python to pull that data and to kind of get a list together to see what the best prices were, the biggest discounts were on those products, so I just go to those products now and I just buy them up. And I don’t have to go one by one from my Kohl’s screen to the Amazon screen. It’s just made things a lot more efficient and made me more money.
Kirill: That is fantastic! So that is a worthwhile exercise, is what you’re telling us. That the return on your time invested is worth it.
Paul: Oh, totally. And there’s so many people at work that I feel that have the skills to do this kind of stuff but they’re not thinking about it. It’s like, oh man! You can connect, be the entrepreneurial mind with engineers and just create businesses. That’s what most of the start-ups do nowadays. But there’s so much out there, like I’ve just got to connect people that have the skillset to help grow business.
Kirill: It’s so interesting that you mention that. This whole exact same model—there’s an entrepreneur in the U.S., I don’t know if you’ve heard of her, her name is Sophia Amoruso. Have you heard of her?
Paul: I think so.
Kirill: Yeah, she’s actually very big now. They’re filming a TV show for Netflix about her life and I think that South African actress—she’s not going to be acting in it but she’s going to be producing it—but anyway, the point is Sophia Amoruso started out the same exact way, about like 12-15 years ago—or maybe 10, I don’t know. She was buying stuff, I don’t know where, like in shops somewhere, like physical shops, or maybe online stores as well. And then she was reselling them on eBay at a higher price. If you want to learn more about Sophia Amoruso just search for the #bossgirl. She’s got this whole brand, BossGirl. From there, from starting doing this thrift shopping and selling it over, now she’s got like a $300 million empire with hundreds of employees. She’s huge, she does public speaking. I actually saw her speaking publicly or doing a Q and A session at an event when I was in San Francisco a couple of months ago. Yeah, it’s just fascinating where you can get to. You telling this story just reminded me of her. So I’ve got a feeling, Paul, that you’re going to get a lot of LinkedIn requests after this podcast, people wanting to know more about this. And you never know, maybe you’ll find somebody to team up with, and build a venture that will be doing this automatically.
Paul: That’d be awesome. Yeah.
Kirill: So guys, after this podcast, hit up Paul on LinkedIn and torture him with questions. Okay, cool. So you’re a very kind of like a self-learning type of person, right? So you learn R for work, you picked up some Tableau, and then you also introduced it for work, and then you decided “Why don’t I pick up Python because I want to solve this problem on my side thing where I can automate a lot of stuff.” Can you tell me what pushes you forward to keep learning these data science tools? This is not normal, as you can imagine. People just don’t decide to learn Python—wake up and decide to learn Python. It’s quite a complex language. It takes time, especially if you’re already learning R and Tableau. What keeps you going? What pushes you forward? What motivates you?
Paul: I definitely have like an internal fire that drives me to learn more. It makes me happy to learn more. Talking to some of my co-workers, we’ve asked ourselves this question, like “What’s next? How come we’re not learning more?” or “What if we lost our jobs? Could we get a new job? What’s the career climate out there for someone that does reporting in Excel and a little bit of Tableau like me?” I always feel like there’s just more to learn and like—a lot of your guests have mentioned that the pull to data science is in the ether, it’s just out there. Data science is calling me.
Kirill: I love it.
Paul: It’s weird like that. It’s something that I’ve been drawn to. And I feel like the trajectory of my career—if I’m not learning, if I’m going backwards. That’s what’s kept me going.
Kirill: That’s awesome. I love how you say data science is calling you. I can feel the same. I have the same feeling and I’ve had it for some time now. But what would you recommend to people who don’t have that feeling, who are curious about data science, who want to know more, who’ve done their research, who know that this is probably where it’s going, but at the same time it seems like a huge forest of all these unknown things: R, Python, SQL, machine learning, Hadoop and so on. There’s so much to grasp. What would you recommend for them? What steps to take to develop this feeling that data science is calling them to become as passionate about learning data science as you are currently?
Paul: Right. If you’re that person, one book and I’ll just say this is the book I’d recommend for you, Kirill, is “The Obstacle is the Way”. It’s on stoic philosophy, so it’s not a data science book, or a data book, for example. But it basically talks about that the obstacle in the path is the path. It becomes the path. And so if you want to be in data science, the obstacle is learning these languages, Python, R, Hadoop – these things that you read about. It’s easy to type “data science” into Google or “What should I learn? R or Python?” and read up about it. And I found myself being that person for like months, like “Oh, which one should I start? Which one’s going to give me the most value add for my career?” But it didn’t really change until I just started doing it. There’s so many resources out there – Coursera, Udemy – that instructors like yourself are willing to share valuable skills and knowledge with us. And yeah, it’s hard, but the obstacle in the path becomes the path. And becomes the path becomes the path. It never changes. You keep going. And that’s what’s motivated me. When I feel down and I’m like “I can’t. I don’t want to learn R. This script is really hard. What’s a vector? What’s a matrix?” I think like “I’ve got to do it, and there’s people out there that are doing it, that if I don’t do it, they will pass me up.” So that’s what’s kind of driven me.
Kirill: That is so cool. I love that metaphor and the whole explanation. And you’re right. There’s always people catching up to you, on your toes, and they’re coming close if you look in the rear view mirror. And this book “The Obstacle is the Way,” I haven’t read it myself but that’s a book by Ryan Holiday, right?
Paul: That’s right.
Kirill: It’s funny because actually right now on my Kindle, I’m reading a book called “Ego is the Enemy”, which is the second book by Ryan Holiday, the next step in that. And so for those of you who haven’t heard of Ryan Holiday, he’s a pretty cool author and he’s got these books where he’s very honest about life and about this stoic philosophy, and he’s actually got the names of his books tattooed on his arms. On the left one I think he’s got ‘The Obstacle is the Way,” on the right one, “Ego is the Enemy”.
So the one I’m reading is really eye-opening. After your comments on “The Obstacle is the Way”, I’m definitely going to pick that up and check it out, because that is such a mind-opening or eye-opening philosophy, that these obstacles that seem to be in your way of getting to a successful career in data science, learning R, mastering Python, all these obstacles – if you think of them as obstacles, then you’ll probably never get to the end. But if you actually acknowledge and accept that these obstacles are the way itself, this is the path – it’s not an obstacle, that is the path, you’ve been misled, your brain is misleading you to thinking that these are obstacles, this is actually the path itself. As soon as you grasp that, I can totally see how every challenge that comes your way becomes exciting rather than becomes a problem that you have to solve.
Paul: Yeah, I always laugh when I see the title of some of the trainings that say “Learn all of R in one training,” and I look and it’s like 6 hours long and I’m like “Oh, that’s not right.” It’s going to take a long time. “Learn R in 10 lessons,” maybe. It’s difficult. There’s no easy way around it. You can’t take the quick road to become a data scientist, there’s no quick road, as I’m sure your listeners and your guests that you’ve had in the past—you just have to do it. And you have to put in the work.
Kirill: Exactly. And I don’t even want to say it now, I was going to say “Bite the bullet and do it.” But it’s not about biting the bullet, right? If you think of it as an obstacle, it’s about biting the bullet. If you think of it as the path, you just start walking! And you just start going, and every single day you learn something new, and it should be exciting. If it’s not exciting, then you’re thinking about it the wrong way, right?
Paul: Yeah. You shouldn’t dread it, it should be giving you value. There’s an intrinsic reward in the process of learning, and I’ve learned that a lot. There are hard parts, yeah, but you need to learn to love the process of learning, and once I kind of figured that out, it’s helped me just tremendously.
Kirill: I totally agree with that. We chatted a little bit before, and you sent me a couple of things that you’d like to talk about and one of them was “Communication is king.” Can you elaborate a bit more on that? I’m assuming that is about how not only the data science tools are important, but also communicating the insights. Can you tell us a bit more why you think communication is king?
Paul: Yeah. I did an internship in Chicago a few years back over a summer, and at the end we had a big presentation where we talked about what we learned. And I think the biggest takeaway was literally “If you have a question, pick up the phone and call that person, or go to their desk and talk to them.” Just that. Basically, if you want something done it’s not that hard just to pick up the phone and call someone, or Skype them, or go to their desk if you’re on site. It gets things done so much faster. I think so many people—I find myself doing this sometimes—they send e-mails and they wait for a response. Days go by – no response. Then you send up a follow-up e-mail five days later. Days go by. And then maybe your project is done. It goes away and no one knows what you did and all they think about is “Paul started some project but it never got finished.”
So if you want to execute a project, a task, some kind of project that you’re working on, it’s communication. I think that’s what really separates myself and a lot of the data scientists as they describe their roles from maybe the IT guys and the sales people, is they’re able to communicate the complex findings derived from data mining and whatnot to the stakeholders, the executives, the managers that need that information. They’re able to communicate that information in a way that they can understand it. Communication is so important to me that I’ve looked into it and started taking like a public speaking course. Things like that, where it’s like “I need to be a better communicator. What do can I do to be a better communicator?” Because I know that that’s a really big piece of the data scientist’s toolkit, so to speak.
Kirill: Okay. So just to clarify this, there’s almost like two areas that you mentioned just now. First of all, if you want to get stuff done, go talk to people, right? Sending an e-mail is good, but if there’s no reply or whatever, you just go and you talk to people, pick up the phone. And a lot of the time, talking to people you get a lot more sorted out.
And the second item was that actually presenting and being able to present findings that you derive from your analytics is what separates a data scientist from a person who can just crunch the numbers, right? And that’s why you’re taking this public speaking course or thinking about improving your speaking skills, correct?
Paul: Right. It’s definitely something that I don’t think a lot of people take as super important when they come out of school, but I think that really makes you stand out from those that might be super good programmers and coders and can do really anything, but they’re unable to talk to the right people that make the main decisions for the business to actually be valuable. You know, they might get lost. If they’re not able to communicate those things, it’s basically useless. So that’s a huge skill to have and I definitely could use a lot more of it.
Kirill: Yeah. It’s a skill that you can always keep improving. You can never master it because when you go on stage, for example, you’re always going to be a bit nervous. But with time, you’ll get less and less nervous.
All right, that’s very interesting. We ventured into an interesting area now about how to approach learning data science and some of the tools that you use and also speaking. And now that we know a bit more about Paul – I think our listeners by now can relate to you in many ways – can you tell us a bit more about your work? So what is a day-to-day operation or how does your day look like? When you get into the office and you open up R or Python, what kind of analytics do you perform? What kind of insights do you drive and how do you go about communicating them?
Paul: Yeah, I think that I can best describe that with a recent project that I’ve been working on.
Kirill: That would be great.
Paul: So we’re working on a vendor partner selection tool for the HPE program that I’m working with, so these are companies that sell HPE products and there’s about 10,000+ of them in the United States. So we have a sales team that basically is geo-demographically segmented—they select partners in those territories. And before my role, there was really no formal selection process. They were given a list of the partners they could select and just pick them randomly. Obviously not randomly, but the ones that they felt were selling a lot or they had good relationships with. There was really no data-driven insights on which partners they should pick. So in the last few weeks I’ve created a model to help determine and tier some of those partners and help the sales team pick the right ones based on a few metrics that we came to an agreement on that would be part of their selection process.
Kirill: Okay. And what tools did you use for that?
Paul: We used Excel at first and then I transformed it into R. We used a model called an RFMVT. That stands for Recency, Frequency, Monetary, Variability and Trend. We also added in a couple of other factors. It’s basically when you talk to your stakeholders or the sales team, what are the factors that go into deciding which partner to select. And we were kind of [inaudible] all down and decided what weighting we wanted to give to each of those, etc.
So in R we basically created a script to tier those partners – tier A,B,C and D – by some scoring and recommendations on which if we had 30 sales reps in these 30 territories, which partners should they choose and why, and giving that to the managers to utilize in their decision making.
Kirill: Okay. And so how often do you run this script? Is it something that you’ve automated, or do you run it on a weekly basis or a monthly basis?
Paul: So they do a selection process every year. So it’s just a once a year tool really. So one thing that kind of got me excited are plans on how to improve it next year and then also look back maybe a year and analyse what would have happened if we had used this model last year to help us understand does the model work, did the partners that we select perform better than we anticipated or worse than we anticipated? And putting that all into R and just hitting “Go” has been phenomenal. Actually, I put it together in Excel. It took me like 5 Excel workbooks and it was just a mess, as you can imagine.
Kirill: Yeah, I can imagine. And did it end up faster to execute or faster to put together in R?
Paul: Oh, yeah. Literally, if we get new sales data, I can refresh it and send out the new list with the tiers in less than half an hour.
Kirill: Okay. That’s very good. I had this question now pop up in my head. You being one of the few, if not the only person in your company that actually knows R and uses R, how do you go about QA-ing your work? How do you go about checking that everything is right, that there’s no errors in your scripts, that the results are being popped out there correct?
Paul: Luckily there is someone else that uses R in the company. I guess there’s two checks. One, I built it all in Excel, so I was able to check myself as I was building it in R. And there’s two people on my team and we both work on the same model, so there’s some two step quality controls. But the other person in my company that knows R – I’m working with him to have them introduce this product to not just my team but the other teams, the clients, vendors that we work with, and providing some kind of RFMVT model to them as well. So that person that knows R is part of the product team where we sell—MarketStar sells a product to a company like Google or Verizon or Oracle, a lot of these companies that we work with. They say, “Oh, we have this model that we can run, and this is the value add that it provides.” Luckily, I’ve got two people – one person who knows R and one that knows Excel really well, that we are able to do the checks.
Kirill: Okay. All right. That’s good to know. Would you say it’s important to have these checks? For our listeners, for the benefit of our listeners, that are creating models, or doing certain analytics, how important would you say it is to have your work QA’d and checked by a peer?
Paul: Obviously, the answer is hugely important for sure! It is because I’ve been caught a couple of times where it’s “Hey, I don’t know if this information is right. Can you go back and check it?” And you hate to hear that, right? You hate to hear “Oh, no, the data. I must have made a mistake.” So we do go through those processes and double-check. And I’ve got a counterpart that I work with. I’ll just give it to him raw basically, and just say “Hey, go at this from scratch and look for any inconsistencies.” And him and I go back and forth on that and work together closely. And so we would do that before we send it out live.
Kirill: Yeah, totally. I think everybody’s been through those moments. Me as well. I’ve had situations where people have pulled me up and said “Hey, Kirill, this is wrong. This is obviously wrong. Something is wrong in your code or whatever.” You definitely want to catch those before the results, or your product, whatever you’re trading, goes live and goes out into the public or it goes to your partners or to your customers. And if something’s wrong there, then that is just horrible. Everybody listening to this, QA is so, so, so important. I cannot stress how important it is, especially in a data science role which combines so many different tools, and also your creativity goes into the work, and also the data itself might have issues and things like that. There’s so many areas where something could go wrong, and if you’re not QA-ing your work, you have to start doing that as soon as possible because you don’t want to get to that point where something goes wrong and it goes all the way to the end, to the end user, or to your executives and then – even worse – they make a decision based on that!
Paul: Definitely. And this point you’re making even hit home harder when you had—I think it was the last episode, Yaw, episode 10.
Kirill: Oh, Yaw. On the podcast, yeah.
Paul: In the podcast he mentioned something like they hired someone to check the models. That was their job. Like, wow! Yeah, you definitely have to QA things that you build, and if you don’t have another person, how do you QA that? I thought about that recently since listening to your podcast.
Kirill: Yeah, totally. So now you just have to find a person who knows Python to QA your side stuff, side business.
Paul: Yeah.
Kirill: Tell me, what has been the biggest challenge for you as a data scientist?
Paul: Definitely, I wouldn’t call myself a data scientist, and that’s maybe another question, like when do you cross that threshold from analyst to data scientist? But I’d say the biggest challenge for sure is learning the techniques. R has been difficult, but having the training from you and other forms of education has been helpful. But I think it’s been weird because the company I work for, we’re at this point where everyone wants to be a data scientist, everyone wants to learn new stuff, and we’re just trying to figure out who should learn R, who should learn Python, and what’s the most important skill that a data scientist needs to know now, what’s the next step. And I think that’s been a struggle because a lot of us don’t necessarily have the PhD in Economics, or a Doctorate in Finance, things that some of your other guests have. And I feel totally underqualified, side note Kirill, to be on this podcast. But regardless, we want to learn more. And I think the challenge has been, what’s the next step? Yeah, I’ve learned some R, I’ve learned some Python, and there’s this obstacle, and how do we overcome it? But I guess that’s me fumbling through this answer to your question. Of how do we take a company where we’ve been basically Excel- and SQL-driven to the next level? Do we hire outside R programmers, or Python programmers, or do we like – I don’t know. We’re kind of at a loss with that right now.
Kirill: All right. It sounds like something you guys really have to figure out. But it’s a good problem to have, right? It’s a good problem to have when you have all these people who want to be data scientists, who want to learn all this stuff. It’s just a matter of structuring the process and, like you said, understanding what the most important skill is. On that, I wanted to ask you what do you think is the most important skill for a data scientist?
Paul: Since I’m pretty young in it, I’d say R has been super helpful. It’s been helpful to take those four or five workbook Excel files or projects and just condense them into one R script. That in itself has been revolutionary, and it frees me up to do more tasks. To do more advanced analytics, talk to more stakeholders, look at older reports that we’ve been sending out, how do we optimize those. I would imagine other people would say Python is more important. I think R has been helpful to kind of optimize those reports that we use – we use Excel and SQL now to generate reporting, but R getting the dataset together. It enabled us to be more analytically-minded and be more robust in our modelling to say “We can bring in 8 million rows of data.” I know it might be a small number for some people, but that’s a huge number for us in our company. We could never work with 8 million rows before! But now we can with R, so that’s going to be a big help for us in our company.
Kirill: All right. Fantastic! Thank you so much. What would you say is your one most favourite thing about being a data scientist?
Paul: I have an answer similar to a lot of your guests and maybe you yourself. It’s taking a complex puzzle, a difficult task, maybe an abstract thought—it would be interesting if such and such you hear in a meeting and you take it and run with it. And you collect the data, you clean the data, you tidy the data up, create the script for it, start exploring the data. And then, it sounds kind of juvenile, but like going to your boss or manager and saying “Hey, look what I found!” or “This is something that could change the business. This is going to be a huge deal,” and they say “Wow! This is amazing.” Working together with management to make big business-changing decisions is motivational to me.
Kirill: Yeah, totally. And that whole process of solving the puzzle – I can totally relate to that. It’s like reading a book and not knowing what the end is going to be like. That’s kind of the feeling you get. It’s not just motivational, it’s inspiring and it’s like there’s some sort of mystery to it, and you get all excited about it. I totally agree with you.
Paul: Yeah. And the harder the task is the more special that moment is when we find the answer to your question.
Kirill: Yeah, exactly. And from where you are at right now in your career, in your knowledge of the world and what you see around you, where do you think the field of data science is going? And what should our listeners, from your perspective, look into in order to prepare for the future of this field?
Paul: I’m not sure about data science directly other than we’ll need to continuously learn, and learn new languages as they come out. There’s going to be more and more, obviously. And just learning to learn. I think you talked about it on your podcast. Just being good at that, trying to be a good learner. But I think a few things that got me excited for the future is — do you read waitbutwhy.com?
Kirill: Yeah, yeah.
Paul: A lot of the stuff there has got me super excited, like the artificial intelligence, Elon Musk’s SpaceX, Mars, automated vehicles. I think data science will thrive in every field, not just those – everywhere. More and more people will begin to rely on it. Not a ton of universities offer a data science degree. I think more and more are getting on board, and that’s kind of the direction things are going.
And I think the Internet of Things is a huge deal, like Google Home, Amazon, Alexa, Siri. Data science will be a big part of everyone’s personal lives in their home. Kind of like Iron Man talking to Jarvis, right? I can see that happening, and the crazy thing is I don’t see it happening in 25 years. I see it happening in like 5 years!
Kirill: Yeah, totally. That’s where we’re headed. And I like that you mentioned the waitbutwhy blog. We’re definitely going to include that in the show notes. It’s one of my favourite blogs. I don’t read that many at all, but this one I do like and enjoy. There’s so many interesting things there. Like, he published an article, a blog, on the elections, like literally a few days ago, and he said “It’s going to be okay.” And that blog post got a million shares in one night. It’s that popular. And they’re all long, very long blog posts and there’s very deep, meaningful thinking in these blog posts. So I totally love it. We’ll put that in the show notes. And I’ve got a feeling that whatever we talk about, you just say “Oh, you’ve already had this on the podcast. You’ve already had this on the podcast.” I’ve got a feeling you listen to all of them. I’ve never asked this question before, can you tell me what value do you get out of listening to this podcast? I’d really like to know.
Paul: I’ve thought about this. I think one is you have interesting guests, you know, that’s key. The guests have such dynamic backgrounds. Some of them are your students; others were in academia and switched over to a data science role, interestingly a lot of physics majors, nanoscience, and things like that. It’s interesting. And I feel like even though I have an accounting degree, I too can be in that spot as well. It gives me motivation. It helps me know what you’re working on and what the latest tools are. I do like when people get nitty-gritty with what are the packages. Some of your favourite packages in R, your favourite libraries in Python. And I’ll go open and look those up and study those. It just kind of gives me another way to know what people are working on and how can I incorporate that into my work. That’s been helpful, so thank you.
Kirill: Thank you. I appreciate the comment. And speaking of that then, it’s a really good point you mentioned that you like getting into the nitty-gritty, or when the guests get into the nitty-gritty. I like it too. I really like talking to people. Even when you were describing your business challenge case study today, I was like oh, wow! That’s pretty cool. I never thought of doing something like that. I’ve never done something like that myself. That feels like I’m learning something, and I feel that a lot of our listeners do that as well.
Also, when you mentioned the different libraries and so on, I wanted to make a suggestion here that—you’ve already mentioned the book which you would recommend to our listeners, which is “The Obstacle is the Way” by Ryan Holiday, and we’ll include that in the show notes. So to finish up this podcast, instead of mentioning a book again could you instead mention the libraries and packages that you use in R and Python, so those of our listeners who are like you, who are interested in the nitty gritty, they can go and look them up?
Paul: Yeah. So I use reshape2 as a library. I use it to reshape data from like a long format to a wider format. Matrix stats – I use dplyr. I use that a lot to—basically the model that I described, I need to know the standard deviation of variability of some of the sales data. I use some of the same stuff that you taught me – cbind – through my R script.
Kirill: So these are all R for now you’ve mentioned?
Paul: These are all R. And then the Python, the few that I’ve used is pandas, kind of the data manipulation part, and Scrapy and Beautiful Soup are the three that I’ve used.
Kirill: Fantastic! I think that’s plenty for our listeners to go and check out. All right, thank you very much, Paul, for coming on the podcast. It’s been a pleasure to learn about you. And if our listeners would like to contact you or follow you and follow your career, what is the best way to find you?
Paul: I’m on LinkedIn – Paul Brown, but I guess they’ll probably find a lot of Paul Browns. I’m in Ogden, Utah. I have a website that I update every few weeks, it’s intrinsicallydeep.com. Those two locations are where I spend most of my time.
Kirill: Fantastic. We’ll include those in the show notes. So intrinsicallydeep.com and we’ll also include Paul’s LinkedIn, so go ahead there and make sure you stalk Paul and connect with him and ask him all about his ventures in Python, R, Tableau and more. Thank you so much, Paul. I really appreciate you coming on the show today.
Paul: Thanks so much, Kirill. You’re my hero.
Kirill: Thanks. You’re my hero too. Bye.
Paul: Bye.
Kirill: So there you have it. I hope you enjoyed today’s episode and picked up a lot of valuable insights and tips and tricks for your own career in data science. For me, perhaps the most valuable takeaway from this episode was how Paul uses data science in his own time to optimize and bring efficiency to his own side projects, so the way he uses Python to do web scraping and literally make money online using data science. So that was a very inspirational and also very interesting idea that he’s had and that just stands to show that data science opens paths for you, not only in your career that you’re pursuing in your workplace, which is great, which is fantastic, but also knowing data science can help you in your own side projects, can help you optimize your life, can help you do things that you otherwise wouldn’t be able to do as quickly and as efficiently.
So that was a very fun and exciting episode for me and I also hope you picked up quite a lot of stuff. If you’d like to get to the show notes and all the links from this episode and also follow Paul on LinkedIn then go to www.www.superdatascience.com/15 and there you’ll find everything related to this episode including all the links. And also if you’re on Twitter, then follow me on Twitter – my handle is @kirill_eremenko and that way you won’t miss any new episodes that come out and you’ll be one of the first people to know about them. I look forward to seeing you next time. Until then, happy analysing.