SDS 011: Learning resources, thinking like a Data Scientist and Data Exploration

Podcast Guest: Garth Zoller

November 18, 2016

Welcome to episode #011 of the SDS Podcast. Here we go!

Today’s guest is Aspiring Data Scientist Garth Zoller
Why and how does a right brain (creative) person end up in data science? Perhaps some of you find yourself in a similar situation, where you feel that you are a creative type, and not statistically-minded enough for a career in data science.
In today’s episode, you will hear Garth address exactly this situation. Tune in to hear all about his career in databases and how he taught himself statistics.
Garth will walk us through a fascinating case study showcasing his approach to thinking about very vague business problems, and you will discover how organisations build a culture of data science.
I can’t wait for you to get stuck in to this exciting episode!
In this episode you will learn:
  • Learning Resources (16:50) 
  • Learning Basic Stats (25:15) 
  • The Most Important Skill in Data Science (28:20) 
  • Case Study: Thinking Like a Data Scientist (31:35) 
  • Data Exploration (44:37) 
  • Building a Culture of Data Science (48:00) 
Items mentioned in this podcast:
Follow Garth
Episode transcript

Podcast Transcript

Kirill: This is episode number 11, with aspiring data scientist Garth Zoller.

(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week, we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Welcome to this episode of the SuperDataScience podcast. I’m very excited about this episode because I have a very interesting and inspirational guest for you. Today we’ve got Garth Zoller, who is a student of mine on SuperDataScience. And why I’m so excited about this episode is because Garth is just starting out into data science. Even though he’s got a background in databases, he’s only getting into the field of data science and he’s learning very rapidly, very quickly. He’s learning all of the things he needs to know about data science. And being at that start of the journey, he is sharing the methodologies, the approaches that he is using to get into data science.
So those of you who are also just starting out into data science, who maybe don’t know how to tackle this very broad and complex field, you will find this episode very, very valuable and also very inspirational. So in this episode, we talk about how a right brain person can get into data science and how they can leverage their skills in data science. And moreover, Garth uses a litmus test saying that if he, being a right brain person, can get into data science, then you can too. So you can already tell this is going to be an inspiring episode.
And here also, Garth shares the tools that he’s learning, how he’s learned a little bit about stats, how he’s learning about R programming, Tableau, Microsoft BI. So you’ll get like a sequence of tools that he’s been learning and that he’s been aspiring to master. Also, we’ll talk about the most important skill of data science, which in Garth’s view, is how to define a problem, how to think about a problem. And not only will we talk about that, Garth will actually give us a walkthrough of one of his case studies. So one of those projects that he’s currently doing at work, we’ll actually get to see how he thought about the problem and what approach he came up with to tackle that problem. That is going to be super valuable for those of you who want to know more about how to solve data science problems.
Also, we’ll talk about a culture of data science in an organisation, and how that is important for you to be able to learn those skills, and if that culture doesn’t exist, what steps you can take to prompt the organisation to start being more open to allowing you to learn more data science and bring more value into the organisation.
And also, we’ll talk about data exploration tools. So we’ll talk about using data visualisation tools for data exploration. And finally, we will talk about giving back to the community and how you can give back to the community and give back to those other people out there who also want to learn data science.
So all in all, it’s a very exciting episode. Can’t wait to get started. And without further ado, I bring to you Garth Zoller.
(background music plays)
Hello everybody, and welcome to this episode of the SuperDataScience podcast. Today I’ve got one of the top students of SuperDataScience, Garth Zoller, here on the show with me. Garth, thank you for joining me. How are you today?
Garth: I’m very well, Kirill. Thank you for having me.
Kirill: It’s great to hear you and so right now, I’m sitting in Brisbane, just so that our listeners can paint a picture. I’m sitting in Brisbane, and Garth, you’re in Durham, in North Carolina. Is that correct?
Garth: That is correct.
Kirill: How’s the weather there?
Garth: We’ve just exited our humid summer. So we’re getting into the best season ever, which is our nice, cool fall.
Kirill: Nice. Very nice. And it’s very early morning, isn’t it? Like 7 am or something? 
Garth: Yup.
Kirill: It’s really crazy how the time difference, right? In Brisbane right now, it’s already 9 pm on Thursday. And you’re only entering Thursday. It’s like 7 am.
Garth: And we’re both tired! We’re at opposite ends of the day. That’s right.
Kirill: Yeah, yeah, totally. But it’s near the end of the week, weekend coming up. So it’ll be fun. So for those of you who don’t know — probably most listeners won’t know this, but Garth is a very avid student on SuperDataScience, and recently we had a showcase of our new features, and Garth missed it, the actual presentation that I was running, and then he emailed me afterwards saying all the things that he thought about the presentation, giving some great suggestions of how we can improve the customer experience. I was so surprised at the feedback he had, and some great, really good suggestions on how we can really deliver great experiences for those who are learning data science. And so I jumped on a call right away with Garth to ask a bit more about what he thinks, and after seeing the passion, Garth, that you have for data science, I was sure that we have to have this podcast.
So I’m really excited about talking about specifically what you’re doing in this space, and how you’re getting into the data science space. Are you excited as well?
Garth: I’m super excited!
Kirill: Totally, it’s going to be fun! Alright, so. I’ve been looking at your LinkedIn, and you’ve worked at a company called NetApp for the past 14 years, with like a little gap in between. But 14 years at NetApp. So what exactly does NetApp do?
Garth: NetApp roughly is a data storage solutions company. And they focus on designing the hardware and software for managing large sets of data. They don’t store other people’s data, but those systems will efficiently and in a high-performance way manage data sets.
Kirill: So it’s not like a OneDrive, or a Google Drive, or a Dropbox. It’s not that type of service.
Garth: Correct. So companies would come to us and buy this hardware/software combination and then implement it on premise in their data infrastructure.
Kirill: Ok, so is that similar to what EMC do?
Garth: Conceptually. We like to think that NetApp is the one that we would mention first, but that’s ok.
Kirill: I gotcha, I gotcha. Alright. NetApp. Ok, and you have gone through multiple roles in NetApp. I’m just counting them on your LinkedIn profile. There’s like 5, 6, 7, 8 different roles in NetApp. So can you take us through that journey since you’ve started in the company back in 2000. How have you moved through the roles, and what has this meant for you and your professional development?
Garth: This has really been a surprise. So I’ll start with a general framework that I actually heard about in my Executive MBA program, and I wish I had known about it a long time ago. I might have made different choices. But it’s this concept of 2 years up or out. So you stay in a role for 2 years, and if you’re not being promoted, then find somewhere else to go, either in the company or to another company. And I’d never heard that before! So my model at NetApp has always been, and just even in my career in general, is just find interesting work to do. And yeah, if I go up, ok, fantastic. That’s not a bad goal. People have different levels of ambition and whatever in that respect, but I place a much greater value on doing work that I personally find meaningful and interesting and challenging. So that’s how my choices have gone at NetApp.
I came in with a background in databases to do technical support, third line technical support. And then at a time that the company was just starting to put databases on their storage. So there was a great need for that knowledge. And then I ended up managing that team; it was a small local team. And then that led to managing a larger global team in a different product line. And after that, I thought, well, let’s look at some other areas. I was interested in leaving the support context and expanding out into the business and strategic side. I wasn’t exactly sure how to do that at first, but program management seemed to be the way to go. And so I started getting involved in the — at that time, we had business units in the programming office, and was given a particular product to be the program manager for that area.
But my background in databases just kept rearing its head in terms of the kinds of metrics and analytics that the business unit GMs were asking for. And I had that knowledge. And so rather than spend hours and hours in Excel trying to accomplish something, I just whipped it into SQL Server and popped it out in a couple minutes! But that also became a real value add for that business area and then that led to other types of activities, from beyond basic “how many widgets did I sell?” to “hey, can you do forecasting?” Well, at that time, I had no idea how to do forecasting, but sure, I’ll give it a go, why not! And then you just start self motivating in what areas you want to expand into next.
Kirill: So you came into NetApp, and you had this database experience, you progressed through the different roles. And the question is, why data science? When did data science enter your professional career, and why did you choose to expand your knowledge and expertise into this field?
Garth: Program management was very interesting, and they’re great skills that I still use, even in data science today in terms of structuring the problem definition, and having a single source of truth to refer back to. But I couldn’t stop my interest and fascination in what was happening in the data. There’s something mysterious about data, like a puzzle, like a mystery. And I want to find out, what is it telling us? What can I do with it? And ultimately, how can I make better decisions based on it? And so that led to opportunities to go into roles that we call Business Analyst. You could equivalently call that Data Scientist, but we call it Business Analyst. And their primary objective is metrics, dashboard reporting, month end quarter, and that kind of thing. But there’s tons of ad hoc activities that come in all the time, of “hey, we’d like to put together a new kind of product or packaging solution of some sort. Can we look at historical data for like products and see how might we position for pricing? Or how might we position for volume, forecasting of the volume, things of that nature?” So I entered into that space and then kind of built a competency in that area.
Kirill: Yeah, and right now, you’re a Business Analyst in the marketing operations. So specifically, this is B2B, right? So you’re dealing with B2B marketing. Is that correct?
Garth: That’s correct. Both from a direct sales and a channel sales perspective.
Kirill: I can totally see how data science skills would be valuable there, because I personally did some work in customer experience and insights. But there was a lot of work that we did with the marketing team in data science. But it’s a bit different though, business to business marketing. What would you say are any specifics about business to business marketing and how you use data science in there?
Garth: On some level, there is a lot of similarity, surprisingly. And there are some differences. So for example, if you were doing some analysis with respect to margins and discounts, if you’re talking about the channel, you don’t have as much control there. They’re taking your raw parts and building the solution, and they’re going to control what kinds of margins and discounts they’re going after. You can provide guidance, but ultimately, they’ll make a choice.
But even in direct sales, I think I see actually more similarity. So for example, from a B2C perspective, ultimately you’re trying to satisfy the needs of your customers. That’s the same in B2B as well, and the methodologies you might employ from a data science perspective to do that can be similar. It doesn’t have to be as fancy as something like a conjoint analysis. But this idea of, you don’t want to just get your feature development roadmap because you talked to your top 2 customers. That’s insufficient. You’re going to miss it. So in that sense, I see a lot more similarities than dissimilarities. I don’t know if I’ve necessarily, other than the channel var perspective that I just mentioned, I don’t think I can remember something off the top of my head that sticks out as being, “this is a square B2B challenge that I’ve had to address.”
Kirill: That’s really cool. And to me, it means that wow, so my B2C experience, I can actually apply it into the B2B space. And I’m sure a lot of our listeners will be either that way or vice versa. So if you have B2B experience, you now know, from this, we can assume that it’s transferrable skills. And that’s great to know.
And I want to move on a little bit to the thing that fascinates me most about you, which is that when we were speaking, you mentioned all these different areas and ways that you’re improving your data science skill set. I don’t think I’ve met anybody who is so passionate about learning about data science, and just like you are getting as much knowledge as you can from all of these different sources that we spoke about. What drives you? What keeps you motivated to learn about data science in such huge strides?
Garth: Fear? No. Because at this point I have a really clear understanding of kind of how I learn and what I want to accomplish with data science. I didn’t always know that and in fact, I’d say even though I’ve been using data science in various capacities, it wasn’t until really recently that the light bulb kind of came on and I said, “Ah, okay. Now I’ve got a clear path that I’m going to follow.”
But rather than being what I kind of envision as a typical data scientist, somebody who is really good with numbers and kind of very left-brain, I’m actually a right-brain person that has lived in the left-brain space for my career. So I do a lot of context switching between kind of the visual graphic connection idea space to, “Okay, now I’ve got to translate that to the formulas and algorithm execution space.” So for me to come at a particular problem like data science or challenge, I need to come at it from as many different angles as possible because my brain is going to start making those neural science learning connections that help form more of a systems model type of view. To folks who aren’t familiar with that concept, it’s basically, rather than just trying to solve the problem you’re not—well, I shouldn’t say never, but probably rarely solving a problem in isolation. Your problem exists in a larger system and so thinking about things in a system model, you’re going to draw and say, “Well, here’s how we’re going to solve this problem but what’s the impact over there?” Or “What’s their disincentive over there and how does that impact my model?” So as I’m learning data science and having that passion for data science, I’m trying to come at it from as many angles as possible so that I can incorporate that in my solutions.
Kirill: Yeah, that totally makes sense. And the more angles you attack it from, the more opportunities you have to learn. And just for the benefit of our listeners, can you mention a couple of sources, if you don’t mind sharing these? Where are you learning data science from?
Garth: SuperDataScience, of course! Thank you, thank you! So I had first found you on Udemy and some of your courses in Tableau and data science there. And I should mention too, I would classify myself kind of at the beginning of my data science journey. There’s so much more I have to learn. So I look at every source as a possible opportunity. Coursera – I have a course that’s being done by Duke in stats right now that I’m going through that I’m finding tremendously helpful not only as a review but also picking up some new things, books. Pretty much anywhere that I can find, I’m going to leverage.
Kirill: Yeah, totally. And I really appreciate you’re honest about being able to admit that you’re at the beginning of this data science journey; and I think people come to data science from different backgrounds. We’ve had guests that came from chemical engineering, from physics, from neuroscience, people that have come from a strategy perspective, people that come from economics, and so on. And you’re coming from the space of databases but still, you can appreciate that the data science field that you’re entering into is quite broad and you do need to undertake some education. And I think a lot of our listeners will be able to relate to that because it’s one thing listening and learning from somebody who is very well established in the field of data science and learning from their experiences and successes. It’s a whole different story being able to relate and to learn from somebody who’s just venturing into the field of data science and to this space, and just embarking on their journey. I’m sure you’ll share some very intriguing and valuable insights with us on this podcast episode.
Garth: Absolutely. That’s sort of my whole model in life. Again, harkening back to this idea that who I am and knowing who you are, the breadth and the vast array of different kinds of experiences is really what I do well. And I’ve always been fascinated by people that were just the opposite. They’re very deep, they’re so knowledgeable, they’re so skilful. And I thought, “Boy, you kind of wish — and think that the grass is greener over there,” but you are who you are and it’s best, as you mentioned in a prior podcast, to focus on your strengths and enjoy those. So from a learning perspective, yeah, I’ve listened to those other podcasts and I’m thinking, “Boy, those guys are so smart,” or as I’m learning, I might find a blog somewhere or an instructor in whatever course and think, “Oh, look how accomplished they are. Look at how much they’ve done in terms of their deliverable, their quantifiable output, what they’re talking about, the depth of their knowledge.” It just seems like the mountain that’s too great to climb and it can feel very defeating at some level and discouraging, but my experience and my encouragement to listeners who are in a similar spot to myself is, do not be defeated by that. There’s no need to be defeated by that. It’s so doable to focus on one thing at a time. You don’t have to solve the whole mountain, just find the rock that’s in front of your foot and solve that one.
And an example might be, let’s say you know that or you’ve heard that R is useful for data science. Well, okay. But maybe you haven’t quite solidified your knowledge of powerful stats like functions in Excel. Start there instead. And then once you’ve got that, then the next step might be go to R and try and accomplish those same things in R. And that gives you a nice focused scope. The net-net for me is the 20/80 rule. Focus on the 20% of activities that accomplish 80% of the outcome. Do it faithfully, do it consistently, and you and your organisation will be light years ahead of wherever you’re at from a data science perspective.
Kirill: Yeah, totally. I absolutely agree with that. It’s kind of like data science is—I wouldn’t say it’s unique in that way; but it has that advantage and it is so broad that you can learn a lot of different elements. You can go deep, like a lot of specialists do, but also you can learn a lot of different elements about data science and then pick the ones that you’re good at, pick the ones that you’re interested in, pick the ones that excite you the most and slowly deepen your knowledge in those, and then pick up something else, pick up something else. And through this breadth of knowledge you will still develop a great expertise. And just speaking of that, once you started learning data science, where did you decide to start?
Garth: I’d have to think about that.
Kirill: Was it R programming? You said you already had some experience with SQL, so that doesn’t count. That’s your background. But when you started venturing into data science, was it machine learning algorithms, was it maybe linear regressions, was it R programming, was it the visualisation with Tableau? Like, what was your first step into this vast, vast field which is data science?
Garth: If I remember correctly, I think it was during school and it was a project we were doing – some sort of marketing or market segment analysis of some sort. And I think it was actually using Statspack, which at the time was an add-in to Excel that allows you to do some correlations, and CP values, and R-squared values, and things of that nature. And I had used that largely because it was ubiquitously available and free because I already had it. But the next step after that was dipping my toe into R and starting to look at that. I kind of segued a little bit into Python. I’m not totally clear at this point how I can make that meaningful to me yet so I’m still kind of squarely in the R space and looking at that.
Kirill: Okay. That’s interesting. So would you say that—so, you went from a database background into the Statspack Excel add-in, which I’m assuming would require some statistical knowledge, some understanding of p-values, and maybe some distributions. Would you say that this statistical knowledge is essential for somebody to get into the space of data science?
Garth: At least a basic level. So statistics – just like data science, you can go incredibly deep. And at some point – again, I’m going to stress this – you really have to learn who you are. Because at some point you’re going to have to embrace either things you don’t know, and be comfortable with that. Or the fact that you’ll never know it. Like, you’re just never going to be that super-genius and whatever, because most of us – if we think about it in a standard, normal distribution – most of us are in the first standard deviation, we’re all in the 68% of average. And that’s hugely powerful. That’s where most of us are; that doesn’t stop you from being excellent at what you do. But you have to decide for yourself and get comfortable with that so that you can then make that kind of linear progression in terms of what you want to accomplish next and start defining what those steps are.
Kirill: Yeah. I’m just going to repeat that notion that was in the other podcast—I think it was in podcast #4 with Brendan Hogan — where we—what you already mentioned, that most successful people in the world are not people who constantly work on their weaknesses. It’s the people who understand what their strengths are, understand what their weaknesses are. And they focus on their strengths and they’re ignoring their weaknesses. That’s not my quote, it’s something that the CEO of Deloitte Australia, Giam Swiegers, kept telling us when we were aspiring to become better consultants at Deloitte. He said focus on your strengths. Ignore your weaknesses. That’s exactly what you’re mentioning now, that if you understand your weaknesses and you understand who you are, what you’re good at and what you’re not good at, maybe it’s not worth getting good at something you don’t want to be good at. Just focus on the things you love and focus on the things that you’re passionate about and your existing strengths make them even stronger. That’s what will make you valuable and that’s what will make you respect yourself and push yourself even further.
Garth: Absolutely! That’s 100% correct. And while you were speaking about that, I thought what if a person that’s listening says, “Well, I’m not all that good at stats. Then what?” Well, again, to close on that question, the net-net is you do need, I think, a basic knowledge of stats. But that’s very achievable. And I do use myself as kind of a litmus test. This will sound strange. I do have a positive self-image, but I like to acknowledge the fact I’m not the shiniest apple in the barrel. I’m in the barrel. So if I can get it, and again, coming from a right brain art background to a left brain world of application and having to make all this context which is, “If I can do it I can absolutely help you or know that you can do it,” and those basics are achievable. It’s okay if it’s not perfectly crystal clear the very first time you do it. There’s stuff that I’ll hit and I’ll think, “Oh, geez. I’m going to have to go again for the third time to try and understand this concept.” And maybe at that point, I’m looking at it from somebody else describing it, trying to get a different view of it. And suddenly it will click for who knows why. But just don’t stop, don’t let that stop you. You can accomplish the basics and again, that 20/80 rule. You just get that 20%, and you’re going to start making powerful, powerful impact.
Kirill: Totally. I totally agree. That’s what a lot of things in life are about. You do not stop. If you fall, you get up and you do it again. If you have to do it the second, the third, the fourth time, you keep doing it. And eventually you will find a way to make it click. That’s what we are as humans and resilience is a huge factor. In terms of self-respect, in terms of learning new things, it’s very valuable. That was great that you commented on that. I was actually going to ask you about that, how was it being a right brain person getting into the field of stats, but you pretty much answered that question. And I also like this litmus test analogy, that you’re using yourself as a litmus test. That if a person with a right brain mentality can get into stats and understand that basic 20% that are required for 80% of the things, then—yeah, it is doable, it is possible and nobody has an excuse not to do it. So what was your next step? So you mentioned you learned some stats, you started on R programming. So how deep did you get into R programming? Do you use it on a daily basis and what was your next step after that, or what is going to be your next step?
Garth: I don’t yet use it on a daily basis. I’m planning that in the next one to two quarters, to try and implement that into my work. I’m still in the learning journey. So I would say that it’s not like, “Oh, I’ve accomplished R.” I mean certainly, I’ve gone through some courses and kind of gotten a sense for it. But again, that idea of wanting or needing, in some cases, to go back multiple times and transition from the, “Okay, I know the mechanics of how the functions work,” but I think the most incredibly important skill from a data scientist’s perspective is to think about how to think about a problem. The mechanics of any given tool, you can learn those and get either super-great or be okay at it.
But if you can rightly think about how to frame the problem and how to attack the problem, that is, in my opinion, the most valuable thing. And with respect to work in terms of next steps, part of it I’m letting kind of in parallel the tasks that were given to me by my executive management. Their ad hoc requests kind of drive what I might need to learn next because it’s so vast, who knows what you’re supposed to do. There’s no one way. But also, just because I have this learning path, I know that there are some basic skills, whether it’s R, or Python, or whatever. You know, I can always continue to make continuous improvement in those areas. But I should mention, in addition to that, we do have particular visualisation tools that I’ve had to spend some time learning those as well. We use Tableau right now at NetApp. We also are exploring and have a lot of interest in Power BI. I use Microsoft Power BI on my own when I’m not doing work projects. So those types of tools take some time to learn and you’ll know what you want to do and if you can’t figure out how to make the tool do it, you’ll kind of spend some cycles working through things like that.
Kirill: Totally. I really like your idea of the most important skill being how to be able to think about a problem, how to frame a problem. I think a lot of the time, that is disregarded. People spend so much time focusing on, actually, the tools. And a lot of our listeners who are in this phase, that are starting out into the data science journey, I highly encourage you to consider this. That a lot of the time that you’re learning something new, you spend your efforts in understanding how to code in R, or how to use Tableau, or how to understand the statistics that you’re using, those methodologies and so on. But what slips past your attention is how to actually—what you’re working on, what is the problem, what is the end goal. And then in your head identifying how you’re going to tackle it, what are the steps that you’re going to take.
Yes, of course, you need to know the tools to understand the steps in detail. But even just having a general picture, getting a general sense, developing that intuition for how to attack these problems, that is – like you correctly mentioned, Garth – that is probably one of the most important aspects of a data scientist’s job. And just falling on that question, I wanted to ask you, can you give us an example of when you recently had to—again, if you can disclose this information—when you recently had a business problem and there was a specific way that you went about thinking about it that helped you solve the problem, or tackle it in a more efficient way, or actually come up with a solution to this problem?
Garth: I do, actually. It’s going on now at the moment.
Kirill: All right, that’s interesting. Let’s hear it.
Garth: I don’t have it fully solved yet but I’ll tell you how I’ve approached it thus far. So I was asked to do an analysis. I’m able to share at least this point: they were looking at what they call master purchase agreements. Basically if you have a large customer and you kind of put into a contract for some period of time whatever the kind of standard discounting that you’re going to get based on who you are, or your planned purchase run rate, or whatever. The VP had a sense that there were some inefficiencies there. And he couldn’t put his finger on it, that was just his intuition. I’d like to stress that there’s a large value in intuition. You can really hone that and use that to your benefit. It’s just you shouldn’t necessarily make every business decision only based on intuition. That’s what data science helps with. So he had this idea that there might be some inefficiencies in our master purchase agreement but he wasn’t sure. So he said, “Hey, Garth, you’re the data science guy. Look at the data and see what you can find.” And that was the assignment. (Laughter)
Kirill: Lovely. I love those. They come to you without a business problem. They’re like, “Just look at the data and tell me what you can find.” Wonderful! Fantastic!
Garth: It’s so vague it’s fantastic. You know, through no fault of their own, a lot of the business leaders that you’re going to work with, they don’t necessarily know. They may not know the data, what they’ve got, they might not even know what fields are available. But they certainly don’t know—you know, you’re in the data all the time—what kinds of relationships might exist already as they’re trying to think about the problem. So here’s this nice vague problem. Of course, I had no idea at first what to do and I thought at first, “Well, what can I do?” When I don’t know what to do, I ask what can I do. What do I know how to do? What can I start when something is ill-defined, and who knows what the next step is. And the first step I thought of is, if nothing else, I can visualise the data and just look at the shapes of the data.
Kirill: Interesting.
Garth: To inform my next step. Again, Mr Visual—if I can get a sense for the shape of the Big Data, that might then help me understand what I might next look at. Whether that next thing is right or wrong, I don’t know yet, but it helps me get a sense for the overall picture. So I—
Kirill: Hold on, hold on. Sorry, I’m going to pause you here. What do you mean by “shapes”? So that our listeners can get a better understanding of your thinking process, what do you mean by “shapes” of the data?
Garth: Definitely. So what I did is I took that data and put it into a histogram. And just visualised the data. In this case it’s Tableau. It could have been any tool, doesn’t really matter. But I was looking for things like—and if you’re studying your basic stats you’ll learn this or you know it already, but the idea of—what does the distribution of the data look like? Is it a normal distribution? Is it skewed left? Is it skewed right? I might throw in some box plots, or actually, before I leave that, I might also look at the modality. You know, is it uni-modal, is it bi-modal, is it multi-modal? Meaning how many major peaks does the data have? Is it very spiky and I have multiple peaks? Well, then I don’t have just one mean anymore. I probably have two means, and I might want to look and decide — well, for the start I might look at the peak on the right, or the peak on the left. And that’s another model for troubleshooting and solving problems. It’s the “Divide by 50 Rule”. Just pick a problem, pick a spot in the middle and say, forget about the stuff on the left, or right, whatever, pick whatever one you want, and go focus on the other half and just say, “Can I solve the problem with that half?” And if I can’t, then find the mid-point of that one and split that in half. Sooner or later you’re either going to exhaust that whole path, or you’re going to find the problem.
So looking at this and the modalities and the distribution of the data, I instantly found that in this particular problem, the data that I was looking at was very skewed to the right. Which is the reverse in statistics to what you might think it means, that the bulk, the majority of your data is actually stacked on the left, and that the small tail runs to the right. That’s a right skew. And when I looked at that again, I didn’t have some profound insight of, “Aha, there is the problem.” But it did suggest some things about, “Well, in this particular model that I’m looking at, there is a heavy concentration here that might not be ideal. And there might be some other things in the business that we can look at or do to address that concentration.” I also then thought, “Okay, here I’ve got this visual. Let me put some lines in there for the 20/80 rule as well,” which is approximately, generally speaking, 20% of your products or 20% of your customers account for 80% of your revenues as a rule of thumb. I don’t know if that’s necessarily always true or it’s true in this case, but let me model that. And I saw variance there.
Then I started to think about, “Okay, why did I experience variance there, and what could that mean, and is it for all of my customers that this is true? Or is it just particular customers?” And suddenly you can see kind of a waterfall effect of what I can next look at as I traverse down that path. But what I want to highlight here too, and this is so useful as a technique, I had some quick visuals, that quick sense, the general feel or shapes of the data. We went back to that executive sponsor who asked the question within the first day or two. Again, no particular insights, just “this is what we’re seeing in terms of the shape. Are we going in the right direction?” is what was asked. And it’s kind of like a negotiation. All of a sudden he starts, “Oh, well look at this,” and “I think this,” and he’s including his experience and his background and things that he knows that he never shared with you ahead of time. But that has then informed a whole another series of direction to take.
So the next step is based on his feedback from that data shape analysis. Now I have the next step of, “Okay, I have a direction for classifying my customers in a particular way. And then I’m going to do some random sampling in each category to see if the behaviour is unique to a particular class or if it’s consistent across all classes of customers.” And you can see now where this is going in terms of how it’s becoming more meaningful to informing the strategy of this master purchase agreement.
Kirill: I kid you not, you kept me on the edge of my seat this whole time you were speaking. I was like, “What happened? What did that executive sponsor say? What did you come up with?” That is so cool. And it’s such a good walkthrough. Thank you so much for walking us through this case study. It’s like an actual case study, a real life project, and how you got a very ambiguous challenge. And again, as you said, at no fault of the person that is requesting it. Sometimes that happens in life, when the executives might have a gut feel for something, and it’s much better than having no feeling whatsoever, no understanding, no intuition about what’s going on. So when you’re supplied with this intuition or gut feel, that at least guides you in the right direction. And then you were able to break down such an ambiguous problem into simple steps that you undertook.
Sometimes, like in this case, you don’t even have the ends in mind. You know that you want to find something. You don’t even know what it is. But still, every single step that you were taking had its own end in mind. And that whole finding the shapes of the data, understanding the skewness, and then applying the 80/20 rule and seeing if it applies. I actually did that myself in one of my projects, just check the 80/20 rule. So things that you know should be true, just check if they’re true or not, and that might give you some ideas. And also what was very valuable, and I want to repeat this for the benefit of our listeners, is working with stakeholders. Whether or not the challenge is ambiguous, or if it’s defined, it’s so important to go back. Even if it’s a very straightforward task like, “Within three weeks we need this visualisation with this dashboard, or these insights conveyed.” It’s still so powerful to go back and check with the stakeholder whether you’re 30% through their project or you’ve delivered your first major insight. You go back and check. And even if it’s what they wanted that you’re delivering, they might change their mind. They might see the insights and they might come up—together you might come up with a much better way to solve the problem. Or you might come up with even more valuable insights that they have now decided that they don’t want those previous insights. What’s going to be more valuable for their organisation is a different insight. So would you agree with that, Garth, that even if it’s not an ambiguous project, it’s important to go back to the stakeholder and discuss with them and work with them closely in solving this challenge?
Garth: I agree 100%. In fact, I’d extend it a little bit. So I think that there’s—particularly if you’re in the early part of your learning journey — there’s a tendency to kind of view yourself as the person who looks at the data and reports the result. The value of a data scientist is so much more than that. In this case, having those discussions with the stakeholders ultimately leads me to start thinking about the broader system and the larger question and what other questions might need to be answered and offering those into the discussion. And you’ll see light bulbs going. So not every stakeholder that you have, whatever level they’re at – director, VP, executive VP, CEO, whatever – they may not be a data scientist and so as you’re describing these results, you always have to think about how to pose this back into the context of the business problem, rather than just, “Look at this awesome regression,” or “Look at the linearity of this trend line.”
Kirill: You’ve got to speak their language, yeah?
Garth: So a good example of that is—there’s another piece of data that I was looking at where—and this isn’t particularly proprietary. This is true for any business, really. You know, you’re kind of looking at your distribution of margin to burden, or to discount. You know, how much profit are you making, and how much discount are you doing. And I saw a little blip in the data where I’m like, “Okay, you know what? There seems to be a pattern here where some of these bookings, or margins, or profits are coming from kind of low margin but high discount.” And that’s kind of like a worst case scenario. You don’t really want that. 
But because I was thinking about it in the context of the business problem, and I was going back to that stakeholder and having this conversation, just through that discussion, we started brainstorming and connecting the dots of the system and saying, “Well, I guess on one point, we could look at the data and just say, okay, this is not ideal. We should immediately address this, ‘problem’.’’ But then we thought, “No, that may not be a problem. What if those are basically kind of your cherry pick accounts?” You don’t have to spend a lot of time as a salesperson doing anything to cultivate them, they’re just kind of recurring, always there, in which case that low margin/high discount, those dollars are golden. Don’t touch them. You know, you’re not spending much time there anyway.
Another part of the analysis, we saw areas where we had some cluster of data points where there was higher margin and low discounting, all part of the same dataset. And I thought, “Well, let’s highlight this group of data points and talk to the reps and find out.” You know, what’s the same about those kinds of accounts. Are those accounts a particular kind of profile, whether it’s the industry, the size, the kinds of business challenges that they’re facing, their customers. What’s the same about those accounts? What’s the same about the messaging and how the reps positioned products and services to those accounts? Take that from those successful areas and go run a short experiment. Go talk to sales in the “problem area” and say, “Hey, look, this is what we’ve learned from other areas. We’d like to run an experiment.” You know, maybe take the worst performing salesperson in that area. They’d love to try an experiment to help their numbers, right?
Kirill: Yeah.
Garth: And maybe work out a deal with the executives saying, “Hey, don’t let this guy go if he participates in our experiment this quarter.” And then see do those messages, do those profiles, does that work in this particular area? If it does, then you’ve just scaled data science knowledge and learning across your organisation and had profound value impact.
Kirill: Very powerful. I totally love it. And just out of curiosity, and for the benefit of those listening to the podcast, what tool do you use for data exploration?
Garth: Probably for quick hits—I’m not opposed to just kind of digging in using SQL.
Kirill: Well, that’s given your background in—
Garth: That’s my background. Although, and I should mention too, as a data scientist, basic SQL is important because you’re going to spend a lot of time transforming and cleaning up data before you ever get to visualisation. But to answer your question about visualisation — probably just because my day to day isn’t — we have Tableau, I do use Tableau. I have a personal preference for Power BI. It is a growing tool. It itself would not say that it’s a direct competitor today, but it absolutely accomplishes the 80/20. It’s readily available for free and I can accomplish my visualisations very quickly with it so I’m a big supporter of that.
Kirill: Yeah. And from that I take that Tableau, Microsoft Power BI or maybe some other organisation might have ClickView. From that, what I was actually going at was that data exploration doesn’t have to be complex. You don’t have to know R, or you don’t have to use statistical tools to perform data exploration. Data exploration, and this is one of my favourite parts about data science, can be very visual. And it should be very visual. So, tools like Microsoft Power BI, Tableau, ClickView – their whole mission and their point is to visualise the data, to help humans see the data and understand the data and that’s what data exploration is about.
And when you were giving that example of looking at the shape of the data, that’s exactly it. It’s not about writing some complex code in R to visualise the distribution, or even just get statistical information about the distribution. It’s about doing just a couple of drag and drops in one of these visualisation tools and there you go, that’s your data exploration. Personally, I think anybody can get into this space, which is one of the most exciting spaces about data science. You can get into it very quickly, and if you don’t know where to start, if you don’t know where you want to start with stats or R or Tableau or some other tools or databases, then one of my favourite places as a recommendation to start is these visualisation tools that not only allow you to visualise data, but also allow you to perform data exploration.
On that note, I actually had an interesting question for you. From what you’ve described so far, in NetApp it looks like they’ve created this awesome culture and I’ve got a feeling a lot of our listeners are going to look at job postings by NetApp after this podcast so they should expect an influx of new data scientists. But it looks like NetApp have created this amazing culture where data scientists can grow and learn and strengthen their skills. Would you say that you are getting support from your organisation when you decide to undertake a new algorithm or learning in data science or experiment with something? Would you say that your managers and the organisation as a whole supports these initiatives?
Garth: That question is so key. You aren’t able to be as successful without the support of the organisation and it does take an organisation approach. So you’ll start small. You’ll start in your immediate area, and hopefully you can help your immediate boss understand the value of data science, but you’ve got to extend that out. So when I’m thinking about solving a data science problem, I do have in mind that what I ultimately deliver isn’t just about the pretty picture. I absolutely have to tie it back to the business and how to make it more efficient, how to make it more profitable, how to do something that matters and affects that bottom line, and ideally also ties to the goals of what the organisation is trying to drive. In our case, we’re probably like a lot of companies. Data science is such an explosive, in the best possible way, area of growth, that we’re looking to say, “We’ve got pockets of data science in the company. How do we bring that together in kind of a standardised, cohesive way so that we’re all benefitting from what’s being learned? A lot of problems are similar, so there’s no need for everyone in the organisation to reinvent the wheel on their own. Let’s look at what’s worked in the past for those similar kinds of problems.”
So in that sense it kind of ties back to what I mentioned before and the idea of how you communicate the data science results back in business problem language that executives and sponsors understand will help them better see the value of data science. And it will be a slow roll. And it can be also very threatening. Please understand that too. When you start finding patterns and data that don’t particularly favour current behaviour you get a target on your back and all kinds of anti—and this isn’t unique to NetApp. This is every organisation I’ve ever been involved in, which is—you know, people defend, they’re scared. “That’s new. That’s questioning my skills and abilities to manage my business,” whatever. And I think you also mentioned this in another podcast, which is, you don’t have to present it in a threatening way. It’s like, “Guys, here’s an opportunity here, and it’s not so much that this is pointing out something you’ve done wrong, but it could absolutely free your cycles, whether your people or your time, to go and be awesome in so many other areas that you don’t even have time to explore right now.”
Kirill: Exactly.
Garth: So that kind of framing really helps drive the kind of organic growth of data science in an organisation.
Kirill: Fantastic. I was actually going to ask you what your recommendation is for those who want to start learning data science, get that buy-in from their organisation, but you basically answered that question. You take it step by step. And you make sure that you’re aware of the challenges and the threats that that can pose to you, but you pose it in a beneficial way. You always find the business value in the activities that you’re doing. And it would be, for lack of a better word, it would be just silly for a business not to leverage those insights that you’re all of a sudden providing and—like, you can learn data science in your own free time, especially if you’re passionate about it, and then you could bring those skills to your business, and sooner or later you’ll find ways you can apply this knowledge.
And once you start driving value, once you position it in a way that’s going to bring value to the business, the smart people at the top that are driving this business are going to pick up on it and they’re going to slowly start encouraging it. And hopefully you can get them to create this culture where you’ll be encouraged to further learn data science.
That was very interesting, and we’ve got–-slowly wrapping up this podcast, we’ve got some interesting questions. But before we move on to those, I noticed on your LinkedIn—and I think I want to talk about this because even though it’s not related to data science, I think it’s quite important. You’ve highlighted that you were a part of a volunteer experience and cause. You were in the Comfort Zone Camp. You were a Big Buddy there back in 2012. Can you tell us a little bit more about this experience because I think giving back to the community is always important and, you know, even if we spend just two minutes talking about this it can be very valuable.
Garth: Oh, definitely. So Comfort Zone Camp is a local organisation that’s basically a bereavement camp for kids up to the age of 18 who’ve lost either a sibling or a parent or other immediate relative. We disconnect from technology, we’re out in the woods and it’s a space to have some guided counselling with professional counsellors to help with the grieving process. It helps them feel comfortable to talk through and work through some of those emotions. Oftentimes their parents are also grieving, and they aren’t always able to give that support or meet the needs of where the child’s at at that time. So this camp helps provide that space to do it.
But to your other point too—that’s one way. Even in data science, there’s plenty of opportunities to give back. I mean, I’m always—it’s funny, it’s probably human nature to think, “Well, I’m not so good at it; so what can I possibly contribute?” Whatever knowledge you know, you have something to contribute right now. Find somebody who wants to learn what you know and help them learn it. Whether it’s provide the direction, provide some feedback, whatever it is, the opportunities are endless to give back. So I always, always try to think of how can I help somebody else achieve what they’re trying to achieve, whether it’s in data science or elsewhere.
Kirill: That is very powerful and I totally agree with that. So there’s lots of ways you can give back to communities, to friends, to families, to people in general. Thank you very much for pointing that out and for your contributions. It’s very inspiring to see somebody let—like, I can relate to you, you’re so passionate about data science, and I can also see that you’re a kind and nice, generous person, and you find the time to give back. So thank you so much for being a person like that.
Garth: My pleasure.
Kirill: All right. So moving back to our wrap-up questions, what would you say—and this is going to be an interesting one—what would you say is your one most favourite thing about being a data scientist?
Garth: I would say it’s just that whole discovery process and—actually, no, it’s twofold. It’s the discovery process, because I do think data is—hopefully in reruns it was global, but we had this cartoon back in the days called “Scooby Doo” and it’s like this mystery cartoon. I was endlessly fascinated by it. They were always solving mysteries and that to me is what data science is. You’ve got this mystery, this problem, and it’s discovery. And it’s just fascinating to see well, what’s possible? What’s not possible? The other half to me, harkening back to my right brain side, is that—like, I use sketch notes extensively to understand complex problems and remember it. But when it comes to expressing of visuals and getting those messages back to the execs, the visuals, Tableau or Power BI or Clickview or all those create—that’s one piece. But it’s a part of a larger presentation, and I’ll start applying infographics to help facilitate the discussions around those visualisations. And so I kind of get to marry art and data science and logic and then this piece together. To me that’s been kind of a dream come true in a sense.
Kirill: Fantastic. I can imagine how for right brain persons such as yourself, how that’s that additional creative part. We talked about that creative part within the actual data science, the technology. But this other creative part, actually bringing the insights to life and making them speak the language of your audience, I can imagine how that would be super exciting as well. And from where you are now, from what you are learning about data science, or from how far you’ve gotten into the field, from what you see now where do you think the field of data science is going and what would you recommend to our listeners to look into to prepare for the future of data science?
Garth: Based on what I’ve seen and what I’m getting a sense of, is it feels very much like the late 90’s/early 2000s with the technology boom that happened at that time. That was largely centred around network administration and system administration. That’s kind of where data science is to me today in the industry. It is the thing. And what to me is a little different about data science compared to that earlier time; that earlier time is about the hardware or the software. Data science, we have some tools but again, it’s really about hat way of thinking about problems. And being disciplined and how you create little miniature, quick experiments to vet out proof points of whether something is valid or statistically meaningful.
That extends well beyond a tool. And well beyond a particular timeframe. To me, it’s more like maybe personal computers were back when they started. Can you really get sustainable employment if you don’t have some basic computer skills today? It’s going to be difficult. Data science to me is like that going forward. It will be the differentiator between organisations. You know, the organisations that have a deep concentrated knowledge of data science skills are going to have a competitive advantage. And whatever you’re doing, even if you’re not a data scientist – maybe you’re a people manager or whatever else – applying data science to what you do will differentiate you from your peers. It ultimately always affects the relationship you have with your customer. You’re able to create a more authentic and provable relationship, giving them exactly what they need based on the data, and also driving the bottom line.
Kirill: Fantastic. Love it! Data science across the board. Everybody needs to know data science to a lesser or greater extent, depending on your role, depending on how deep you want to get into it. But I totally agree. It’s a great analogy: it’s like computers back in the 90s. You thought computers were this chic thing that maybe some people would need to know, some people wouldn’t. Kind of like a non-compulsory thing, and people could use them if they wanted to, but now everybody uses a computer. It’s natural, right? Same thing about data science. I totally agree. Thank you very much, Garth, for coming on the show and sharing all of this valuable knowledge. If any of our listeners would like to contact you, or follow you, or find you, or follow your career, how can they do that?
Garth: LinkedIn is a good way to find me. I don’t yet have a thing like a blog. Maybe as I feel like I have something to say, I’ll add one, but for now LinkedIn is a good way to find me.
Kirill: For sure. We’ll include the URL to your LinkedIn. But definitely I highly encourage you to start a blog. And any of our listeners who are passionate about learning data science, I think—as you said, teaching and sharing and giving back, it will help you improve your skills. But it’s also a very valuable thing to do. And if you ever start a blog, we’ll definitely include it in the show notes and let our listeners know about it. One final question for you today is what is your favourite book that can help our listeners become better data scientists?
Garth: Harkening back to something I mentioned earlier again, I’m a fan of Microsoft Power BI largely because it leverages what I already knew about Excel and functions there, and adds a little bit of SQL. And it’s just a powerful, inexpensive tool, so I highly recommend a book called “Introducing Microsoft Power BI” by Alberto Ferrari and Marco Russo. They’re so good at being able to describe how to use that tool in a very clear and easy to understand way. So I’m a big fan of that one.
Another book that I really strongly recommend, and this has to do with the idea of data presentation, is called “Show Me the Numbers” by Steven Few. He does such a good job of walking through, or actually helping you avoid the temptation of just letting a tool tell you, “Well this is the visualisation.” No, it’s going to make some assumptions that aren’t necessarily always right. Stephen Few walks you through, why would you want to choose one visualisation versus another type of visualisation based on the problem you’re trying to solve? What are some of the artistic elements that really matter around the use of white space? Or around the use of colours? Or colour intensity to highlight different parts of your data? Such a great book that I just can’t recommend it enough.
Kirill: Fantastic. Sounds like a great book already. I can feel the enthusiasm in your voice. So there we have it. We’ve got two books: “Introducing Microsoft Power BI” by Alberto Ferrari and Marco Russo, and “Show Me the Numbers” by Stephen Few. So we’ll leave the links to those books in the show notes, so we’ll definitely check them out and pick them up. I’m personally very interested in Power BI after our conversation yesterday and after our chat today, so I will put those onto my “to read” list. And finally, again, thank you so much, Garth, for coming on the show. Really appreciate you taking the time to share this, and I’m so happy that so many of our listeners are going to get value out of this, especially those who are just starting out into the field of data science. I think this interview has been a great inspiration to those people. Thank you so much.
Garth: Oh, my pleasure. I hope so, and look for me in the community.
Kirill: All right. Talk to you soon. Bye.
Garth: Thank you, bye bye!
Kirill: So there you have it. I hope you enjoyed this episode and probably you could tell how inspirational it was, how energetic this episode was, and I’m sure you picked up lots of very interesting skills. Personally, for me it was very motivational to see somebody like Garth learning about data science and just pushing the boundaries, always constantly finding new materials, finding ways to improve his knowledge, and that even inspires me to go and start learning more and more myself. So definitely take this as an inspiration that people are learning data science, people are finding ways to go about it. It is a complex, broad field to get into, but it is possible to take those baby steps one step at a time and learn those things that you need to learn, set yourselves challenges and goals, and you will get into it and you will become a successful data scientist.
And don’t forget that you can get the show notes for this episode at www.superdatascience.com/11, so just the number 11, and there you’ll get the transcript for this episode, you’ll get links to the books that we mentioned, and you’ll also be able to follow Garth and get a link to his LinkedIn. And if you’re listening to us on iTunes, then please rate this podcast. It’s quite a new podcast, and any ratings that you can submit, especially if you like the show, will be very, very beneficial. I can’t wait to see you next time. Until then, happy analysing.
Show All

Share on

Related Podcasts