SDS 055: Building and Managing a Successful Data Science Team - SuperDataScience - Big Data | Analytics Careers | Mentors | Success

SDS 055: Building and Managing a Successful Data Science Team

Welcome to episode #055 of the Super Data Science Podcast. Here we go!

Today's guest is Head of Data Jaco Van Der Berg

Subscribe on iTunes, Stitcher Radio or TuneIn

Are you a manager in the space of data science or someone who is looking to move into a management opportunity, or perhaps just curious to what happens at management level?

As Jaco Van Der Berg researched courses to broaden his data science knowledge, he found much material geared towards the technical side of things, but hardly anything on the management side.

Tune in to hear Jaco share his own deep experience and body of knowledge in the space of management and learn all about higher level challenges in strategy and management, as well as his own approach to self-development.

In this episode you will learn:

  • Managing and Hiring for a Data Science Team (18:25)
  • Structure of a Data Team (23:52)
  • Data Science Strategy (27:21)
  • Changing Approaches to Data Security (33:17)
  • Managing the Flow of Data Within an Organisation (38:45)
  • The Single Point of Truth and Changing Systems (40:45)

Items mentioned in this podcast:

Follow Jaco

Episode Transcript

0

Full Podcast Transcript

Expand to view full transcript

Kirill: This is episode number 55 with Head of Data Jaco Van Der Berg.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.

(background music plays)

Welcome to the SuperDataScience podcast. Very excited about today's episode. I had a very interesting guest on the show just now, Jaco Van Der Berg. He is the Head of Data at a huge company. The company is called The Hut Group, and they have over 2000 employees. This is a company that manages over 100 high profit websites, mostly based in the UK, and mostly oriented towards healthy lifestyle, health/fitness/beauty websites. You have heard about some of them, lookfantastic, Myprotein, ProBikeKit, and others. As you can imagine, it would be a very, very responsible and challenging role to be the Head of Data for an organisation that size.

And in this podcast, we discuss some very interesting things which you will find helpful especially if you are considering a role in management in the space of data some time in the future down the line. But even if you're not, I think these things are very valuable to know about anyway. So we talked about data science management, we talked about data strategy, data security, data flow within an organisation, the single point of truth, those are just some of the topics that we touched upon and some of the topics that Jaco has to deal with on a daily basis. Also Jaco is quite interested in the space of data science even though he comes from a different background, he comes from a background of database administration, and therefore he is in the process of learning more and more about data science, taking courses on data science, and it's very interesting to see how he's combining the two sides to his role on one hand managing a huge organisation's data, and on the other hand, at the same time, upskilling himself in data science.

So I think it was a very inspiring podcast, and also lots of knowledge in the space of data management and building that data structure within an organisation. So I hope you enjoy it and without further ado, I bring to you Jaco Van Der Berg, Head of Data at The Hut Group.

(background music plays)

Welcome everybody to the SuperDataScience podcast. Today I have my comrade from South Africa, Jaco Van Der Berg. How are you going, Jaco? Welcome to the show.

Jaco: Hi Kirill. Thank you very much for having me. All good here in sunny Manchester, thank you.

Kirill: Just so everybody knows, you were probably expecting a South African accent, but Jaco has been in London or England for what, 10 years now? And you've completely lost your accent now!

Jaco: 10 years. I still think I have bits of it left. My wife is a South African as well. So we do speak Afrikaans when we can.

Kirill: Ok, ok, fantastic. And just to start us off, can you describe how we actually met in terms of how we got in touch with each other? It only happened just recently. I was very fascinated by how this whole process went.

Jaco: Yeah, sure. So I started doing some online training using Udemy first of all. And one of the courses I did there was one from you. I think it was the machine learning training I did then. And one thing led to another, I started looking at your website, SuperDataScience website, and looking at the podcasts and going through all of them. And the one thing that struck me was that it's very interesting and you have some very sharp people there. But a lot of it is very technical or people who are actually data scientists. And from my point of view, being quite new to data science as a field, and especially as a manager, I struggled to find any resources where people spoke about managing a data team or managing a data science team. So I sent you an email and I suggested a few things and hopefully you know, there will be other people as well who can share their experience, not just from a pure technical side, but also from a management side.

Kirill: Yeah, totally. And I really appreciated that. I think that's a very important topic to discuss and that's why I right away jumped on the opportunity to invite you on and that's what we're going to do today. We're going to talk about management of data science teams, of the data science process, of just data in organisations. So yeah, I'm pretty excited about this. How about you?

Jaco: Yeah, excellent. Looking forward to this, definitely.

Kirill: Alright, cool. So just a quick run-down of your background. I'm going to start, and then you can add the bits and pieces that I might miss. So you are the Head of Data at The Hut Group in the UK which, for those listening, is a massive company. On LinkedIn it says 1000-5000 employees, not exactly how many, but between 1000-5000 people, with over 100 high profit websites. So you can imagine the amount of data going through this organisation, and Jaco is the Head of Data there. So tell us a bit about your role and maybe about the company even. What exactly are they focused on and what does your role entail there?

Jaco: Ok. So the company is purely online. It's an e-commerce company. I think the number of employees is about 2000-3000 people there, so the website is roughly right. I'm Head of Data there, so I look after a number of teams. So the database team, the BI team, we have an ECRM team, and these guys write the frameworks which we then use to send marketing emails to all of our customers. But as well as the data science teams, the guys' focus on the data science side is mainly around revenue growth and cost reduction as well as optimising some of our workflows in the warehouses as well.

Kirill: OK. That’s pretty cool. You obviously weren’t always in that role. It took you some time to get there. I think you’ve been there for eight months now. How did you start off and how did you progress from your initial roles and then all the way to Head of Data? I think that could be a very interesting topic to discuss, how your journey actually unfolded.

Jaco: Okay. I think I got lucky as well for parts of my career. When I finished my degree at university, I went and worked for Standard Steel Manufacturing Company in South Africa. This was probably the first bit of luck I had. As soon as I finished my degree, all the DBAs in that company left for some reason. I did a bit of Oracle as part of my degree, and instead of sending me down the usual graduate route where people went with end user computing or the IT guys, carrying PCs around and so on, they put me in the database team.

So I started my career off as a DBA really, and I really enjoyed that. I got to work with some very interesting technologies and some very smart people as well. I was a DBA for probably about 10 years or so. I started in South Africa and after 5 years there I moved to the UK with my wife and I started to work at an online gaming company, which again really made me think about high volume/low latency databases and the data used there as well. After a few years or so, I started getting involved in a little bit more of data analytics, so the business model for managing the databases and the data states for the analytics team. That’s when I first started hearing about things like R and Python and what would the guys do with it, things like Tableau. My interest really started from there, and one thing led to another and here I am.

So yes, it was a bit of luck but also always being curious in new trends and technologies. You can’t always sit back and just let things happen to you. You need to go and find out what the trends are and what’s happening.

Kirill: Okay. Fantastic. So, even though you’re Head of Data for such a large organization and obviously you have a vast range of experience in database administration and that pathway where you came from, but at the same time it sounds like you’re only starting to become an expert or even learn things to do with data science and R and maybe Python and other applications of data science. Can you tell us a bit more about that? How does it feel that even though being in such an important and responsible position, at the same time you’re still finding time to continue your education and look into new things?

Jaco: I think that’s a very interesting question. I think the long and short of this, you need to acknowledge what you’re good at and what you’re not. The data science team for The Hut Group, there are some really smart people there. It would be foolish of me to try and tell them what to do. Those guys are the experts. My job is to make sure that they’ve got all the tools and everything they need to do to do their job well. So, I need to enable them to do what they do best.

In the meantime, I need to do a lot of self-study so things like Udemy, things like the SuperDataScience courses you guys have is fantastic. It’s something you can do part-time, you can just do an hour or two after work or before work. My intention is not to become the world’s best data scientist. I doubt I will ever get a Masters degree in quantum physics, for example.
My intention is to understand enough about the discipline and the field so I can at least make informed decisions, help with priorities, help with new trends and find areas of the business we can apply our knowledge and scale that. It will be my goal to know enough about machine learning and deep learning as well. If I can write some Python code or some R code to do some basic things, for me that will be successful. I’m starting an MBA in a few weeks as well, so longer term, I think what’s very important is you need to look from a business point of view and if you can match up your business problems and your business knowledge with your data knowledge, I think that’s a pretty good place to be.

Kirill: Fantastic. I love that answer. I love that you mentioned that you need to acknowledge what you’re good at and what you’re not. I feel that it’s kind of like an ego thing for a lot of people to be—when you’ve accomplished some things, or a lot of things in your life, it’s harder to say, “Oh, actually, no. I don’t know this,” or to admit to yourself or to others that, “No, I don’t have the expertise here. Even though I wish I had, I don’t, but I’m going to do something about it. I’m going to go and learn. I’m going to go and upskill.”

I just wish more people were like that. I think the world would be a bit more open to progress if people would sometimes admit to themselves that, “Hey, it looks like the world has kind of moved on” or “This is a bit of a different field. I need to learn some new things.” That’s totally fine. That’s my personal opinion. So how is your learning going? You’ve taken up some courses, done some initial research in the space. How do you feel about all of it?

Jaco: I feel I’m making progress. It’s probably not as fast as I would have liked. Slowly, I’m starting to learn more about the field. I think I’m in a good place now where at least I have the resources in place where I can then probably spend a month or so scouring the Internet and trying to find good resources and good places to learn from. I did the Coursera one as well, which was pretty good. There were one or two, you know, you probably wasted a couple of weeks as well on somebody kind of thinking, “This guy isn’t really teaching me what I want to be taught.” So, I had a bit of that. I’ve got a plan now, at least, which is good. Within six months I will be in good place for data science and within three years that’s when I’ll be finishing my MBA as well.

Kirill: You should have come to me right away. (Laughs)

Jaco: I know. (Laughs)

Kirill: Tell us more about the plan. That sounds like a really cool thing. It sounds like you’ve got it all set. So six months – how did you come up with that plan and what does that plan entail?

Jaco: Well, it’s just—I first looked at the time I had, so obviously my work commitments will come first. So it’s the time I had, I looked at the SuperDataScience training that is there. I think for me to do the course and to do some examples of my own as well, including the MBA stuff I’m starting. I think after six months I will be in a position where I can say, “You know, I’ve got some of the core basics of what data science [indecipherable 13:57].” I guess if I wasn’t working I can probably even bring it back a lot, but, man…

Kirill: (Laughs) For sure. Okay, that’s really cool. I like the idea of a plan. How about this MBA? Tell us a bit more. It sounds like it’s part of your overall plan for not just your career, but your involvement with data and data science. What’s this MBA about and how closely is it related to what you’re learning about data science?

Jaco: It’s pretty close. The reason I selected this one is because it’s an MBA with data analytics as additional modules. I’ve been thinking of doing it for a few years now, but always thought I can do it later. And for some reason I was looking at this one and this is the first MBA I’ve seen that says, “Okay, you do the core MBA modules, but at the end you also do your data analytics modules. And there will be some data science in there as well.” I was quite excited by that because it looks like the universities are now realizing that you can do the ‘vanilla’ MBA, which I’m sure is still fine for some areas, but at least they are now considering adding a bit more technology and data to it, which I think is a pretty cool thing. This is the only university I’ve seen it in, which is Nottingham Trent University. The other thing is it’s also a fully online course as well, so it worked really well with my work commitments.

Kirill: Oh, that’s fantastic! I thought for an MBA you actually had to go to INSEAD or some other MBA provider and actually attend. So this is fully online?

Jaco: It’s fully online. So, they’ve created a platform where you do all of your networking online so you know who your classmates are, you know who the professors are, and all your work you do and research is all online as well.

Kirill: Okay. I was about to ask, and that was my next question probably – networking. Because I heard that MBAs are predominantly about networking, the connections you create there are very valuable. So you think you’ll be okay that it’s all online and you’ll still be able to network with the students, the professors and so on?

Jaco: I think so. It really only started last night. Last night was my first task. So I had to go in and I had to introduce myself to my fellow classmates, put in a bit of background for myself, who you are, and you can also see what the other guys have done. They’re spread across the world so there’s quite a few, but the guys I remember—there’s one guy in Cape Town in South Africa, there’s another guy in Dubai, couple of guys here in the UK in Liverpool, I think one or two in India as well. So it’s truly global, and I think with where social media is, and where the web is, you don’t have to sit next to someone always. I think it will help probably if you have that, but it’s not going to be a make or break on this course.

Kirill: That’s super exciting. I’m very excited for you. I hope it goes really well.

Jaco: Thank you. So do I. (Laughs)

Kirill: Okay. All right. Let’s slowly move on to your role as a manager in data. This is a very exciting topic because personally, the closest I’ve gotten to that in my career was—in Deloitte, I was running projects. There weren’t people that were part of Deloitte, but there were clients’ people that I was supervising and in my role in the industry building the data science division, but I never had a whole team of data scientists that I was fully managing.

This is a pretty interesting topic and at the same time I’ve seen other people do it and I’ve seen other people do it right, somewhat right, and I’ve seen so many other people do it wrong, very wrong. There’s so many pitfalls because it’s not just about managing people. It’s about managing a very specific type of people who are very laser-focused on certain things, who have certain types of attributes that you need to account for. It’s also about managing the data side of things, it’s also about managing the communications between teams inside the organization. Where do you want to start? Where do you think is the most important thing about managing a data division or a team of data scientists?

Jaco: That’s a tough question. I think as a manager I can only tell you what I have done, some of the mistakes and the good stuff as well. What I did first off is I try to understand the team, what their skills are, so in a way it’s a skills matrix and we say, “Okay, what are you good at? What are you not good at?” I did the same for myself. Then we looked at our data state, you know, what do we have. We then started speaking to the business and find out from them what are the business problems they have and can any of these problems they have be fixed with data, do they need to make more predictions, do they need to have more evidence or facts to back up the next decision where they want to take the business.

So for me personally, it needs to be led or driven by the business demand. We see this sometimes when we interview folks. They come in and they are exceptionally clever and they’ve done thesis after thesis, but some of them will struggle in the workplace because they will want to come and look at the problem and then write their next thesis instead of just find an outcome for the business. It’s not academia. There needs to be scope for innovation, so you need to allow enough time for the guys so they can do their research to keep them motivated. You know, the reason they’ve gone into this field is not to do a bunch of tickets. They want to use their knowledge and their skills and their brains, but at the same time there is a job to be done as well. You need to find that balance between what the guys are good at and allow them enough time to innovate and do research because a lot of the things that you’re going to do in the next six months or so, no one has thought about it yet, and you need to get these guys to spot those business problems looking at the data as well.

The other thing as well, we really push people to become what we call ‘T-shaped individuals’ or cross-skilled teams. If, for example, you are very good at Python and your teammate is very good at R, the expectation is that you will teach each other a bit as well. So you will have a bag of skills, you won’t just have one skill. The same goes for business analytics, when you know Tableau or Linux, whatever. So if you are a more rounded individual, then you just become better at your job. So we need to allow time for that as well. I don’t know if that really answers your question, but you need to have a look at your problems from a business point of view using data. You need to allow your teams to innovate and you need to also make sure that what your teams are working on is actually the business problem.

Kirill: No, that’s a fantastic start. When you get into a role like that, it sounds like you had it all under control, a very planned out process. Just like you planned things out with your own education, it seems like you came in, you planned it out, skills, what’s the business demand, how does a team work together. It sounds like a great approach. After you’ve done this and after you’ve gotten to know the team, gotten the team in shape and you kind of see the business demand and so on, what are some of the other challenges that you face? Once your team is established and it’s slowly doing its work, obviously there’s going to be other challenges that you have to overcome. What are some of them?

Jaco: There’s always daily challenges, you know, things happen and things break. One of the challenges is when you planned a piece of work and you might work in sprints, or I’ve used kanban as well, something will come and derail that a little bit, but you just need to work around it. Sometimes that happens for a reason. It can just be more important than whatever you’re working on. Sometimes you have to drop whatever you’re working on maybe for a week or two or maybe even a bit longer to work on something else and then get back to what you were doing. So make sure that you don’t lose sight of that and whatever you drop, you always come back to it. The other bit as well, especially on data science, is to try and build a roadmap of things, what we’re going to work on. It’s not particularly easy to, and I don’t think it’s useful, to have a very long roadmap. So if you have a roadmap for two years in data science, I think you’re probably going to be wrong anyway just because of the rate of change. It’s just having a constant six-month rolling roadmap of “what are we working on” and getting the guys to feed into those roadmaps. If you just say to them, “Listen, we’re going to do this over the next six months,” then they would have some really good ideas. So you need to make sure they feed into that.

Kirill: Okay. That’s totally right, I think, that the rate of change is so high that there’s no point in having a — there can be a vision for maybe 5 years ahead of what’s the direction we’re going towards, but at the same time what projects you’re working on, what systems are you going to install. Anything beyond six months or a year, that’s a bit of a stretch. While you were speaking, I had this thought: Can you give us a structure of what it looks like to be the Head of Data? What does your team look like? Do you have managers in the team? Do they have people that they’re managing, or is it a flat structure and how many people do you have in the team or in the whole of the data side of things? I think listeners will be very interested to hear about that.

Jaco: Sure. We’ve got team leads managing the DBA and the BI teams. For the data science guys, they don’t have a team lead so that’s me today. I prefer that today because that’s how I get to learn as well from them. The reason we have team leads in place is purely about size. I don’t think this is necessarily a data thing. As a manager, if you have more than say seven people in your team, it becomes just unwieldy and hard to manage the team. You have to ask yourself, if you have so many people – let’s say you have 20 or 30-odd people to manage on one base, you are probably not doing them any justice as well. So you’re probably not going to be extremely good at your job. Just imagine you have to do 30 or whatever one-on-ones. If you do them on a monthly basis or whatever, I don’t think you’ll be able to focus on your job very well. So out of necessity and also to give some of the guys on the team an opportunity to lead teams as well, we’ve got team leads in place.

We have a pretty good mentoring system in place as well. So whenever we get a junior or a graduate data scientist, they get assigned a buddy with specific goals and objectives and things they need to achieve within a short space of time. Part of that is we make sure that they fit into the culture or they fit into the whole cross-skilling ethos. It seems to be working quite well there.

Kirill: Okay, cool. Thanks a lot for sharing that. That sounds like a very interesting structure. I definitely think team leads is a pretty cool idea. I actually heard that a person can manage on average about—two people is okay, five people is the average a person can manage. A very good manager can handle eight and the best managers ever can handle 12 employees that they’re managing. Anything beyond that, that is way out of line because, as you say, even if you’re doing a one-on-one catch-up once a month or let alone once a week, you’re not going to have any time whatsoever to do your own job and you’re just going to turn into a perpetual manager and I don’t think that’s a really exciting thing to do if you have the opportunity to be doing data science at the same time.

Jaco: Yeah, absolutely. I completely agree with you. Just imagine if you are just doing admin tasks every day, how are you going to find the time to do project prioritization, project management and even learn yourself? You won’t. You become an administrator.

Kirill: Yeah. I think one-on-ones is probably — I prefer to do them once a week. If I imagine myself doing 35 one-on-ones, it’s pretty much an hour each. 35 hours per week just sitting in an office, one person comes in, you do a one-on-one, he goes out, the next one comes in, he goes out, the next one comes in… So that’s the whole week. You’re just sitting in one room or on Skype or whatever and people are just coming in and out and that’s all you’re doing, just one-on-ones.

Jaco: Yeah. My limit, or my goal, is to have no more than seven. That’s not always achievable, but that’s a nice goal to aim for.

Kirill: Yeah, that’s totally true. Okay, so how about we talk a little bit about data strategy or data science strategy? That was one of the topics you outlined that you’re passionate about and maybe you’d like to share some things about data strategy with our listeners.

Jaco: Sure. Okay, where to start with this one? One thing I found over time – it wasn’t something I knew up front – you really need to engage with the business and you need to find out from the business point of view what are their problems. Doing so, then you can try and put two and two together and say, “Okay, so we’ve got this data in the business, either the fraud team or the marketing team or the finance team, whatever, has this problem. I know where the skills are and I know where the data is so we can actually help them answer their question.”

You need to have someone from the business side to help you build that strategy. For example, we’ve got a product manager. His job is to spend day in and day out with the business and he helps us to build a pipeline of work for our data science and data teams to work on. The other side of things, the strategy I had when I first joined was — what I wanted to do, I wanted to make sure that everyone in the business has access to the data they need whenever they need it to do their jobs. What was sort of slow and I don’t know if you’d call it “anti DevOps” was that — you know, it’s the old school way of things where someone raises a ticket and someone else picks it up within a day or two and then you may or may not get the answer you need at some point, which is not very good.

The previous companies I worked at, the DBAs were almost seen as the absolute custodians of data, you know, “Don’t touch it or anyone else,” whereas I like to look at it from the other way around. You need to be able to take your data and push it up to the business. And they need to be able to access it whenever they need it. You obviously need to be thinking about security. You can’t have sensitive data out there so you need to make sure that you mask data that needs to be masked and you only give people access to the data they need.

But we are very much self-service and automation as an overarching team goal. Whatever we do, we look at it and we go, “You know what? Whatever we do now, we’re not adding value to this task. Can someone else do it for themselves and then we will try and build a tool ourselves, an automation process so someone can do that.”

Kirill: No, that’s so cool. Really, I’m all for self-serve analytics and I’ve been on the opposite side, of the person who’s requesting the data, and you submit that data extract request and you get an e-mail saying, “We will process it in one or two days.” Two days later you get your extract and it’s missing three columns that you wanted or it’s completely the wrong time frame. That just keeps going on and on, and by the time you get the data the opportunity has passed and there’s no point in doing the analytics anymore.

I totally agree with that and I wish as well many more people would be like that in organizations. Personally, I think it’s kind of like a process of natural selection. That’s going to happen. With the advancement of data science and how quickly the change is happening, organizations, especially large organizations which cannot adapt, which don’t go down the path of self-serve analytics, which don’t allow users access, which are very protective of their data, they’re just not going to be able to keep up with other organizations. They’re just going to die off. What do you think of that thought?

Jaco: I think that is spot on. I don’t think I can summarize it any better than what you’ve just done. You either have to adopt technology and be data-driven. Otherwise your competitors are just going to get better than you and they will take all of your business.

Kirill: For sure. So that’s pretty cool. Data strategy is about being in touch with your business side and also understanding where the skills and the data can come from to solve those problems and encouraging self-serve analytics.
By the way, self-serve analytics, it comes hand in hand with building a data culture in the organization that not only the people that are in your team, that drive these data tools and skills and insights analytics, not only they are passionate about data, but also the rest of the organization which is utilizing the opportunities for self-serve analytics which you have created, they have to be passionate and they have to understand what data they have access to, they have to understand what power data brings. So how do you go about creating this culture in the organization?

Jaco: That’s a very good question. I don’t know how it got started, but we got fortunate and I believe almost everybody at The Hut Group is like that. They’ve got a natural hunger for data and they’re always curious. It’s also quite a young business. There’s a lot of younger people in there and they just grew up, they were the first, in the Internet age. What is life without the Internet? I think half of them don’t know. (Laughs) So they’ve always just had it and I think that’s just the thing they have. It needs to be something senior management obviously needs to push for and drive, and they do. They themselves have a hunger for data so people see what the senior guys in the business are doing. They do follow that approach.

Kirill: Okay. Leading by example, yeah?

Jaco: Absolutely.

Kirill: Fantastic. And obviously the other side of self-serve analytics, something that you already touched upon, is data security. For somebody who is possibly considering a role similar to yours or a management role in the space of data, can you give us a brief rundown of data security and what are the common issues that can arise and common things that people need to think of when they’re a manager and they are responsible for data security?

Jaco: Sure. It’s quite a wide topic. It starts with some of the absolute basics. Maybe it’s from my database background, but one of the very first things you’re doing is you have role-based access. Whenever someone wants access to a data store or a database, you only give them access to what they need and then you check what is it that they need. You know, is there an e-mail column there? Do they have credit card information? All that sort of stuff.

Question it as well. Don’t always just say, “Yes, that’s fine. Here’s your access.” That’s the guys who work with the data day in and day out. They’re the best place to go, “You know what? This data is very sensitive. I’m just going to check should you actually need this access.” You also need to work very closely with the security team, so they obviously have their own tools and their own policies and things which people need to adhere to. You need to get close with those guys because they can definitely help you if you need to do a day-to-day extract and pass it over to a colleague or a third party or something. They can definitely help you with ways and means to make sure that that is safe and secure and it doesn’t get into the wrong hands. The last thing you do is want to be in the news for all the wrong reasons. So it’s very much worth it to take a bit of extra time when you look at anything that is sensitive or secure.

Kirill: Okay. That’s pretty cool. And in terms of this role-based, I also want to add a bit that I remember that when you’re working in an organization, one thing is access to data. You go and you get your role, “I need this data,” and so on. But as soon as you start accessing sensitive data, they actually make you sign a separate form saying, “Okay, now you’re a person who has access to sensitive data. These are the conditions.” Even though you already have an employment contract and everything, you have to sign an additional form specifically about that sensitive data and what you can and what you can’t do with it. Is that something you guys also have in your organization?

Jaco: We don’t have it specifically. I think that’s for everyone in the business because we move at such a pace it’s expected that when you deal with this information, you will treat it right. We do educate everyone in the business what to do on the security side, whether it’s clicking on dodgy links that you get in an e-mail or what you do with sensitive data. They’ve got constant training programs. Every three months you get a refresher or just an e-mail saying, “Remember! This is how you need to treat sensitive data.”

I think that’s pretty good. I’ve seen it in previous companies as well where, as you’ve said, sometimes you need to sign an additional document. I’m not sure that’s sustainable any more. We live in a digital age and everyone has access to data. So it needs to be baked into everyone’s business, this is how you operate.

Kirill: Okay, fair enough. That’s a fair point. I guess we are moving so quickly into that direction that it’s just a default type of thing. Data security to me, it kind of breaks down into two parts: One is, of course, sensitive information. Like you said, you don’t want to be in the news for the wrong reasons. You don’t want any sensitive information to be hacked. You don’t want people’s credit cards to get stolen or even just their personal details. Nobody wants that to happen.

But also there’s this other part which is about treating data as an asset. Like, previously businesses had, and they still have – a good analogy is that, trade secrets or some patents and stuff like that, ways they do things that their competitors don’t know about. In my view, same thing about data. Not only can it leak to hackers or other parties that want to steal people’s credit cards, but if it becomes exposed, publicly available, competitors can learn from that and can find out the way you do business and the way your knowhow happens inside your organization. Can you tell us a bit about that? Do you approach data as an asset and how do you spread that sentiment across an organization that data is not just a tool for us to conduct business operations, but it’s actually an asset, it’s full of value which we can extract from it?

Jaco: Yeah, absolutely. I think that’s a very good question. I’m not sure how to answer that, to be honest, but yes, we do. I think the way to do that is you need to explain to the business people that the data on its own can be valuable, but it’s what you do with the data that will make you unique from your competitors. For example, if you do a recommendation or a personalization or whatever project, you need to showcase to the folks in the business, “This is what we’ve done. This is the business outcome using the data and this is the data we’ve used.”

If you explain it to them in that sense, from a business perspective, what they’ve just received because of the data you have, then I think people will start having a better appreciation for the data and the business. Obviously the data in itself is valuable, but I think it’s what you do with it that will make you stand out from the rest.

Kirill: Okay. Yeah, I totally agree with that. Another thing that you pointed out is kind of like a point of concern or just something that you work with on a day-to-day basis, is data flow. Data flow between systems or data flow between parts of the business. What can you tell us about that?

Jaco: That’s not necessarily a data problem always. That’s more like a technology problem. As companies grow—and also the rate of change, you know, the technologies that we use today is probably going to be different from the technologies we’re going to use next year. But you will still have these pockets of data everywhere. So, you might have your own data centre in Google or Azure or whatever. And at some point, you’ll probably want to connect those data sources together because you will either have a business problem or you’ll have an issue or whatever. There will be a need for it.

And that bit is hard. You can’t continue to build data warehouse on top of data warehouse just because you have all these disparate data sources. That’s probably the next thing we’re going to look at ourselves. I don’t think I have the answer for that yet, but that is a growing concern. We’re okay where we are now, but it’s constantly in the back of your mind. You know, what happens if we open more data centres or other things come up? That data source is now completely in a different place and I can’t for whatever reason replicate it to my data warehouse. So what do I do? I don’t have that cracked, to be honest with you. It’s something that requires a bit of work around it.

Kirill: That’s good, I guess, that you still have interesting challenges like that ahead of you. It’s not just about managing the business or your part of the business. It’s also about overcoming these challenges and coming up with interesting solutions. I have no doubt that you will definitely crack it one day or sometime soon. That will be all under control sometime. At the same time, it also leads into the next question I wanted to ask you, single point of truth. What is a single point of truth and why is it important in an organization that’s driven by data?

Jaco: I guess if you look at — if we all lived in an ideal world, you will have all of your data sources wherever they are and they will feed into one other data source which will be your single point of truth, whether that is a batch job type system or an online real-time data store. I think ideally you want all of this information in one place. I think that’s doable. It’s not always required to have all of the data replicated to a single place. It is a hard question to answer, to be honest.
What we’ve done is we looked at, again, what are the key requirements for the business to make sure that we have the right data and the right place to answer specific questions. Now, for us on a technical side, that meant we had to replicate data to multiple places, sometimes to more than two. So you have two or three or four copies of the exact same dataset to answer different questions. That’s how we’ve approached it.

Again, long term, that’s going to prove to be a bit of a headache for us. It’s a bit of a trade-off, you know, what the business needs right now to answer that question and if they get it next week it’s too late, as opposed to, “Okay, we can do a short-term fix here and in the long-term our strategy will be…” to “let’s change all the data to get to that one place.” But I think that will be ever-changing. You will do that and you will have these short steps in between them.

Kirill: Okay. Good answer. And I think it’s important for people to understand that it’s not a jump like — you cannot just solve a problem right away, I’ve noticed this. In fact, in our organization, SuperDataScience, we’ve encountered this just recently, that when we want to change something from the way we’re doing it now to completely a new system, it’s not going to happen overnight. As much as there is passion and drive and excitement about the new system, it has to happen step by step in order to ensure business continuity, in order to ensure ability to control the possible errors and roll back if necessary at any point in time. I totally agree that it’s not a quick process. Eventually you get to your destination and when you get there, the rate of change is such that, “Oh, the world has gone to a new place,” and now you have to go to the new place and then you start your new journey.

Jaco: Absolutely. If you think about it, the data is generated by applications and these application developers also have their own things to deal with. They can’t drop everything and start writing data pipelines for the data team to just have a perfect world. We don’t live in a perfect world so you need to work with them to get the data you need when you need it. It will be an ever-changing thing. You can’t have a three-month strategy, get to a place where you have one data warehouse or one data lake and then think, “That’s it. I’m done.” You won’t be, because the business and everything else will have changed around you.

Kirill: Yeah, for sure. Well, those are the questions I had and that I derived from the little overview that you sent me. Could you maybe use this opportunity to share anything else that you’d like to share with people who are specifically interested in data science, are possibly considering an opportunity of becoming a manager in the space of data, or even further becoming the Head of Data or Chief Data Scientist or something like that, or maybe people who are already in that role? What are some things that you can share with them, some tips or tricks or ideas and comments that you can give them?

Jaco: Okay. If you look at two different people, you probably have a few technical people who want to move into management and they might lack some management knowledge, some management skills. For them I would say, work with your HR team if you have a good HR team. They will probably have a good training program for you. Go do some reading, as you would go to study for a new technology. It’s a skill as well to be a manager, so you need to go and find what are the good management skills you need to build up. That can be things like communication, project management, performance management, you know, how do you get the most out of teams, how do you get the most out of individuals, how do you have those difficult conversations with people because that can be uncomfortable. You know, things like emotional intelligence, do you know what it is, do you know how you can work on it to get better, how do you read people. Because it’s unlike technology or data. It’s not always black and white. There’s usually quite a bit of grey because we’re dealing with humans.
If you are a manager and you want to do a little bit more technology stuff, it’s the same approach I would say, identifying the bits you’re not good at or you want to learn and then just try and make time to learn those things. I think that’s probably easier to do because chances are you have a few very good people in the technology side who can do mentoring for you. So just because you are someone’s manager doesn’t mean you stop learning. Maybe you need to spend some time with them and just do what I’m doing, “Data Science for Dummies,” that sort of thing. Start from the beginning.

Kirill: Okay. I love it. Thank you very much for sharing that. It kind of summarizes this talk in an interesting way. Like you said, there’s two types of people: people who want to go from technical to management and they need to learn the management side of things, but also, if you’re already in management, it doesn’t mean that you have to stop learning. There’s always opportunities and space to learn the technical side of things. I love that summary.

One more question—I’m really interested to hear your view on this. We’ve talked about rate of change and how the space of data science is progressing so rapidly and how that’s impacting organizations and systems and data and so on. Where do you think this whole field of data science is going, and what should our listeners look into to prepare for the future?

Jaco: I firmly believe the future of this is in machine learning and deep learning. It’s purely based on what I’ve read and what I heard other experts say. If you look at what Google themselves say, if you’re going to do one thing, learn machine learning. There are some people in the world, for example Elon Musk, who most people know — he is concerned about machine learning and data science and AI and where that’s going to the point that he’s now bought a company, Neuralink, to “How you’re going to merge brain and AI” sort of thing. I think it’s a concern for a lot of those people. You probably just need to listen and say, “Okay, those are probably areas I want to spend time on and get good at.”

Kirill: Okay, gotcha. I totally agree. Yeah, it’s really booming, both these spaces. How crazy was it, AlphaGo beat the world champion in Go last year. And then just recently I watched the CEO of Google Brain, the guys behind Alpha Go, they released a breakdown of what was happening there. It’s just crazy how that algorithm was thinking in terms of—it was actually using intuition to beat a human.

Jaco: Exactly. There you go. You know, as soon as the algorithms start thinking for themselves and figuring things out for themselves without being told what to do, then you know we’re on an interesting path here.

Kirill: Okay. Well, thank you so much, Jaco, for coming onto the show. It’s been a great chat and I think a lot of people can pick up some valuable things from here. How can our listeners contact you or follow you or find you in case they want to know more about how your career is progressing or maybe even ask you a question or two about some challenges they’re facing in their data science management roles?

Jaco: The best place is probably on LinkedIn. You’ll find me there.

Kirill: Fantastic. We’ll include your LinkedIn URL on the show notes. And one more final question I have for you today is, what is your one favourite book that you can recommend to our listeners so that they can better themselves?

Jaco: This is not a data or a data science book, it’s a book that was recommended by one of my previous CEOs and it’s called “The Hard Thing About Hard Things” by Ben Horowitz. He’s sort of a serial entrepreneur in Silicon Valley. It’s a very interesting read on how he went about building companies. I think there’s a lot to be learned from that.

Kirill: Fantastic. It’s “The Hard Thing About Hard Things”?

Jaco: Correct. By a guy called Ben Horowitz.

Kirill: Ben Horowitz. Okay, yeah. There you go, guys. “The Hard Thing About Hard Things” by Ben Horowitz. Once again, thank you so much for coming on the show and sharing all this plethora of insights into the role of Head of Data.

Jaco: Thank you Kirill. I appreciate the time.

Kirill: So there you have it. I hope you enjoyed today’s podcast and picked up quite a few new things from our guest Jaco and how he goes about his role, very high level responsible role of managing the data of a large organization. I really liked a lot of things and I personally learned a lot of things, but my favourite part was when Jaco talked about the capacity to acknowledge the fact that sometimes there are things we don’t know regardless of how successful we are. Regardless of how much we’ve already accomplished, there are things we don’t know and we need to admit that to ourselves, we need to admit that to others, and we just need to go and learn those things. It’s totally normal. We cannot know everything all the time. There’s so many things to learn in the world of data and just in the world. There are so many different things. So it’s a very important thing to be able to put your ego aside and understand what you know and what you don’t know and then what you need to know and go and learn that.

So there you go, that was a very interesting podcast. If you’d like to get the show notes, you can always find them at www.superdatascience.com/55 where you can also get the transcript for this episode. And don’t forget to connect with Jaco on LinkedIn and follow his career. I hope you enjoyed today’s episode and I look forward to seeing you next time. Until then, happy analyzing.

Kirill Eremenko
Kirill Eremenko

I’m a Data Scientist and Entrepreneur. I also teach Data Science Online and host the SDS podcast where I interview some of the most inspiring Data Scientists from all around the world. I am passionate about bringing Data Science and Analytics to the world!

What are you waiting for?

EMPOWER YOUR CAREER WITH SUPERDATASCIENCE

CLAIM YOUR TRIAL MEMBERSHIP NOW
as seen on: