Welcome to episode #135 of the Super Data Science Podcast. Here we go!
Today's guests are Senior Director of Data Sciences at BigSquid Jorge Zuloaga & Director of Data Science at HireVue Lindsey Zuloaga
What's better than interviewing a Director of Data Science about the exciting, challenging project they're working on? Interviewing TWO Director of Data Science about their work.
Jorge and Lindsey talk in-depth about their work. Jorge's developing a tool that will help businesses “plug and play“ predictive machine learning into their operations. Lindsey's company is creating an AI-based software that helps predict future job performance of an applicant – purely by analyzing their video interview.
You'll also hear how they made the jump from academia to the corporate world of data science, what it means to be a Director of Data Science, Citizen Data Scientists and their thoughts on the future of the industry.
Let's get going!
In this episode you will learn:
- Where the couple got their start in academia (6:08)
- How Jorge broke into data science with a game of squash (12:22)
- The proliferation of machine learning and AI – and what Jorge's work is doing to help businesses get into this (18:22)
- Empowering Citizen Data Scientists (26:22)
- Predicting future job performance with a single video (30:24)
- How companies (and applicants) are taking this coming “new age“ of recruitment (37:17)
- What it means to be a Director of Data Science (41:32)
- Where the field of data science is going (55:30)
- Will it be harder to be a Data Scientist in the future? (1:00:41)
Items mentioned in this podcast:
- Hands-On Machine Learning with Scikit-Learn & Tensorflow by Aurélien Géron
- Python Machine Learning by Sebastian Raschka
Follow Jorge & Lindsey
Kirill: This is episode number 135 with data science power couple, Jorge and Lindsey Zuloaga.
Kirill: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. Each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now, let's make the complex simple.
Kirill: Welcome everybody back to the SuperDataScience podcast. Today I've got a double trouble episode. Today we've got not one, but two data scientists on the show. In fact, two PhDs in physics who are now directors, who are both directors of data science in their respective companies. How crazy is that?
Kirill: This episode is going to be super exciting. We've got Jorge and Lindsey Zuloaga on the line. We've talked a lot about a lot of things. This was the first double episode which I recorded with two guests at the same time, and it was very, very exciting.
Kirill: In this episode, you will find out a couple of interesting things. First of all, we'll talk about Jorge's side of things. You'll learn about high level automation of machine learning and the concept of the citizen data scientist and what that means and what that also means for companies in the future and how companies can apply machine learning in more of an automated fashion.
Kirill: From Lindsey's side of things, we'll talk about the application of deep learning in HR and hiring, and also how to prevent racism, sexism, and other forms of bias in artificial intelligence. Those are just a couple of highlights from this podcast. Of course, there'll be much, much more. You'll also find out of course from both of them what it means to be a director of data science and if that's for everybody. Of course, they'll tell us the amazing story of how they met.
Kirill: There we go. This is going to be a really fun episode. Can't wait for you to listen to it. Let's dive straight into it. Without further adieu, I bring to you two PhDs, two directors of data science, Jorge and Lindsey Zuloaga.
Kirill: Welcome, ladies and gentlemen, to the SuperDataScience podcast. Today on the show, I have not one but two guests, Jorge Zuloaga and Lindsey Zuloaga. Welcome, guys, to the show. How are you going today?
Lindsey: Good. How are you?
Jorge: Doing great, thanks.
Kirill: I'm doing well, as well. As I just mentioned before the podcast, I have never had a podcast with two guests, so I'm quite excited about this to see how this goes. How about you guys? Excited?
Lindsey: Yes, we're excited.
Jorge: Yes, definitely.
Kirill: All right. Well, for those listening, I was introduced to Lindsey and Jorge through Ben Taylor, and like a lot of our listeners, we know Ben from a couple past episodes. Lindsey, you worked with Ben. Is that correct?
Lindsey: Pretty briefly. I was actually hired just as he was leaving, so he was a good mentor to me in handing off a lot of things, but we ended up not overlapping very much because his startup started taking off and he had to go do that full-time.
Kirill: Awesome. That was at HireVue, right?
Kirill: Okay. You are the director of data science at HireVue right now?
Lindsey: Yes, that's correct.
Kirill: Jorge, what is your role?
Jorge: Yes, I'm heading data science at a company called Big Squid.
Kirill: Mm-hmm (affirmative), and what do you do?
Jorge: Big Squid is a software as a service company that we make a platform to automate machine learning.
Kirill: Is that something similar to Data Robot, what Data Robot does?
Jorge: Yes. What we do, basically as a company we do two things. One, we make the software to automate much of the machine learning process, so it is a tool where you can very easily train models and deploy models on your data. Number two, we also provide consulting around the machine learning process for companies to be able to adopt machine learning into their data workflow.
Kirill: Mm-hmm (affirmative). Fantastic. Lindsey, in the HireVue, as I understand from Ben's description, you use machine learning, AI to ... Maybe it's better if you describe it. What do you do there?
Lindsey: Yes, so we predict basically job performance based on video interviews. We're a video interviewing company. The AI part has been added on in the last few years where we started building models on video, that people recorded themselves answering questions and trying to find patterns in who ended up being really good at a specific job, and so helping companies predict what job candidates they should look at first and who might perform well in a position.
Kirill: Fantastic, so there we go. We've got two directors of data science today on the show. This is pretty crazy, like a goldmine. Let's see what insights we can get out of you guys. Probably would like to start more with your backgrounds. You guys have very unique and specific backgrounds, so let's dive a bit into those. Maybe Lindsey, let's start with yours. What did you study? You have a PhD. Maybe tell us a little bit about that, and where did that take you from there?
Lindsey: Yes. I have a PhD in physics, applied physics actually. When I got into grad school, I really thought I wanted to work with my hands and build things and do experiments. I did that, and it turned out what I enjoyed the most about my work was actually writing code to analyze my data, which surprised me. I thought I would enjoy the other part more, but after a while, you get sick of aligning lasers and trying to measure teeny, teeny tiny signals on particles and being in a dark lab all day. I really liked writing code and I kind of taught myself a lot of [inaudible 00:06:48] during my grad school years. That's where Jorge and I met was back at Rice University.
Lindsey: When I finished my PhD, I went to Germany for a year to do a post-doc, similar stuff, and took on writing code more and more. My journey from academia to industry is something I've written a blog post about. It was a little trickier than I expected. I was doing well in academia, so I thought, "Hey, I'm going to be valuable in industry and this should be a piece of cake," but I kind of went into it underestimating the power of connections and knowing people. Transferring from academia to industry, you don't really have connections in industry, and I ended up applying, well, both of us applied for jobs online. Just a lot of times, you never hear anything back.
Lindsey: Working for HireVue kind of has a personal connection for me because going through the application process as it is today is pretty horrible. It's kind of a broken system where a lot of people apply for things online. If you don't make it through some resume filter and have the right keywords, you're never going to get contacted.
Lindsey: That's one piece of advice I can tie in. It's very important to meet people and make connections, because that made all the difference, going to meetups and connecting with people on LinkedIn. That's how I met Ben Taylor, and getting experience, obviously, in whatever ways you can.
Lindsey: I started off a data science career. Actually, Jorge and I worked together for a healthcare company and kind of got our feet in the door that way, and then I moved over to HireVue, where I've been for about a year.
Kirill: Got you. Okay. Very, very interesting story, and I totally agree with that moving from academia to industry, those connections, you don't really think about it, right? You think, "Oh, I've got the right skills for the job. It's all going to work," but in reality, I heard a crazy statistic. It's like 70 to 80% of the hiring happens behind the scenes, not through resumes.
Lindsey: Mm-hmm (affirmative). That's very true.
Kirill: Yes. It's crazy.
Lindsey: Yes. I think when you're in academia, you think your publications, they mean so much and that they'll kind of speak for themselves, and then when you move into industry, you have to change your CV to a resume, and they just wipe out all of that. It's like you could have a bullet point on your resume that says, "I have this many publications and this many citations," but no one cares really as much in industry.
Kirill: Yes. They like more of the real life experience and stuff like that.
Kirill: All right. Thank you, Lindsey. Jorge, how about you? You also have a PhD and as far as I sense, in computational physics, so also in physics.
Jorge: Yes. My background, Lindsey was an experimental physicist, I was more of a theoretical physicist. I came from a mathematical background. My masters was in math, and I went into physics doing mathematical and computational physics. Everything was from the point-of-view of building theories and doing calculations.
Jorge: In graduate school, I was very interested in the mathematical and computational science of work. That's what I had all my focus on. After grad school, I joined the faculty at Rice and I taught there for a year. By then is when I had realized that I didn't want to go down the academic path. That's what I thought I wanted to do when I started my PhD. By then I realized I wanted to do more applied stuff in the tech world and in industry, so I wasn't sure what field I wanted to go into. I just knew I wanted to do something that involved a lot of math and coding.
Jorge: Data science is a perfect playground for people who are interested in math and coding, so I just started reading about data science and taking some online courses, but I wasn't doing it for a living. I was still in the physics world and I was teaching physics at Rice University and just kind of doing it as a hobby. By the time I was done with my stay there teaching physics, I realized, "You know what? I want to go do this full-time." That's when I made the transition.
Kirill: How did you find the transition? Did you run into the same problem of connections, or was it easier for you?
Jorge: Yes. We both went through it at the same time. At the time, Lindsey was doing a post-doc in Germany, and we lived there for a bit together. When we decided to move back to the US, we thought it'd be so easy to send applications online and just immediately get jobs in data science, and then we realized a lot of these applications are going into black holes, that nobody's reading them. As soon as we met people in person, then it was so easy to find a job, and we talked to people about what we were interested in doing, and it was totally different.
Jorge: Just trying to apply for jobs through online applications seems to be just very inefficient, and it's hard for people to notice an application that comes out of the blue, especially when it's for a position where hundreds of applications are coming in, and it's very unlikely that all of them will be read carefully.
Kirill: Mm-hmm (affirmative). Got you. How did you go about finding people, making these connections that helped you get the job for you?
Jorge: My first job was because of somebody I met playing squash.
Kirill: We had a good laugh about squash just before, right, how you were traveling Australia. It was really funny. Just for those listening, Jorge was talking about how he was traveling Australia and he was like, "You guys have really good squash." Out of all the things that we have in Australia, I've never heard that comment.
Jorge: Really good squash players in Brisbane.
Kirill: Yes, sorry, and so you met someone playing squash?
Jorge: Yes, and he introduced me to the CTO of a company, and I got in touch with him. I had some interesting conversations with him about what they were doing with their data. They weren't doing any serious data science at the time. They were just collecting a lot of data and doing business analytics and sort of data visualization, so it was just a cool step for them to take to try to incorporate machine learning into what they were doing. I joined their team to start sort of a machine learning program with their data, and then Lindsey joined the company afterwards to be on the same team, so we worked together on projects for a while. It was a great learning experience in data science.
Kirill: Lindsey, you got the job through a connection as well, but through Jorge.
Lindsey: Yes. Yes, I was working for a really small math tutoring startup at the time. It wasn't quite full-time and I was looking for another job, and it worked out that Jorge mentioned to the CTO I had a similar background, I was also looking for something, and they loved him, so they brought me in for an interview. That was fun.
Lindsey: Yes, we worked together for almost a year, which for us kind of came naturally because we went to grad school together. We regularly talk about work in our life, although a lot of couples that we talk to, they're like, "Oh my god, I could never work with my spouse. How do you do that?" We liked it. We kind of miss it sometimes, commuting together and stuff.
Kirill: That's so cool. That's actually a good segue into what I wanted to ask you. How did you guys meet? Tell us the story. You both were at Rice University, right? How did it happen?
Lindsey: Yes. Rice is pretty small.
Jorge: Yes, we were in the same department. We were both in physics, so everybody knows each other.
Kirill: Okay, so what, Lindsey, you're sitting at your desk just looking at your book. Jorge's walking down-
Lindsey: We met at the ice cream social, the physics department ice cream social. I remember that Jorge was wearing a really dorky tee shirt that said something about, "So here's my problem," and then it had like, "A black hole is going to collide with," blah blah blah. It had all these relativity equations and then it said, "So what's your problem?" Like some dumb tee shirt [inaudible 00:15:25] and that's the thing that I remember most vividly is the tee shirt.
Kirill: Jorge, your side of the story is probably, "So I was walking to this ice cream thing wearing my coolest tee shirt."
Jorge: Yes, but I'm killing it with this tee shirt, and then suddenly this redhead shows up and starts talking to me. I was like, "Oh wow, the tee shirt must work."
Kirill: Oh, nice, nice, and so as soon as you chatted, you hit it off right away? Is that how it happened?
Jorge: Yes. We became just really good friends for a while and hung out. For many years, we were just in the same department and got to know each other really well, and just eventually, a relationship evolves to dating.
Kirill: Nice, nice. That's a very cool story. Okay, so physics bringing people close together, contrary to what many might think. Okay, so we talked about where you guys worked right now. Lindsey, you mentioned something, that you met Ben Taylor through LinkedIn. That's a very interesting one. Can you tell us a bit more about that, because as I imagine, that was I guess a big event in your life, since you took over his role as director at HireVue? How did that happen?
Lindsey: Yes. It was kind of a weird thing. Someone who was in my group in grad school who graduated before me was doing a post-doc in I think Finland, and he had posted something. Ben Taylor had written a blog about this beauty model that he made predicting people's hotness scores, and I thought that was very interesting and funny, and then I was like, "Who's this guy?"
Lindsey: Really weird, because I hadn't heard of HireVue, and it was literally across the street from where we used to work. I looked it up and I'm like, "That is crazy." Someone in Finland posted this thing and this guy is right across the street from me, so that kind of gave a good connection that I commented and I said, "Hey Ben Taylor, I'm right across the street from you. Really weird coincidence," and ended up getting connected with him that way, and he reached out to me when they were looking for someone.
Kirill: Yes, that's very cool. For me, the same thing happened, kind of like you. I also read Ben Taylor's article, became a big fan. I just told him on LinkedIn how much I liked his article. He was like, "Oh, thank you," blah blah blah, and then a couple years later, he asked me something about Australia, and that's why we hit it off, then we met in person. That's how I met you guys, through Ben Taylor, as well. It's interesting, just having the courage to reach out to someone on LinkedIn can really lead to a lot of different things in life, I guess.
Lindsey: Yes, definitely. You never know where those little connections will take you.
Kirill: All right. Well, shifting gears, you guys have supplied a couple of topics that you're interested in. Let's maybe start with you, Jorge. One of the ones that you're passionate about is the journey for companies to adopt machine learning. I think it's a very apt topic for this day and age where we are seeing the proliferation of AI and different data science techniques and how that's getting into all different businesses. What are your thoughts on how companies can adopt machine learning?
Jorge: Yes. This is a super exciting time for this because now we have the tools to make it really easy for companies to adopt machine learning. We have really good software, and that's what I'm interested in now because that's what I'm working on. I'm building software to automate much of the machine learning process, and I'm also on the consulting side of things, so how do we talk to a company?
Jorge: The biggest challenge is understanding the business that they're in and trying to find out which ones are the business questions that they care about. Could we frame it in a way that can be tackled by machine learning, and which ones are the ones that we should go after that are most likely to be pretty successful?
Jorge: It's a journey to go down that path because a lot of companies are collecting a lot of historic data and they're doing analytics on the data that is kind of just looking backwards. They're visualizing and understanding what happened and why it happened, but trying to predict what will happen in the future is a whole different step, and that's where machine learning comes into play.
Jorge: It's a journey to get on a path towards being machine learning ready, and that's what we do. We try to get on the staff of the company where we help them ask the right questions and collect the right data needed to train models, to tackle those questions, and then start using machine learning to help them make smarter decisions.
Kirill: Do you have any examples of maybe recent projects that you can share with us where you've helped a company adopt machine learning processes?
Jorge: Yes. Actually, I'm new to my role. I'm on my third week on the job, but it's been so interesting. In the last three weeks, I've dealt with at least 10 different companies that are doing completely different things, companies from companies that sell food to selling cars to doing construction, and then all of these different projects that have been with companies who have collected a lot of good historic data and are not doing machine learning yet.
Jorge: It's kind of like a goldmine to explore good opportunities and trying to ask the good questions. At all of them, it's been really exciting to do pilot projects where we train models and try to predict things that they weren't visualizing before, and seeing the aha moment for them and understanding how much value they can get out of their data, and then we're doing predictive analytics.
Kirill: When you say automating machine learning, there's always a consulting element to it where you identify the questions, see how you can answer it, and so on.
Kirill: What does the automation part mean?
Jorge: Yes. A good way to think about it, everybody who has been doing data science or most people have probably seen this really famous Venn diagram that has sort of three subsets of skills that are useful in data science, one of them being hacking skills, which usually refers to skills in R or Python. Another one being math and stats skills, and another one being sort of domain knowledge or substantive experience, their expertise.
Jorge: Usually data scientists are people who live in the intersection of these skill sets. Really what we're trying to do is build software that can automate two of these big skill sets in this famous Venn diagram that can automate a lot of the hacking skills and a lot of the math and stats skills. Somebody who just has domain knowledge of the industry that they're working and has good data skills, there's probably analysts that can write SQL queries and get a data set ready, and they have an understanding of the business that they live in, but they don't necessarily have really strong Python and R skills and they don't necessarily have super strong statistical and mathematical skills, so those are the parts that could be automated with software.
Jorge: If you have a tool that helps you take a data set and say, "Hey, this is the column that I want to make predictions on and I want to use all these other columns as features to make predictions on this one column," then you could just put this into a software that automatically tries a bunch of different models and then tells you which ones performed better and then helps you deploy that model for future predictions.
Kirill: Mm-hmm (affirmative). Okay. Very true that it's about time we started doing all thee things in a more automatic fashion rather than everything manually.
Jorge: Yes. This automation has been happening for years in many different fields. I used to work in physics. There probably was a time when a physicist had to invert the matrix by hand, for example, doing the calculations, but nowadays, somebody would just use a linear algebra library to invert the matrix.
Jorge: Those kind of tools are used to solve different steps in a problem, and same thing with machine learning. There probably was a time when people were writing their own machine learning algorithms from scratch, and then nowadays, most practicing data scientists will just go use Scikit-Learn and use their random forest or their regression models or whatever model they're interested in doing, and they don't have to spend time writing a random forest algorithm from scratch. That's one step higher of abstraction where you just use a Python library, for example.
Jorge: Now we're taking this a step further and saying you don't even have to go in Python and use a library. We're going to try a bunch of different algorithms for you that we're going to build software that can try all of that for you, and all you have to do is provide the data and provide the business insight to saying what data matters for this kind of decision, and then the software will go and search through a space of models and space of parameters for those models and find the best ones for you.
Kirill: Interesting. What do you sacrifice in that process? For example, do you miss out in interpretability of results? What's the price you pay for doing that?
Jorge: Yes, you don't really sacrifice that because you can visualize the results and the performance metrics of everything that was tried, and then you can as a user make a decision as to what model to try. The advantage of having these kind of tools is that a naive user can just say, "I don't know the difference between this model or this other model, so just try everything and tell me what works better," but then a power user can choose to go into a deep dive mode where they can tweak parameters and have more control of the software themselves.
Jorge: As long as you have the option to be able to use a software as a power user who chooses to go into a deep dive mode, then you're not sacrificing anything. For the more naïve user who is not interested in the details, they can just use it at the higher level of just giving the software a data set and letting it do all the work for them.
Kirill: Mm-hmm (affirmative), okay, and what about overfitting? With one model, if you fit one model to a data set, you have a risk of overfitting, but when you fit 10 different models and you find just the best will fit out of the best-fitting model, isn't there an even higher risk of overfitting?
Jorge: Yes, there's always a risk of overfitting in any problem, but as long as you take the standard steps to reduce that risk, then that's fine. We're designing software that can hopefully take the necessary steps to avoid that, doing the standard cross validation, hyperparameter tuning. To have ideal hyperparameter tuning, you can avoid those risks.
Kirill: Mm-hmm (affirmative), okay, got you. Wow, very interesting. Thank you. Lindsey, what do you think of the work that Jorge's doing?
Lindsey: I think it's really cool. I know a lot of data scientists and a lot of technical people in general sometimes tend to get a little snotty about what they know and think, "I don't want anybody to just be able to do what I can do." There's some truth to feeling that way, that if you have a deep understanding of what you're doing, it's very helpful, but there are a lot of things that are being done in data science now that are done by people who don't have a deep mathematical understanding. They just know how to use Python, and they can try models and they run things. I think there's a balance. It always helps to understand what models are doing and have people who have good math and stats backgrounds building models, because there's always pitfalls that you could fall into if you didn't understand what you're doing.
Lindsey: I think that using machine learning in day to day life is just going to be more and more pervasive with time. For example, business intelligence, there's a lot of people who are BI analysts who aren't necessarily data scientists, but they could really benefit, if they have a good math and stats background, they could really benefit from tools that allow them to pretty easily make predictions without having to have as much coding background.
Kirill: Mm-hmm (affirmative). Yes, okay.
Jorge: We like to talk about the concept of the citizen data scientist. What we mean by that is a person who was not necessarily a data scientist before, but now has been empowered by tools to basically do data science and do machine learning. The benefit of this is that even if you have a solid data science team in your company, it's usually hard for the data scientists to be able to scale when there's so many problems they could be tackling, and there's not enough time for all of that.
Jorge: With the right tools, they can have a lot of people basically doing models on their own, and then the data scientists can just come in later and either review the work that they did and validate it, or they could also use a tool themselves to just be more productive and spend more time just thinking about ideas and less time implementing the details of those ideas in code, because the tool will do it for them.
Kirill: That's very interesting, because I had a podcast with Tom Brown from the Information Lab just a few days ago, and that same concept of self-serve analytics, I think we can call it, also popped up. It has been popping up quite a lot. I think this is quite a big trend that's going silently. It doesn't flash like AI or blockchain, but at the same time, it's quite a big trend that's happening in the space of data science where more and more companies are adopting processes where people in the business with some interest in data are having more access to tools and ways to get those insights from data, and that is indeed really easing off the stress from data scientists and actually driving the businesses forward in general because more people are driving more insight.
Jorge: Exactly. Yes, the hardest thing to automate is the domain knowledge of the business that you work in. People who understand their business and have the right business questions, if they have the right tool, we can automate the machine learning part of it or the model training part of it and they can spend their time thinking about the business questions.
Kirill: Do you think we'll ever automate the domain knowledge?
Jorge: Not until we have general AI, I don't think, which is [inaudible 00:30:12].
Kirill: Yes, which is like, what, five years away, 20 years away?
Jorge: Exactly. Maybe next.
Kirill: Yes, got you. Okay, what about you, Lindsey? You've mentioned a little bit about your work at HireVue and the really cool video interviews and how you've recently started adding AI to that. Can you tell us a bit more about what you do there in HireVue?
Lindsey: Yes. We have a huge store of job interviews. We're almost at six million now. Even a few years ago when we were more like two or three, we started thinking about how can we use data. This was before my time, but how can we use this data to help people make decisions?
Lindsey: A big problem that a lot of companies face is just they have too much volume. This kind of ties into what we said before. Even with higher level jobs, there's hundreds of applicants for one position. There's a lot of companies where they constantly hiring for a job because there's high turnover, so they have a call center and there's just way more applications than they can possibly look at. They basically end up randomly ignoring a lot of those applications.
Lindsey: We have this video interviewing platform, which is cool on its own about the machine learning part, where our customers can set up an interview. It's called an on demand interview, so it's asynchronous. It's not a live interview between two people, but it just has set questions. You can take video of yourself asking the question, or we have some NFL teams that will have a famous player record the question, and then people take the interview on their own time and they just record themselves giving a few minute answer to several questions.
Lindsey: That kind of takes the place of a resume and a phone screening. It's a great idea because resumes are kind of not very good representations of people and they can often have lies in them. I know this from hiring myself. When I look through a lot of data scientists' resumes, they can all start to look very similar, and it's hard to find anything that sets people apart.
Lindsey: When people can express themselves and talk about what they're excited about, many people have had the experience where they bring someone in for an interview and maybe they even fly them in from across the country, and within five minutes of sitting down, they're like, "This is just not the person. I can't sense that they're excited about this," or there's kind of these cues that people communicate through their video that they can't communicate with their resume.
Lindsey: Like I said, we build that platform, and then many people still don't have time to watch all the videos. The machine learning part of it comes in when we say, we have a high volume position. We have many examples of people who have interviewed in the past and kind of how they turned out, how is their performance in the job, and depending on the job, that could be just a retention metric, do they stay in the job, if it's a high turnover position. It could be their sales numbers.
Lindsey: Companies have different performance metrics that they use, but we tend to like less subjective measures, and then we train the model to predict that thing. Whatever the company is interested in maximizing, we build a model on their previous interviews and apply the model to interviews that are coming in. The features we use are just the words people say, the tone, the way they say them, and the facial expressions they make.
Kirill: Whoa. Sorry to interrupt. I found this super exciting. I just want to reiterate. You guys use what they said, so you use speech to text, right? You analyze the text. The tone, which is in the actual tonality, which you can't get from a resume. It's not just written out text. It's the intonation in the voice, and also, with computer vision, you analyze the facial expressions of those people while they said those things, and you put all the combination of those three types of parameters into the machine learning model, and then you predict how successful a person will be on the job. I just want that to sink in for our listeners how crazy it is, how far technology has advanced and what an amazing solution you guys have come up with. Sorry to interrupt. Please continue.
Lindsey: No, it's fine. Yes, so it is. I felt the same way the first time I heard about it. It's really amazing. It's a lot like what a human does in a job interview. You're taking in a lot of information. One thing that's great about it is that humans can be pretty biased and they can not even know why they made certain decisions. When we build a machine learning algorithm, we can look at our model and we can audit it after the fact, so we can look at if it treated different groups differently.
Lindsey: Sometimes your training data is biased, and we see that. We see that a lot. We also see it depending on country, like sexism is way worse in some countries. If we have some kind of subjective performance measure and women were scored significantly lower than men, we see that and the model can learn that behavior.
Lindsey: That's a big question that I get immediately when I tell people about what we do, like, "How do you prevent that from happening?" It's something that I think we're really on top of and it's something that's very interesting and a lot of people are talking about right now is fairness and bias in AI. There's ways of going back after you build a model and seeing how it treats different groups. If there's any significant differences in how it scores different groups, then the, it's called adverse impact or disparate impact, needs to be mitigated.
Lindsey: There can be proxies. Obviously we're never going to feed race in as a feature in our model, but there could be proxies to things like age, race, and gender. I think a big one for gender is tonality, just the way you speak. It could be pronunciation of certain words gives away something about your race or your gender or your age. If that does happen, we can simply remove those kinds of features that could lead to that discrimination in the model.
Kirill: Very, very interesting. That's such a good obligation. I notice, it's really funny, because one of my friends, she was applying for jobs with big airline companies and then she sent me, "Oh, finally, I got an invitation to an interview." At the bottom, it said, "Interview will be conducted by HireVue." I won't name the airline company, but it's a huge one in the world, and I was so surprised. I really wanted to send that to Ben Taylor.
Kirill: Yes. A lot of big companies are using this stuff, your product, not stuff, but your product. I was wondering, what kind of feedback are you getting? Are you getting happy clients or not happy clients?
Lindsey: Yes. We've had a lot of success with it. Yes, you brought up the example of a flight attendant, and I was going to bring up an example, as well. Obviously we don't tell our model that anything is important, but for a flight attendant, something that would end up coming out of our models is something like flight attendants that smile a lot and speak in a certain way tend to be better flight attendants. Our model looks for things like that.
Lindsey: It's a really difficult problem that we're trying to solve, so I think we've had a lot of success, but I also think it depends on someone's expectations going into it. This is a problem with generally just not understanding statistics very well, but I think some people expect this is going to be perfect, and I should be able to look at the score of these people and I would agree with every single score. Of course, that's not always the case.
Lindsey: I think that's something that I struggle with just explaining in any algorithm, you're going to have some false positives and true negatives. The idea, though, is that we have clients who can't possibly look at all their interviews and they're looking at a random subset, and we can help them choose that subset more wisely so that there's a higher percentage of top performers in that group.
Kirill: Yes, okay. It doesn't have to be the final decision, and so you have 1,000 applicants instead of just picking one that gets the job, you could use the model to pick the top 10, and then humans just do the final step.
Lindsey: Definitely. Yes, definitely, our models are early in the funnel. Like I said, kind of replacing a resume screening or a phone screening. This is getting through early stages of the funnel and then we very much rely on humans to make final decisions.
Kirill: Got you.
Kirill: What feedback have you gotten from the candidates themselves?
Lindsey: A lot of people like it. Generally, I think our feedback is good. I think when I first did it, you feel a little awkward. I think if you're video chatting, you feel a little awkward. It's something that I'm not quite used to, so the one thing that I think probably over time, people would just get more used to is just recording yourself speaking without immediate feedback. I think if you're a little insecure or nervous, it can make you feel awkward that you don't have someone giving you feedback or nodding their head.
Lindsey: I think it's something that as a society, we're getting more and more comfortable with and people record themselves a lot, doing vlogs or YouTube videos. I think that getting over that awkwardness is kind of a necessary step, but it's happening more and more that a lot of people really enjoy it.
Lindsey: A lot of a companies, the way they use it is not to replace an in person interview. One complaint I have heard is kind of like, "I would rather go in there and have an interview," but that's usually not the stage that you're at in the funnel. You're way earlier than that, so it's actually replacing your resume or phone screening.
Kirill: Okay, awesome. Well, that's some great examples of the work you guys are doing. Now I wanted to talk about something that you have in common in your roles, but you might have different perspectives on this. The topic is, what does it mean to be a director of data science? Both of you are directors in your respective companies. How is that different to just being a data scientist, and what does it involve? Is this a natural progression for any data scientist to go up the ladder and eventually become a manager and director, or is it something you wouldn't recommend to everyone? Maybe Jorge, let's start with your opinion on this topic.
Jorge: Yes. For me, my role is very unique in the sense that there's different aspects of the company that are very important and I have to be involved in all of them. As I said, our company basically revolves around doing machine learning and helping companies adopt machine learning, but we have at least three big sections of the company, one of them being the product. We're developing software to do this.
Jorge: The other one being the consulting branch of the company, because we get into a business partnership basically with our clients or we do consulting for them and help them get on the staff to asking the right questions, the right business questions and framing them in the way to be answered by machine learning, and then the third aspect of the company being sales, where we have to get in touch with a company and understand what kind of questions are they going after and how we could help them answer those questions through machine learning.
Jorge: As a director of data science, I have to be involved in all three of those aspects of the company. I have to be in sales calls with data scientists from other companies and try to get a feel for what kind of problems they're trying to solve, and then thinking with them how our product could help them be more productive and how it could help them empower other people in the company to basically do data science for them. That's one aspect.
Jorge: The consulting side of it, I talk to clients with which we're already in a partnership with, and I help them go through their data and go through their business concerns and help them frame those concerns in a way that we could tackle with machine learning and help them get on that path.
Jorge: On the product side of it, I'm very involved in developing our product and just tracing the way in which our product will be involved in time to address needs as they arise. You get to be involved in many different aspects, or at least I do in my current role, so that's one thing that I find really exciting.
Jorge: People have different personalities in data science and some people might be just very passionate about writing code and improving the performance of different algorithms, but in data science, even practicing data scientists in general, I tend to think that there's almost two camps.
Jorge: I guess a good analogy is people who are interested in cars. There's always the people who are interested in cars but they like driving cars, and those people care about going fast and trying new cruise control features and going into the fifth gear and seeing how fast that gear can go, and then there's people who are interested in the design of cars. They don't necessarily care about driving it. They care more about why is there a fifth gear in the car, and why five and not six and seven, and just how the car is designed in general.
Jorge: Taking it to the data science point-of-view, there's data scientists who are just interested in using tools that are out there to solve cool problems, and those are the people who are going to use [inaudible 00:45:16] to build image recognition algorithms to solve a problem. Then, there's the people who want to go and look at the source code and Scikit-Learn and understand how the algorithms are implemented and maybe try to tweak the way they're implemented and play around with how can they make them more efficient and things like this.
Jorge: There's data scientists who have many different kinds of interests. There's data scientists who have a huge interest in business in general, so as a director of data science, you have to be very interested in the data science aspect of things and in the math and the data and the code, and also in the business side of things and in the big picture strategy side of things.
Kirill: Yes, no, that's a very, very cool way of putting it. You mentioned what keeps you excited about being the director of data science, and I guess a combination of these three pillars, consulting, product, and sales, and how you have to make decisions in all three. What are some of the challenges that you face on a daily basis?
Jorge: Yes, I guess one of the biggest challenges is dealing with companies who have never used machine learning before because you might have a company that has good historic data that they've been saving. The problem is if there's an industry that you're not familiar with, having the right conversations with the right people who understand the business who understand what questions are really the bottom line for them, and then from those questions, picking the ones that could really be great questions for machine learning algorithms to attack.
Jorge: They also have to be questions that are actionable, so if a machine learning algorithm gives you some insight into the problem, it better be some insight that you can do something about. That's one of the interesting challenging parts of the job, trying to find from a business side of things, how can we frame these questions in a way that we can tackle them with our software.
Jorge: Then, the other big challenge, which is super interesting to me at the moment, is more of a design question. That's basically revolving around, so we're building a product. We offer a software as a service platform, and there's a lot of design decisions that go into building a product. By design, I don't mean making the product look nice or be intuitive for the user. We have thankfully a superstar UX team that takes care of that side of things, but I'm talking about the design of how all the number crunching happens in the background.
Jorge: There's a lot of decisions to be made on how we're going to pre-process a data set. If a user wants to use a software and not have to make any decisions and make the least amount of decisions possible, they just want to say, "Here's a data set. Train a machine learning model," so we have to make decisions about what pre-processing steps to take on that data, what algorithms to try.
Jorge: First of all, we've got to decide what kind of problem it is. Is it a problem that we're going to tackle with regression algorithms or classification algorithms or time series analysis? We have to make a lot of decisions around how these things are going to be done and in what order, so that's a challenging part of it, and that's why it's so fun, because you have to basically automate things in a way that makes sense.
Kirill: Mm-hmm (affirmative). Does that design vary from client to client? Is it client-specific?
Jorge: No, it's just for the product. When you're trying to build a product that is very general and can work in many different situations, that's why good design is so important, because you have to design it in a way that it could be used to tackle a huge variety of problems.
Jorge: If it was only for one problem, it would be very easy to design it because you know exactly how to tackle that kind of problem, but the challenge is designing a product that is robust enough to work in many different situations, and that's why it's such an interesting design problem.
Kirill: Okay, that's cool. That's really cool, actually. I love how you answered the question. You can tell that you've got lots of experience with this, that even a question about challenges, you answered in a way that those challenges are exciting for you.
Jorge: [crosstalk 00:49:43] because it provides a lot of interesting work, for sure.
Kirill: Yes, yes. It's really cool. Very excited about that. Very exciting for your role, and I can see how it's so much fun. Okay, and Lindsey about your view? What's your view on what it means to be a director of data science?
Lindsey: Yes, so when I transitioned a few months ago, I would say the biggest differences I've experienced are just being more involved in the big picture, so more responsibility, more meetings, more planning, delegating work to other people, which is nice, but it's also kind of hard to get used to when you're kind of used to, "I'll just do this myself," but to tell people, "Hey, we've got a bunch of stuff to do. You do this, you do this," and you kind of delegate and act more like a conductor than just playing your own instrument.
Lindsey: That gives you less time to have that super focused coding problem solving time, which just like Jorge said, it's like people enjoy different things. I think management is not for everyone. In most fields, a lot of times when you become a manager of people, you do less of the thing that you got into the field for, like the actual work.
Lindsey: I think there are a lot of people who are incredibly valuable data scientists who prefer to continue to be really focused in deeply on problems and solving problems and having the space to do that. Tying into the other question you asked about what's the biggest challenge, to me, balancing those two things is the biggest challenge, that big and small picture. Getting into the weeds, I still do a lot of that on my team, but at the same time, I have to be doing this managerial role.
Lindsey: It's really hard for me to switch my attention when I'm deep focused and solving a problem, but then I have a million little things to do as well. That's something that I'm still trying to figure out, how to manage my time to say during certain hours of the day, I'm doing deep focused work, and during other hours, I'm doing the more shallow big picture work and delegating and organizing and answering emails and things like that.
Kirill: Mm-hmm (affirmative). That actually reminded me of an interesting article. There's an article on the Y Combinator by actually one of the creators, or the creator of Y Combinator. It's on maker's schedule versus manager's schedule, and he talks exactly about that, that sometimes when you need that deep focus, you need to block out at least half a day or a day to be able to be creative and really get into the state of flow to answer those questions.
Kirill: That's maker's schedule, and for manager's schedule, your typical day would be broken down into 30 or even 15 minute slots where every 15 or 30 minutes, you have a new meeting, a new thing you have to look at, a new person you have to talk to, and so on. It's about being able to switch between the two, I guess. Would you agree with that?
Lindsey: Yes, exactly. I'll look that up. That's exactly what we're both going through.
Kirill: Yes, yes. I totally agree with that. What about people skills? How important are people skills in a data science director role, because as I would imagine, in any managerial role, they're important, but specifically in data science, you have such unique people that you're working with who are very talented, but in a lot of cases are not always very approachable, or you have to approach each person in their unique individual way that will get through to them, that will help you communicate with them. What are your thoughts on that?
Lindsey: Yes. I think it's huge, and not every data scientist needs it. It depends on what kind of data scientist you want to be, but it is very valuable to have technical people that can explain things to non-technical people. In my role, and it's probably more so for Jorge, but in my role, just explaining to our clients what we're doing, because it's a little mind boggling, and also, they have a lot of concerns and questions.
Lindsey: Having people that can do that, we have a really good IO psychology team, industrial organizational psychologists that we work pretty closely with, and they face the customers a lot, but a lot of times, people want to talk to a data scientist and they want to hear it from them. I think that's a hugely valuable skill, just being able to explain technical things to any audience.
Kirill: Mm-hmm (affirmative), that's a really good point, but another thing I was trying to understand is the people skills that you have in order to work with your team, in order to make sure your team is happy and that you resolve any conflicts and things like that. What do you think of that?
Jorge: Yes, I think the big thing is understanding what motivates different people and getting them. If there's somebody who is really not that interested in being customer-facing and they're more interested in the technology side of things and they want to write code, then that's great. We should exploit that that's what they want to do and just let them do what they're good at. If somebody really enjoys doing a different aspect of the business and they're good at that, then it's going to let them do what they enjoy.
Kirill: Mm-hmm (affirmative), yes, no, totally agree with that. Okay, well, we're getting close to the end. This hour's been running through really quickly. I got an interesting question I would like to get your opinions on. It's a more philosophical question, which I love to ask. Where do you think the field of data science is going, and what should our listeners look into to prepare for the future that's coming ahead?
Jorge: Well, I'm kind of biased in this because I'm working on building pools for the future of where the field is going. I think what's happening in pretty much every field, not just data science, there's more and more automation and there's basically better and better tools that are being developed that allow tasks to be done more easily. Data science is no exception. There's better and better tools out there all the time that allow people to spend less time implementing their ideas and more time just having good ideas and dealing with the business side of things, so that's where the field is going.
Jorge: We've reached a very high level of abstraction already with current packages that exist in the languages that are common for data science. R and Python have so many packages and libraries that take care of so many details already in the process of doing machine learning, for example. Now we're taking it to a higher level of abstraction. We're making better tools that take care of even more of the details and the field is moving more towards having people who understand the concepts, but don't necessarily have to know how to implement them.
Jorge: I think for people who are trying to get into the world of machine learning and building models and making predictions using these models, it's more important to understand just the big ideas of what types of models exist out there, what types of questions these different models can address, and what kind of insights you can typically get from these models and how to interpret those results.
Jorge: Those are the big skills that are really useful. The details on how to implement one model or another can be taken care of by current technologies, so it's more focused on the big ideas as opposed to the detailed implementations. I think we're moving more and more in that direction.
Kirill: Mm-hmm (affirmative). I'll just add to that that another point that's I think very important, and I kind of think it ties in with what you mentioned, is the intuition behind those models, right? The mathematics, it's good to know how it works and so on, but it's way more important to know intuitively which model is better for which scenario, how they work, what actually happens when you click that button, and the logistic regression runs or the decision tree is constructed and things like that. I'd say that's an important part. All right, Lindsey, what are your thoughts on where the world is going?
Lindsey: Yes, so mostly agree. AI is going to touch everything, and I think that humans will have an important role working alongside AI, but we do need to be prepared for certain tasks being automated. I think no matter what you do, you kind of have to look at the pieces of what you do that require you to be human and the pieces that could be automated. That's different for every job, but at least in the near future, personal skills are not going to be so easily automated.
Lindsey: Really, what humans have to offer is thinking about interesting problems, having intuition like you said, and being creative. I think those things will likely become more and more important as a lot more even of these high level technical jobs become automated.
Kirill: Mm-hmm (affirmative), yes. Do you think that in the next couple maybe decade or two, that data scientists will be completely made redundant, or there will always be space for data scientists?
Lindsey: I think their job will change. I do, and I think a lot of that, like Jorge said before, if you were doing what was data science or some statistical analyses years ago, so much more of your job consisted of things that were probably really mundane and boring to do for us. Now we can move a lot faster because of the tools that we have. I think there's a still a lot that data scientists will do, but the nature of their job will probably shift.
Jorge: I agree. I think there will always be demand for people who can think mathematically and statistically. Just their life will be easier all the time because a lot of the details can be automated, but being able to think in that mindset and interpret results that the different pieces of technology give you, that's always going to be very valuable.
Kirill: Do you think that on one hand, does that ease or make it harder or increase the barrier to entry in today's sense? Previously, if you had any sort of technical skills, you were at an advantage against the rest of the population and you can get into data science quite quickly, but now because it's becoming so easy to do data science, anybody, let's say five years from now, anybody will be able to do it. Therefore, entry level data science is not going to be a valuable skill anymore and in order to become a proper practicing data scientist who builds a career around it, you will probably need to take it a step or two steps farther and be quite proficient in more advanced techniques and technologies and maybe even design those algorithms. What are your thoughts on that? Is that going to increase or reduce the barrier to entry into the data science career now just that anybody can do this?
Jorge: Yes. That's a great point, and I guess it depends on how you define the data scientist and where you draw the line. I think new technologies are really lowering the bar, making it so much easier for people to do what we call data science now, or basically what people would call the machine learning practitioner. That kind of role, people who build machine learning models on a company's data and deploy them on the data, we're making it so much easier for people to do those kind of roles with current technology.
Jorge: The people who actually build the technology must have a deeper understanding of how things work in the background, so that's a different role than the people who use the technology, and it's just a matter of which one you call a data scientist.
Kirill: Yes. Do you agree, Lindsey?
Lindsey: Yes. I was going to say the exact same thing. Maybe those two scenarios you described will just become two different positions. There's a practitioner and then someone with a much deeper understanding.
Kirill: Than the designer.
Lindsey: Mm-hmm (affirmative).
Kirill: Okay. Well, thanks a lot, guys. That was a great chat. Super excited to have had you on the show, and just one more question. Where can our listeners find you or follow you to see how your careers progress from here?
Lindsey: I'm on LinkedIn and Twitter. It's Lindsey with EY, Zuloaga. I have some LinkedIn blog posts, and pretty easy to find because of the last name.
Jorge: Yes, same for me. You can find me on LinkedIn with my name, and I guess you'll include the URL on the podcast.
Kirill: Yes, yes, you're looking into the future. We will include the URL on the podcast. Yes, I'm sure you'll get lots of people contacting you to see how your careers progress from here. One more question for both of you. What is or are a book or books that you can recommend to our listeners to help them with their careers?
Lindsey: Yes, I really like Python Machine Learning by Sebastian Raschka. It kind of takes you from super simple perceptron model all the way up to way more complex things all using Python with good hands on examples.
Kirill: Mm-hmm (affirmative). Thank you.
Jorge: For me, a book that I enjoy is Hands On Machine Learning with Scikit-Learn and TensorFlow. It's a very hands on and practical book, but really enjoyable.
Kirill: Awesome, so there we go. We got Python Machine Learning and Hands On Machine Learning with Scikit-Learn and TensorFlow. Okay, well, thank you again very much guys for coming on the show. It's been a great pleasure having you both, and I'm sure lots and lots of people got tons of value from all the insights that you shared today.
Lindsey: Yes. Thanks so much.
Jorge: Thank you. It's been a pleasure.
Kirill: There you have it. That was Jorge and Lindsey Zuloaga, directors of data science at Big Squid and HireVue respectively, and we thank them so much for coming on the show. It was our very first episode where I was conversing with two guests, and I think it went quite well. I hope you the same, and I'd love to know your feedback.
Kirill: Also, what was your favorite part of this episode? For me personally, I think a lot of things were very insightful that they shared, but I would say that for me, the highest value was from when they talked about what it means to be a director of data science. They both provided some very interesting comments on that, and I liked the insight that Jorge gave about the components of being a director of data science. You've got the consulting, the product, the sales to worry about, and also that analogy that he gave with cars, that some people like collecting cars, some people like building cars, some people like driving cars.
Kirill: Maybe a director of data science role is not necessarily for everybody, and like in other professions, climbing the corporate ladder and getting to a manager level or a director level might seem natural, whereas in data science, it's a personal preference whether you prefer to stay at the data scientist or senior data scientist level or then would rather become a director of data science.
Kirill: I also really liked Lindsey's insights in the sense that it's a good clarification that as a director of data science, you sometimes have to think of the small picture, but often you have to step back and think of the big picture. You also need that type of skill, or you need to be passionate about that type of approach where you're thinking about the big picture of data science projects as not always at the small picture level. It's an interesting combination of the two.
Kirill: Yes, hopefully those insights will help you understand better if a director of data science is a role that you want to pursue sometime in the future, or if perhaps you might be more interested in the more data scientist slash senior data scientist role and growing your career in that direction.
Kirill: There we go. That was our podcast today. Make sure to check out SuperDataScience.com/135, where you can connect with our guests, find their links, find the links to their LinkedIn profiles, and make sure to connect with them there and any other places that you can find them and follow them. Also, you will find all the show notes, all the recommended books and other materials on the same page, SuperDataScience.com/135.
Kirill: If you enjoyed this episode, forward it to a friend. Maybe somebody if you know is a director of data science or becoming a director of data science, this might help them out. On that note, thank you so much for being here today. I really appreciate your time and look forward to seeing you back here next time. Until then, happy analyzing.