Podcasts SDS 365: Deep Learning Models For Recruitment

81 minutes
Data Science, Deep Learning

SDS 365: Deep Learning Models For Recruitment

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

We covered the importance of data science in medicine and epidemiology, the role of data science in recruitment, testing your models for bias, what Jon thinks the future holds for deep learning and so much more as we dive into Jon’s work in data science.

About Jon Krohn

Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the 2019 book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy, as well as online via O’Reilly. Jon holds a PhD in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010.

Overview

Jon did his educational work at Oxford working with genetic data sets and applying machine learning to find patterns. Jon focused on data believing, whether or not he stayed in medical sciences, data would be important. His career pendulum swung to the other side at a hedge fund and found it unpleasant. He identified data science as his next step after leaving trading when a friend who worked at Zocdoc offered him the position of a data scientist. Jon’s unique medical research background gives him an advanced look into the way data science plays a role in epidemiology and pandemic tracking.

Jon’s current position is at untapt, which is a machine learning company that offers systems for HR purposes. Their clients utilize their algorithms to comb through thousands of applicants for job openings. Their algorithm was able to locate 21x more ideal applicants for a specific role than human sifting through the applications. Their most common model is human-to-job matching though they do many other types of models for the human resources industry. He makes the point that while many Kaggle Competitions teach you to focus on the most accurate algorithm, Jon’s role has been more about efficiency. Jon’s algorithm work is unique in that it doesn’t focus on keywords so much as human intuition, this is a way of avoiding people who put “laundry lists” of keywords on their resume as a way to trick models. As a result, the explainability of the model takes a hit but to make up for this, Jon’s team has worked to make sure the model lacks bias.

Beyond his work at untapt, Jon is a published author and educator in the space of deep learning. He got into deep learning education as a way to understand deep learning better. His will also join us at our first DSGO virtual event at the end of June where he’ll be sharing an interesting talk on deep learning. In his book, “Deep Learning Illustrated” he takes a unique approach to teach by way of illustration. Starting May 28th, he’ll be offering new courses on the foundations of machine learning.

We continued with a Q&A session from LinkedIn’s post:

Why is R Jon’s go-to tool for data exploration and which language should new college graduates go for?
– Jon says, today, he utilizes Python more than R though the latter was his bread and butter the majority of the time. He still likes the plotting libraries of R better than Python. He thinks diversifying your languages for different problems is ideal.
What does Jon think of the topic of predictive decisioning in candidate behavior, yes or no to roles, based on email communication?
– Jon doesn’t have experience working with those types of models. He thinks that utilizing algorithms in that way would eventually breakdown and tends to shy away from tools or “games” that analyze candidates via speech or communication patterns. Be skeptical but explore.
Are we in a simulation?
– Jon mentioned his podcast A4N where they discuss this topic with Ben Taylor. In their discussion, Jon claimed they definitively proved we are living in a simulation.
What is the next version of deep learning?
– Jon wonders what is going to make the next big splash in the space. He notes that the computational complexity of some of these leaps are too much to be viable. The next deep learning might be some kind of learning approach that requires much smaller data sets.

We closed things out with Jon’s recommendation for someone aspiring to be great at deep learning. There’s a huge capacity for anyone to be making a huge positive impact on the world. To be best at anything, be focused on one task at a time and be completely aware of why you’re focused on that specific task.

In this episode you will learn:

Coronavirus update in New York City [2:36]
What brought Jon to New York? [5:38]
Data science and coronavirus [12:50]
Jon’s work at untapt [18:09]
Techniques used to design models in untapt [22:02]
untapt’s approach to explainability and bias [30:19]
Jon’s other contributions to data science [38:10]
Jon’s book and visual teaching styles [44:32]
LinkedIn Q&A [52:05]
Jon’s recommendation for becoming best at deep learning [1:13:09]

Items mentioned in this podcast:

Talk to Transformer
DistilBERT
Deep Learning Illustrated by Jon Krohn, Grant Beyleveld and Aglaé Bassens
Artificial Neural Network News Network Podcast
Deep Work by Cal Newport
DataScienceGO Virtual

Follow Jon

Episode Transcript

Download The Transcript

Podcast Transcript

Kirill Eremenko: This is episode number 365 with Chief Data Scientist at untapt, Jon Krohn.

Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now, let’s make the complex simple.

Kirill Eremenko: Welcome back to the SuperDataScience podcast everybody. Super pumped to have you back here on the show. Today, we’ve got a very special guest, Jon Krohn. What you need to know about Jon is that he’s a speaker, he’s an educator in the space of deep learning. He is an author, a bestselling author I should say, of his book, Deep Learning Illustrated, and he is the chief data scientist at untapt. Jon has impacted numerous data scientists with his deep learning lectures, tutorials, workshops, and explanations. And in this podcast, it’ll be no different.

Kirill Eremenko: Today, we will cover off topics such as coronavirus and data science, models for human resources, natural language processing, transformers, BERT and even GPT2. The role of a chief data scientist, the tradeoff between accuracy and compute complexity and also explainability, checking for bias, helping people learn, making your AI narrow and transfer learning versus one shot learning. And those are just some of the topics we’re covering off in today’s podcast.

Kirill Eremenko: So all in all, this is going to be a very packed episode with lots of knowledge bombs. Get ready to be blown away. Without further ado, I bring to you Chief Data Scientist at untapt, Jon Krohn.

Kirill Eremenko: Welcome back to the SuperDataScience Podcast everybody. Super pumped to have you on the show. And today’s special guest is Jon Krohn, calling in from New York. Jon, how are you going?

Jon Krohn: I’m doing very well, Kirill.

Kirill Eremenko: That’s awesome. And I love asking this question at the start, because we’ve already been chatting for 15 minutes or so, and still, how are you going? Let’s get this going. Yeah, man. So lots of cool things, lots of cool things happening. At the same time, of course, we’ve got the whole situation still with coronavirus. How’s New York fairing? Is it slowly past the peak, or is the worst yet to come?

Jon Krohn: In terms of what I see on the streets, things are exactly the same as they have been for several weeks. As of right now, from the end of April, we’re on a complete lockdown. When you go on the subway, there’s nobody there. Walking in the street, there’s nobody there. Very easy to keep the social distancing in public. But from what I read, things are definitely improving. The projections of how many hospital beds would be needed, ICU units would be needed, ventilators, all that stuff, none of it ended up happening.

Jon Krohn: The severe lockdown has worked, and yes, tragically, lots of people have been affected. Lots of lost work hours and discomfort. And unfortunately, lots of lives lost as well, but it has not nearly been as bad as was feared. The infection rates as well as hospitalization rates are now coming down, and the governors of New York and the surrounding area are starting to figure out how we can start opening things back up with lots of testing and that kind of stuff.

Kirill Eremenko: That’s good to hear. Well, hopefully it all settles down sometime soon and people can start moving around. Because also being indoors is, I think, taking a toll on many people as well. You want to be outdoors sometimes at least.

Jon Krohn: Yeah, absolutely.

Kirill Eremenko: You’d know. You come from Canada, right? Like Toronto, probably great places to go outdoors there, no?

Jon Krohn: Yeah, although it’s a funny thing. Growing up in Canada, I think people have really an idea of people spending lots of time outdoors. There is a lot of greater tourists, but I grew up in downtown Toronto, and I’ve never camped in Canada in my life.

Kirill Eremenko: No way.

Jon Krohn: I’ve lived a very urban life. We lived one block away from a subway station from when I was a young kid going to the subway station on my own and taking a train into school. So the opposite of the rural Canada experience that people think of.

Kirill Eremenko: Got you, man. Oh, that’s crazy. I don’t know even if you can call yourself Canadian now.

Jon Krohn: Oh, exactly. Toronto is the most American city in Canada. A lot of the rest of Canada doesn’t even really like people from Toronto. But it will be nice. That is the one thing that you can do in New York. I know some places they have severe lockdowns where you can’t even go outside, but in New York it’s actually encouraged, going on bike rides, going on runs. So I’ve been able to keep that up. And actually from that perspective, it’s nice because you don’t have to worry about traffic in the streets.

Kirill Eremenko: Oh, true.

Jon Krohn: You don’t have to [inaudible 00:05:30] red lights or anything. So there are some very small benefits.

Kirill Eremenko: Gotcha, gotcha. Well, that’s good to hear. What brings you to New York in the first place?

Jon Krohn: Great question. I did my PhD in Oxford in the United Kingdom and I specialized there in developing machine learning algorithms distributed over many computers for analyzing the large data sets of brain imaging and what we call genomic data. So genetic data where you have genetic information from the DNA of the entire organism. These are very large datasets, and so you need to apply machine learning to be able to identify patterns in the data. You can’t possibly do it manually.

Kirill Eremenko: Wow. Wow, very cool. Because I’ve been looking at LinkedIn, your PhD says neuroscience. So you’re doing neuroscience and you’re doing machine learning in neuroscience? That’s the best combination you can get.

Jon Krohn: Yeah, exactly. I had colleagues. There were 20 people in my year that started with me and studied neuroscience that we did a master’s year. Then most of us stuck around for a PhD afterward. And people were … you could do anything in neuroscience. You could do studies on lab animals or in the lab grow bacterial cultures. To me, it was so obvious that really focusing on using computers as my specialization in neuroscience, it made a huge amount of sense. Because I knew even then, and it’s still true today, that the amount of data on the planet doubles every 18 months. And so there’s a huge opportunity to be automating things, to be understanding the world by using computers to comb through these ever, ever larger datasets and identify patterns.

Jon Krohn: So I was fascinated by that even then. And I thought, “Well, I might want to stay in medical sciences.” And if I do, well, this skillset will be useful. Or if I decide to leave medical sciences, this skillset will be very valuable there as well, and that is what I ended up doing. That’s how I ended up in New York. I ended up working at a hedge fund as a quantitative trader for a couple of years between New York and Singapore, and that was the pendulum swinging too far the other way. On the one hand-

Kirill Eremenko: Same skillset, other side of the pendulum.

Jon Krohn: Yeah, exactly. Where in academia, you can spend your day however you want, you can publish on everything you do and be very open about what you’re doing, but the pay in academia is not always … well, it’s on average, relative to industry. It’s a much lower pay. And then the hedge fund world is the opposite way, where everything is closed. You obviously can’t be sharing your strategies widely at all, and I really like being able to speak openly about what I’m doing. And also it was very narrow problem solving.

Jon Krohn: Basically, you’re trying to predict whether an asset is going to go up or down over some time period in the future. And a lot of people find this problem fascinating. I have lots of friends and my sister even, they work as traders and for a lot of people that seems like a great fit. For me, I’ve been able to find this thing in the middle of commercial data science, of building models that can be widely used. In a lot of cases, you can publish on a lot of what you’re doing, and you can automate repetitive tasks and make people’s lives better.

Jon Krohn: So I think this is a nice middle ground for me these last few years, but I stayed in New York. It’s a wonderful city, especially if you like biking around and if you like socializing.

Kirill Eremenko: Fantastic. Wow. Great. Great story. And that’s how you found data science for yourself, through neuroscience, quantitative trading? What led you to identify that data science is the next thing?

Jon Krohn: Actually, it’s a funny story. I didn’t know that this term existed until after I had left trading. And my initial idea, because most of the time I was doing my PhD in neuroscience, I thought I wanted to go back to Canada, study medicine and do medical research. That’s what I was thinking. Near the end of my time trading I was like, “Yep, I’m going back to Canada. I’m going to go to med school.” But I had a few weeks left of rent on my apartment after I’d left trading. And so my initial plan was to finally get to see the sites of New York, which I was always too busy to see.

Jon Krohn: And I still haven’t seen the sites because I discovered through a friend, she said, “Oh, my company …” There’s a company, they’re called Zocdoc. They’re big in the US. They allow you to find a doctor or any medical specialist online. And she said, “I work at this company, Zocdoc. We have this role open called data scientist, and it seems like you have a very similar background to what we’re looking for.” I looked it up and it was exactly right. It was like, “Oh, wow. I have exactly this background already.” And so that opened my eyes and then I ended up spending all the rest of my time in New York those weeks. Then now, several more years in that field.

Jon Krohn: It’s the perfect fit. I love the data science world so much as I’m sure you do. It’s such a wonderful community. You meet people either online or in person from all over the world, working on so many different kinds of problems, publishing their code in GitHub, sharing it there, writing papers that are published immediately on archive. The speed with which things move is so exciting. And in the time I’ve been a data scientist, there have been so many huge breakthroughs across machine vision, natural language processing, deep reinforcement learning. And I think this is only the beginning really. I think that the best is yet ahead.

Jon Krohn: I talk about this a bit in my book, which we’ll talk about later on. I think that the future is going to be really interesting in the coming decades. Data is going to facilitate enormous changes in society, orders of magnitude more than it already has. And it’s the data scientists that are going to be playing the key role.

Kirill Eremenko: Absolutely, absolutely. We’re already even seeing that with this coronavirus pandemic. At the start, everybody scrambled and tried to, of course, do everything that we can to save lives and not let this fire spread. But now, slowly as the situation is stabilizing, it’s data scientists who are looking into the data sets, who are modeling what was happening or modeling how the outbreak happened, what’s going to happen in the future.

Kirill Eremenko: For instance, Sam Hinton, who’s been on the podcast before, I was just speaking to him yesterday or the day before. I think yesterday. He’s one of the lead data scientists in Australia handling this whole situation. So it’s the people that we know, that we surround ourselves with, that we learn from, that we network with, these are the people that are now going to be looking into these data sets and how we can prevent the next pandemics from happening. And this is just one example. This is going to be happening across the board. Data is everywhere.

Jon Krohn: Totally. And that epidemiology piece is one part of it, so people are aware more than usual about the work that epidemiologists do and forecasting on, as I mentioned earlier on the show, how many hospital beds are we going to need, ICU and all that stuff. That’s one piece where data science plays a role. Although for the most part, those models are relatively simple. We wouldn’t typically call those AI models, or we wouldn’t use deep learning to tackle those problems.

Jon Krohn: However, there are other parts of understanding coronaviruses or viruses in general that do involve these kinds of techniques where we model an understanding of the shape of proteins that you would expect from the DNA or RNA of a virus, and then how those protein shapes interact with each other. And so there are a lot of really sophisticated modeling techniques that can be used to understand a virus and potentially help us develop drug candidates, or maybe even identify drugs that already exist and that are approved as safe for humans that will bind well with some aspect of the virus’ shape. And that we can use to speed along treating society from this disease. It is really fascinating.

Jon Krohn: From all these perspectives, whether it’s sharing of information or coming up with solutions, if you think about the tragedy that unfolded over the world a century ago with the Spanish flu and how there were more people killed by Spanish flu in the years after World War I than there were killed in the trenches of World War I. And today, we’ve been able to learn. We know so much more about the world than a century ago. It’s crazy, and our ability to communicate.

Jon Krohn: For example, ahead of the Spanish flu pandemic, some big American cities decided to cancel their parades and others kept their parades. And it’s those cities that kept their parades that were impacted disproportionately worse by the Spanish flu. So it’s this kind of thing. This is exactly what I’m talking about. You were hitting the nail right on the head here with this idea of how humans in general and data scientists in particular, have become so efficient at sharing information, even in an automated way over the world today, that we can adapt as an entire species globally, change our patterns of behavior, everyone, so that we can fight this virus and protect as many of our species as we can. The really fascinating thing to think about.

Kirill Eremenko: Absolutely. Absolutely, I agree.

Kirill Eremenko: Hope you are enjoying this amazing episode. I’ve got a cool announcement for you and we’ll get straight back to it. Virtual Data Science Conference, here is! Well, you’ve probably heard of DataScienceGO, the conference we’ve been runing for the past 3 years in Southern California. And maybe you’ve attended it, if so, it was super cool to have you there. But maybe you weren’t able to attend for the reason of being in a completely different country or the flights were too long or the timing wasn’t perfect. There could be plenty of reasons why you weren’t able to attend. But now, we’re bringing DataScienceGO to you.

Kirill Eremenko: So this June, we’re hosting DataScienceGO virtualy and you can attend and get an amazing experience there. And guess what? The best part is that it’s absolutely free. Just head on over to DataScienceGO.com and get your tickets today. This will be our very time running a virtual event but nevertheless we’re still going to combine the three key pillars of fun, amazing talks and networking into this event. You will hear from speakers like Jon Krohn, Sam Hinton, Hadelin de Ponteves, Stephen Welch and many others, plus you’ll be able to network with your peers.

Kirill Eremenko: This event is going to be epic on all fronts and we’d love to see you there. Head on over to DataScienceGO.com/virtual and get your ticket today. The number of seats is limited. We’d love to have everybody there but for our very first event we are limiting the number of seats to make it more managable so make sure to get your tickets today if you want to be part of this. And on that note, I look forward to seeing you there. And now, let’s get back to this amazing episode.

Kirill Eremenko: Let’s talk a bit about what you currently do. You’re the chief data scientist at untapt. What does untapt do, and what is your role there?

Jon Krohn: untapt is a machine learning company. We design models specifically for use in the human resources sector. The most common problem that we tackle is you have a big data set of candidates, so if you’re a big corporation, some clients of ours, they get millions of job applications a year to thousands of different roles, and they are inundated with applications. And so they have to hire large teams of people to use historically, a keyword search to mine over those millions of candidates that they have.

Jon Krohn: Clients of ours, blue chip companies, have done studies using our algorithm compared against their previous approaches that their internal recruiters use to identify candidates in their database. I can’t disclose the name of the company, but a big global microchip company, they found that our algorithm was able to identify 21 times more of the very best applicants for a given role.

Kirill Eremenko: Wow.

Jon Krohn: Relative to their keyword search, because keyword search, if you think about it, if you have a sophisticated Boolean search where you have lots of ands and ors, you’re going to be looking for a very specific person that used a very specific language on their resume. With an approach like ours, that takes advantage of deep learning, and some audience members will know what word vectors are, we can basically have this fuzzy representation of language and the meaning of language.

Jon Krohn: These fuzzy representations allow us to easily find people who approximate the language of a job description, or approximate the language of what a given internal recruiter is searching for. And so in this way, we’re able to identify a much larger pool of possible candidates. And on top of that, the other thing that’s really great that you get out of this is that the candidates can be ranked in a very specific way.

Jon Krohn: We can assign a probability that any given candidate will be invited to interview at any given role, and then we can sort based on that probability. Whereas if you do a keyword based search, you just get-

Kirill Eremenko: Yes, no.

Jon Krohn: … a pool of positive hits. Yes, no. Exactly. So still, you get your positives back and you have to go look at every single one instead of starting at the top of the list. So these kinds of approaches, we, over the years, have been working with many clients and previously we used to run a recruitment website ourselves, which we no longer do. We licensed out that part of our business, but through that process, we built up a dataset of hundreds of millions of decisions of a given candidate being invited to interview to given role.

Jon Krohn: So we have this rich data set that we’ve been able to use to create these probabilities, and it allows our clients both to … it saves them a ton of time and it allows very highly qualified applicants to now be noticed. That 21 times figure is crazy.

Kirill Eremenko: Mm-hmm (affirmative).

Jon Krohn: To think, that 5% of the relevant people were getting a phone call from this company and 95% were being ignored.

Kirill Eremenko: Wow. That’s crazy. Tell us a bit about the techniques, what kind of techniques do you use, especially NLP is so exciting these days with BERT, which is what a year old now, it’s just over a year old from Google. What kind of techniques do you use, of course to the extent that you can disclose?

Jon Krohn: Yeah, I can absolutely talk about these things at a high level. So we’ve experimented with transformer technologies like BERT. At this time, we don’t think that the advantages in performance that you get with something like a transformer are worth the computational time or the computational expensiveness of running these kinds of techniques.

Kirill Eremenko: Sorry, I just want to jump in, do you mind explaining what a transformer is? Sinan Ozdemir explained to me on the podcast a few months ago, I need a refresher.

Jon Krohn: Yeah, of course. So BERT is the most well known of these transformer approaches. What they allow us to do is they allow a natural language model to be able to scan over lengthy stretches of text and identify the most relevant words to some outcome that you’re trying to predict. So it’s the most sophisticated way that we have today of being able to represent the most important parts of a document, especially over long stretches of the document. And there’s, if you want to see a transformer in action, there’s a really fun click and point user interface called Talk to Transformer. I think it’s talktotransformer.com and you can Google talk to transformer. And this uses what some people think is the most powerful kind of transformer today, it’s called GPT-2 that was created by the people at OpenAI, a charity funded by Elon Musk and some other people.

Jon Krohn: And this GPT-2 algorithm, you can use it to generate text for you. So you can start typing something, and this can be a very short sentence, so you could say a recipe for French onion soup is… and then you can have GPT-2 this at talktotransformer.com complete your thought for you. And it’s really amazing. You can have it create some really fun things that you could say, Joe Biden will beat Donald Trump in the next election because, and then it will come up with reasons. And every single time it comes up with something new.

Kirill Eremenko: This is amazing. I’m on there right now. So this is what I typed in. I typed in I’m on a podcast with Jon Krohn, and this is what it generated. I’m on a podcast with Jon Krohn and Chris Sims, sorry Jon Krohn and Chris Sims. This is the podcast I listen to when I’m taking a cold shower. I do find some interesting information in it. One article I like is and so on. Wow. It’s so good. It’s very legible, it kind of even makes sense.

Jon Krohn: Yeah, it’s funny. It does a really good job, it’s not perfect. We still have a long way to go to make an algorithm that can converse with you in a way that is really compelling. But you can see if you play with talk to transformer a few times, you can get this sense of how it’s able to remember say that you typed in my name, Jon Krohn and it will remember that over paragraphs even, and it can continue to talk about me in a way that makes some sense. Sometimes it isn’t perfect, but it’s the best approach we have today.

Jon Krohn: So anyways, BERT, as you mentioned earlier, that’s another transformer. It was, it was earlier than GPT-2 and a bit more widely known, but these kinds of transformer technologies, they are very computationally expensive. So yeah there’s a trade off where you’re getting these really great results. And there are various people, there are lots of research groups who are working on making these transformer technologies less computationally expensive. So a very famous transformer that’s doing this is called DistilBERT. So it distills down the most important parts of BERT, but even then, they’re still very computationally expensive.

Jon Krohn: So for our purposes, for what we’re doing, to give the example again of that candidate to job matching, we do other kinds of human resources models, but that is a very common one and so I’ll return to it again here. When we’re doing a search over the natural language of a client’s database that has millions of profiles in it, it’s much more important to us to be able to give them results in a sub-second timeframe, then to be able to get slightly higher accuracy. And if we were to switch from the kinds of approaches that we’re using today, which involves deep neural networks still, but it involves kinds of combinations of neural network layers that are pretty well known. So convolutional layers, long short term memory units, and also using natural language pre processing approaches like creating word vector spaces that enable us to have this kind of fuzziness of meaning around language like I mentioned earlier on.

Jon Krohn: By using these kinds of approaches we’re able to surface very, very quickly results that I think qualitatively, they would be the same as using these much more computationally expensive transformers. So this is an interesting thing you actually asked and I haven’t really answered. You said what’s my role as chief data scientist, and this is a big part of being the chief data scientist is taking these kinds of considerations of accuracy versus computational complexity. What is the user’s experience going to be like with a model like the model that we’re building?

Jon Krohn: In academia, when you see machine vision competitions, natural language processing competitions, or Kaggle competitions, the purpose is almost always to build the most accurate algorithm. And so you come up with these complex ensembles of various different approaches, gradient boosted trees, deep learning models. But in practice, when you’re building a product for clients, it’s often actually much more important to be efficient. So they’re not waiting around for results. You can do things real time. And getting that efficiency and engineering things in a way that you get sub-second results can often be more important. And so it’s that kind of thing that being a chief data scientist is, at least in my case.

Kirill Eremenko: So you’re always kind of reigning in the super ambitious and driven and passionate data scientists who are like, all right, let’s do BERT or let’s do GPT-2 and so on. And then you’re like, okay, what does the client actually need? Where’s our accuracy versus a computational complexity and managing all those things to get a good result but at the same time, meet the expectations of the client.

Jon Krohn: Yeah, exactly.

Kirill Eremenko: Very cool. And one more thing that I would add to that equation, and I’d be curious to see how you guys go about it is so you mentioned accuracy, computational complexity, but there’s also a new concept, well relatively new concept that’s becoming more and more important, which is explainability, right? So you can have a very accurate model and maybe even computationally affordable, but it might not be explainable enough. And some companies will refuse to use it because they’re afraid of maybe bias or racism that is inherent in the model and they just don’t know how to explain it to their end user at the end of the day. How do you guys go about explainability?

Jon Krohn: Yeah, that is in this case, it’s hugely important. So we have taken an approach where, so there are some companies out there, some competitors of ours that build models that are solving a similar kind of problem to ours. They use very simple models where you can explain much more. So they will do things primarily around keyword counting. So they can say, look, these are the keywords that were used in your job description and look at how many of those correspond that’s literally our competitors, the biggest names. I won’t mention them by name, but the biggest names in our field, that build these same kinds of models, that’s what they do. They have big lists of keywords.

Jon Krohn: And so if they see the word Python and Java, then they will say, okay, these are a software developer words and they’ll kind of base… and so they maintain these lists. And so they search over job descriptions and they search over candidates for matching keywords. And they’re nested so you can say this group of words, because there were a bunch of words, Java, Python, C++ , this person is a software developer. And so they can show that very specifically.

Kirill Eremenko: So you can basically game these algorithms right? And people who are more elaborate using synonyms and stuff like that are a bit more modest they’re just going to be passed by.

Jon Krohn: Absolutely. That’s absolutely it. And so you see this. So we see this all the time, tons of people in order to get past these kinds of automated systems, you’ve probably seen this, people put laundry lists of skills inside each of their work experiences. So, they’ll do in every work experience, they’ll have a few bullet points that is designed for a human to read. And then after that, they’ll put something like skills used colon, and then just put like 20 skills. And that’s because these kinds of algorithms that are prominent in the human resources industry, they do exactly that, they do keyword counting. And exactly like you said, people can gain the system in that way.

Jon Krohn: So our system is completely different where it’s using word vectors to represent the language. And so things like you said, synonyms can be accounted for, people being creative with language that’s absolutely fine. And in fact, because our algorithm is trained not on how many keywords match, but on human decisions, these hundreds of millions of decisions of what candidate is the right fit for the right role. Our algorithm actually has learned to penalize people who do that laundry list of skills because human recruiters don’t like to see that.

Jon Krohn: So, the flip side, as you mentioned, there is unfortunately with this kind of approach that we’ve taken explainability is limited. I don’t like the word black box because you can go in and you can learn what all of the weights in your model do, but the interactions are so complex that it can’t be done any in an easy way. So the approach that we’ve taken is to say, well, since it’s not going to be straightforward to explain what’s happening in our model, we need to be absolutely sure that it is unbiased. And so we, for example, have created a dataset that consists of hundreds of cases of women and men that are perfectly suited to the same job.

Jon Krohn: And our algorithm comes out it’s exactly the same score to the decimal point for those people. And it’s related to our modeling approach as well as the way that we preprocess our data. So some of these things are obvious, like removing personally identifiable information, pronouns, all this kind of stuff. So there are things that you can do, those kinds of recommendations I just gave anybody can implement, but we’ve kind of patched it all together into a suite of pre processing and modeling tricks that eliminate bias and we’ve actually applied for a patent.

Kirill Eremenko: Oh, wow.

Jon Krohn: Yeah.

Kirill Eremenko: That’s really cool. I was about to say, could you tell us some more, but since you don’t have a patent yet… Gotcha. Okay, that’s really cool. So I liked that idea. I think a lot of people can learn from that, especially a data scientist or chief data scientist at companies that they are out there, techniques and by the way do you know Ben Taylor?

Jon Krohn: Yeah. Ben Taylor was the last guest on my podcast.

Kirill Eremenko: Oh, nice. Okay. Well, Ben Taylor definitely knows you cause he posted a question for you. And we were just talking to Ben Tyalor or maybe a year ago or so about specific techniques to remove bias. Like you can normalize somehow, even if you don’t, if, for instance, in his case, like when he’s working with photos and videos of humans, you can’t really desensitize it completely, sometimes it’s just impossible to completely make everything absolutely anonymous. And so there’s techniques on how to adjust your model or deep learning, deep neural network, so it eliminates the bias.

Kirill Eremenko: But even, I love your approach because it’s very factual and is applied at the end. Like, okay, you have a model. Great. And now why don’t you have a test dataset that you can run through your own model and if it doesn’t have bias, then the results shouldn’t matter if it’s a male or female, if it’s a, a person of any gender, any race, any background, they should be able to get absolutely the same results. So I think that’s a very simple, but at the same time, very powerful technique that a lot of people overlook and that can help a whole organization avoid later on, bad publicity or even a lawsuit because they didn’t check that their model doesn’t have the bias.

Jon Krohn: Exactly. You couldn’t have said it better Kirill. I was actually expecting those kinds of techniques that you’re describing. What I was expecting to happen was that we would do this test and we would find that there was some bias. And so I’d been reading the literature on these kinds of techniques for adjusting models, but then once we got these results, I was like, Oh, well I guess we don’t need to do that. Which I think is a good outcome because when you apply those kinds of techniques you’re adjusting your model and your cost function in a way that you might have unanticipated side effects that you’re not able to control. And so this was an ideal, a fortunate situation, which I didn’t expect. And yeah, it’s funny that, so what, so Ben Taylor, he wrote something on LinkedIn and I haven’t seen it yet?

Kirill Eremenko: Oh yeah. Oh, so you probably haven’t seen, so I posted a LinkedIn announcement 24 hours ago that you’re coming on the show and I asked for people to submit questions and we got quite a few questions come in. And one of them was from Ben Taylor.

Jon Krohn: Oh, that’s funny. Yeah. I’ll check that out right after the show. I was so busy preparing to be on the program Kirill, so focused. Well, we can talk about focus and that kind of stuff later on.

Kirill Eremenko: Yeah, yeah, for sure. And we’ll get to these questions in a bit as well. Ben Taylor posted some nice ones for you. Let’s shift gears a little bit, and let’s talk about your contributions to the world of data science. So in addition to being the chief data scientist at untapt, you’re actually doing a lot and it’s a very impressive how you find the time. I guess your productivity techniques work very well. You published a book, congratulations. First book and the bestseller on Amazon in two categories and translated into six different languages. Outstanding. You’ve recorded a lot of hours or dozens or maybe more, yeah probably dozens of hours of video content on deep learning. Your videos are available on YouTube, on different other outlets, such as O’Reilly and so on. You’re really adding a lot of value to the lives of data scientists. Let’s start with the question, what motivated you to start creating educational content in the space of deep learning?

Jon Krohn: Oh, complete selfishness Kirill. I’m not even kidding. I got started with teaching these techniques because I wanted to understand and better myself. Anytime that you’re going to have to teach something, you have to really dig deep into, okay, what assumptions am I making here? And where is my argument weak? And what are people going to ask questions about? And so in large part I set off down this road because I wanted to understand initially deep learning was the thing that I wasn’t familiar with. So we talked earlier about my background, my PhD, that kind of stuff. I finished my PhD in 2012. And neural networks only started becoming popular after the AlexNet victory in 2012 in the image net large scale visual recognition competition. And so it was kind of that was the turning point for deep learning.

Jon Krohn: And so a couple of years have gone by and I’d had deep learning on my radar and I hadn’t been learning it. And so I thought, you know what I’ve got to force myself, to learn it. And so I created something in New York, which we have been running at of untapt offices for years. There’s a deep learning study group. And so I stood up at a meetup and I said, if anyone’s interested in learning about deep learning, I was thinking of putting a group together. We can email around and figure out what we want to study and then go over it together. And so that was the starting point, but then I love doing it. I love teaching. I’ve always enjoyed teaching and lecturing and it’s been a terrific opportunity.

Jon Krohn: It was in 2016 that I created that group. And a lot of meetups in New York at that time, they, they hadn’t had any speakers on deep learning, but their audience was very interested in that topic. And so I could come to a meetup without anybody knowing who I was. The topic alone, drove huge audiences, where they had overflow rooms with a screen and a video feed. And they had people lining the walls. And this happened at several meetups over several months that I was speaking on where they say you know you’re our first deep learning speaker. And I don’t know. So then, yeah and one of those events, there was an acquisitions editor from Pearson, the publishing company. And yeah. So, with her sort of making videos, wrote Deep Learning Illustrated.

Jon Krohn: And now I’m hooked. I absolutely love it. I still do pick topics to teach on that there’s a mix, some stuff is stuff that I already know really well and I think is important for data scientists to know, other stuff is stuff that I don’t know and that I really want to learn and so it allows me the opportunity to shore up what I perceive as a weakness, and the ideal is when it … and a lot of that content, it ends up being both. It ends up being something that I need to know and don’t know yet and also it’s great for my audience to know, and so it’s been this journey, but I’m really glad to be here.

Kirill Eremenko: Fantastic, wow. That’s really cool. And for our listeners, I wanted to make a quick surprise announcement. We just spoke with Jon before the podcast, and Jon agreed to join us as a speaker at our very first virtual event, DataScienceGO. That’s going to be happening sometime in end of June, so look out for announcements at datasciencego.com. Jon’s going to be presenting something special on deep learning and natural language processing. How are you feeling about that, Jon?

Jon Krohn: Yeah. Well, first of all, I’m honored. The SuperDataScience podcast … we were talking about this earlier, before we started recording the show, this podcast, as well as the associated videos, the brand that SuperDataScience has, I’ve been aware of you guys for so long and so I was honored to be asked to be on the show, and then now I’m further honored to have the opportunity to speak at your first online conference. That’s exciting. Yeah, I’m feeling good about it. I’m looking forward to virtually meeting a bunch of new people. As we talked about also earlier on the show, speaking at conferences is one of the most rewarding experiences that I have in my life today. It’s so wonderful meeting people from all over the world and being able to help people understand things.

Jon Krohn: I have a style of teaching, which people seem to really like, which to me was intuitive. A lot of illustrations, a lot of color. I put a lot of personality into my talks and I make sure that I have as few assumptions as I can and lots of hands on demos, and I don’t know. So yeah, it’s great that it’s been able to resonate. I love getting that kind of feedback, and every time I get to talk and meet people, it’s this feedback cycle that constantly allows me to understand the material better, develop better materials, understand what people want to see, and I don’t know, it’s just really good fun. I’m looking forward to it.

Kirill Eremenko: Fantastic. Thank you, and speaking of illustrations, let’s talk about your book. I just ordered my copy today, so I haven’t read it yet, but really looking forward to it. It sounds like a lot of fun. So as far as I understand, and correct me if I’m wrong, but you’re describing, explaining deep learning. There is code so people can follow along and there’s also GitHub repositories. Funnily, I scrolled down to the bottom and you’ve got this super cool image of a myopic trilobite.

Jon Krohn: Yeah, exactly.

Kirill Eremenko: Loved it, but basically, so as I understand your book has descriptions of deep learning models and how they work, but in a very visual way. Tell us a bit more about that. How is this teaching style, which you mentioned, what is it all about, using visuals to describe how a deep learning model works? How do you do that?

Jon Krohn: Yeah, so a lot of things that happen in mathematics and in machine learning can be represented visually as … and I do have equations in the book. You can’t escape that, but I try to, wherever possible, compliment equations with visuals. So drawings of matrices or drawings of the way, an intuition of the way that something is happening. So one of my favorite analogies that I’ve ever come up with, and that is used a fair bit in Deep Learning Illustrated is to explain stochastic gradient descent. So that trilobite that you saw, those trilobites throughout my book Deep Learning Illustrated, the illustrator Aglaé Bassens, she did a wonderful job kind of creating this trilobite as a mascot, and then we use that mascot in lots of different ways. So when we’re explaining what deep reinforcement learning is and how you have an algorithm play against itself, we could have two trilobites sitting on the opposite side of a go board playing go, the board game, against each other.

Jon Krohn: And she did some brilliant illustrations around illustrating stochastic gradient descent, where we have this analogy of a blind trilobite trying to find his way home, and he knows that he lives at the bottom of a valley and all he can do … he’s standing on a hillside and all he can do is kind of use his cane to investigate in the directions around him –

Kirill Eremenko: The gradient.

Jon Krohn: The gradient, exactly, the slope around him, and wherever it’s lowest, he takes a step in that direction and gradually following that approach, he makes his way home to the bottom of the valley.

Kirill Eremenko: I love it. I love it. Definitely going to remember that one better than the balls rolling down the valley and hitting the bottom. Very cool. Interesting approach. So for anybody interested, the book is called Deep Learning Illustrated, and it’s a book with a white cover with a very weird drawing at the bottom. That’s the trilobite we were talking about.

Jon Krohn: Yes.

Kirill Eremenko: Yeah. Okay. So that’s really cool, and then you got another project you’re working on, which is coming out very soon, end of May I believe, Machine Learning Foundations. Tell us a bit about that. So why are you diving into the foundations of machine learning?

Jon Krohn: Yeah, so I’ve been able to … I’ve had the honor over the past few years through developing this deep learning curriculum that has turned into the book that was, as I mentioned, dozens of hours of videos. I teach that curriculum in person and at conferences, in [inaudible 00:46:37] at live trainings. I guest lecture at Columbia University, New York University, and I teach my curriculum at the New York City Data Science Academy. So I’ve been doing this deep learning curriculum. I’ve done it to dozens of different audiences over the years, and the key thing that I realized is people are by and large using abstractions of the algorithms that they’re deploying. So whether you’re using Scikit-learn or TensorFlow or PyTorch, you might have a minimal understanding of what’s going on under the hood in your algorithm, and so in order to kind of help people understand the foundations of machine learning … so linear algebra, partial derivative calculus, algorithms, data structures, probability distributions. So I’m covering all of these topics so that when you’re thinking about your machine learning algorithms, you can understand better what’s happening, how they work, where the dangers might lurk, and where the opportunities might be for you to improve your approach.

Kirill Eremenko: Wow. Wow. That’s really cool. That’s a big challenge you’re taking on. All those topics you mentioned, it’s going to require quite a bit of explaining. How you feeling about it? You’re ready for the task?

Jon Krohn: Honestly, Kirill, I said to you earlier that I’m never nervous, but I am actually really nervous about this. It’s just because of how much … it’s like writing a book, and you know what this is like.

Kirill Eremenko: Yeah.

Jon Krohn: It’s extremely time consuming to make sure that you’re doing it properly and the kind of distress that I know I’m going to be causing to myself and to my personal life by deciding to do this, I’m nervous about that, but I know that the reward … that it’s going to work out one way or another. Eventually you always … you set the deadline and eventually you make it happen somehow, and I’m really excited about this. So we start … May 28th I’ll be offering the first linear algebra class in O’Reilly Safari. It’s a three and a half hour live training, and then a week later we do linear algebra two. A week after that there’s a three and a half hour calculus class, and then another calculus class, calculus two the week after that.

Jon Krohn: So gradually I’m setting these deadlines of kind of once every couple of weeks from end of May through beginning of September with these hard deadlines of doing these online teachings, but in parallel, I will also be taking those online teachings and converting them into video tutorials, and there’s going to be some mix, but a large portion, maybe even the majority, possibly even a hundred percent of them … I haven’t figured this out … I will be putting for free and YouTube so that anybody can enjoy these materials, and we’ll see, there’s other possible avenues for this content as well, which even you and I have been talking about, but we’ll figure those things out, but one way or another, it’s going to be these live online trainings, videos, some combination of which will be free, maybe all of them, and eventually that will also be a book.

Kirill Eremenko: Fantastic. Thank you, and that’s very, very noble and admirable that you’re putting out a lot of this content for free on YouTube. I’m sure a lot of people are going to be excited about that. So everybody look out for those coming end of May. All right. I wanted to use the remainder of our time to chat about some of the questions the audience posted on LinkedIn. Are you ready for some rapid fire questions?

Jon Krohn: Absolutely.

Kirill Eremenko: Okay. Here we go. So Vasco asks, why is R your go to tool for data exploration? And if you’re fresh out of college today, which language would you focus on?

Jon Krohn: That is interesting. I wonder where Vasco got that from.

Kirill Eremenko: [inaudible 00:50:59] Python?

Jon Krohn: Yeah, today it is, but I would say it’s likely that I said that at some point. That was definitely something that used to be true.

Kirill Eremenko: That makes it so much more interesting. Why did it change?

Jon Krohn: So for a long time, R was my bread and butter, and so that just kind of made it easier for me, and actually even today … I don’t use R very much. I spend most of my time … when I’m doing exploration today, I’m most of the time in Jupiter Notebooks in Python. Now I understand you could be using R in Jupiter Notebooks, but I seldom do, and I could still see that there’s a place for R. So for example, the ggplot2 library is a wonderful library, and there is nothing that comes close in Python. So probably the best developed plotting library in Python is Seaborn, and Seaborn has got nothing on ggplot.

Jon Krohn: So maybe actually I’ve convinced myself back, that I should be spending more time … I think that it’s a great idea to learn more and more technologies, and then you have more things in your belt to experiment with, and so I might see something that I want to investigate and it would be quicker to do it in R, though most of the time today, it is Python and that’s largely because Python is … this is going to sound really derogatory to the R community, which I was in for a long time and it was my primary, but R isn’t a real programming language. It’s not for production systems.

Kirill Eremenko: You did not just say that. Oh, gosh.

Jon Krohn: It’s a statistical programming language. It’s not for production deployments.

Kirill Eremenko: Okay. Okay. I completely understand where you’re coming from with this. I have to say though, that I was speaking to Hadley Wickham on the show about a few months ago –

Jon Krohn: Oh, geesh. Well he’s obviously going to disagree.

Kirill Eremenko: They’re working very heavily on making it a production oriented language that people can use, but yeah. Everybody has their own opinion. I think I agree with you, the more you know, the better, I would say that kind of the trap that people can fall into is that you learn one, you learn the second one and the third one, and then you kind of develop … because by nature where generally humans are lazy, and laziness is the driver of progress. We tend to like focus. You find one project you work on in a certain language, let’s say Python, and then you stick to it and you kind of don’t focus on R as much, and all I can say on this topic is I would probably set up for myself at least one week per month or something like that where I can only use R, whether I like it or not, to just motivate myself to stay on top of things, because with time you obviously forget these programming languages.

Jon Krohn: Yeah. I think that’s a great idea, and it’s not even just going back and forcing yourself to use an old language. It’s also challenging yourself to learn new things. So I’ve never used Julia.

Kirill Eremenko: Yeah. I heard Julia is great.

Jon Krohn: Yeah, same, and you hear that a lot, and so it’s the kind of … there’s an endless … I mean, this is a good problem to have.

Kirill Eremenko: Yeah.

Jon Krohn: That’s one of the wonderful things about the field of data science is you can never be bored. There is never any reason that you should not have a full day as a data scientist, because there is an infinite amount of new libraries, new approaches, new theory that you could be learning and deploying for your problems.

Kirill Eremenko: Absolutely. Absolutely. All right. I think we’ve covered that one. Question number two. This one’s from Adrian. Actually, funnily enough, I was talking to Adrian a few hours ago. He’s the recruiter I had mentioned who’s from Ireland and he’s also been on the podcast before. So this question is, would love to know what he, meaning you … would love to know what you think about the topic of predictive decisioning in candidate behavior, yes or no to roles, based on email communication. So do you have any experience, rather than from just analyzing the resume, actually understanding if a person is going to be a good fit for the role based on how, what they respond in their emails and how that exchange is going back and forth?

Jon Krohn: I do not have experience building models that do that kind of thing. I am open to being wrong, and I’m sure that there are some applications that would be perfectly valid. So for example, if you were hiring for an executive assistant and somebody had a lot of spelling mistakes or unclear sentences, you could build models, of course, detecting spelling. This is a trivial thing. We’ve been doing it for decades, and then even detecting kind of the quality of their writing. You could assign a reading level score or an ease of reading score, and these kinds of things for some kinds of roles, for I would say a limited range of roles, would be a meaningful predictor.

Jon Krohn: If you were trying to predict whether somebody was going to be a good software developer or a data scientist from their email communications, I think things would start to break down. I think that any kind of algorithm … and there are lots of people in the human resources sector. There are companies that have enormous valuations, which I won’t mention by name, that have created tools that supposedly identify suitability of a candidate for roles, and I think it’s absolute poppycock. There are tons of tools out there, games that you can play or tools that analyze things you’ve written or the way you speak, that try to predict how well you’re going to fit in a role, and I think that there’s two ways that this can be done. One of those ways is by having tests that are so conservative and so unbiased that they don’t actually have any signal in them, and then it’s not going to be very useful. The other way you can go is you’re asking for a lawsuit.

Kirill Eremenko: Yeah.

Jon Krohn: Because I think that those kinds of techniques … there might be some middle road and maybe some people are hitting upon it in exactly the right way, but I think that there’s a lot of snake oil out there. I think that there might even be some stuff that isn’t snake oil out there, but it’s overwhelmed by snake oil at this time, and so I’m currently skeptical of a lot of these kinds of approaches.

Kirill Eremenko: Well, what do you think of predicting candidate fit for a role based on video interviews that are analyzed by AI?

Jon Krohn: Yeah. So I know that that is … so we talked about Ben Taylor [inaudible 00:00:58:44]. He was at HireVue as a chief data scientist for many years and HireVue does that kind of thing. I haven’t personally used technology like that. There might be some technology out there, like HireVue’s, that does work well, but I’ll bet you, and there’s definitely a lot of … there have been academic investigations that do a lot of these kinds of technologies and many of them are not highly predictive of the signal that you’re looking for. So yeah.

Jon Krohn: So I guess it’s good to be skeptical. It’s good to, if you’re considering working with a particular vendor, check what literature there is that they didn’t publish, that other people have been able to validate about their techniques. I’m sure that there is stuff out there, particularly in … so the more narrowly you define the problem.

Jon Krohn: So that example I gave earlier of ease of reading, of somebody’s writing example, where you’re hiring for a role where this person is going to be doing written communication, you narrowly define the problem and you have a dataset that is, that you can tie very directly to that outcome. Sure. In other kinds of situations, the more broadly you try to make the algorithm apply, the more edge-cases that that algorithm will break down in.

Kirill Eremenko: Interesting. Very, very valuable insights. It also ties in with that right now in the world, what we have, and what is the most powerful thing, is narrow AI rather than general AI. So until we do have general AI, use that to your advantage. Make your AI as narrow as possible in the context of the problem you’re solving, and you’re going to have the best results.

Jon Krohn: Yeah, exactly. There’s a huge amount of opportunity still, but you’ve got to look out for these edge cases. And I think AI companies, a lot of them feel a lot of pressure to be widely applicable before their technology is ready, because their investors, their marketing teams want that. And so I think it can end up being that then you have these algorithms that are overstretched. But yeah, any specific company out there, I don’t mean there’s… For all I know, every single AI company that’s doing this kind of stuff in this space, everything that we’ve talked about, their application may work perfectly and it may work perfectly for a narrow range of applications. I don’t want to sound like I’m being uniformly negative, but-

Kirill Eremenko: But it’s fair. It’s fair to be skeptical, because if we’re open-arms to absolutely every new technology out there, we’ll end up in a whole new swamp where we don’t know what’s the truth.

Jon Krohn: Yeah.

Kirill Eremenko: Okay. Speaking of Ben Taylor, he’s got a very cool question for you: ask Jon if we’re in a simulation. Do you believe we’re in a simulation?

Jon Krohn: Well, you should listen to the last episode of my podcast, the Artificial Neural Network News Network.

Kirill Eremenko: I love that name.

Jon Krohn: Yeah, I’m glad that you do. Sometimes-

Kirill Eremenko: What’s the short version of that name?

Jon Krohn: A4N.

Kirill Eremenko: A4N. Artificial Neural Network News Network. Crazy, man.

Jon Krohn: And we’ve only done two episodes. So far we just launched, but we’re available on all the places that you’d want to get a podcast. So Apple podcasts, Spotify, Google Podcasts. And then my intention was to publish everything on YouTube as well with video. So like the Joe Rogan experience where you can see that raw feed of… Not trying to make it about the video, but it’s something that you have. But then coronavirus hit. And actually Ben Taylor was supposed to be in person in New York for filming that second episode, and the coronavirus hit and he couldn’t make it, but we still did audio. And I’m planning on doing that for as long as coronavirus is happening. I’ll do it as an audio only podcast like today’s. But yeah, actually on that, so if you listen to that Episode Two with Ben Taylor, we actually definitively proved that we’re living in a simulation.

Kirill Eremenko: Love it.

Jon Krohn: Yep. You can find that out for yourself.

Kirill Eremenko: I will check it out. I’ll check it out. And yeah, man, as we discussed before, I’d love to join you for an episode as well. Sounds like a fun show. That’d be cool.

Jon Krohn: Absolutely. I’m stoked at the prospect of it, Kirill.

Kirill Eremenko: Awesome. Okay. And a bit of a more serious question from Ben: what do you think of the next version of deep learning is going to be?

Jon Krohn: Oh, man. My first thought is that I don’t even know what that question means.

Kirill Eremenko: Well think something like… Geoffrey Hinton just over a year ago. What are they called? Capsule networks. Completely different. Still maybe can be called in the same space, and I don’t know enough about them, deep learning or not, but something new. That type of thing.

Jon Krohn: It’s still deep learning. Capsule networks definitely. I think maybe the question is, and I’m open to your interpretation, but I think maybe the question is, deep learning… I mentioned earlier, in 2012, that AlexNet coming into the picture and everybody taking notice of how powerful deep learning could be relative to other kinds of machine learning approaches for some kinds of problems, particularly where we’re just trying to maximize accuracy. And so it’s been a huge splash, unlike anything I’ve seen in my lifetime. And so maybe that’s what the question is, is what is going to make that next really big splash?

Kirill Eremenko: Good point. Yeah. I would agree. I would agree. We’ve seen recently capsule networks, BERT, GANs. What’s next? What do you think is coming next?

Jon Krohn: Yeah, so I mean all of those things, BERT, capsule networks, GANs, they are all still deep learning, but they might fit this thing of, okay, well they’re going to be so world-changing that they’ll become a field under their own that rivals deep learning, or makes as big of a splash. The thing that all of those techniques that you mentioned have in common, and that is a usually limiting factor that we’ve already touched on in today’s show is that whether we’re talking about transformer architectures like BERT, whether we’re talking about GANs, whether we’re talking about capsule networks, all three of those approaches are so computationally intensive that they are seldom used in practice. They are impractical in production systems. They’re about as practical as using art.

Kirill Eremenko: Oh I hope [inaudible 00:05:56]. I’m going to send Hadley this episode so he can have a look.

Jon Krohn: Yeah. I love art. Don’t get me wrong. And I really am a big fan of Hadley’s work too. Actually Hadley, if you do end up listening to this, for years now, I have seen you at a distance at conferences and we’ve even smiled at each other, but I’ve been too nervous to talk to you. I was like, what do I say to Hadley [inaudible 01:06:19]?

Kirill Eremenko: Oh my God, so romantic. You’re such a romantic.

Jon Krohn: The first time was at the joint statistical meetings in Chicago in 2016. Anyway, so I think that maybe the answer lies in what we’ve been saying here, which is that I think the next deep learning might be some kind of learning approach, maybe even derived from deep learning, that requires much smaller data-set sizes and much smaller compute. We have some early indications of this kind of thing happening. So academics have been coming up with narrow applications of technology that can do what they call one-shot learning. And so a lot of these techniques, they borrow inspiration from the way that small children learn by imitation, where they can see somebody do something once and they can imitate it.

Jon Krohn: And so particularly in the field of deep reinforcement learning, which I don’t have time to go into much detail on right now, but it’s a way of tackling a particular kind of problem, you can build these algorithms that are able to imitate in a very small number of examples, or maybe even one example. Whereas most deep learning models, particularly the more sophisticated you want them to be, you’re requiring almost always at least thousands of cases and in the best models, millions of cases. So I think that’s going to be the next big thing, and it’s nowhere near.

Kirill Eremenko: Wow. That’s very cool. That’s very cool. Yeah. We have something also which is quite popular, I would say. Transfer learning, right? When you have a pre-trained neural network, you just slice off the final layer and then put in your type of problem there. How do you think that new application is going to be different to transfer learning?

Jon Krohn: Yeah. So transfer learning is related in a sense, because transfer learning allows us to take a model architecture that has been trained on a huge amount of data points. And so we don’t have to do all of that complex compute, maybe hundreds of thousands of dollars worth of cloud compute to create that initial model. We just spend hundreds of dollars, or maybe even tens of dollars, retraining that model to our smaller data set of maybe thousands or, in some cases, even hundreds of examples. And so that helps in one respect. And I think that there’s a huge amount that can be done there still in terms of being able to access models for transfer learning. So a lot of the popular deep-learning libraries today, like TensorFlow and PyTorch, they allow us to access some pre-trained machine-vision algorithms that are very powerful.

Jon Krohn: However, there’s still so much more way to go. If you go and look at the documentation, I know this for sure for TensorFlow, all of the machine-vision models they have are trained on the same data set. They’re all trained on the ImageNet dataset. And so there’s that… We have so many people around the world tackling so many different kinds of problems, so much data being generated, and it’s interesting that we’re still seeing that isn’t easy to do transfer learning today for a broad range of applications. And that’s something that’ll change.

Jon Krohn: But even as that improves and transfer learning becomes easier and easier… And that is absolutely going to happen for real, you’re exactly right. However, that doesn’t help with our computational complexity problem because we’ll still have these very deep architectures. And then that is still a very different thing. Transfer learning is still just using deep learning. We’re just using deep learning more, again, on a network that’s already trained. Whereas this other stuff is one-shot learning that can do stuff in a very small number of examples. It’ll be a completely new way of training your model.

Kirill Eremenko: Gotcha. Thank you. Thank you for breaking that down. That makes total sense. We’ve got a lot more questions. Well, no, a few more questions on LinkedIn, but unfortunately we’re running out of time, so we won’t go through them. I just wanted to ask you one final question. And that would be, what’s your… You’ve worked in the industry, in recruiting specifically, and applying your knowledge here and in other applications of deep learning, you’ve taught people, you’ve written books. Out of all these interactions that you’ve had with people in the space, what would you say is your best recommendation for somebody aspiring to become really good at deep learning? What would you say for them to go and look for or do in order to become the best deep learning professional that they can possibly be?

Jon Krohn: Perfect. I’m so glad you asked, Kirill, and this gives me the opportunity to tie back to something that we were discussing earlier. So to be the best at anything, whether it’s a data scientist or anything that you want to be, people have so much capacity to change and to do a huge amount of good for themselves and for the world. And I would say that a very small proportion are using even a small fraction of their potential. I think data scientists are an interesting audience because it’s a fast-moving field and it’s pretty new. And so probably a lot of listeners, they’re already doing much better than average, but there’s a huge amount of capacity for us all to be making a massive positive impact in the world.

Jon Krohn: And it’s a cliche, but it starts with yourself. And if I’m going to recommend one resource, it is Cal Newport’s book, Deep Work, which is filled with recommendations for how and why you should be focused on one task at a time, and that you should have a clear idea of why you’re focused on that task. And by living in this way, by being deeply focused on work, for even just two hours a day, although he gets lots of ways that it could be maybe five or six hours a day of deep work, you can produce a huge amount of novel ideas and share those with the world. And that ties back to our thing earlier of why I haven’t seen the LinkedIn post yet, is I check all of my social media in a calendar appointment once a week.

Jon Krohn: So I’ll get to that shortly. But so a lot of that has to do… So it’s like these blocking strategies of if you want to get better at running, then you might leave your phone at home when you go out for your run. You don’t check your messages or take phone calls when you’re out trying to become a better runner. And so it’s applying that same philosophy to all aspects of your life, blocking off specific windows for particular things that are important to you and just getting that done one thing at a time.

Kirill Eremenko: Love it. Thank you so much. That is an amazing recommendation. I’d love to talk more about productivity. Maybe we can do it on your show. I’ve got a lot of ideas there, but Deep Work has been on my list of books that I need to get into. I’ve actually read, I think, the Blinkist version of it, the shortened version, just to get the main insights, but I definitely need to get into that book. Heard a lot about it.

Jon Krohn: I think a great way of living, actually. I wish I did that more often, just reading a summary. Because you can imagine how many more areas you could dabble into. I’ve been thinking that’s a good idea.

Kirill Eremenko: Oh, there we go. Exchange of ideas already. Yeah. But on that note, Jon, thank you so much for coming on the show. Really appreciate your insights, time, had a fantastic chat. Before I let you go, where are the best places for our audience to find you and get in touch, or follow your career, podcast and so on?

Jon Krohn: Oh, wonderful. Thank you for asking, Kirill. Well, I’ve tried to make it as easy as possible to follow all of the various things that I’m up to, whether it’s my book, whether it’s podcasting, whether it’s three new videos that are coming out, whether it’s lectures, like speaking at super data science conference or online in other venues. One of the blocks of my time every week is making sure that all of that stuff is available and easy to access on my website. So just my name, JonKrohn.com.

Kirill Eremenko: Let’s spell that out. You have an unusual spelling of that. If you don’t mind.

Jon Krohn: Yeah. That’s fair. So Jon, J-O-N, and then Krohn is K-R-O-H-N. Yeah, so JonKrohn.com, and there’s only one thing that you need to do, which is sign up for the email newsletter that I have on that homepage. And whenever I make something new, I will a push a notification of, “Hey, look, there’s free new videos out, check them out.” And then I’d love to stay in touch. From there, you can find me on LinkedIn. I’d love to connect with you on LinkedIn and hear what kinds of content you’d love me to be creating, or just any thoughts you have about data science. Yeah, that’d be great.

Kirill Eremenko: Fantastic. Fantastic. Very cool. So check it out, JonKrohn.com, sign up there and follow or connect with Jon on LinkedIn. Yeah, Jon, once again. Well, that brings us to the end of the show. Had a fantastic time and looking forward to seeing you at our virtual event, DataScienceGO in the end of June, maybe start of July. We’re still planning out the dates. But yeah, see you there, and thanks so much for coming on the show today.

Jon Krohn: Perfect Kirill, it’s been a ton of fun. I can’t wait until the next time.

Kirill Eremenko: So there you have it. Everybody, hope you enjoyed this podcast as much as I did and got lots of valuable insights from our conversation with Jon. I know for sure that I got lots of take-aways and it’s actually really hard to pick what my favorite take-away was. I’ve been sitting here scratching my head, looking at my notes and deciding what was my favorite part. There’s so much valuable information.

Kirill Eremenko: If I had to pick just one, I probably would pick that whole very simple but powerful concept of how they check for bias in models at untapt and simply by having a verification data-set, where you have, for instance, a mix of genders or a mix of backgrounds of people, and see if your model is actually going to output results with bias in them, if it’s going to prefer a certain gender or a certain race. And by having such a simple fail-safe mechanism, companies can save themselves a lot of headache and just verify that the models are performing accurately. Something I hadn’t thought of before, but it really is a very powerful idea of doing it that way. So that was our episode with Jon Krohn.

Kirill Eremenko: If you’d like to learn more about Jon or follow any of his work, you can definitely find him on LinkedIn. Also his website, JonKrohn.com. There you can find all the resources materials that Jon produces and publishes, and of course, make sure to have a look at our very first virtual conference. You can sign up and get your ticket already today. You can find it at www.datasciencego.com. That’s datasciencego.com. And Jon is going to be running a presentation there as well, so you might want to be part of that. And as always, you can get the show notes and all of the materials mentioned on this episode at www.superdatascience.com/365, that’s www.superdatascience.com/365, where you’ll get the transcript for this episode, any materials, links, and URLs to Jon’s LinkedIn and so on. So make sure to check that out as well.

Kirill Eremenko: On that note, we’re going to wrap up. Thank you so much for your time today, and if you enjoyed this episode and our chat with Jon, then we’ll see you at DataScienceGO Virtual. And until next time, happy analyzing.

Podcasts SDS 365: Deep Learning Models For Recruitment

SDS 365: Deep Learning Models For Recruitment

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 365: Deep Learning Models For Recruitment

Share

SDS 365: Deep Learning Models For Recruitment

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025