SDS 752: AI is Disadvantaging Job Applicants, But You Can Fight Back

Podcast Guest: Hilke Schellmann

January 26, 2024

Jon Krohn interviews Hilke Schellmann about the ethics of recruitment algorithms, how recruiter technologies currently sort candidates, and what can be improved about AI used in recruiting.

About Hilke Schellmann

Hilke Schellmann is an Emmy-award winning reporter holding artificial intelligence (AI) accountable. In her book, The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired, And Why We Need To Fight Back (Hachette), she investigates the rise of AI in the world of work. Drawing on exclusive information from whistleblowers, internal documents and real‑world tests, Schellmann discovers that many of the algorithms making high‑stakes decisions are biased, racist, and do more harm than good.
Overview
Hilke Schellmann wants recruitment companies and firms that use recruiting AI to understand the nuance of the technology’s capabilities and assumptions. Hilke uses HireVue, which is also recorded in her book, The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired, And Why We Need To Fight Back, as a case example of early complications in AI recruiting. With 30 million candidate interviews under their belt, HireVue has been a major player in helping to recruit new workers, even in such a saturated market.
Hilke says that, during her early research into AI recruiting in 2018, HireVue was running emotion recognition, which estimates emotional responses based on facial expressions. Hilke noticed the lack of scientific backing for accurately gauging facial expressions. Hilke also mentions Pymetrics, which uses, as Hilke terms it, “AI games” to sort job candidates. These “games” are often simple tasks that test one desirable trait in a potential candidate. Hilke notes that traits found in gaming – such as risk-taking – don’t always cross over into the workplace, where the real-life stakes are different.
For Hilke, recruiter technologies are not currently suitable for fairly or effectively testing job candidates. She advocates for companies that use AI recruiting technologies to stay aware of potentially discriminatory approaches to judging facial expressions, accents and dialects, and ensure that marginalized communities are treated fairly and equitably.
Many of The Super Data Science Podcast’s listeners are data science practitioners and machine learning engineers, and Jon asked Hilke how these listeners can take care in creating algorithms that are not discriminatory. Listen to the episode to hear Hilke’s thoughts about this, as well as what the future “right way” to hire might be and how AI can be used across a worker’s employment at a company.

Podcast Transcript

Jon Krohn: 00:03

This is episode number 752 with Hilke Schellmann, Assistant Professor at New York University. 
00:19
Welcome back to the Super Data Science podcast. Today, we’ve got a practical and important episode with the Emmy-Award winning reporter, Hilke Schellmann. Hilke’s book “The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now” was published by the prestigious international publisher Hachette this month. In the exceptionally clear and well-written book, Hilke draws on exclusive information from whistleblowers, internal documents and real-world tests to detail how many of the algorithms making high-stakes decisions are biased, racist, and do more harm than good. In addition to her new book, Hilke is also Assistant Professor of Journalism and AI at NYU. She previously worked in journalism roles at The Wall Street Journal, The New York Times, and Vice Media. She holds a master’s in Investigative Reporting from Columbia. 
01:08
Today’s episode will be accessible and probably interesting to anyone. In it, Hilke details examples of specific HR technology firms that are employing misleading, Theranos-like tactics. She talks about how AI can be used ethically for hiring and throughout the employment lifecycle, and what you can do to fight back if you suspect you’ve been disadvantaged by an automated process. All right, let’s jump right into our conversation.
01:32 Hilke, welcome to the Super Data Science podcast. I’m so excited to have you here. This is going to be a great episode. Where are you calling in from today? 
Hilke Shellmann: 01:39
I’m calling in from Brooklyn. 
Jon Krohn: 01:41
Nice. Yeah. 
Hilke Shellmann: 01:42
Thank you for having me. 
Jon Krohn: 01:43
My great pleasure. This is one of those things, when I have guests who are also in New York like I am, it is so much fun to actually meet in person and record. I feel like you really get to know somebody, but here we are doing the- 
Hilke Shellmann: 01:57
Next month.
Jon Krohn: 01:58
… as though [inaudible 00:01:59], yeah. And maybe we should be doing a follow-up because this episode is going to be really illuminating for a lot of listeners. I’m sure we’re going to have a great feedback and maybe it won’t be long before we need to do this again or have the pleasure of doing this again. 
Hilke Shellmann: 02:11
Yeah. And I would love to have feedback. I’m a journalist, I love feedback. 
Jon Krohn: 02:16
Nice. Yeah. Well, listeners are not shy about commenting what they think of episodes when they come out. So, get out there, listeners, and be sure to tag Hilke in those comments. We’d love to hear your feedback as always. So, Hilke, you are here because you had a book published this month by the prestigious Hachette publisher. So, published at least in the US. I’m not sure about global distribution. 
Hilke Shellmann: 02:41
Yeah, yeah. The UK is coming out in February. There’ll be a China edition. I don’t know when that will be. And I’ve heard from people all around the world that we’re able to get it through at least Kindle or there’s an audiobook version that, even if in their jurisdiction it wasn’t available, they could listen to it. There’s all kinds of ways to- 
Jon Krohn: 03:00
That’s right.
Hilke Shellmann: 03:00
… find this book. So, that’s the beauty of having a global publisher, I guess. 
Jon Krohn: 03:05
Yeah. And the digital ones, of course, it’s easier for them to get around the world quickly. So, the book is called “The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now.” And so, somebody I think tagged me on LinkedIn, if I remember correctly. So, somebody, it must’ve been a listener to the podcast. I tried to dig this up quickly, but it’s one of those things, LinkedIn is notoriously difficult to find things unless you can remember exactly what- 
Hilke Shellmann: 03:36
So, go back and see what happened when, yeah. 
Jon Krohn: 03:38
But at some point in the last couple of months, if my memory is correct, a Super Data Science listener tagged me on a post about your book coming out and said, “You should talk-” 
Hilke Shellmann: 03:49
Great. Thank you, Super Data Science listener.
Jon Krohn: 03:49
Yeah. Said you should talk to Hilke and it’s because they know what I do for a living. And so, not only is this an interesting topic for us, and we’ve had episodes like this in the past talking about ethics and bias in algorithms, unfairness. Most recently a big episode that we had on that was Number 727 back in October, came out on Halloween, with Joy Buolamwini, an amazing MIT researcher who also has- 
Hilke Shellmann: 04:20
Yeah, she really is amazing. 
Jon Krohn: 04:22
… popular Coded Bias documentary. And so, we had a great episode on related topics then. But this one, this listener, I’m pretty sure, pointed out this to me specifically on LinkedIn because this is what I am involved in for a living specifically, is algorithms for hiring in particular. So, you and I, Hilke, talked about this a bit before the episode kicked off, and I don’t want to just be talking about my company. I think probably parts of it will come up organically as we go on. But let’s start by talking about the problem here. So, you highlight in your book, even in the prologue, you are mentioning a particular company, HireVue, as being- 
Hilke Shellmann: 05:09
Yeah, they’re one of the largest vendors. They do one-way video interviews. For folks who don’t know what that is, it’s like instead of having another person in a job interview either on Zoom or in person talking to you, you just get prerecorded questions on your phone or on your desktop and you just record yourself answering most of the time probably looking into the green light and recording yourself sort of giving, I would call it more of a video presentation, but I guess people call it a one-way video interview. And HireVue is, I would say probably by far one of the largest immersed in the space I think about a year and a half ago. So, they said they just did the 30 millionth interview. So, lots of video interviews are going on. 
Jon Krohn: 05:55
And it seems, if I’m remembering this correctly, I think that things like your book and people like Joy Buolamwini, I think has there been a bit of a change now in HireVue’s offering that it used to be- 
Hilke Shellmann: 06:10
Yes. So, what happened is I started reporting in earnest in 2018 on AI moving into HR tech. I went to some of these technology conferences and I was just blown away how many hundreds of vendors and people, and I was like, “Whoa, there’s a real change here.” And there wasn’t a whole lot of journalists and this idea that we are quantifying humans, I was like, “This is really interesting. It seems humans are very complex. Let’s see how we do this.” So, I didn’t want to do no more. And at the time, HireVue was still doing emotion recognition. It wasn’t technically facial recognition because it doesn’t look if it’s Jon or Hilke. The software looks at what are my facial expressions and what emotions can be and characteristics of me can be gleaned from that. So, if I move my mouth, am I smiling, am I happy, those kinds of ideas. 
07:08
And they also use the intonations of people’s voices and the words that a job candidate used and compared it to people who’ve done the video interview before who are now present in the role. So, whatever facial expressions they used in the same part of the job of the video interview would sort of get you more points or less points, I guess. And so, when I first saw the first presentation, I was like, “Wow, this seems like magic. We can actually find if somebody is well qualified for the role by checking their facial expression. Wow, this is next level.” The more I started looking into it, I was like, “Oh wait, there isn’t a lot of sound science underneath here.” So, in fact, we don’t know what facial expressions you need in a job interview for a given job. It’s not even on the job.
08:00
So, maybe people who are in customer service need to smile a lot, but do you need to be smiling a lot in the job interview to be good? What facial expressions do teachers need to have and other people? We don’t actually know. And then, the second problem is the more I talk to folks in the space on it, there is actually also a lot of variability in the facial expression and the feelings underneath it. So, the computers are very good at tracking our facial expressions. It can see that my brow is furrowed or that I’m smiling and it would infer that I’m happy, but I might not actually be happy. And in fact, many times when I’m on a job interview, I’ve forced smiles because I’m really nervous and a computer would’ve figured out that I’m happy, but really I’m not.
08:50
So, the question is, why are we using this kind of predictive AI tools when actually the measurements are not scientifically sound? So, I’ve pointed that out in a big investigative piece in The Wall Street Journal. Drew Harwell, who was a Washington Post reporter, also had an in-depth piece about HireVue a few months later. There was a couple other… there was an EPIC inquiry and I think the government started looking into HireVue and all of that. I think there was a lot of pressure on HireVue and they did end up dropping the emotion recognition. I think about a year later, I also read in the blog post in the last paragraph that they also did drop the intonation of our voices, because similar to the emotion recognition software, there isn’t a whole lot of scientific evidence there that this kind of stuff works. 
09:48
But with HireVue, so they change the way they do things. I sometimes compare the HR tech world a little bit to a game of Whac-A-Mole because one company changes something, and we are very glad that they did. The next one… there’s still a few folks out there or companies, vendors out there who offer emotion recognition, intonation of voices. I’ve tested that myself in the book. So, you get one to close down, there’s the next one because I guess some of these technological solutions are very easy to pull in into these tools. There’s some sort of Python library that you can pull in and people feel like, “Oh, it’s here. Let’s use it.” It might solve a problem, but not understanding is it a sound science? Am I actually doing something that’s valid? And also, is this discriminatory or not? Is it actually fair to judge people on their facial expressions, especially people who have darker skin, maybe the light doesn’t catch them in the same way. We saw this in Gender Shades. And so, there’s a whole lot of questions around these technologies and if they’re fair. 
Jon Krohn: 10:59
And that was Joy Buolamwini was specifically in that Gender Shades work. I was blown away by how particularly poor the intersection between being black and female was. And yeah, as we covered a lot- 
Hilke Shellmann: 11:11
And I can tell you, in HR tech, not a lot of companies check for exactly the intersectionality because that’s often where both problems come together. So, I think that is a particular problem that the EEOC, the Equal Employment Opportunity Commission, is asking companies to look for, but it has not mandated it. So, we haven’t seen a whole lot of companies actually looking at that. They might look at just the tool, let women and men roughly through at the same rates, and different racial groups, but that doesn’t look for do white men and black women pass somewhere of the same rate. So, I think that’s a real big concern in this space. 
Jon Krohn: 11:56
Yeah. Well, it’s great to hear that at least some organizations, some of the big ones, so if HireVue is the biggest video interview platform, it’s good to hear that they’ve made these changes, even if it is a bit of a game of Whac-A-Mole. Something that we talked about on the show a lot, of course, given that our listeners are, a huge amount of them are hands-on data science practitioners and machine learning engineers, so they know about how easy it can be to be going to a GitHub repository and downloading some open source package to do some capability. And something that I try to mention in the episode and probably need to be doing even more, and certainly that’s what this episode is focused on, is it’s thinking about how is doing that, even though the ease of getting that package into your platform may be there, what is the impact of that on your users, particularly some subgroups, particularly people at the intersection of underrepresented groups like Black women? 
Hilke Shellmann: 12:50
Yeah. And I think today, HireVue’s AI now looks at the words that people say and sort of makes predictions based upon that. And a lot of AI video interview tools do that. And one of the questions that I’m looking into is, is that still fair? Because the AI doesn’t actually predict on the audio of people speaking, it predicts on a transcription, like a speech-to-text transcription tool. It writes it out in words and then the computer looks at the words and makes a prediction based on that. But the question is, do these transcription tools work fairly for everyone? What if people who have accents, people who speak African American vernacular? We know that these tools in the last few years, when academics have tested them, work not as well as for Caucasians who were native speakers and people who have a speech impairment, who have a speech disability. And the question is, if the tool doesn’t capture all of their words as well, is it really fair to predict about how well they’re qualified for the job? So, I think there’s a lot of questions that we haven’t answered for a lot of these tools in this space. 
Jon Krohn: 14:10
For sure. And getting the words transcribed correctly is just one part of the story. But of course, downstream in how these words are being interpreted, of course, it’s even more complex. It’s not just as simple as getting the words right. It’s that if somebody comes from a less common background that’s being interviewed, it’s the way that they speak, not just the accent, but they could be using terms or phrase that aren’t comprehensible to the model or had a few examples of. And so- 
Hilke Shellmann: 14:41
Yeah, yeah. They might define teamwork a little differently and it’s not based… the training data didn’t cover that. So now, they’re outside of the statistical patterns. 
Jon Krohn: 14:50
Exactly. 
Hilke Shellmann: 14:53
There are so many cases that, proxy variables that speak more about our background than actually about our job fit could come in that it’s really its own science to figure out how can we make sure these are nondiscriminatory? Because so many easy proxies that look on its face, as psychologists in the space say, it’s like facially neutral. It looks neutral, but when you dig deeper, you actually see there’s a whole lot of problem here.
Jon Krohn: 15:24
Right. So, what happens with companies where their whole business model is based on some kind of probably spurious association between some behavior and hireability? So, with HireVue, for example, you were able to evoke real-world change, where HireVue is still able to grow and succeed as a business without having these emotional recognition or tone recognition algorithms running in the platform. Now, they do still have some potentially suspect AI running and there’s still TBD. But it seems maybe reasonable to say that perhaps if HireVue didn’t have maybe any AI, it’s conceivable that they could still succeed as a company because they offer this video interview service. But there are other companies out there who are in the hiring space, who are in HR tech and who exist exclusively based on some AI models. So, a company that was a few years ago lauded quite a bit was Pymetrics. 
Hilke Shellmann: 16:39
So, AI games. Yep. 
Jon Krohn: 16:42
Yeah, AI Games. 
Hilke Shellmann: 16:45
So, we see that, right? So, I talk about that AI in hiring is used in very many parts of the, as we call it the funnel. You have a lot of people at the beginning and then you funnel them down to the last, I don’t know, four, five, six applicants that you interview. And along the funnel, you might have the one-way video interviews that HireVue offers or you may have AI games, as I call them, that Pymetrics offers. And I should say that the AI games, it’s not like a video game that we have today that is super engaging and dramatic to play. This is more like early ’90s aesthetic. There’s a balloon that you pump up. It’s very, very basic. 
Jon Krohn: 17:26
I remember the balloon. 
Hilke Shellmann: 17:29
But for probably most job applicants much more engaging than filling out 100 questions, like are you the life of the party very much? Is that you? These AI games often want to find out personality traits and some sort of job skills that might be connected to the job. So, the way Pymetrics used to do that, they have been bought by Harver, but parts of the tool are still available. And so, the historical Pymetrics, what would happen is that they had a suite of games, I think about 12 games, and the people who are currently in the role, at least their company would want 50 people to play the game. And so, if they are all risk-takers, for example, if one of the three most important traits that come out of it is risk-takers, if applicants then play the game, if you are a risk-taker, you would probably get on the yes pile. 
18:32
If you’re not a risk-taker, as per the game, you would get on the no pile. So, one of the question is, well, maybe all of your accountants that have played the game that are currently in the role, the 50 that played the game, they’re all risk-takers. Good for them. But the question is, is this just something that is unique to the accountants or is risk-taking actually part of the job? So, you might look for risk-takers, but risk-taking isn’t actually part of the job. So, you’re looking for, I don’t know, something like most of the time if you would have a visual algorithm, it would find probably people with brown hair who are currently successful. So, yeah, you’re looking for people with brown hair, it might not have anything to do with the job. So, that’s one problem. The other problem is, well, risk-taking in a video game is not the same as being a risk-taker at work. 
19:26
I think a lot of us might be daredevils in a video game, but does that actually make us risk-takers in the real world? And the other question is, do these tools actually pull out these specific traits that they say they pull out? We don’t actually know that. Does pumping a balloon and banking the money actually measure your risk-taking propensities? So, there’s a whole lot of problems that when I’ve talked to experts, they were like, “Here’s why we would be very careful with this kind of AI games approach.” Also, a lot of personality predictions doesn’t have a high validity for hiring. It’s about 5% or 10% of actually success on the job is based on your personality. So, putting personality front and center in your hiring process is maybe a little dubious.
20:24
You might want to bring in a couple more other things to look at candidates. So, I think there’s a whole lot of questions about these AI games. I’ve played all of these games and I had different strength. A few months later, I would play the game again and suddenly had a different personality strength. And I’m like, “That’s a little weird because you’re supposed to have a stable personality.” But I think what was really interesting is when I played the game with Henry Claypool who has a disability, and while we were playing it, he was sort of talking about how in one game, he and I both had to hit the spacebar as fast as possible, and he was really concerned. He was like, “What if I have a motor disability and I can’t hit this as fast as possible? Am I going to be rejected because I couldn’t do this?” 
21:22
But what does hitting a spacebar as fast as possible have to do with any kind of job? There are a whole lot of questions that come out of these, and I’m not sure if a lot of these games are calibrated for people with disabilities. And I think a lot of folks in HR would say, “Well, but if you have a disability, you could always ask for an accommodation.” Well, it turns out if you play an AI game, you don’t actually know what is really being asked of you before you start. So, you wouldn’t even really know if you needed an accommodation. And I’ve also now talked to enough vocational counselors that told me over and over again that their clients did need an accommodation and they never got it. So, there’s a lot that is broken here and a lot of questions to be asked how some of these tools are set up. 
Jon Krohn: 22:13
Yeah, there’s all kinds of these. I also am aware of, there’s big companies out there that require tests that… this isn’t really, I don’t think the focus of your book is it’s not exactly AI. But even when we get into more traditional personality tests or IQ tests, it’s wild to me how those are used sometimes. I had a partner whose company was acquired by a private equity firm called Vista Partners, and they had a mandatory, everybody in the company that was acquired, including my partner, had to do a test called the CCAT. 
 22:54
And so, it’s an IQ test, but you don’t even get told how you did, yet it does impact, if they’re looking to do cuts, your CCAT score will be considered in whether you’re cut or not. And so, it’s wild to me because, as you say, these kinds of things like the Pymetrics AI games, what is the relationship between risk-taking in a video game and risk-taking in the real world? Well, similarly, these kinds of IQ tests where you sit and you have to pick multiple choice answers, how much is that a real job? It says nothing about somebody’s resilience and their ability to build relationships across the firm and to execute on a big project. It’s like- 
Hilke Shellmann: 23:46
Yeah, yeah. And I think we often think about personality, and I think a lot of leaders in companies feel like, “Oh, I want agile employees who not only know the programming language that I need now, but who have so much forethought that they’re going to learn the new one before it’s almost even out. I need this agility.” But the question is actually, how good are we really at measuring it? How much does it have to do with a job? If you only have agile employees, they’re all going to find a new job very, very quickly. And then, the other question is, a lot of us can compensate, right? Actually my personality is I’m actually kind of a bit shy and I had to learn to be much more outgoing and approach people.
Jon Krohn: 24:34
Oh, wow. 
Hilke Shellmann: 24:35
It was easy to do it for my job, but- 
Jon Krohn: 24:36
Great work because you seem extroverted. 
Hilke Shellmann: 24:40
But on the playground with my toddler, I’d be like, “Ah, it’s painful,” and really had to overcome or in professional networking settings. So, I really had to overcome that. But I think that speaks to it that maybe our personality is X, but a lot of us find ways around that. So, I don’t know how much validity there is to this. And we’ve seen actually, for example, that this test was in some of the AI tools that I tested that will find your personality out of your social media data exhaust, or they claim they can find your personality out of your social media data exhaust. Even the DiSC people themselves say that DiSC is actually not valid for hiring. Don’t use it for hiring, but somebody found a plugin or an easy way to use it and they built a tool. And some companies and recruiters now use it. 
Jon Krohn: 25:34
Well, so let’s flip the question around. So, obviously, as we’ve stated so far in this episode, there’s lots of ways to do things wrong in the interview process, yet tools, as you highlight in your own book, online platforms like ZipRecruiter, Indeed, Monster, these have allowed huge volumes of applications to be made to even small companies. And the biggest firms like Google and Delta, as you’ve highlighted, they get millions of applications a year. 
Hilke Shellmann: 26:09
Yeah, they’re drowning. 
Jon Krohn: 26:11
And so, what can you do? What’s the right way to do hiring? Is there a place for some kinds of tests or AI tools? There’s lots of people, like Pymetrics whole thesis was the resume doesn’t work, play these games. And I’m sure there’s lots of cases where that’s true, but I’m sure there’s also lots of other cases where a resume does provide some useful information. 
Hilke Shellmann: 26:36
Yeah. That’s part of the mission of the book that I think human hiring is also very bad. I’m not advocating to go back to have biased humans doing all the decision-making because that’s another problem. But what I’m trying to… I want to have larger conversations to actually think through what actually works on hiring and what tools should we use that may be less discriminatory and give us a sense of someone. So, I think one thing is to have numerous tools, have the resume, but also have personality, have assessment that directly look at what are the skills used in the job and can I somehow measure that? And I think actually some virtual reality applications might actually be helpful here because now with virtual reality, we could test actually people ask them to do the job that is required and give also the candidate a sense of the job.
27:36
I think it’s still early, it’s still hard to build these, but I think there should be a larger conversation around this. I would also tell people, I’m really open, I work with a lot of sociologists and computer scientists and I would love to have a longitudinal study where a company hires people with AI tools and maybe a traditional way and hires all of them. And then, we’ll see over the course of years, did the actual green labels or the red labels, did they actually perform better? Was this assessment, did it actually work? Because we don’t know that. That’s the problem. We don’t have these longitudinal studies and I have not found a company that actually tracks people that are hired via AI tools for years to come to actually then loop back and understand did this tool actually, was that actually predictive or not? 
28:35
And I would love to see that because I think that would actually help companies, vendors. I think it would be helpful to anyone, but it would take a while to set this up and do this. But I think that would actually be a way for all of us to understand how do these tools work? And I do think for folks who are in the position to buy these tools, like HR managers or leadership at companies, to really think skeptically, does this actually work? What is the science underneath? And really have outside counsel come in before using these tools to have a pilot face. Because I’ve heard this over and over again, I’m going to talk to employment lawyers. Nathaniel Glasser is one of them. He’s based in DC, and they looked at an AI games vendor. They all signed an NDA, so he couldn’t name the company, but he said, “We had a pilot phase and that tool always discriminated against women. We just couldn’t get it out of it.” 
29:32
So, the company didn’t use the vendor, but the vendor is still around. Hopefully they’ve learned, but we don’t know that. So, I think the pilot studies are good and we need a whole lot more transparency to actually sort of understand what doesn’t work, put that in the open so then it forces vendors and companies to not use it and change their ways. I think that’s the first step. 
Jon Krohn: 29:55
I see. Those are great practical tips. So, combine lots of approaches, including for many kinds of roles maybe a resume as well as multiple assessments, particularly assessments that include specific skills that are required for that job.
Hilke Shellmann: 30:10
Yes, like skills. Don’t look at the school where I went to because that tells you probably more about my socioeconomic background. But actually make sure if you need to hire people to do X, that you test for X and not for 30 other things. We see this, too. Job descriptions are ballooning because maybe an HR manager takes the job description from three years ago and then just adds new things that the candidate should have. And suddenly you have 55,000 things you need when in reality focus on the five most important skills that you need in that role and find those people and not anyone who happens to have all of the 55 skills, but maybe it’s mediocre, one of them. Maybe you need somebody who is really good at four. But the way often these tools are calibrated is that they reject qualified candidates.
31:00
And the interesting thing is that when people do surveys of company leadership, company leadership knows that, that they’re AI tools or algorithms or whatever they use for hiring reject well-qualified candidates. And somehow no one is stopping this problem because I think the efficiency, it is just so needed in the industry. There’s too many applications coming in and they just need a technological solution. And they’re okay with some that maybe not as good as we want them to be, but they’re just so efficient. So, what I’m saying is I think the AI tools have proven themselves. Their vendors always say, “This is more efficient. It will save you money, labor costs, make this all faster and find the most qualified people. So, it’s definitely efficient.” It’s definitely probably saving company’s money, that’s why they want it, but does it find the most qualified candidates? I don’t think we’ve seen a whole lot of proof of that.
Jon Krohn: 31:58
Great points. All right then. So, since we have a data science audience, there’s probably lots of listeners out there who, like we mentioned earlier in the show, they’re very familiar with going to GitHub repos, downloading open source tools. And so, I’m interested to hear what thoughts you have for what practitioners, data science practitioners and machine learning engineers, can be doing, product managers, business owners can be doing to try to prevent algorithms with biases getting into their platform.
32:33
I can quickly, and maybe you can even assess my own ideas and the kinds of things that we do at Nebula. So, because we are an AI company in the HR tech space, we try to be careful, and I’d love to hear what we might be getting right or might be getting wrong. But one of the things that we do is when we are designing our algorithms, we are thinking about different groups from the start. So, we’re thinking about men and women, different sociodemographic groups. And so, from data collection and preprocessing all the way through, we’re trying to think about downstream, how is this going to be used by the model and are we potentially allowing bias to be included in the training data? 
Hilke Shellmann: 33:18
I guess my question is, if you have large sets of data on people, how can they not be biased in there? A lot of these policies are biased. My hobbies are bias, show probably more about my gender- 
Jon Krohn: 33:31
For sure. 
Hilke Shellmann: 33:31
… and my background than other things. So, I think that is a problem. And I think a lot of data scientists, and that would be my critique for a lot of data scientists is they come from a big data correlation approach. And if there’s a correlation that’s meaningful, but the question is, is it really meaningful? Because often it’s a correlation, but it’s not a causation. And I think that matters in hiring. Is this actually causally connected or is this just random correlation? And I think we’ve seen that in the resume screeners when I talked to employment lawyers and employment lawyer adjacent folks who look at these tools who found out that the name Thomas was predictive of success. So, I’m sure it’s statistically significant, it’s correlation. That’s what the tool found. 
34:30
But we all know that unfortunately for the Thomases in the world, your name doesn’t qualify you for anything. It’s your skills and all of the other things that make that, but it’s statistically significant. But shouldn’t we be using it? Absolutely not. So, I think that’s a lot of things. We can get a lot of signals that we can measure, but is it actually statistically… is this actually significant information? I’ve also reported on surveillance at work and companies using, for example, swipe in data to check who is the longest at work and making inferences if somebody is most productive. So, we all know just by you sitting at your desk for the longest doesn’t make you the most productive. And we don’t even know what it means to be most productive. That’s what the company used for promotions.
35:26
And when they had to do layoffs during the pandemic, they also wanted to look at this data because the data was available. So, you suddenly have this beautiful data that you want to use. But actually, if you would think about it, it’s a whole lot of nothing because swipe in data isn’t meaningful. And in fact, it might actually be discriminatory for women and people with disabilities because actually it turns out nobody has a chance to be at their desk as long as they want to. It can actually have discriminatory effects because women are mostly caregivers and people with disabilities have the highest rates of absences, and these are protective classes. So, I think what looks facially neutral and looks like a great correlation might actually not be causally connected, that it actually is meaningful and one is connected to the other or leads to the other. And then, it could also have discriminatory effects. And I think we don’t talk enough about that. 
Jon Krohn: 36:28
Yeah. And there are lots of examples in your book, so people might think, and this episode has ended up being mostly about hiring. We don’t have time to get into all the detail, but you have many chapters in your book covering different parts of the employment lifecycle. So, not just issues of AI in hiring, but also in performance. You just went into people sitting at their desk or swiping times, promotion algorithms or leadership training, selection algorithms, layoff algorithms- 
Hilke Shellmann: 36:57
Yeah, flight risk algorithms. I’m really interested in those. We see more and more of those or predicting who’s going to be an insider threat. There’s all kinds of predictions that companies can do. And now we see it also that algorithmic assessments can be part of layoff decisions. And I’m also really interested in when new health and wellness initiatives come into the workplace and predictions of, are you at mental health risks? Should I give you help? And they’re really interesting inferences that are coming into the space that I think… yeah, I want to think more about that and sort of the idea of what does that mean to privacy. If my employer can make inferences, if I’m anxious and depressed based on my voice that I have to use to actually talk to my colleagues. I don’t really get to opt out of that. What if they use algorithms to predict that? Am I comfortable with that? Do I want that? There are all kinds of questions coming up with these new tools coming into the space. 
Jon Krohn: 38:05
Yeah, yeah. So, yeah, it’s wild. There are a number of things that you could dig into. I’m sure there’s tons as a journalist and as an academic and as somebody who’s giving lots of practical guidance to actual companies out there in the spaces providing these tools or using these tools. 
Hilke Shellmann: 38:22
Yeah, I still find it endlessly fascinating. I’ve been looking at this for five-plus years and I’m still interested in the idea of how can we quantify humans and how can we actually do a good job with that and be fair and don’t discriminate and actually use tools that are valid. I think it’s a never ending interest of mine, so I’m going to continue down this road. I’m always interested in hearing from folks, talking to folks and looking at the interesting stuff that works, that doesn’t work. I’m not against tools that work. In fact, I have a couple in the book that I highlight where I think like, “Oh, that might be an interesting application.” But I do really think if we want to do AI tools in the world of work right, we got to ensure that they work and that they don’t discriminate. So, I do think that this is a necessary first step to make these tools better. 
Jon Krohn: 39:24
And with more and more AI, more and more data collection, it’s going to be increasingly important. It’s a good thing that you find this interesting because we’re going to have an accelerating pace of platforms and companies that are doing AI things in employment. And so, a few minutes back to the question that I had about what our listeners as data scientists or ML engineering experts could do, I started talking about being thoughtful about your data collection and your modeling from the very beginning and how this might impact people downstream. That’s probably not a bad idea. You reminded me how important it would be because even no matter how thoughtful we are about removing explicit things like people’s names, so issues like Thomas coming in there, you could still end up with implicit biases getting in. So, things like interpretable machine learning are probably important as are things like testing afterward. So, the four-fifths rule, for example, that you talked about from the EEOC. 
Hilke Shellmann: 40:24
Yeah. And also I sometimes come up with edge cases that I test myself and say, “Well, if you get an applicant like this, what would happen?” And I think we need a whole lot of more of that, and we need a whole lot of more people who are diverse to build these tools, but they also test these tools. I really have, and there’s a whole chapter in the book devoted to people with disabilities, and I’m not sure how to square that round because a lot of people with disabilities, first of all, they’re underrepresented in the workplace. So, probably a lot of their personality traits and their skills and how they do work might not be represented in the training data. And then, the problem is also a lot of their disabilities are so individually. I might be autistic and somebody else might also be autistic, but my disability might manifest itself so differently than from somebody else. So, even if somebody else in the training data, it might not actually cover the way my disability is expressed. 
41:33
So, it’s such an individual thing. And having a one-size-fits-all algorithm, it’s just not very clear to me how that is going to work. And we don’t want to leave folks who are already marginalized, who are already underrepresented in the dust. So, I think there needs to be a whole lot more testing and thinking through this before bringing the products to market. I think we are pretty good on race and gender thinking about that. There’s still problems coming up with that. But I think people, you mentioned intersectionality, very important. People with disabilities. We haven’t talked a whole lot about age and how that is affected by the algorithm, by algorithms that we use. So, I think there’s a whole lot more testing and thinking through out there. And truthfully, I think often companies don’t have the time. They have to go to market. They don’t have the luxury of years of scientific research. 
Jon Krohn: 42:33
Yup, yup, yup. So, really great points there. Just quickly getting through them, trying to remove data that could be biased from the data because we can’t be sure that we get that all right or there could be implicit biases. Having interpretable machine learning in there, having diverse people building these tools and testing these tools is critical. And then, my last one here is that testing afterward once the algorithm is developed, so not just the main effects, not just making sure that the main effects are not having bias, but the intersections, too. 
Hilke Shellmann: 43:08
Yeah. And always test even if you think this is totally facially neutral. I talked to a former director or senior acquisitions folks of one of the largest companies in the United States, and he said that they found out if somebody has a friend in the workplace, they would stay longer at the company. So, they wanted build out retention, but when they actually looked at the results, they found out that this was over representative for Asian Americans. And African Americans had way less people at the workplace. So, it was actually discriminatory. But you would think, well, if you have an acquaintance or a friend at the work site, that’s great. You stay longer. It’s a great predictor of success. Let’s find people who know someone. Well, it turns out there’s only… certain sections of the population have a higher percentage of people they know. 
44:04
So, it’s actually like a discrimination through the back door if you don’t keep on testing and looking at this. And I think a lot of times we’ll be like, “Oh, this sounds like a great criterion. It’s predictive.” Yeah, it is. But should we use it? No. So, I think those are kind of examples where I want to get people to really start thinking and skeptically looking at these tools. 
Jon Krohn: 44:28
And so, last technical question for you here before I let you go, and you’ve been very generous with your time. I’ve taken up more of your time than I said we would already. So, thank you so much. But last thing is, so if we have a listener out there who suspects or knows that they have been impacted perhaps adversely by an AI tool or some other kind of evaluation at some point in their employment process, what should they do? 
Hilke Shellmann: 44:57
They should call an employment lawyer and me and possibly call the EEOC, to the Equal Employment Opportunity Commission, who are all tasked with looking into this. I think it’s a little bit difficult in the United States and in most jurisdictions because you have to show that you were harmed by a tool. So, we all get rejected from applying for a job all the time. So, do I know why I was rejected? So, it’s really hard to have that to show that you have evidence of that unfair rejection. I think what might be helpful, I’ve seen people who have recorded some of the assessments who’ve asked for their data from companies very easy under GDPR in Europe, and I think there’s some consumer loss in California that will also allow people to do that soon. 
45:52
So, I think those might give you hints, but there is a little bit of a growing chorus of people who want, if you see design features that are problematic, so the EEOC, for example, find a company that had made people put the ages into whatever template they had on their website. So, a woman put in 55, and then younger and older, and she was rejected when she was older and not when she was younger, and she bought a case to the EEOC. So, by the design choice already gave you a hint that there might be a problem. So, those are one of the cases. So, I think people want to see, okay, let’s see if you find a design feature that’s possibly problematic, maybe we should let a case come through. But right now, that’s not the case. So, it’s very difficult to actually bring a claim. 
46:43
So, the more we need to rely on whistleblowers, employment lawyers who are in this space who will hopefully speak up and tell us when they find problems, because that’s the only way the industry will get better. And I know that everyone in the industry also does want to hire people fairly. I know no one sits there and wants to discriminate. It’s just like a matter of if we use these kinds of AI tools, we need to make sure they work. And that is a very, very hard task to do and don’t discriminate. 
Jon Krohn: 47:15
Well, lots of great guidance there for data science practitioners as well as anyone who thinks they may be impacted by these AI tools in the employment process. Hilke, before I let you go, I always ask guests for a book recommendation. Of course, we have your own brand new, “The Algorithm,” which people should be checking out, but do you have another recommendation for us as well? 
Hilke Shellmann: 47:36
I do, and I think it’s slightly off the beaten track. One of the books that has touched me the most is actually “The Warmth of Other Suns” by Isabel Wilkerson that is about the great migration to the north of the United States. It’s actually beautifully written. And I’ve never cried reading a nonfiction book, and at the end, I was so sad and I cried. 
Jon Krohn: 48:00
Oh, wow. 
Hilke Shellmann: 48:01
That was a very powerful book that really has nothing to do with AI, but with the love of humanity. And I think it was just a beautiful human spirited book. 
Jon Krohn: 48:15
I like how you just indirectly said there that AI has nothing to do with love of humanity. That’s perfect. 
Hilke Shellmann: 48:22
Look, I think it can actually help us a lot. There was a reason why almost six years ago I started investigating, actually it’s been six years. I started investigating this because I love the idea of AI. I think it’s a transformational technology. Clearly, it’s fascinating. Just have to do it the right way. 
Jon Krohn: 48:42
Yeah, that is the key thing that can come out of all this is that we know that humans make terribly biased decisions and are so much more likely to, for example, pick people that look like themselves for a role. And so, there is a huge opportunity here with technology, with AI in particular, to undo some of those issues as opposed to propagate them and just make them reinforced. So fantastic. Thank you so much, Hilke, for coming on and giving us so much of your time in today’s episode. I really appreciate it. 
Hilke Shellmann: 49:15
Yeah. Well, thank you for having me. As you can tell, I love talking about this topic. I’m endlessly fascinated. So, thank you all for listening. 
Jon Krohn: 49:22
Yeah, it was a really fun episode. How should people follow you after today’s episode or get in touch? 
Hilke Shellmann: 49:27
You know what, the easiest way is LinkedIn. There’s only one Hilke Schellmann, so find me on LinkedIn. I also have a website, but we all kind of left Twitter. So, I think LinkedIn is maybe the new Twitter for the World Award. I’m not sure. We’ll see. 
Jon Krohn: 49:48
It seems to be by default with people ending up- 
Hilke Shellmann: 49:48
For lots of people now.
Jon Krohn: 49:49
Nice. All right, thank you so much, Hilke, and we’ll be catching you again soon, maybe for an in-person episode. 
Hilke Shellmann: 49:54
I hope so. Thank you, Jon. 
Jon Krohn: 49:56
Lots of food for thought in today’s episode. In it, Hilke covered how AI-driven processes like HireVue’s emotion recognition and Pymetrics’ AI games can do more harm than good during the hiring process. She talked about how combining resumes with a suite of assessments, including assessments that cover skills that are specifically required in the role, can provide a more holistic snapshot of a job applicant. And she talked about how data scientists can curate their training data and modeling approach carefully while using interpretable ML approaches, diverse builders, diverse testers and tests of both main and interaction effects to not only minimize the negative impact of AI in decision-making, but potentially make a positive societal impact with AI. 
50:36
All right, that’s it for today’s episode. If you enjoyed it, consider supporting the show by sharing, by reviewing, by subscribing, but most importantly, we hope you just keep on listening. Until next time, keep on rocking it out there. And I’m looking forward to enjoying another round of the Super Data Science podcast with you very soon. 
Show All

Share on

Related Podcasts