Jon Krohn: 00:00
This is episode number 449 with Ayodele Odubela of Comet ML, as well as the founder of FullyConnected.
Jon Krohn: 00:12
Welcome to the SuperDataScience Podcast. My name is Jon Krohn, a chief data scientist and best selling author on deep learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today. And now let’s make the complex simple.
Jon Krohn: 00:42
Welcome to the SuperDataScience Podcast. I’m your host Jon Krohn, and I am very fortunate to be joined today by Ayodele Odubela. Ayodele has a master’s degree in data science, works full-time as a data science evangelist at Comet ML, and is the founder of FullyConnected, a community that supports the growth of black and brown people getting started in AI and machine learning. And on top of all that, she’s the author of a brand new book about getting started in data science.
Jon Krohn: 01:11
During this episode, we focus primarily on how the data and algorithms that influenced so many aspects of our lives from the quality of the healthcare we receive to the jobs we’re invited to interview for very often perpetuate historical biases against particular demographic groups. It is a constructive conversation with countless specific practical actions we can take to train and deploy AI models that are indeed fair. In addition, Ayodele also fills us in on all of the invaluable on-the-job data science knowledge she wishes she’d been taught in grad school and the imposter syndrome that many experience especially early on in their data science careers.
Jon Krohn: 01:54
This episode is ideal for technical and non-technical folks alike. If you’re interested in being more thoughtful about the impact data and algorithms have on society, and I passionately believe every one of us should be, then this episode will be a rich resource for you. Ayodele, welcome to the program. It is so wonderful to have you here a return guest, and we’re very excited about it. Where are you calling from today?
Ayodele Odubela: 02:29
Yeah, I am in Denver, Colorado. I am really, really excited. I love the show. I have been a longtime listener, and it’s always nice to be back.
Jon Krohn: 02:39
Nice. Were you listening before you were on the program?
Ayodele Odubela: 02:42
I was. That’s why I was like, “Oh, it’s kind of crazy. But it’s awesome, I love it.”
Jon Krohn: 02:47
So, Kirill who is someone from the show reached out to you, and you were like, “Yes, please.”
Ayodele Odubela: 02:52
Absolutely. Your podcast was the only real data science podcast I listen to frequently. So, I have a couple I dabble in, but I listened to the podcast religiously when I was in grad school.
Jon Krohn: 03:06
Nice. Well, that is great to hear. We didn’t pay for these comments.
Ayodele Odubela: 03:12
You did not pay me to say that.
Jon Krohn: 03:14
So, you have had a lot of developments in your data science career since you were last on the show with Kirill. I can’t wait to dig into so many of them. We can start off by talking about how what you’re doing right now as a data science evangelist at Comet. So, what is Comet ML?
Ayodele Odubela: 03:34
Yeah, Comet ML is a tool that really just helps data scientist, ML engineers formalize and be able to easily reproduce their experiments. So, especially those of us running deep learning models that have so many hyper parameters and outcome metrics, it’s difficult to compare them especially against each other. So Comet’s UI, you can add a couple lines of code, start automatically tracking things like hyper parameters, data samples. And then go to our UI and really just create the kinds of visualizations that you need just to compare your models as well as being able to collaborate a lot easier.
Ayodele Odubela: 04:15
Before actually joining Comet I had briefly heard of Comet. The first thing I thought was, “I’m really frustrated that I wasn’t using this on a team of about 40 data scientists.” I had several sessions where I would be sitting down with my manager trying to get my models to work on their computer. So that’s just one of the range of problems that Comet’s trying to fix.
Jon Krohn: 04:41
So, it helps with… It sounds like it helps with version control. And not just control of software versions like your particular code for running the model, but also associated things like the data. And yeah, hyper parameter files, model weights. On top of that, it also makes it easier to make your data science code portable from machine to machine?
Ayodele Odubela: 05:07
Exactly. So, if you are collaborating on a team, all everyone on your team really needs to have installed is Python. They can install Comet, and PIP and then go to their front end UI. So managers can look at all of their data scientists and be able to see what their productivity level is like, but it’s not necessarily just [crosstalk 00:05:29], but to make sure that no one’s doing overlapping work. I think this is a problem that I run into where you basically you work on the exact same model on the same data subset. So it’s easy to have these project workspaces, and just collaborate on this model training and iteration process together.
Jon Krohn: 05:48
That sounds brilliant. It sounds like something I needed in my life. And it also doesn’t surprise me that in the January episode, one of the first that I ever hosted as SuperDataScience host, Erica Green, who’s a data science manager at Etsy was evangelizing for you. I mean, not for you personally, but she was talking about Comet ML, and what a big difference it’s made for her workflows. And so, for a huge company like that with how many data scientists they have, it goes to show how useful what you’re doing is. And it’s amazing, I found out in the conversation just before the podcast started, Comet ML is actually pretty small still, but you’re growing quickly.
Ayodele Odubela: 06:28
Yes, it’s growing incredibly quickly. We’re really excited because we are trying to tackle a lot of these problems. But I think that that leads to meeting really talented people. So, we are always looking for awesome talent. Yeah, it’s been a ride already. I’m excited that I don’t… That people like Erica can do a little bit of my job for me. But I’m excited to try and build more awareness for something that would truly make a difference to you, data scientists.
Jon Krohn: 07:00
Nice. And so, I mentioned Erica evangelizing, which got you smiling because your title at Comet is data science evangelist. So tell us what it means to be a data science evangelist. It probably varies company by company, but you can let us know what it’s like for you.
Ayodele Odubela: 07:17
Yeah, absolutely. So at Comet, it’s really about trying to build ties with our community. So our community is people who are doing data science and building models. But, A, saying, “Hey, this is a thing that’s out there that you should probably be using.” It’s not hard to convince people that manually taking notes in Word docs or spreadsheets for [crosstalk 00:07:41]-
Jon Krohn: 07:40
Yeah, I use more spreadsheets than I should admit.
Ayodele Odubela: 07:46
Yes, and I have been there. I think that’s what makes it easy for me to talk about the benefits. But my role is a mix of creating some technical content like blogs and things that are machine learning kinds of tutorials. And the other part is trying to show people how to use Comet. So beyond saying, “Hey, we’re here,” but going into, so once you’ve set up a project, here’s how you can run it. Here’s how you can share it with folks. I want to stress that it’s not just for people in industry. So, there’s a lot of academics who use it as well because, A, it makes your research really, really easy to reproduce, to share with those in your departments, to have small teams working on different kinds of ML projects. I know from being in grad school, and taking some AI classes, it was almost impossible to not be in the same room with someone and try to work on a project together.
Jon Krohn: 08:46
Yeah, that sounds great, especially for international collaborations maybe in completely different time zones. Very common in academia, and with a tool like Comet it sounds like you could easily be sharing that. It would also potentially… I’m just free wheeling here, but it’s something that happens a lot that I remember from my time in academia and data science is people have a lot of issues with reproducibility. You try in your methods to explain exactly what you did. But sometimes it’s very difficult. Even if you’re providing your dataset publicly available, people can still have a really hard time getting the same result as you so you could do that with Comet, right? You could provide somebody with… Yeah.
Ayodele Odubela: 09:31
Yeah, I think that’s one of the best aspects is that you are able to see for each model or for you’re moving a candidate model into production, you’re able to see all of the artifacts, the exact… Especially for companies in healthcare where there are so many different industries where you are testing dozens of different data subsets at once. And especially for those in regulated industries. It’s easy to say, “Okay, there is some paper trail of exactly what modeling techniques we took, the kinds of documentation, and we have templates for scoping these projects and going through the ethical requirements as well.” So, it gives you a really solid, not just paper trail and saying that you’re checking a box, but especially for earlier stage companies or earlier stage data science teams, you’re setting yourself up for success in the future.
Ayodele Odubela: 10:26
I’m sure you’ve seen there are so many times that models don’t end up in production or projects are abandoned or fail for whatever reason. It’s incredibly difficult to figure out why. It’s incredibly hard to debug why it’s going wrong. Or you start to see these accuracy drifts like in production, and you don’t know… You might have overwritten a cell in a Jupyter Notebook, and you don’t know why certain things are working the way they are. So, especially for high risk industries, regulated industries. It’s a good solid way of just having better documentation about exactly what you did in the experiment process.
Jon Krohn: 11:09
It sounds really great. I am a convert. Your evangelizing has succeeded. I believe in this. This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience. It’s the namesake of this very podcast. In the platform, you’ll discover all of our 50 plus courses, which together provide over 300 hours of content with new courses being added on average, once per month. All of that and more you get as part of your membership at SuperDataScience. So don’t hold off, sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.
Jon Krohn: 11:59
So, Comet isn’t the only work that you do. You are a very busy bee. And you are also the founder of a special organization called FullyConnected. And I love that it’s called FullyConnected because it obviously evokes for me the idea of a fully connected layer in a deep learning architecture.
Ayodele Odubela: 12:18
Yes.
Jon Krohn: 12:18
And yeah, that’s not a coincidence. I think it also has a great metaphorical meaning. Why don’t you explain what FullyConnected is to us?
Ayodele Odubela: 12:28
Yes. I’m so glad that you brought that up because initially most people don’t get it. But I really went with FullyConnected because I wanted to create a community especially for black and brown people in machine learning and AI. There are specific obstacles I think that I faced. One, is just being transparent and frank I think it is a matter of fact that there are a lot of demographics who don’t have the same access to great mathematical education. So, I think for myself and so many other people, that was the biggest setback when getting started. I hope it brings people some comfort to know that I was never a math whiz. I was in remedial math in middle school through college. This was never a thing that just, oh, I just got it one day until I took a applied statistics course, and I understood, oh, there’s a why to why I’m doing this.
Ayodele Odubela: 13:27
I started getting an interest in data science and machine learning. I was so terrified of linear algebra. I was like, “I can’t hack this. These squiggles mean nothing to me.” But I really had to take it from the approach of I’m doing things that are meaningful to businesses. I’m doing things that have potentially life saving results, just learn it. So, I like to help people have a forum where it’s more comfortable to talk about that. I think specifically with data science, it’s really easy to feel like an imposter. There are super smart people with PhDs. And that’s not a bad thing by any means. But so many people feel intimidated that they can’t just learn something because it’s new, and because it seems locked away behind certain kind of educational gates.
Ayodele Odubela: 14:19
So, a lot of the work I’m doing with FullyConnected is to, A, break those barriers and start teaching this like basic math in a way that’s culturally relevant for people. I know when I started hearing things in terms of sports, and I’m a huge sports fan, I was like, “Oh, I can get stats. That’s easy.” It wasn’t so intimidating anymore. But a lot of the programs I’m excited about this year. First, we’ll be releasing some online courses that set that foundation for anyone who wants to go further into data science and machine learning. And then we’ll be doing also an immersion program. So, this stems from my experiences with trying to go from academia to industry, and not feeling like I had all of the tools or I knew exactly what I was supposed to do on the job.
Ayodele Odubela: 15:09
So my first couple data science roles I was the only data person, which I think is incredibly hard when you are just learning still. So I think the immersion program is going to be great because we will be pairing people who are still students, or early career transitioners with people who are more experienced, and it’s very much you get to see what their day to day is like. You get to see some of the tools they use in the workflow. And I’m hoping that, that inspires people to realize it’s not so… We’re not necessarily writing neural networks by hand and doing math formulas too all day, but really how to use these tools in combination with each other to get the job done.
Jon Krohn: 15:52
I love this. FullyConnected sounds like such a special venture for you. I personally, I experience a feeling of being an imposter as a data scientist. And I have had every imaginable in advantage, I guess in terms of I did a PhD in a quantitative discipline. And so, that’s kind of like, oh, well, that should make me feel comfortable, or even writing a book and in the field. And then I’m like, “Well, I wrote the book. I shouldn’t feel like an imposter anymore.” And there are lots of people that… I think this matters in a way because of us being humans, there’s something about me being able to see people who look like me and talk like me who are already leaders in the field. And so, I can only imagine how with me having that kind of imposter feeling that if you don’t even feel like you can see yourself being a data scientist or being a data science leader how much harder it would be.
Ayodele Odubela: 17:05
Yeah, I think that everyone has that. I think this is common for a lot of areas in tech. But specifically, because of how data science is developed. Everyone feels like that imposter syndrome, and that we’re never always up to date. I think part of it is being comfortable with not knowing everything. I think that was one of the biggest professional things I learned is that so many times I’ll be like, “I don’t know, or it depends.” And that’s okay. And having leaders be like, “All right, well if you don’t know what are the steps we can do to find out?” But not assuming, why don’t you know this? And that was a really hard obstacle to get over.
Jon Krohn: 17:48
Maybe our takeaway for the audience from this episode should be if you feel like you know what you’re doing in data science, you’re probably making a mistake.
Ayodele Odubela: 17:58
Pretty much. You’ll always feel a little bit out of the loop.
Jon Krohn: 18:03
Yeah, I couldn’t agree more… I think, and things seem to be… I think not things seem to be, things are definitely changing more rapidly all the time. We have way more people working in the field than ever before, across academia, and in industry. And so, there’s an impossible number of things to keep on top of and every time you specialize a little bit more in one thing, you’re losing specialization in something else.
Ayodele Odubela: 18:31
Yeah, I think it is also difficult, especially, I remember just a few years ago trying to decide what specialization should I look at? Should I be the NLP person? Am I the computer vision person? But it’s so like you mentioned with all of the research, even if you are specialized, it’s really hard to stay on top of all the new papers and frameworks. So, it’s okay not being on top of it all, but that’s why there’s podcasts like these and a lot of other resources out there for you to learn.
Jon Krohn: 19:06
A podcast like this to make sure that you feel like you can’t possibly be on top of it, and that it’s okay to feel like an imposter. That’s what we’re trying to… That’s what we’re educating you on in this podcast. Now, that makes a lot of sense. We had Deblina Bhattacharjee on a recent episode. And she’s pursuing a PhD in computer vision. And she’s doing all kinds of various applied cool projects. And she spends so much time reading computer vision archive papers to stay up on top of just that one space. And she has the luxury of being able to do that as basically a job when you’re pursuing a PhD. She can spend 10, 15, maybe 20 hours a week just reading papers and staying on top of things. And when I heard her say that I was like, “Oh man, I probably don’t know anything anymore.” Are you still using neural networks and computer vision?
Ayodele Odubela: 19:57
Right. Probably not at this point.
Jon Krohn: 20:02
So, that is a perfect… So, this topic that we have been talking about, about feeling comfortable and getting going in data science, it gives us a perfect segue to talking about your new book. So, on top of having a full-time job, and founding a data science community, you also have recently published a book. And so, the book is called Getting Started In Data Science. So, what do you cover in that book? Do you have a chapter on imposter syndrome?
Ayodele Odubela: 20:32
I don’t, but I have a chapter that covers it. I’m really excited about this because this was the culmination of what I learned in industry. There’s a chapter on career insight, and it goes into all of the things I wish grad school taught me. So, learning how to clean messy data over and over again, and as an iterative process, and having a little bit higher standards as far as what is good data cleaning. But aside from that, that chapter also really just covers the ways that you can find yourself at an advantage. So, especially for people who have non traditional backgrounds. I would dare to say data science is one of the harder fields to enter. Because I think while there isn’t a perfect path, there is an expected path I think a lot of organizations are looking for when they’re hiring.
Ayodele Odubela: 21:35
So, I talk about marketing yourself because it’s nice that I come from a marketing background. But I know plenty of people in this space who aren’t as forthcoming with their wins and are missing out on a lot of public attention for the impactful things they do. But more generally speaking, the book goes into how do you actually analyze data? That was one of the harder steps for me initially in that I would run some statistics and then not really know what to do next. I’m like, “Okay, I have these, what kind of models do I build?” So I think it really starts there, but slowly introducing statistical concepts, and then trying to put all of these things together in. Okay, you’ve got your data, you have an understanding of methods, what next? What does that feature engineering pre-processing step look like?
Ayodele Odubela: 22:32
I cover the modeling step as well as really having these frank and difficult conversations about ethics and bias and machine learning. So, I can say that I graduated with a degree in data science without ever really understanding the depth of these issues. And I think that was really what has driven me to focus on this in the vast majority of my career because we have, unfortunately, so many people who are naive, and who you go through grad school program, and they don’t mention it much. Or mention the things that you actually need to do as an engineer. So, I wanted to talk about really how you make these decisions, how you go through and scope projects and try to measure what potential harm you can cause.
Ayodele Odubela: 23:27
So, I bring this up early, so that people also understand there’s more to what we’re doing than building a widget. We are in a really unique position, especially because ML is used in every single industry now. We’re in a really unique position and have a lot of power about how these technologies rollout and how they impact people. So, I wanted people to know early on they’re doing really important work. And that does come with a certain level of responsibility. And without exposure and awareness, it’s hard to in five years from now give you a talk on ethics. And then you can actually take away something tangible that you can impact or change at your job.
Ayodele Odubela: 24:12
I wanted to cover that, and from a perspective of I have been there and I have probably not always made the best decisions. However, I have found frameworks that work when talking to stakeholders, talking about abandoning specific projects, talking about alternatives to kinds of models they asked for like predicting gender based off of names. So, I hope to try and outline some methods that individuals can do within orgs to push for a little bit more reproducibility and transparency in their work.
Jon Krohn: 24:53
That sounds like such a valuable book. You’ve really covered a wide gamut of topics that people really should know as they’re getting started in data science. It sounds from just feeling comfortable in the job to the actual brass tacks of doing the data engineering. And then also frameworks for talking about project management, and dealing with the social issues associated with this occupation. I think that that sounds like a brilliant book. Yeah.
Ayodele Odubela: 25:31
I’m excited. It’s something I wish I had earlier on. I think that’s what really inspired it was, what would I have told me before I decided to sign up and go to grad school for this thing?
Jon Krohn: 25:47
Yeah, and it’s so nice that you took the time to do that. I can only imagine how busy your week must be. And it’s fabulous that so many different things you’re taking on are to help out people in data science. So the book, the FullyConnected community, you’re building bridges for people to make the most of this career and get into it in the first place. So, yeah.
Ayodele Odubela: 26:14
Attempting to make the road a little less rocky. I will be the first to admit it was not an easy path. And I know that it’s not for most, but if I can help make it a little bit less hard to learn some of the lessons I learned then awesome.
Jon Krohn: 26:31
Lovely. And so, you were talking about near the end of your explanation of getting started in data science, your first book, it’s your first book, right?
Ayodele Odubela: 26:40
Yes, yeah.
Jon Krohn: 26:43
With how much you’ve done I wouldn’t be surprised if you have more books up your sleeve? Well, I guess in the metaphorical sense you do have one book up your sleeve, and it is related to that topic of bias specifically. And so, this is going to be published by Wiley. And all I know about it is the title. I know that it’s called Uncovering Bias In Machine Learning. So, what’s happening? Is it kind of a continuation of topics that you began talking about and getting started in data science?
Ayodele Odubela: 27:13
A little bit. So, I really just hit the top of the iceberg in my first book. And this really just dives into specifically for practitioners. It’s part social lessons, so part social and history of computing and technology, and part frameworks and tools to actually build A, more interpretable models, and B, trying to change the culture in our organization. So, I’ve done a lot of work now with different companies trying to audit models or trying to have a better understanding of if they have disparate outcomes with their models. And I think so much of that work has come back to their organizational structures. And it’s beyond just having a couple engineers make whitebox or interpretable models. I’m really excited in that I’m hoping to give engineers both a little bit of history so that there is an understanding of why this is taken with more severity than a lot of other areas of tech.
Ayodele Odubela: 28:27
And then go into here are tangible things you can do. Here’s code examples of how you can alter models. Here’s code examples of how you can calculate fairness. But then also mention that so much of this struggle and trying to get to organizational accountability and transparency doesn’t just come from individual data scientist or ML engineers. So I hope to give people frameworks to talk to stakeholders, to have these difficult conversations and have internal specific ethical approaches. So there’s multiple approaches that you can take for ethics. There’s the rights approach. There’s the utilitarian approach. Within all of these things it’s really rare that organizations have a standard of this is what we go by. These are the levels of acceptable risks. These are the levels of the harm thresholds we can tolerate when we’re talking about how our products interact with humans.
Ayodele Odubela: 29:32
So, the book is going to talk to engineers about pure frameworks for speaking up to stakeholders. Here are factual things that you can use as data points as far as getting your point across as well as whether you should abandon a project. But I think one of the biggest aspects is that you take anything really pushed to actually collect in a consent first way. Make sure that you are collecting the data on protected classes. And so, I pushed so hard for this because so many organizations assume that they’re being fair by not asking for this data, and then using it as features to build models. But it makes it really, really difficult on an auditing standpoint to actually calculate fairness if we don’t know what those classes are. If we have no idea of what that spread is.
Ayodele Odubela: 30:29
I hope to give a little bit of inspiration in that we are headed in the right direction. But this kind of utopian fair AI world we kind of want to be in takes so much more work than just individual coders. It really takes organizational change, and how do we get those above us, especially in orgs to accept that change when it does come at a loss of either profit or specific goals that they had organizationally?
Jon Krohn: 31:03
I can’t wait for that book to come out. I feel like there’s so much in there that I need to digest. You touched on the idea of things may be changing. I think, so the impression that I get and I would love to have your opinion around this is I get the impression that people are starting to think data scientists and people working in tech companies that use a lot of data are starting to at least have a conversation about these kinds of things, about issues around fairness. And so it seems like some of these organizations could be doing more to actually be taking steps as opposed to just talking about it.
Jon Krohn: 31:48
But one specific example that comes to mind for me is that in Europe’s Neural Information Processing Systems, perhaps the most prestigious machine learning conference, they have recently, I’ve noticed that for any paper that you submit, you have to provide some information on the social impact of the model that you’ve devised. And they also have whole sections of the conference now dedicated to tools for explainable AI, or evaluating fairness, those kinds of things. So, it seems like some people are taking real steps. But yeah, what’s your perspective?
Ayodele Odubela: 32:29
Yeah, I think we are still in the early stages of awareness, kind of, like you mentioned. So we are starting to have more conversations around it. I think the biggest area that we can improve, and really, the one that needs the most improvement is really transparency. So I think there’s a lot of talk about having an appeals process for algorithmic decisions, or having some kind of remediability. So you’re denied for housing, or you’re denied for a job. Yes, that has a drastic impact on your life. So what are we doing as organizations to make up for that? What are we doing to fix the damage that’s already been caused?
Ayodele Odubela: 33:16
We have looked forward a lot. But I think in order for us to have algorithmic appeals or remediability, we can’t do either of those things without transparency. So part of this, I think, will end up coming from regulation. So at the bare minimum like a GDPR in the US. But going beyond that, if we don’t have… If we’re not transparent with our users about, okay, we build X, Y, and Z models to predict this kind of activity about you using data from this segment of users. If we’re not able to do at least that we’ll never get to a point where a user says, “Hey, I think I was mistakenly assigned some class value or I was segmented into a specific group.”
Ayodele Odubela: 34:06
It’s really hard to have any kind of appeals process because we have no clue about what algorithms are predicting things about us about whether it’s clicking on an ad, or it’s going to a hospital and dealing with patient triage. Because we don’t know and it’s pretty much all opaque. There’s no real way for user groups and not journalists to write about or surface these issues. It’s next to impossible if you are… And I bring up medical systems a lot because they are truly life and death. But if you are, let’s say deny for certain kind of treatment because it was an algorithmic decision it’s almost impossible right now to say, “Hey, I think maybe this wasn’t the right decision.” So, transparency really opens the door for algorithmic appeals for being able to basically either financially reward users if they are negatively impacted or specific groups are negatively impacted within all of your users. So, I would say transparency is is key.
Jon Krohn: 35:19
Yeah, it’s interesting. So, it seems like, yeah, step one is awareness that there is a problem. But to even be starting to fix these problems, unless we know how our data is being used, we have some idea. And to me at this stage in 2021, I can’t even really begin to imagine what that would be like to have when I’m being served an ad being able to probe and see why was I served this ad? So yeah, you’re right. So we are in very early stages.
Ayodele Odubela: 35:51
I think, surprisingly so because of the lack of regulation in tech in general. So obviously, machine learning has been really popular for tech companies and has trickled outwards into all of these other industries. But as we started to adopt it, like machine learning, and policing, and healthcare, and all of these other industries, I think we had two major problems. One, they weren’t really prepared for how the social issues were going to impact real people. I think it’s now three arrests we’ve seen based off of bad facial recognition algorithms that have just misidentified people.
Jon Krohn: 36:37
I mean, I’m sure it’s… Yeah, three that are probably widely reported. But I’m sure there’s tons of this happening out there.
Ayodele Odubela: 36:47
Exactly. And there’s, it’s really difficult to stop it because organizations using these tools, organizations like police departments, don’t really know what’s going on in them either. So it’s hard for them to say technically what’s happening. We have made it incredibly opaque because we see machine learning as part of our tech IP. But when we’re in healthcare, if we are working with local governments, I think it goes far beyond just, oh, our little user base of people who consented to use our product. And I think we do just have to be more transparent. And I’m with you, I don’t know what the world looks like if I get an ad. They said, we’re like, “Hey, we think you’re a power user. So you got this email.” I can’t imagine what that would be like. But when we start getting into, we predicted that you would need less healthcare because of X, Y, and Z reasons, then it’s like, “Okay, no, I want to know why these decisions are being made.”
Jon Krohn: 37:55
Yeah, it goes to show all the progress that needs to be made. Yeah, we’re nowhere near having real transparency. And it’s going to be a critical step to having algorithms that are truly fair. So, the example that you were just giving around enforcement agencies using tools to identify people. So by far the biggest company in the space, I think it’s safe to say by far, certainly the one that’s most publicized is Clearview. And I think it was last week at the time of filming. So we’re filming in early February. And I think it was just last week that the Canadian federal government ruled that Clearview is not legal for any of their enforcement agencies at any level to use.
Ayodele Odubela: 38:42
Yeah. Pretty, pretty big. I think that is to me almost kind of we’re riding these small waves of regulation suggestions that I think will get us almost to this apex of we have to reevaluate all of it. And like you mentioned, we are nowhere near close to this transparency because of a lot of reasons. But really, most tech companies having less regulation in general I think is a big part of that. We also saw, it was, I believe, June of last year that the ACM recommended that all governments and businesses stop using facial recognition for identification citing… They cited catastrophic impacts. And so, I think since last June, and how many online courses have been amended because I think that’s where it starts. I’m unfortunately not on the policy side of things. But-
Jon Krohn: 39:52
Not yet. I have a feeling that you’ll be the next time you’re on the show [crosstalk 00:39:56] work that you’ve been doing.
Ayodele Odubela: 40:01
Right. But I think it starts with educating people who want to do computer vision. If you take 12 computer vision courses and you never heard about this problem. So, I’m looking to academic institutions. I’m like, “Okay, you have a course that’s coming out after this.” I hate to say the ACM is the most respected, but there’s few orgs like that, like the ACM, IEEE, that really have this kind of impact over our industry. If they’re saying, “This is such a bad thing, we shouldn’t be using it.” We should start teaching that. We should start teaching how we get to fairness. How we get to statistical parity in our data set samples. So, there’s so much work to be done because we start seeing this.
Ayodele Odubela: 40:46
All right, the ACM is not so happy with facial recognition. And then the Canadian government says, “Clearview as a whole isn’t something our legal department should use.” We have to I think, get to that root cause, almost of why is this a problem? And part of it is datasets. But the other problem is it’s hard for people who don’t experience certain marginalizations to understand that these are issues to begin with. And I think that is I recently have discovered I feel that so much work specifically in equitable machine learning trying to get to equal outcomes is really a lot of diversity, and inclusion work. Because I’ve been on several data science teams. I’ve been the squeaky wheel to a lot of issues on data science teams. And it’s I by no means am a subject matter expert on everything. But I even have to catch myself. There’s a lot of disability issues I don’t understand. There’s a lot of unchecked biases that I have that I’ve had to reevaluate in how I build models, if I am just assuming that binary gender prediction is the norm.
Ayodele Odubela: 42:07
I think we have a lot of those challenges to think about that I didn’t realize were, I guess, issues for a long time as well. So, there’s movement in the right direction. But I think we will just end up seeing more regulation because it’s hard to gauge yourself. If I were to look back at my career I’d say, “Yeah, of course, I’ve been ethical.” I’ve tried to make every correct ethical decision. But it’s so hard to do that when we are the same people we’re trying to judge it. I think a lot of it has to come from outside sources.
Jon Krohn: 42:45
Yeah, so I think I agree, or not I think I agree, I definitely agree that there will be more regulation. That things like GDPR will become more common in the US. It would be nice, in fact, if there was international standard that everyone had to comply to. So right now in the US we have specific legislation in California, and in Maine. There’s specific Canadian legislation. There’s specific GDPR legislation from Europe. And so, we have someone on our legal and operations team who has to keep on top of all of these various pieces of legislation that are not aligned in any way. So yeah, it’d be nice if in the future we could have federal administrations working together and having one set of policies.
Ayodele Odubela: 43:37
I think as we get more international with really accepting so much remote work, accepting being in different time zones. I think we will start to see governments pushing in that direction, especially companies who are operating all over the world. It’s hard to only have ethical standards in some places and not others.
Jon Krohn: 44:02
Yep, that makes a lot of sense, and these issues are surmountable. So, they eat into profit margins as you mentioned earlier. So if you’re Google and you’re regularly indexing all of the internet, then of course that comes to mind when you think to yourself, “Oh, I want to build a facial recognition algorithm. What should I use?” Well, you already have all this data on hand. And I’ve given this example on a recent podcast episode already. I can’t remember which one it was, but as a contrast to that approach, Apple when they were designing facial recognition for their iPhones to automatically unlock them when you look at them.
Jon Krohn: 44:43
You’re nodding your head. I think you already know this. But maybe the audience members don’t. That they hired actors from a broad range of demographic backgrounds and facial appearances to ensure that their phones worked well for people of all different kinds of groups. And that is something that’s… Yeah, I mean, I think I gave the same example before, but it is a huge gap that there was a Google photo classifying algorithm that was classifying darker skinned people as gorillas. And instead of being able to fix that in the model, they just made it so that’s not an output. So, that’s not a real solution.
Ayodele Odubela: 45:26
It’s funny that you mentioned that because I literally mentioned it in the keynote I gave yesterday. Because removing gorillas in output helps fix this situation by no means. It doesn’t stop similar things from coming up, and it doesn’t get to the root of the problem. And I’m glad you brought up Apple with that because I think that is one of the things that we notice this insane amount of measurement bias. So, so many of the biases we end up seeing in machine learning are not necessarily the fault of the engineers or even the data itself. I’ve pushed myself and I’ve pushed my colleagues to understand that just because there is data, and it is technically ground truth it exists does not mean it tells the full story, especially when we’re talking about measurement devices.
Ayodele Odubela: 46:25
So one of my earlier roles in data science was really doing machine learning on sensor data, trying to recognize the difference between a firearm in someone’s hand or another object or a cell phone. The problem is we have so many technical solutions this comes from, or this can be. Even the Apple watches have difficulty with sensing correct and accurate heart rates for people with dark skin or pulse oximetry. [crosstalk 00:46:57]. It’s shocking. And I think one of… We’ll see. Maybe there’s a career for me down the line either in research or hardware. But one of the things I wanted to study is that we don’t currently have cameras that sense dark skin with the same accuracy that we sense light skin with. I think that is the crux of every single facial recognition problem. It has nothing to do with the personal beliefs of the people creating it. Our training data is based off of tools that are not equal already.
Ayodele Odubela: 47:35
I think having that understanding, and like, “Okay, so how do we make these tools equal?” I think there’s more work that we’ll see done on this hardware and product prototyping and testing side of things to get to equality on a technical level, and it will make it much easier to, okay, well, if I have datasets of the same size sample, and I have tools that got these images with the same level of accuracy for people in all groups then we can say we’re getting closer to fairer facial recognition.
Jon Krohn: 48:10
That’s something that I never thought about. I had never thought about hardware.
Ayodele Odubela: 48:16
I think that is half the battle is we have millions of cameras in our laptops. If we’re going off of Zoom data, or if we’re going off of these cameras, we have to assess what kind of disproportionate outcomes we’re seeing there. Are we able to get the same level of clarity even under constant lighting conditions?
Jon Krohn: 48:47
Yeah, so some clear room for opportunity here for people looking for real work to do in the world, and make a big impact on society, improving sensors, so that they are effective for a broader range of people. And transparency, so that we can start making steps on the road to having fair algorithms. Because if you don’t know how you’re being treated, then, of course, is difficult for anybody to be getting feedback improving these algorithms.
Jon Krohn: 49:18
And so, despite us being clearly not even 1% of the way on our journey to having accountable AI systems, fair AI systems, I do dream of a future where… I dream of a rosy future, and it’s because data and algorithms allow us to track and ensure that we do have fairness in our systems in a way that you never could when you’re asking a person, when you’re evaluating whether a decision was fair. So, when a judge is handing down a sentence there’s no way to get out of that black box really what was behind the length of that sentence. And so, it will be interesting to have algorithms involved in more parts of our lives in the future once algorithms are making fair decisions.
Ayodele Odubela: 50:21
Yeah, I think you brought up a really interesting point there, too. I think there’s a huge amount of work to be done. We talk about data validation when we’re building models, but more so critical data validation. So, let’s say we’re looking at housing data. How do we go look at historical data and decide if the human at the moment made the right decision? We can take a lot of the other data we have about housing and about population, and census data. And then try and work backwards and say, “Are we using this training data to build models that make identical decisions that were bad?” Or can we look at this data and say, “Should a different decision has been made?” I think there’s a lot of interesting research to be done there, especially when we’re talking about criminal justice, and housing, and medical data.
Jon Krohn: 51:20
Beautifully said, as with everything you said in this episode. You articulate it all very clearly. Yeah, so things are definitely… We definitely don’t have fair algorithms today. But we could in the future some day. And yeah, that’s something to look forward to. So on the note of making progress, I mean, this is a bit of an orthogonal segue. I’m not a segueing very well here. But on the note of things coming in the future, and opportunities for people to learn and be better on top of your books, Getting Started In Data Science, and Uncovering Bias In ML, your forthcoming book, the second one there, which will help people with these specific fairness related issues.
Jon Krohn: 52:14
You also do have another course, believe it or not, to the listeners out there that these aren’t the only projects. So on top of being a data science evangelist at Comet ML, being the founder of FullyConnected, having a book that just came out, Getting Started In Data Science, having the forthcoming book on bias in ml. On top of all of that, if you would like to check it out in LinkedIn learning, you can get access to Ayodele’s brand new course, which is on Supervised Machine Learning. And yeah, so do you want to tell us a little bit about that course? And how does LinkedIn learning… Do I have access to that just by having a LinkedIn account or is it a special… Is there some kind of paid tier?
Ayodele Odubela: 53:04
Yeah, so with, if you have a LinkedIn premium account you have access to the whole LinkedIn Learning Library, but you are able to purchase courses a la carte, so you could just buy one off courses if you’re not interested in the whole subscription. But I’m really excited because this course is a really good hands-on course for people who have maybe been interested in data science, but haven’t really started building a ton of models yet. So, maybe you have heard of NumPy and pandas, but not really exactly sure how to start building a neural network or a decision tree. This course is perfect for those folks. So, it is in Python. You don’t need to have a super strong knowledge of Python. But any past Python coding or just coding experience is helpful. And I’m excited because it really just gives you a taste of all of the… a couple of the different kinds of supervised learning models out there. So, yeah, I’m excited for it. It came out end of last month.
Jon Krohn: 54:08
Nice, end of January. So, yeah, even I think this episode will air early March, and that will still be very much brand new. It sounds great. So other than obviously, your book, Getting Started in Data Science. Do you have any recommendations for our listeners on wonderful things that you can read?
Ayodele Odubela: 54:31
Yes. The book I’ve most recently been reading is called Race After Technology. It is a phenomenal book that does not initially seem like it’s a tech book. So, if you are looking for just a good read. The book is written by Ruha Benjamin, and she is a phenomenal storyteller. I really, really enjoyed it. But it’s also given me insight and random facts about how tech has developed and how there have been plenty of events that are in the history of machine learning that I didn’t even know about yet. So, definitely the nightstand book at the moment.
Jon Krohn: 55:15
Nice. That’s a cool recommendation. So, what is what is the central premise of the book? What’s [crosstalk 00:55:26]?
Ayodele Odubela: 55:28
It’s wonderful because she talks about how technology has been used in different racial spheres. So talking about surveillance in certain communities, as well as talking about how it brings to light a lot of the issues I mentioned like if you have dark skin it’s hard to be seen on webcams. But it brings to light how they impact people that we may not know about, if it doesn’t directly impact us. So things like having specific names, and it is very general text. So not all machine learning focused. But there are lots of Asian people who have two letter last names. And you may get a web form that says it must be longer than two characters. So, there are-
Jon Krohn: 56:16
You just said that you. Why you may get a web form?
Ayodele Odubela: 56:22
Literally, that says, “Hey, you don’t have enough letters in your name.” But of course, the feeling as a user though will be like, “But that’s my name. What do you want me to add to it?” There’s always that kind of friction of what do I really do when this technical tool wasn’t necessarily made for me? So, I really enjoy that she brought to light a lot of those aspects.
Jon Krohn: 56:48
I am so glad that you brought up this book because I now wish after the conversations that we’ve been having earlier in this podcast that I had asked, “Where can I be learning today about these issues in technology?” And you answered that question that I should have asked. So, thank you.
Ayodele Odubela: 57:05
I answered it.
Jon Krohn: 57:06
Yeah, we talked about so many important issues today that while they’re starting to get some attention they don’t get nearly enough. Thank you so much for opening my eyes to many of them. And I’m sure many listeners. [crosstalk 00:57:22]. Yeah, I can’t wait to have you on the show again and see what amazing things you’ve started by that time.
Ayodele Odubela: 57:29
Thank you so much, Jon. It is an amazing time always.
Jon Krohn: 57:38
Greatly appreciate Ayodele joining us for this enlightening episode. She is so deeply knowledgeable and articulate about the biases in data and algorithms that permeate throughout our society. And even better, she has clear action items for anyone to follow to begin to fix this pervasive issue. As examples, at the top of the data funnel we need to be designing sensors such as cameras and pulse oximeters that are as effective for dark skinned people as lighter skinned people. We need to be transparent with users whenever an algorithm impacts them. We need to offer recourse and be accountable when an algorithm makes an unjust decision. And we can use data analytics to track human end algorithm behavior to ensure all demographic groups are receiving equal treatment. Super.
Jon Krohn: 58:28
Well, as always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show and the URLs for Ayodele’s LinkedIn and Twitter profiles at superdataacience.com/449. That’s www.superdatascience.com/449. Ayodele has a very active social media presence and a huge following, so I’m sure there’s plenty to be gained from connecting with her. If you enjoyed this episode, I of course, greatly appreciate if you left a review on your favorite podcasting app or on YouTube. I also encourage you to tag me in a post on LinkedIn or Twitter where my Twitter handle is @JonKrohnLearns to let me know your thoughts on this episode. I’d love to respond to your comments or questions in public and get a conversation going with you. All right. It’s been a wonderful episode. Thank you for listening today. And looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.