Jon Krohn: 00:05
This is episode number 828 with Keith McCormick, Data science principal at Further.
00:27
Welcome back to the Super Data Science Podcast. I am your host, Jon Krohn. Today, we’ve got the great Keith McCormick on the show. Keith is data science principal at the enterprise AI Consultancy, Further. He’s the creator of dozens of LinkedIn learning courses on machine learning and AI, with, in aggregate, over a million students. And he’s also the author of four books on statistics.
00:51
Today’s short episode should be of interest to just about any listener. In it, Keith details common circumstances where low-code/no-code data science tools are the best option for you even if you are a coding whiz. He discusses whether citizen data scientists are myth or reality and how AutoML fits into the data science workflow and why it won’t replace data science teams. All right, you ready? Let’s jump right into our conversation.
01:16
Keith, welcome back to the Super Data Science Podcast. I’m delighted to have you here today for this quick episode. We’re just going to cover one specific topic, we’ll get into that in a moment. But first, how are you doing? Where are you today?
Keith McCormick: 01:29
I’m doing great. And thank you for the return visit, I always enjoy it. I’m pleased to report that I’m home, so I’m in Raleigh-Durham, North Carolina.
Jon Krohn: 01:39
Nice. And recently, at the time of recording, there had been some pretty big weather events, but not too close to you.
Keith McCormick: 01:46
Yeah, Western North Carolina, and pretty severe, yeah, but didn’t affect us here.
Jon Krohn: 01:53
Fortunate. Well, so your most recent appearance on the show was episode number 655. We had a great episode on getting a return on investment, an ROI on AI, so AI ROI, that sounds pretty fun. And so, people who haven’t listened to that episode yet and who really enjoy this shorter one with you can refer back to that great episode to get a sense of how you can be, not even a sense, to get a clear picture of what you can do to get a return on investment in AI projects, which are famously difficult to land. So really valuable episode there. And Keith, since that episode, there have been some big changes. So you are a principal data scientist? Is that the title?
Keith McCormick: 02:44
Yeah, so the basic idea there is that I’m a senior individual contributor. I think that’s what they’re going for with that title. I’ve got my Further mug here. I’m with Further. Some folks that listen to the podcast might have heard of my colleague, Cal Al-Dhubaib. He ran a small consultancy, was the founder of Pandata. And Pandata was acquired a few months ago, so we all made the move over to Further, and it’s been very rewarding.
Jon Krohn: 03:12
Excellent. Yes, that is nice when that happens. And congrats to you and to Cal for that happening. So that’s a big update since that last episode back in February 2023 that you were on.
03:23
And today specifically, we are here to discuss something that you heard to your regular listener to the show, which I love. I get lots of interesting insights from you when we speak about things that you’ve heard on the show. And something recently that really stuck with you was in episode number 811 with Nick Elprin, who’s the CEO of Domino Data Lab.
03:48
And I said to him, “I see the same thing we’ve talked about on the air before with other companies where there’s this idea of the citizen data scientist.” And I asked Nick, “Does the citizen data scientist really exist? Have you ever met one?” And Nick’s cutting response was, “Is this a rhetorical question?”
04:09
So yeah. So there is value, I have learned. Since there is a lot of value to low-code/no-code tools, so I wanted to open the floor to you today because you have used low-code/no-code tools a lot, and you have created courses on low-code/no-code tools. So fill us in on the value both for data scientists as well as this, I don’t know, maybe not ethical citizen data scientist.
Keith McCormick: 04:39
Yeah. And I don’t think it’s necessarily a huge controversy. I think that, possibly, you two guys were conflating two different issues, and that’s what I thought was interesting, which is low-code/no-code on the one hand and citizen data scientist on the other. And then, possibly, there’s even a third topic that gets mixed in there, AutoML.
04:59
So briefly, my history with this is I started using low-code/no-code in the ’90s, I’m not elderly, certainly, but I meant career, I’ve been doing this for a while. Folks might not remember that that was before Python’s popularity as a machine learning tool that existed as a general purpose programming language. And before R really had its peak in popularity.
05:25
So when I started doing data science and machine learning, the two big players in the marketplace were SPSS and SAS and they both had a low-code/no-code tool. In the case of SPSS, it was Clementine, later changed its name to SPSS Modeler and then IBM SPSS Modeler. And SAS had Enterprise Miner.
05:45
So I grew up on those tools. So for me, it’s not something that only citizen data scientists use, it’s something that an expert data scientist can use to ease collaboration with other teams. So for instance, I’ve been on calls live with a subject matter expert and been live working in the tool while they watch me and we try different things. And it’s not impossible to code Python live, but I’ve never seen anybody who’s quite as fast. So you can’t be as spontaneous in that way. And then, when we get to deployment at MLOps, there’s some different issues that get arisen and some of these tools perform better in that place than others, but in terms of developing the solution and prototyping, it’s what I’ve always done, it’s what I’m most comfortable with.
Jon Krohn: 06:32
That’s cool. So you would say you’re most comfortable with low-code/no-code tools?
Keith McCormick: 06:37
Oh yeah, for sure. Again, that’s how I started. Now, to be clear, at Further, I’m a consultant there, but we’re basically a Python shop at Further. But when I was listening to some of the projects that you guys were talking on the episode, which was very interesting, by the way, I really enjoyed the episode. I’ve done similar kinds of things and low-code/no-code tools over the years.
07:00
So again, sometimes the suggestion is made that it’s only for citizen data scientists, so it’s very limited, but I’ve done insurance fraud projects, and my colleague and I are pretty sure that was an eight-figure ROI project, in low-code/no-code tool. Right?
Jon Krohn: 07:17
Wow.
Keith McCormick: 07:17
There was some data engineering stuff going on too, but that’s where the model was built. There was this really interesting thing, and I can mention it because it was in the New York Times, usually, with client situation, you have to be careful, but this was in the newspaper. There was a big hospital in Manhattan where someone stole over a million dollars by ordering too much copy toner and then selling it on the black market, that was like a million dollars. So like many projects, it gets inspired by something like that. That was done in the IBM SPSS Modeler product. So dozens of them over the years all designed in low-code/no-code tools.
Jon Krohn: 08:03
Is that your favorite low-code/no-code tool, the IBM SPSS Modeler?
Keith McCormick: 08:06
IBM bought SPSS in 2009 and it kind of sent that community in a different direction. The real limitation is, as you know, I do workshops and things like that, and it’s really hard to get a trial license of that, so I started to drift away from it. I’m having my coffee in my Further mug, but on the same desk I have my KNIME mug. I use KNIME a fair amount because there’s a commercial version when organizations scale up to it, but it’s basically open source. So I use it in workshops, I use it to manipulate data myself. So again, my colleagues at Further use Python, but I’ve been using KNIME for about 10 years now.
Jon Krohn: 08:47
Nice. Yeah, KNIME. And for people who are listening who haven’t heard of that before, it starts kind of like the word knife, so it’s K-N-I-M-E, KNIME.
Keith McCormick: 08:59
Yeah, Konstanz Germany is the K.
Jon Krohn: 09:01
Oh, Konstanz.
Keith McCormick: 09:03
Oh, and actually, Jon, something you might not know about KNIME that I think you’re going to find very interesting, the earliest users of KNIME were genomicists.
Jon Krohn: 09:12
Oh, really?
Keith McCormick: 09:13
Yeah, it started in bioinformatics. I don’t know all the details, I mean, I’ve met the CEO, Michael, and I’ve heard little bits and pieces of the story, but it really started out with serious computational science.
Jon Krohn: 09:25
Cool. Well, actually, we’ve been trying to line up someone from KNIME as a guest, so we may have a KNIME episode coming up in the not too distant future. We will see. But yeah, okay. So we’ve talked now about low-code/no-code. I can see how it’s valuable to you as a serious data scientist and consultant, using low-code/no-code tools on eight figure projects. Very cool. What do you think about this citizen data scientist thing? Do you think that people who aren’t technical, who couldn’t otherwise maybe write Python code and find it faster to use low-code/no-code tools like you do? Do you think that they’re out there? Do you think that it makes sense?
Keith McCormick: 10:05
It’s complicated and I’m looking forward to, there’s a book that’s coming out, I think it’s already out in ebook, but it’s Ian Barkin and Tom Davenport. And of course, virtually everybody that listens to this podcast will recognize Tom’s name. The book is coming out I think next week on the 15th or on the 22nd, but they’re going to talk about the whole citizen movement and they’re going to talk about citizen data science, but also citizen developer.
10:27
I’ve always been a little bit of a skeptic. For me, if we just called it Citizen BI, and you guys hinted at this when you were talking about this in the podcast-
Jon Krohn: 10:36
When Nick and I-
Keith McCormick: 10:38
Yeah. Yeah, exactly.
Jon Krohn: 10:39
In episode 811?
Keith McCormick: 10:39
In the same episode, you were hinting at that. Because for me, what can happen is someone who’s a subject matter expert, some of the subject matter experts you were mentioning, it could be accounting, it could be in marketing, I think they can spot trends in the data and then bring that to the attention to the data science team. But I think you probably need the data science team and some MLOps chops to bring that into production. So I think they can spot the solution.
11:05
I’ve never quite understood how they can take it all the way to production on their own, because then you’ve got everybody in the organization building their own models, and you really need some data governance there. So my issue with it is how does the data governance work? So I’m actually going to interview Ian and Tom after the book comes out, and I want to ask them about that, how you handle the data governance aspect.
11:28
But having said that, if you’ve got a subject matter expert and you’ve got an expert data scientist, they can collaborate more easily in a low-code/no-code tool. For instance, I don’t do as much training as I used to, but I’ve been doing a training with a really big bank, and they have literally thousands of individuals using KNIME in the finance team, thousands. And the reason that they’re doing it is they want to get them out of doing ETL and Excel because it leaves no paper trail and it makes a big mess. Right?
11:58
Excel has its strength, certainly, I’m not here to slam Excel, I’m just saying that it’s not really an ETL tool. So they’re trying to train them enough to do the ETL. Now, their colleagues on another team might then do data science with that.
12:11
That’s what I’ve always wanted organizations to move towards. In all these years, almost 30 now, I’ve never seen anybody do it, but if you had a collaboration between a BI tool and the data science team, it would speed things along. What can sometimes happen with code is the data scientists kind of become a priesthood and it’s all very secretive and people don’t know what they’re doing and there’s less communication. So for me low-code/no-code is about good communication, it’s not about democratizing deep learning to someone that doesn’t know bias and variance.
Jon Krohn: 12:47
Cool. All right, thank you for that perspective. Another related term that I’d love to cover here quickly with you, you mentioned that you had recently been re-listening to episode number 627 with Erin LeDell on this show. And Aaron, at the time, had been at H2O.ai for a very long time. And H2O are leaders in AutoML. So how does AutoML, automated machine learning, fit into this picture of low-code/no-code? Does it relate in some ways or?
Keith McCormick: 13:17
I think it absolutely does, because again, I think that experts can use AutoML. In fact, in that episode with her, you talk about that and how it can speed things along, but there’s certain steps in the machine learning lifecycle that I don’t think AutoML does terribly well, right? So you really need an expert data scientist there.
13:39
So again, I think, probably it’s a marketing thing. We’ve conflated all three of these with the idea that if you have a low-code/no-code tool, now you don’t have to learn programming, and you have AutoML, so now you don’t have to learn modeling. It’s just automated, now everybody in the organization, literally, everybody can be building their own models. I just don’t see how that works from a governance standpoint. But in the hands of an expert, AutoML was a workforce multiplier, just like the best of gen AI, right, done well, it speeds things, it saves you a few minutes here, a few minutes there. Next thing you know, you’re working twice as fast. AutoML really should be the same.
Jon Krohn: 14:22
Cool. Yeah, great perspective there. Crystal clear, Keith, as usual. And appreciate you keeping it concise for this Friday episode. I know you could go on, you write whole books on these subjects, and indeed you have created dozens of courses on these subjects. So do you want to fill us in on some of the most exciting LinkedIn learning courses that you have released that are relevant to the topics covered in this episode?
Keith McCormick: 14:47
Well, I have quite a few now. So the LinkedIn folks expect me to keep them up to date, which, of course, is flattering, but it’s also a lot of work. So I don’t have new ones come out as often as I’d like, but I did just have a new one come out on problem identification and solution design. So it’s basically the first phase of the machine learning lifecycle. And I basically share, virtually word for word, what I would ask a client to clarify their business problem. And I even say, because I’ve been doing this long enough, what they might likely say in response, because there’s often a language breakdown between the data scientist and the business. So I’m really proud of that one.
15:31
And actually, I have one on AutoML too that isn’t a step-by-step of using AutoML, but which phases it does well, which phases it does poorly and what the implications are for team composition. Who do you need on the team if you have this tool? And the quick answer is you don’t fire the team, you still need the team, you just change the way they work.
Jon Krohn: 15:53
I think also, something in that AutoML course is about explaining the parts that can’t be automated well to management in your company.
Keith McCormick: 16:01
Oh yeah, for sure. Yeah. No, I think that’s really important, especially, one of the things I think that’s going on now that makes that important is organizations are spending so much on their cloud platform investments that I’m starting to see, I’ve been noticing it for the last couple of years, there’s a lot of pressure to live within that ecosystem, which means that the data science team might be somewhat arm twisted into all using the same tool and there just might not be enough understanding about what those tools can and can’t do and when you need the experts.
Jon Krohn: 16:37
Nice. All right. And so, to wrap things up here on the note of your LinkedIn learning courses, you have very kindly, for the second year in a row now, not only are you a listener to the show, Keith, you literally support the show with sponsorship. And so, there’s now the second year in a row running of this #SDSKeith hashtag campaign on LinkedIn. So if people follow the hashtag #SDSKeith, like Super Data Science Keith on LinkedIn. Then for the past couple of Tuesdays as well as the next few Tuesdays, you are giving away LinkedIn learning courses for free. So tell us about how that promotion works.
Keith McCormick: 17:17
Yeah. So if they haven’t clicked on the link, they can go a couple of weeks before, so every Tuesday, for a little while. So the Problem ID course was, of course, I’m not quite sure where on the calendar this lands, but the first two weeks we did the Problem ID course. And the second two weeks, I think we’ll do the AutoML course. And then, we’ve got another course after that. So we’ll do maybe Human in the Loop, which I think is really interesting.
Jon Krohn: 17:46
Nice.
Keith McCormick: 17:46
Again, typically, my courses are more on the strategy side, so what the heck is human in the loop? How would you manage a human in the loop data annotation project in-house if you had to? And what does management have to know about this phrase and what it means?
Jon Krohn: 18:03
Sweet. Yeah, so free courses for our listeners by following that #SDSKeith hashtag. You get access to these three great courses. And yeah, really appreciate you partnering with us on this, Keith. And hopefully, our listeners enjoy getting these great kind of more management-oriented insights from you in these highly professional courses that leverage your many years of expertise. And yeah, so LinkedIn, I guess is the best place to follow you, Keith, in general?
Keith McCormick: 18:32
Yeah, I think so. Yeah, absolutely. And I think at least while we’re doing this, my LinkedIn is in the show notes. I’m fortunate, I’m just Keith McCormick. You don’t need to know any numbers or letters after my name because I guess I joined LinkedIn early enough to be the first Keith McCormick or something like that.
Jon Krohn: 18:48
Nice. All right, Keith, well, thank you for taking the time today. This was a great little episode and I’m sure we’ll have you on the episode again sometime soon.
Keith McCormick: 18:56
That would be fantastic. I would enjoy it. Thanks, Jon.
Jon Krohn: 18:59
Thanks to Keith for that informative episode, in it, he covered how low-code/no-code tools like KNIME are valuable for both expert data scientists and subject matter experts alike, enabling faster prototyping and more effective collaboration. He also talked about how AutoML is best viewed as a workforce multiplier for expert data scientists, not a replacement for expertise in the full machine learning lifecycle.
19:22
To be sure not to miss any of our exciting episodes, be sure to subscribe on whatever platform you’re listening to, but most importantly, I just hope you’ll keep on listening. Until next time, keep on rocking it out there. And I’m looking forward to enjoying another round of the Super Data Science podcast with you very soon.