Jon Krohn: 00:00:00
This is episode number 489 with Vin Vashishta, founder and chief data scientist at V Squared.
Jon Krohn: 00:00:12
Welcome to the SuperDataScience podcast. My name is Jon Krohn, a chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.
Jon Krohn: 00:00:42
Welcome back to the SuperDataScience podcast. We are lucky to have Vin Vashishta, a wise, concise, and rich communicator on the show today. Vin is founder of and chief data scientist at V Squared, his own consulting practice that specializes in monetizing machine learning by helping Fortune 100 companies with AI strategy. He’s also the creator of several platforms, including The ML Rebellion for learning about critical skill gaps related to artificial intelligence such as commercial strategy, data science leadership, and model explainability.
Jon Krohn: 00:01:21
In today’s episode, Vin fills us in on his approach to AI strategy and why it’s critical to the long-term success of both data science teams and machine learning companies. We then move on to answering questions from fans of the SuperDataScience show, including on efficiency gains from no-code or low-code machine learning tools; the biggest skills gaps that data scientists have, the most disturbing data sets, investing in socially beneficial models, and the most challenging problem with commercializing AI. Today’s episode will be of interest to a wide range of listeners from technical specialists like data scientists right through to commercial specialists like business managers. All right. You ready for another awesome episode? Let’s go.
Jon Krohn: 00:02:14
Vin, welcome to SuperDataScience. I’m so excited to have you on. You’re a legend in the industry. I’m really grateful to have you here. Where in the world are you calling in from?
Vin Vashishta: 00:02:25
I am in Reno, Nevada. Sunny, over 100 degrees right now. Reno, Nevada.
Jon Krohn: 00:02:31
Reno, Nevada. It’s close to Las Vegas, right?
Vin Vashishta: 00:02:35
No, it’s almost half or a third of the U.S. away from Las Vegas. It is incredible how far away we are. So if you draw a straight line from Los Angeles, you can go east, you can get to Las Vegas. You draw a straight line from San Francisco east, you get to Reno, Nevada. And that’s how far away we are.
Jon Krohn: 00:02:58
Man. I don’t, that was a bad misconception.
Vin Vashishta: 00:03:01
But everything thinks that. If you’ve never been here, you don’t understand that there’s really three parts of the state. You have the crazy hot desert, which is kind of the Middle and Eastern. You have the Sierras, which is where I live. And then you have Vegas way Southern, which is also crazy hot desert, but there is a major city there.
Jon Krohn: 00:03:24
Right. All right, well that’s really useful to know. How has the pandemic been lately there? Are things opening up? I guess you have a lot more space than some other places.
Vin Vashishta: 00:03:40
Well, it it got bad. I mean, we had a lot of cases. We’ve been fortunate, our deaths were bad, but obviously better than other cities fared. We’re one of the middle of the road states. So we’ve been fairly open the entire time. People here have been responsible for the most part, which is unusual. We’re one of those states that’s really, we are in this odd place of middle of the road territory where most people take some sort of personal responsibility and accountability. And we do pretty well. I mean, we definitely have idiots all over the place, but they stayed home thankfully during the pandemic.
Jon Krohn: 00:04:20
Nice. I think you can find idiots anywhere, but true. Nice to know that they mostly stayed home. So we’re having you on the program for a lot of different reasons. We have a whole bunch of audience questions coming up. So I posted that you were going to be on the show a week before filming. And we had tons of people reach out with great questions. So we’re going to get to those later. But first to give people some context on where you’re answering questions from, let’s talk about your consultancy V Squared. So V Squared is your own consulting practice. It focuses on AI strategy. You founded it nearly a decade ago, and it sounds like business is going very well. I don’t know your specific clients, but I know that they are in the big leagues for sure. So tell us a bit about how you started an AI strategy company, especially doing that almost 10 years ago, what the environment was like then. I guess how the company’s evolved over time, and maybe even what AI strategy is.
Vin Vashishta: 00:05:35
I’ll give you the secrets. So spent 25 years in technology, and spent the last 10 years in advanced analytics, data science, machine learning, deep learning. All of the buzz words that you can think of that we’re now so frustrated and used to. But how did I start it is actually a really funny story. I landed my first large client in 2012. And at 3:00 in the morning, I needed a business because they told me, “If you want to be a vendor, we need you to have a business. You got to have one of those tax IDs, you have to have insurance.” So it was three in the morning, and V Squared was legitimately the best thing that my sleep deprived brain could come up with. And it has worked amazingly well. I just kept it and never went away from it.
Vin Vashishta: 00:06:24
And AI strategy, it’s been a field of necessity for me. Because anytime I do advanced machine learning use cases, I do applied ML research now. But back then it didn’t have a name. We didn’t even know it was data science. I’ll be completely honest. I thought I was making up my own field and I was awesome, but no of course, somebody else had gotten there before me. But AI strategy is a piece of necessity because companies don’t understand all of the work it takes in order to justify an advanced project. And that’s the bulk of what AI strategy is meant to accomplish. The whole concept of monetization. Companies in order to spend several million dollars on a project and to invest six to 12 months in a project, the C-suite has to be involved. They have to be bought in. So it has to be part of the business model. It has to have some connection to their value stream. It has to be in some way enabling their operating model. It has to be something that fits on their product roadmap, that enables them to make new money, to actually serve inference and get paid for it.
Vin Vashishta: 00:07:30
So AI strategy is everything that goes on under the covers to get your CEO to say yes. And if you are a CEO, AI strategy is what gives you the ability to manage AI based initiatives without getting dragged into the workflow. So you are managing the strategy side of it. You are not managing the day-to-day because that’s what CEOs hate about technology. The absolute bane of their existence is every time there’s a new technology roll out, they get sucked into being IT people, and they don’t want that. They’ve got other things to do. So that’s what AI strategy aims to accomplish is it connects your products and your internal facing projects with the business model and with the operating model. And it creates that connection to revenue through the value stream. And that’s what I work with clients to accomplish is to understand how they can leverage AI for competitive advantage. So that’s AI strategy. And the reason why I do it is because I couldn’t do any of the cool projects I wanted to if I didn’t put the strategy in place. I couldn’t get anybody to say yes and sign the check.
Jon Krohn: 00:08:33
Nice. So to kind of play that back, AI strategy is focused on a manager and their ability to implement meaningful, probably typically data-driven models within their company, without needing to know that TensorFlow 1 has now become TensorFlow 2 in the background.
Vin Vashishta: 00:08:54
Well, strategist at a higher level than the management. That’s really at senior leadership. People that are connected with the business model, who are defining in many cases improvements to the business model or new business models that are increasingly moving away from traditional, going towards digital, and now from digital to AI first. So that’s where AI strategy is aimed at. But you’re right, CEOs are getting dragged into import from TensorFlow. No joke, they’re getting presentations from data scientists where you see their eyes gloss over and they never want to see you again. So it is the bridge between the two worlds.
Jon Krohn: 00:09:36
Hope you’re enjoying this episode. We’ve got a quick announcement and then we’ll get straight back to it. The fourth iteration of our DataScienceGO virtual conference is coming up quickly at the end of July. This time it’s for three days running from July 23rd until the 25th. You can get your free tickets today at datasciencego.com/virtual. This iteration of DataScienceGO has an extra special agenda. We’ve got a standalone career day on Friday, the 23rd, where you can meet hiring companies and discover exciting job opportunities. On Saturday, the 24th, you will hear from world-class speakers like Ben Taylor from DataRobot and episode number 433, Jamie Fan from TikTok, Erica Greene from Etsy and episode number 435. And you’ll also be hearing from me. I will be providing a session on the pros and cons of PyTorch and TensorFlow, the two most popular deep learning libraries, with a conclusion that may surprise you. As well as lots of time for audience questions.
Jon Krohn: 00:10:40
Finally on Sunday, the 25th, you can attend a full day bootcamp taught by seasoned instructors like Andrew Jones who’s also an episode number 483, Harpreet Sahota who’s also in episode number 457, and Joe Reis. These boot camps certifications are included in the premium ticket, which is available now for a limited time at $49. On top of all that, over the course of the conference, there will be several networking sessions in which you’ll have the opportunity to connect one-on-one with data scientists from all over the world. You can get free tickets for days one and two, or the full three-day premium experience for $49 at datasciencego.com/virtual, and I’ll see you there. All right, let’s get back to the episode.
Jon Krohn: 00:11:28
I’ve had that experience many times where I think that what I do as the chief data scientist at a tech startup is really cool and really important. And in a way, I guess it is, because it allows the ‘magic’ to happen. So I’ve put a huge amount of effort into making decks that I’m like, “Oh man, if I explain it this way, their mind’s going to be totally blown.” And I get in with somebody who’s super super intelligent, runs a machine learning company, but is interested in the impact and not in how it works. And you can tell that immediately, they’re kind of distracted, fidgety, looking for something else to do. So I understand.
Vin Vashishta: 00:12:10
Yeah, totally. And I’ve been in that room where I’ve watched a data scientist dying. And I guess the joy of my life is being able to throw them a lifeline on slide three or four, and saying, “Let me ask you some questions. Let’s talk more about impact. Let’s talk revenue. How about internal rate of return? What are we looking for?” And I allow them to bypass five to six slides and get to the ones that the audience really cares about.
Jon Krohn: 00:12:36
Nice. So you’re kind of touching on it there. What is your particular approach or process with your consulting with V Squared?
Vin Vashishta: 00:12:44
Well, the first thing I come in and do is assess the business model, which is basically the statement of monetization. How a company takes resources and transforms them into something that some customer has a perception of value for that they’re willing to pay you more than it costs you to make that transformation, and where your business model defines the advantages you have when it comes to providing that particular product to customer, customers based on something unique that you’ve come up with as a business. So that’s the business model. And a lot of times, as you’re transitioning as a company to digital and realizing that you’re also transforming your company to AI first. So strategy starts by understanding the business model, talking about how AI can improve the business model or even how AI based capabilities can suggest new business models, that the company has some advantage around based on their big existing operations.
Vin Vashishta: 00:13:47
And then you start talking about an operating model. And that defines the value stream. How does anything that any one person does contribute to building value for the business? Building value for customers by extension. Why do they buy the products? And what is it that you do that leads to someone paying cash? And when you talk about data science and machine learning, the majority of projects right now are just internal rate of return, where you’re doing something that’s improving your operating model, which makes things more efficient. Which allows the company to execute on strategy with sometimes less staff, but more likely staff doing higher end roles. So you have that concept of freeing up people to spend more time thinking and doing the things that people are really good at rather than the things that people hate.
Vin Vashishta: 00:14:37
But moving companies away from just internal rate of return and looking at product strategy. Building products. And it always starts with here’s some supporting features. Here’s a cool thing that we couldn’t do before that we can do now. And then it moves really providing products to customers where they’re paying for inference in one way, shape, or form. Not paying for a product in its nice to get inference. So that’s taking the business model through the value stream. And a lot of times, that results in the research process of the workflow. Because to build a competitive advantage, you need to have something that other people can’t copy. And that’s going to be a unique model, unique data set, something that they can’t do very quickly that gets you away from import from very, very rapidly, and gets you into the experimental and applied research frameworks. And that’s what I do for customers is walk them through soup to nuts from, ‘Here’s your business model,” to, “Here’s something in production you’re getting paid for.” And I help them understand how to do that.
Jon Krohn: 00:15:35
So it sounds like depending on where the client themself is, you could end up having an impact in a different part of that spectrum. So it could be from getting started on a digital transformation, to realizing that as a part of this digital transformation, they can take advantage of AI to make that transformation even more profitable. It could be realizing that AI can improve an internal rate of return. So improve efficiency say internally, or maybe for a customer. And then it sounds like the most enlightened state is realizing that models can be sold themselves, that the inference itself can be useful, and that there can be a strong moat around that model if the client maybe has a particular data set or position in the market that makes them unique.
Vin Vashishta: 00:16:29
Yeah, definitely. And truly that is nirvana. When you get paid, when your data science team starts being a revenue generator, it’s really a different look at the team.
Jon Krohn: 00:16:43 Yeah. I dream of being able to have just my data science team itself be profitable. And we have an API that exposes our most popular model. But as of right now, it does not in and of itself pay for my data science team. So we’re reliant on the front end developers, the product team. But yeah, that truly would be nirvana for me as well.
Vin Vashishta: 00:17:11
We can talk afterwards.
Jon Krohn: 00:17:17
So in addition to V Squared, which sounds like it could be quite a handful, you also have managed to make time, it sounds like especially recently to create content for people. So for example, you’ve already launched an AI strategy class. You have more classes that will be released very soon it sounds like this year. So things on Explainable AI, Edge Machine Learning. And you’re making this available on several different platforms. So you have one platform available today called The ML Rebellion. And it’s your own platform, I guess AI strategy is available there today. But then you’re also launching shortly other platforms like The High ROI platform. So tell us a bit about these courses, how you’ve been drawn to creating them, and why several different platforms are useful for disseminating those courses.
Vin Vashishta: 00:18:17
So I’ll start backwards and say the reason why I’m doing three different websites is because I have to keep them straight. And I confuse people if I try to have everyone show up on one website and then decide which way they’re supposed to go. It didn’t work. I tried that, and it’s a really horrible idea. Just anyone else trying to do it, don’t do it that way. Split up.
Jon Krohn: 00:18:37
So that would have been like, you get to a website and it says, “Are you a senior leader or are you a data scientist?”
Vin Vashishta: 00:18:44
Yeah, exactly. And most people say, “Why are you asking me these questions? How do you not know? Why am I taking a survey before I can get to where I want to go?” So yeah, I could have done that with cool landing pages or I could have just been smart to begin with and split them out. And it is kind of a handful. But what I’ve been finding is that they all run together. Understanding from the strategy standpoint what the opportunities in machine learning are allows me to see that in a lot of cases, students aren’t being taught very well. The people that I’m trying to hire, those gaps skills that are out there, it’s really difficult to find people who have forward-looking capabilities. So I am building out these classes to handle a lot of the pieces that I see are missing. So strategy is one of them. We don’t train our leaders very well in data science. And data science leadership is very different than leading many other types of teams. I mean, you know this. You’re leading a team of really smart people. Many of whom are smarter than the leader, many of whom are more technical.
Jon Krohn: 00:19:51
I’m definitely in that position.
Vin Vashishta: 00:19:54
Many of them are more technical than we are. So all of these traditional sources of authority change. We have to have a different way of leading. So I’m building out a leadership class. I don’t think we talk about Edge ML in a comprehensive way. There’s not a one-stop shop for that. So I’m trying to build that out. Explainable AI is another really hot topic that I think we need to spend time teaching, even at the lowest level people coming into the field to understand how important that is, because it’s a huge piece of model reliability. You can’t monetize the model you can’t trust. And regulators are going to start beating companies up so you can’t monetize the model, you can’t explain. There are a lot of danger areas that if we don’t do reliability correctly, it can come back to bite us when it comes to revenue. So this concept of strategy, leadership, and even these topics are all just closely related.
Vin Vashishta: 00:20:44
So themlrebellion.com is my content site. And for right now, the few classes that I’ve offered are up there. Highroidatascience.com is going to release probably near the end of the summertime. And that’s going to be my main course, find classes here. And I will someday clean up v2ds.com and make it a comprehensive strategy consulting site again. I’m sorry to all my clients, most of whom have never been there because if they have, they would never book me. It looks terrible right now. I tried to put everything in one site, and let me tell you. If you want to go and look at a mistake, but just go to that website right now, because you’ll see it’s an attempt to get people to go in five different directions and it doesn’t work.
Jon Krohn: 00:21:30
So despite you maybe having one website that doesn’t look amazing, I forgot to mention earlier that the way that we know each other is that you have been recommended to me. So I can vouch while having not taken any of your classes yet myself, I can vouch for your ability as a communicator and speaker. Because a amazing speaker in his own right Harpreet Sahota, who was on episode 457 sent me a text and said, “You should get Vin on the show. He’s a brilliant speaker.” And then it was either that same week or a week after I reached out to DataScienceGO, which is an affiliate of the SuperDataScience podcast. And I said, “Do you guys have any recommendations for amazing speakers?” And they wrote back and said Vin. I was like wow, this guy, I’ve got to check him out. So I’m sure your courses are brilliant as well.
Vin Vashishta: 00:22:22
Harpreet, that 20 is in the mail. Thank you.
Jon Krohn: 00:22:27
Did you send him 20 American or Canadian?
Vin Vashishta: 00:22:29
It’s the same thing these days.
Jon Krohn: 00:22:32
It’s true. They are converging. Yeah, Harpreet is interesting because he lives in Canada, but he isn’t Canadian. He’s from your neck of the woods, roughly speaking, from California. Right? Or the Southern U.S. I thought it was California.
Vin Vashishta: 00:22:53
I’ve go to ask him where he’s from because there’s a bunch of people here from Utah, Colorado, California, and actually a couple of former Nevada who’ve all scattered and moved and I can’t remember who’s who. So I need to ask everyone who’s who, and who’s moved from California to one of those places. I can’t keep it straight anymore.
Jon Krohn: 00:23:13
Listeners might even know because we talked about it in episode 457, but it was the Southern U.S. It was either California or Texas, I’m feeling now like San Antonio.
Vin Vashishta: 00:23:23
Texas, yeah. He might, yeah. Okay. I don’t know either. I’m sorry, Harpreet. I should know this.
Jon Krohn: 00:23:33
Anyway, he’s in Winnipeg, Canada now. So he’ll accept that 20 in either currency I’m sure. If he [crosstalk 00:23:41] probably. Depends how it’s been trending in the last week.
Vin Vashishta: 00:23:50
Five minutes sometimes.
Jon Krohn: 00:23:52
Exactly. So you have a big online following I don’t mind mentioning. So you have six figures, over 100,000 followers on LinkedIn. So a week ago prior to this session today where we’re filming, I posted, I said, “I’m interviewing Vin next week on SuperDataScience. Do you have any questions for him?” And I had a massive reaction, tons of questions. We actually can’t possibly answer every single question that came up. But we had tons of great ones, and I’ve picked a subset here that I thought might be perfect for the audience, and that I think you’ll have an interesting answer to. So let’s dig into those.
Jon Krohn: 00:24:37
We’ll start with Serg Masis, whom I’ve known for years. So Serg was the MC at a tutorial that I gave years ago with the Open Data Science Conference in New York back when conferences were in-person. They will be again. They are coming back. I think as early as September, it looks like I’ll be speaking in person at the Machine Learning Conference in New York. And I can’t wait. It is such a better experience to be in a classroom with people speaking to them than offering the same content online. You’ve had that experience, yeah. So can’t wait for that. Anyway, so Serg, met him in person when he was the MC at this talk that I was giving. And he has since become a really well known data scientist in his own right. He wrote a book on interpretable machine learning that has been doing really well. And he had a great question for you, Vin. So he said, “low-code/no-code solutions are getting a lot of hype. In your opinion, what parts of the data science or machine learning workflow can be automated, and which parts can’t?”
Vin Vashishta: 00:25:46
Well, I think when you look at the data science workflow … and I talk about value streams a lot. And this ties really well into the value stream concept. In that low-code/no-code are starting to pull some of the pieces of drudgery, some of the things we hate out of data science. So when we talk about low-code/no-code, a lot of times people go, “I hate it.” No, we should like it. We should actually learn to love it because this is making us so much more efficient and we can work on cooler projects because things like the pipeline. Your data gathering piece, data quality data wrangling, most of that can now be handled. And if you look at the product roadmap for companies like Microsoft who are doing all of this through Azure, you look at Amazon AWS. Again, their product roadmap goes through their ML services. And even other companies outside of massive tech, you’ve got a number of different companies who are tackling the ML ops problem. But they’re also leaking into automating a bunch of other work that we hate as data scientists.
Vin Vashishta: 00:26:49
So one of the most important things for us to understand is this isn’t taking our job, it’s making our jobs better. So when you look at the pipeline, that data engineering role, the majority of it is automated in one way, shape, or form to some level of complexity. And it’s getting increasingly automated, which makes a data engineer able to focus more on curation, where they can go out and find high quality data. It’s easier to attach the type of metadata that you need in order to build models that are going to get you somewhere without spending 80% of your time on the data side. So ML engineers are getting pulled into some of these types of activities because they’re managing that automated platform. And that’s something that’s happening to ML engineers. They’re being pulled all over the place in the data science and machine learning workflow.
Vin Vashishta: 00:27:41
The concept of orchestration. You look at Kubernetes handling a whole lot of the distributed side of it. And everything in the world that spins off of the Kubernetes ecosystem, is the wrong word, because that’s not a big enough word. They’re a planet in and of themselves. And you can even automate parts of your workflow through Kubernetes workflow, and a ton of other cool and interesting things, manage those. I call them statistical experiments. They’re not the science experiment side of it, but managing running models against each other, spinning them up. It can handle everything from some of the crude feature engineering, to some of the hyper parameter tuning, to so much of the stuff we all hate, because it’s just iterative and it doesn’t take any brainpower to really do a lot of those types of things, except when it gets complicated. And those are the models that you actually want to spend time feature engineering and hyperparametering, not all the basic ones that you’re going to compare against the cool model or the cool tool models that you’re thinking about using. So there’s all of this, and then it’s the operational side. As soon as I’m done with a model, it’s not hard to push to production anymore.
Vin Vashishta: 00:28:50
The ML engineering team used to need to do Herculean effort in order to get something out the door. But now, a majority of that productization side, it’s purely automated. And then the ops, management, maintenance, continuous retraining, even continuous model reselection, all of that can happen automatically with just basically human-in-the-loop where you do some basic oversight, you look at it and say, “Yeah no, that passes. Congratulations, ship it.” And there’s so much of this workflow that’s been automated and it’s awesome.
Jon Krohn: 00:29:21
All right. All right. You are selling it very effectively to me. So you’ve mentioned one platform specifically that could be helpful and I agree 100%, Kubernetes. What else should people be looking at? So Kubernetes for example couldn’t help with the model retraining or model reselection. What do you recommend in those kinds of circumstances?
Vin Vashishta: 00:29:42
How do you mean as far as-
Jon Krohn: 00:29:44
Is there a particular low-code or no-code solution that you would recommend for automating model restraining or model reselection?
Vin Vashishta: 00:29:54
No, I don’t think you can give one platform recommendation at this point because every platform … and this is something that people in the startup space that are building these say. Every platform is one feature away from being perfect. So the reason why MLOps is such a crowded space and low-code/no-code is such a crowded space is because every platform is one feature away from being absolutely perfect. So you can’t say this is the best without 1,000 caveats. There are improving platforms and emerging leaders. But it seems like every time somebody takes a lead, even if you look at feature stores, great example. Feature stores jumped out, they were hugely important. Then there were eight of them. And the open source version of feature stories was still awesome. But there’s Tecton. I’m blanking on all of the-
Jon Krohn: 00:30:53
What are features stores then?
Vin Vashishta: 00:30:54
So a feature store essentially allows you to, when you do data science, you do feature engineering, you need to save those features someplace. And that’s the most basic functionality of a feature store. And then it expands out to doing the metadata tracking, version control, so on and so forth. All of those things that you need to do in order to have someone else come behind you and reproduce this work that you’ve done or even yourself three months from now, to be able to come back and reproduce what it is that you’ve done using some sort of a checkpoint, and understanding some of the constraints that you might’ve been working on. That’s what a feature store is great for. And those are morphing to handle the metadata better. And it’s one of those cases where I could recommend one. And then literally five days from now, I could probably reverse that and say, “Well no, this other one came up with this new feature.”
Jon Krohn: 00:31:45
Got it. I suspect that for a lot of these startups that are making future stores, low-code/no-code solutions, they are probably earning a lot of their immediate income from doing consulting. So they could be developing their own product features in a way to suit some particular client’s needs and then productizing that. So there’s kind of a lot of, I don’t want to say tail wagging the dog, because there would still be some product manager coming in and saying, “I think that these features productized would be useful to more than just these one or two clients.” But I suspect that you have a couple of big clients or half a dozen big clients for a lot of these different startups that end up shaping the feature map. So each of these different startups is catering to a different subset of clients. So they end up building out slightly different features. So first of all, you could feel free to disagree with me on any of that, and I’m completely speculating. But in that kind of circumstance then, for a listener’s particular use case, there could be one or two of these low-code or no-code solutions that could be ideal for them, depending on where the startup has gone in their product roadmap.
Vin Vashishta: 00:33:14
So what I find is that the startups were founded by someone who was a data scientist, who felt the pain. Who build a solution and said, “You know what? I could sell this.” They go to market with the product that they say, “You know I could sell this,” that solved their problem, their specific problem. They find a whole bunch of customers upfront who have exactly the same problem and similar workflows. And then comes the difficult part where a lot of them say, “We’re all one feature away,” because once you get that initial target market, expanding out and gaining market share from there is difficult. Because in some cases, it undermines your original value proposition. You did it this way because this is how you do it. And it fits this workflow. Now expanding that out so it supports multiple workflows or integrations with tons and tons of other tools. You get two clients, but you’re spending a ton of money in the development side of it. And that’s where a lot of these have niched themselves.
Vin Vashishta: 00:34:11
And then you have on the other side the big players like Microsoft, and Amazon, and Google’s getting into the low-code/no-code space as well. And they’re fairly open. They don’t want to lock you into their proprietary solution because their real solution is cloud and a larger ecosystem that they want to co-exist in and with. Whereas you have other niche players who lock you into a proprietary framework for low-code/no-code solutions that don’t port. So now you have this anchor that you’re attached to, and that’s one of the big drawbacks of low-code/no-code right now is companies trying to lock customers into their proprietary architecture and gain market share that way rather than building an ecosystem where it makes sense around best-in-class features. So there’s ups and downs.
Jon Krohn: 00:35:01
Very nicely said. You captured everything that I was trying to articulate, I think much more clearly and with better examples. So thank you very much for the support Vin, and also for expanding on what I could have possibly said. Beautiful. All right. So I think that that gives Serg a good answer, and I think an interesting answer for lots of people out there. Next question is from Daniel Marostica, Marostica. I’m butchering Daniel’s last name there. He’s a data scientist at NZN. And he shouted at you, “Top five gap skills.”
Vin Vashishta: 00:35:34
I get this on the street a lot. I’ll grocery shop and somebody just yell top five gap skills. It’s crazy. They don’t know my name anymore. So gap skills, things that I can’t find and companies can’t find, but these are skills that they desperately need. And strategy. Understanding how to monetize machine learning is one of the core concepts that we’ll find in a lot of gaps skills. So communication for impact, meaning I can talk to a group of stakeholders, and I can actually get requirements out of them. Or I can work with them collaboratively to build a solution to a problem that they have. I can work with them to understand current processes and how a product needs to work. Whether that’s for internal stakeholders or external customers.
Vin Vashishta: 00:36:22
So there’s this concept of communications, but not just to have words come out of your mouth and have them understood, but to have some sort of an impact where you can drive an outcome, a result. You’d look at being able to elicit requirements from stakeholders. That in and of itself is a massive gap skill because many stakeholders do not understand what they need. They also don’t understand how to tell you what they need, even if they do, because they don’t understand what machine learning and data science can accomplish for them reliably, and how long it will take. So that’s another gap skill. Just working with stakeholders across the board either communication for impact, or be able to elicit requirements from them is another one of the gap skills. So that’s on the strategy side. You have leadership, which is a gap skill. Everyone who has been promoted in tech knows that that first job you get put into as a leader, you are not trained sufficiently in the majority of cases. You think leadership is technical leadership, where you are the smartest person on the team. Therefore, you’re in charge of the team. But that’s not necessarily what leadership is. It’s great to be a technical mentor, but there’s so much more complexity to leadership. So that’s another gap skill.
Vin Vashishta: 00:37:41
Being able to push to production. Software engineering and software development patterns and practices are so rare to get in a data scientist. It’s one of the drivers of machine learning engineer is a data scientist with strong software engineering and architecture capabilities. So you can take a model and put it into production in a way that it integrates with what’s already there. It is functional and reliable, and it meets service standards. And it can scale. All of these core concepts that if you were a cloud architect or a software architect, you would understand. But trying to combine that in data science, that’s another gap skill is that that software not so much coding, but architecture and engineering. And those are just examples of the gap skills that I find consistently that if data scientists can focus their career in direction, they’re going to have number one, excellent prospects, and number two, career longevity.
Jon Krohn: 00:38:42
Love it. I agree 100% on all counts. I can’t believe how you were able to just reel those off. But yeah, communication for impact, being able to elicit requirements, leadership skills, and that kind of computer science, software architecture background. Being able to put things into production that scale well that are reliable. Couldn’t agree more. Those are all hugely valuable skills for data scientists. So nicely said. And some of those actually, some of your courses, your upcoming courses actually cover some of these. So the leadership one for example, you have a course coming up on that, right?
Vin Vashishta: 00:39:15
Yep. Yes. That’s what I’m trying to do is cover where I see the gap areas in education. Because you look at boot camps, and they cover this core set. And when you talk about actually interesting overlap with the platforms and low-code/no-code, much of your bootcamp while it’s essential to know all of those pieces, many boot camps graduate people whose jobs are being automated. They’re graduating into roles that are low hanging fruit for low-code/no-code. So what I’m trying to teach is skills that’ll allow you to have a longer career in the areas that I’m having difficulty hiring in.
Jon Krohn: 00:39:50
Nice. Sounds great. All right. Next question is from Joe Raaen, whom I used to work with at Omnicom. So I was a data scientist at Omnicom. Joe was working in sales there. And he knows the marketing landscape extremely well, was one of the best connected people. You couldn’t bring up a name in marketing that he didn’t know. He’d always say, “Oh yeah, that guy. I had beers with him last week or last year,” or whatever. I basically didn’t know X and Y about him. So Joe says, “If I can ask a marketing question, it would be what are some of the more interesting data sets that you’ve seen recently for insights on consumers and targeting?”
Vin Vashishta: 00:40:34
And I’m going to take this in a direction most people don’t expect, maybe he does. But the credit bureaus actually have some absolutely fascinating deep data sets on customer behaviors, which you wouldn’t expect. You wouldn’t expect Experian to be selling as much data about you as they are. But they have some of the, and I mean this is a gift and a curse. They have some of the best data sets. Because think about what a credit bureau has access to. Pretty much everything. From a spending habits perspective, to some of your more nuanced behavioral perspectives. They know a lot about you. They may not track you on Facebook, but they can buy that data. And when they aggregate a lot of the data that they have access to which no one else can access, they’ve got something that’s not only interesting, but an advantage for them. It’s a novel data set.
Vin Vashishta: 00:41:32
Now as I’m saying this, everyone is thinking what in the world. Because think about what they have access to. Every transaction you’ve made, they know about. Everything you’ve bought, they understand. If you’ve switched from chocolate muffins to brand muffins, they know you’re on a diet, especially if it’s consistent. And if all of a sudden you switched back to chocolate, yeah, maybe your healthcare insurance might be going up. I mean, that’s how scary this stuff is when you think about it as an aggregate. So there’s definitely some great data sets out there from a number of different sources, which for us as data scientists are awesome. But for consumers, I mean we always have to think about, because there’s a person on the other end of that data point. You have to think about what the implications are and how comfortable would they be with us using some fairly scary data points out there. And that’s a big question for ethics in our field. Should we use this data?
Jon Krohn: 00:42:32
So I like how on the one hand, it comes with all of these warnings that these data even exist. But on the other hand, as soon as Joe hears this, he’s going to be going out and buying it the data.
Vin Vashishta: 00:42:41
I’ll bet he has access to it. I have money that says he knows exactly what I just said.
Jon Krohn: 00:42:48
All right. So the next question is also actually kind of … yeah, I guess there could be disturbing answers to this question too, or even maybe even that we’re posing the question could be a little bit unnerving. Especially because I think a lot of people get involved in any kind of career, whether it’s in data science or whatever today, you get in hoping to make a positive impact. Well, first Nikhil states, “Most machine learning innovation funding is targeted at opportunities that will have some monetary value, that will get ROI.” And I mean, if there’s a main theme of this episode, it’s talking about AI strategy, monetizing machine learning. So very relevant point that Nikhil is making. And then, so he asks, “Are there applications that benefit society?” So yeah. I mean, we have talked a little bit on the show about things like green machine learning. So machine learning for climate change, episode 459. We also talked about being able to use machine learning and automation to better direct energy flows and make shift towards renewable energy. We talked about that in episode 461. So I think there’s no question that there are applications that have a big benefit.
Jon Krohn: 00:44:15
I mean, actually I could continue reeling on. We’ve had countless episodes, with Michael Segala in 447, we talked about medical technologies enabled by AI. With Anima Anandkumar we talked about a broad range of socially beneficial things that Nvidia is working on. So yeah, I think that there’s tons of out there. But despite all of that, I think that that’s the kind of stuff that people actually love to talk about in podcast episodes. Right? Because you know that your audience is going to be interested. But despite that, I think it’s fair to say the vast majority of investment goes into return, getting a return on that investment as opposed to social impact. So your thoughts Vin. I have now been speaking for way too long.
Vin Vashishta: 00:45:04
Never. I think the old adage greed is good comes into play here because the more companies invest in data science and machine learning, the more opportunities there are for data scientists, the more interest there is in the technology. And that has definite benefits to the entire community. Because if you look at the pace of research over the last 10 years, it’s gone from sporadic when companies were not interested in the technology to just heavy paced now. Because there is a corporate backing, there’s cash behind it. So that permeates into academia, that permeates into policy, that permeates into think tanks and everyone else was doing work for social good. So again, greed is good. However, greed also pushes us into some terrible directions when it comes to applications of data science and machine learning. So double-edged sword.
Vin Vashishta: 00:46:01
One of my biggest concerns isn’t so much that we’re doing so much good with AI, because I think it’s awesome. I think we often go in with best intentions on projects, especially social good projects, where we think we’re going to improve bail for instance in the U.S. Where we think we’re going to be able to improve recidivism by predicting it and make sentencing more fair by putting a machine in front of it. Or giving resources to people who have the best intentions, but do not understand the ethical implications in some of the … getting back to reliability, talking about explainability too. Not looking at the sources of data that they’re using. And there’s a great example. Georgia published a data set on their basically prisoner data on reoffense. And the department of justice is using it in an open program to solicit models from people and offering a prize. And one of the most prominent features in this data set, you’ll never guess is race. It’s in there. And you just know. I opened up the Excel spreadsheet and I just knew what I was looking for. It wasn’t like I had to do a whole lot digging to find the features in this which would be massively problematic.
Vin Vashishta: 00:47:27
And someone in academia who’s very enthusiastic about this particular use case who wants to do some good, may not have the same insight as other people who have run into problems before. And we look at hiring. We want to make hiring a fairer and more inclusive process. But in many cases, features that are well correlated with hiring decisions are also well correlated with biases that are baked into those decisions and have been for decades. So we have a double-edged sword with the best of. We also need the best of talent and the best of analysis on whether or not some of these use cases are … they are for social good, but will they end up doing more harm than good? And that happens in a lot of cases.
Jon Krohn: 00:48:14
Right. Well said. So both the positive and the negative are well understood for me here. So that this huge … on the positive side, this huge influx in investment and interest in data science over the last decade. I don’t have the exact numbers to hand, but I often show in lectures when I’m lecturing to early career data scientists I’ll show, “This is how much venture capital money is flowing into AI startups.” And it’s exponential increases over the last few years alone. And the same kind of thing. Government funding, like you’re saying across academia and industry. So while most people who invest in things do want a return on investment, there are lots of people who are investing with some social regard, or maybe even just entirely grants from the government that are designed to support these socially beneficial things.
Jon Krohn: 00:49:19
So absolutely yes, there are applications that benefit society. But I love that twist that you put on it there where even the best intentioned social models can have unanticipated negative impacts. Especially because I think we have … a lot of these models are created by data scientists working on a static data set. And over time, your inputs change in production. So yeah. If you’re not looking out for how the data can change, you can end up having a model that it may be unbiased today, that becomes biased in the future.
Vin Vashishta: 00:49:57
Well, and we discover biases. That’s one of our biggest problems is we don’t do the evidentiary support for our models in many cases. So those biases go undetected because we don’t understand, especially when you were using those massive data sets and somewhat opaque modeling techniques. We don’t spend the time to do it right. But that bias can sit in there for years before someone discovers it because they discover the fact that people have been impacted by it.
Jon Krohn: 00:50:26
Nicely said, Vin. Great answer to that question from Nikhil. Our final question is from Michael Cutler who is a CEO at a quant driven hedge fund. And Michael is interested in knowing what the most difficult data science problem or challenge you’ve ever faced is. And it doesn’t need to be a technical challenge. It could be a business challenge.
Vin Vashishta: 00:50:49
I think pricing just across the board. Pricing strategy is one of the most complex use cases for data science and machine. There are definitely others. I mean if you talk about healthcare, that has a different type of most challenging use case, because you live in a regulatory environment that restricts much of what you can do. You have access to a very limited amount of data. So there are other challenging environments. But when I look at pricing, the amount of touch points within a business and the complexity of the models make it I think more complicated than anything else I’ve worked on in my career. Because if you think about pricing, pricing is an extension of the business model. It touches everything. You need to understand customer behaviors. You need to understand how marketing plays into those customer behaviors, changes to the products themselves.
Vin Vashishta: 00:51:44
And this ecosystem marketplace that you compete in, the substitutes that you are competing against. All of that plays into, and that’s just the customer touchpoint. Now you have to touch back to the business. And if you’re in manufacturing, this goes all the way back to supply chain. If you’re in retail, again, it could go all the way back to supply chain and every single one of those. Your planning based on your costs. I’ve recently done some projects with modeling inflation, the impacts of inflation. Where companies are trying to look at how they can use marketing to make their customers less price sensitive, because they know the costs are rising. And if your business model is based around low prices, then you need to train your customers in advance of inflation to be less price sensitive, or they’re going to go someplace else. So all of these things are this complex problem that presents itself to you with pricing. And then you have to put it in production. And it touches one of the most important mission critical things you have. If you price something at a penny by accident, that can be a huge, huge loss mistake.
Vin Vashishta: 00:52:57
So you have to worry about monitoring at such a granular level. You have to have people in the loop when you could be repricing millions of skews. And how do you do anomaly detection? How do you monitor for something that could have changed in the way that needs review so that your pricing strategists aren’t staring at a million skews every day, but they’re staring at 10 or 20 that are truly anomalous and are worth looking at before the pricing change goes into effect? So it’s another monitoring nightmare and maintenance nightmare, because if that model drifts, your prices drift. So you have to be very rigorous in how you create these models. Put safety nets into place, how you build a body of evidence to support your pricing strategy in the first place. And then to prove out its stability and reliability in production. I think it’s one of the most complex challenges there are, or at least that I’ve worked on. I mean, outside of something in the hardcore sciences domain. I think it’s really a difficult challenge.
Jon Krohn: 00:54:00
I agree 100%. I’m not the person who’s ultimately responsible for pricing in my company, but I always have to work with our global head of sales on trying to figure out how we should be pricing things. And it is super complicated. It is almost impossible to commit to a client to say, “All right, you can lock in this price for two years and be 100% sure that you’re not going to end up underwater based on some unexpected use or some high volume activity that you weren’t anticipating on an API, or user interface.” Yeah, so really tricky. And then on the flip side of that, so you have that on one side, it’s like I want to make sure I’m not underwater on this deal. But on the flip side, you need to price it in a way so that you’re competitive. So yeah, definitely. Tricky point, and probably not where my mind would have gone to if somebody asked me that question. But now that you’ve mentioned it Vin, it could be one of the toughest problems that I face as well.
Jon Krohn: 00:55:11
Nice. So that is it for our SuperDataScience podcast audience questions. Thank you so much everyone for having those. I will definitely be doing this kind of thing in the future with guests. So we had so many great questions come up, and I’m going to try to do this more often, post on my LinkedIn feed to post questions to future guests. So please do look out for that. And do ask your questions. So I have one question for you now Vin. Do you have a book recommendation for us?
Vin Vashishta: 00:55:45
Might have had this one already. You might’ve talked about this in advance, right here. Causal Inference: The Mixtape. Sorry, you got to see the author, Scott Cunningham. One of the up and coming capabilities that as a data scientist you need is an understanding of causal inference, causal modeling. How that changes our experimental workflow. And it adds so much richness to not only what we do in our projects, but also the results that we can deliver. So talking about causal inference, experimentation, anything around those topics I think is just huge. So definitely. That’s what I’m reading, but pick your own around that area. You can even read through current research that’s come out of … Microsoft’s doing some really good work around causal inference. So that’s a good place to start, and you can kind of rabbit hole to your heart’s content from there.
Jon Krohn: 00:56:44
Nice. So Vin, you have had so many concrete and insightful things to say about machine learning and artificial intelligence from technical aspects all the way through to business strategy. You seem like an expert up and down the funnel. So I am not surprised that Harpreet and DataScienceGO were lining up to recommend you as a guest on the program. We’re going to have you on sometime again soon. But until then, how can listeners keep up with your content and your thoughts?
Vin Vashishta: 00:57:17
Easiest way, themlrebellion.com. That’s got a link to everything. YouTube, Twitter, LinkedIn, so you can find more links there. Google me, that’s always an easy way to find links if you forget how to do my website, that you can find me. Unfortunately, I have Google search results. I don’t know if that’s good or bad. Hopefully that’s good. But yeah, those are the easiest ways to connect with me.
Jon Krohn: 00:57:41
Nice. Crystal clear Vin. All right. Thank you so much for being on program. And we’ll catch you soon.
Vin Vashishta: 00:57:46
Thank you.
Jon Krohn: 00:57:52
Told you Vin was a brilliant and rich clear example filled communicator, didn’t I? In today’s episode, Vin explained that AI strategy allows for effective management of commercial ML deployments. That low-code or no-code platforms can offer efficiency gains for data scientists, particularly around data engineering. That the biggest skills gaps he sees in data scientists are around impact communication, requirement, elicitation, team leadership, and model deployment architectures. He also talked about that there’s lots of funding flowing into socially beneficial data science, but much of it could paradoxically backfire against the causes they’re designed to support. And finally, we talked about pricing being the most challenging problem associated with AI model deployment.
Jon Krohn: 00:58:41
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, and the URL for Vin’s LinkedIn profile, as well as my own social media profiles at www.superdatascience.com/489. That’s www.superdatascience.com/489. If you enjoyed this episode, I’d of course greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel, where we have a video version of this episode. To let me know your thoughts on the episode, please do feel welcome to add me on LinkedIn or on Twitter, and then tag me in a post to let me know your thoughts on the episode. Your feedback is invaluable for figuring out what topics we should cover next.
Jon Krohn: 00:59:25
Finally, if you’d like to attend a talk of mine and ask me questions directly, I’ll be speaking about deep learning, specifically the pros and cons of TensorFlow relative to PyTorch at the DataScienceGO conference on Saturday. I’d love to see you there, and sign up is free at datasciencego.com. There will of course be tons of other great speakers there, workshops, and lots of opportunities to network with other data scientists from all over the world. All right. Thanks to Ivana, Jaime, Mario, and JP on the SuperDataScience team for managing and producing another amazing episode today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.