Welcome to episode #065 of the Super Data Science Podcast. Here we go!
Today's guest is CEO and Co-founder at SFL Scientific, Michael Segala
If you've wondered about the diverse applications of data science in changing the world, tune in now for this episode!
Mike Segala will share his rich experience leading a data science consultancy, including some fascinating case studies in projects his company has worked on, and his background at CERN that led to starting the company.
You will hear about his challenges and his client approach, how he manages to apply the firm's expertise across so many different domains, as well as some tangible ROI wins his firm has brought to their clients.
Prepare to be inspired!
In this episode you will learn:
- Professional Communication for Academically-Trained Data Scientists (11:02)
- Case Study 1: Cleaning Unstructured Data with NLP Pipelines (13:35)
- Case Study 2a: Using Deep Learning to Detect Cancer (17:13)
- Case Study 2b: Growing Organs with Deep Learning (20:19)
- Case Study 3: Gaining an Advantage in Sports Betting Using Machine Learning (22:05)
- Challenges in Running a Data Science Consulting Firm (25:26)
- Walkthrough of a Client Engagement (30:25)
- Ways Data Science Consulting Adds Real Value (38:39)
- Learning Data Science Through Consulting (43:39)
Items mentioned in this podcast:
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman
Kirill: This is episode number 65 with CEO and Co-founder at SFL Scientific, Michael Segala.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Hello and welcome everybody to the SuperDataScience podcast. Very excited about this episode because today we've got the CEO and Co-Founder of SFL Scientific. SFL Scientific is a top data science consulting firm and they have worked with huge clients such as Staples, Goodyear, and Salesforce. So what did we talk about? Well, Mike gave a quick overview of his background. He actually worked on the Large Hadron Collider in CERN on the Higgs Boson problem. So you might have heard this, this was a big thing, and still is, the Higgs Boson, the particle that's responsible for gravity. So he was on that huge project, which is really cool.
And then we also talked about some case studies. So you'll find this very exciting. Mike shared some case studies from the consulting world, and how data science can be used to drive actual business value to businesses. So I think that was really valuable. Also Mike talked about what it is to use data science in your organisation. So if you're a CEO, Executive, a business owner, an entrepreneur, you'll find a lot of valuable insights here about data strategy and how to think about using data in your organisation. And of course Mike gave some tips for those of you who are building data science careers. So you'll find some very valuable golden nuggets here which you can apply to your career.
All in all, as you can see, everything was covered in this podcast. Very excited to bring this to you. And without further ado, please welcome Mike Segala, the CEO and Co-founder at SFL Scientific.
(background music plays)
Welcome everybody to the SuperDataScience podcast. Today I've got a very special guest on the show, Michael Segala, who is a CEO and a Co-founder of SFL Scientific, a data science consultancy firm. Michael, welcome to the show. How are you going today?
Mike: Hi Kirill. Thanks, I'm good. I'm good. I'm doing great. Thank you very much for having me on today.
Kirill: It's great to have you on. And unfortunately we didn't get a chance to meet in person at ODSC when I was there, but at the same time, I met with a couple of your colleagues who were there at that time, so Daniel and
Alexander. And you guys are doing some great work, so I'm very excited to have you on the show actually.
Mike: Great, yeah, thank you very much.
Kirill: And just to start off with, I promised you I would ask this question. Where are you calling from?
Mike: I'm in Boston.
Kirill: And the weather is?
Mike: Yeah, it's beautiful today for the first time in about 6 months, so it is very nice to look outside and see some warm, sunny skies.
Kirill: Fantastic, fantastic. Like I told Michael, I've never been to London yet. But Boston feels a lot like how people describe London, and I feel that every time it's sunny in Boston, there should be a public holiday or something!
Mike: Not to knock on London, but I don't think we're quite as bad! We get some nice months. London gets a little dreary. Not to put shame on any London people listening, but Boston I think is a bit nicer.
Kirill: Yeah, ok, that's a good start I guess. Better than London is not a huge accomplishment. But ok, we'll give you that!
Kirill: Alright so ODSC, this is the Open Data Science Conference. Tell us what you guys are doing there. You guys had a whole booth and people were coming up, talking, you were giving stuff out, your booth was one of the most interesting ones to chat to personally, because I learned quite a lot. You have some very high calibre people there. Tell us what you guys were doing at ODSC, what were you trying to achieve there.
Mike: Sure. So let me just give you roughly a few seconds about the background, and then tell you why we're at ODSC. So at SFL, we're a very boutique data science consulting firm. And what we do is we work with our clients at that very individualised professional service level to build out some kind of unique solutions for them. So a lot of this centres around machine learning, artificial intelligence, business intelligence, something to get in there, understand the core of their problem, and then start tackling it and providing solutions for them. So at ODSC, a lot of people are there, and they're kind of looking at what they can for their business to drive value. That's what it's about. It's not a theoretical exercise any more, it's a business exercise. So we have a really nice presence at ODSC because a lot of people are looking for education and ways to improve their business function.
So we're basically there speaking to people, educating them, teaching them what they could do with data science and then of course trying to get them as customers and have that nice rapport with them.
Kirill: Totally understand and I like that you mentioned the drive value part of things because I feel that a few talks in the conference were focused on that. There was this part of the conference actually, they just launched called the CXO Summit, which happens at the very start. And that is specifically focused on using data science to drive value. And I think that a lot of people don't get that part right, or even get it wrong in the sense that they forget about – there's data science and then there's business value. And you need to really connect the two to make the case for data science. And it sounds like that is exactly what you guys are doing. Tell us a bit more. Are you guys based in Boston? Do you only work with Boston-based clients, or do you have a national presence in the US?
Mike: Sure. Yeah so we're a bit scattered out. So we have 3 co-founders and we're all separated. So I'm in Boston, the headquarters are in Boston. My other co-founder is down in New York. And then we have our other co-founder on the West Coast. And then we have employees scattered about. So we're all over the US and we have clients completely internationally. So I have clients down by you in Australia, which is interesting because you've got to talk to them in the middle of the night, and you see some really interesting communications. We have people in Japan, we've had clients from London, Germany, really all over the world. So we have a really nice presence both locally to the Boston and New York area and then kind of spread out all over the world.
Kirill: Ok. Gotcha. That's very interesting. It sounds like you guys have some very high calibre people on there. Like you have a few PhDs, you have some people with backgrounds in engineering, statistics, and so on. How do you go about building up this team that's so scattered and at the same time working together to make SFL happen?
Mike: Sure. So I think it's a fun story. So the way that the 3 founders started was we were all in grad school together down at Brown University, which is an Ivy League school here in the US. So we were all studying particle physics at the time. Myself and my other co-founder, we were experimental physicists working at CERN, at the Large Hadron Collider. Do you know CERN?
Kirill: That's so cool. That's so cool.
Mike: Yeah so I and my other founder, we were working at the LHC and my thesis actually was on the discovery of the Higgs Boson, which was the –
Kirill: Oh wow! Were you part of the team?
Mike: I was. My thesis was literally on the discovery. It was just incredible timing. It took 30 years of people putting in some real blood, sweat and tears to build it, and then the day I started my PhD, the thing turned on. And the day I graduated, it turned off. I'm not exaggerating.
Kirill: That is so cool!
Mike: I mean, the timing couldn’t have been better. And then when I got out of school, you’re hot off this big Higgs Boson Nobel Prize discovery and I just have it on my CV. It’s so cheating because you present this to somebody at a job interview and they’re so impressed. Little do they know that it’s a team of thousands and thousands of people working. But anyways, we were all doing that together and our other founder was a theoretical physicist at the time. And back then we were basically data science and training. What we used to discover the Higgs or black holes or whatever else is fundamentally the same exact type of machine learning or data science that’s applied in industry today.
So we were basically just training to become data scientists before the real terminology came about. So we left academics because sadly there’s really not too much money or opportunities in academics anymore just because of funding issues, and we started these professional careers as data scientists. And to touch on one of the points you made about business value, I think we’re still in this phase where people are thinking about data science as a theoretical exercise where we’re sitting down and we’re just trying to write really cool and sophisticated algorithms to solve things, which is great, but we really need to think about how do we take it to the next level and what do we do with the outputs of these models or whatever we’re building.
So we saw on the market that there’s this huge gap there that businesses are now getting into this whole data collection, data sourcing thing and let’s try to capitalize on this and say, “Okay, we understand the fundamentals of data science, but we also understand the business value here.” So we came together, we started the company, and then all of our technical people, they all happened to as well be PhDs, mainly in physics and then engineering as well. So, it’s not a culture where we demand you to be a physicist, but from a background perspective we have a lot of that kind of critical thinking and problem-solving skills which are great, you know, applied to a business sense. That’s kind of how we all got together, formed a company and then pretty much got to where we are today.
Kirill: Gotcha. That’s very interesting. I love it. One thing I found about people with PhDs – not everybody, of course – sometimes there’s issues with communication, with communicating insights, and therefore they sometimes face challenges when working in consulting. Do you find the same thing or do you somehow source PhDs that don’t have that problem?
Mike: I talk to a lot of budding data scientists that come out of school, and the problem is exactly how you’re describing it. It’s a communication gap where they get so focused on the academic exercise of it that they don’t really think about what does the client care about, or what does the business care about. So it’s great when we find somebody with the technical skills and then the communication skills, they’re homeruns, they’re instant hires. But a lot of the time what we do is, we’ll work with people who have the raw technical ability and guide them and start bringing them into calls and meetings and seeing how to communicate properly, such that they build that rapport themselves. But you’re absolutely right, there is this kind of thought in the industry that academics can’t really speak to the business, which unfortunately is true, but we do have to bridge that gap when we are consulting.
Kirill: Gotcha. That’s really cool. Well, it’s very exciting that you work at a data science consulting firm, and moreover, are the owner of a data science consulting firm. To me it just means that you have 100% visibility of what you can expose in the podcast and what you can’t, and I’m going to torture you a little bit about some case studies of data science in action. I think our listeners love that the most, when you can actually give live examples. Could you start with a simple case study of how you applied data science to help a business and bring some value to that business?
Mike: Sure. So, one of the things that I’ll mention first before I start talking about case studies, because I’m sure they’ll go all over the place – this will make sense as I start going through this – so what we noticed as a company was, no matter your industry, so if you’re in pharma, healthcare, financial services, insurance, it doesn’t matter. Fundamentally you’re all solving the same problems.
When we look at it from a data perspective, most people are solving things in text analytics or vision, they have an image or there is some kind of time series pattern we’re trying to expose, or there’s a marketing type of question. So if you kind of think it from that perspective, all businesses are the same and they’re solving the same problems, which is really cool for us because then we get to work with literally all verticals.
So, kind of coming back to your question about use cases, and you’ll see as we talk through these I’ll mention all sorts of random verticals, but when you think of data science it doesn’t really matter, your vertical, as long as you can kind of get down some of the terminology that these verticals speak in.
So first case study: this is a bit of a boring one, I think, but it’s actually a huge problem in so many fields. For instance, pharma companies or finance companies, they have an absurd amount of unstructured data. This is all text data, so this could be — like, if you are looking at case studies or financial reports and you’re looking to make investments or whatever you’re trying to do from drug discovery or whatever the use case is, you have all this information sitting there that’s constantly coming in to you. And what you would do as an analyst is you would literally have to comb through all this information and start to try to extract key elements from thousands or millions of .pdf files or Excel files or what have you, and aggregate it into one kind of centralized spreadsheet or something.
So one of the things that we worked pretty heavily on is building these end-to-end natural language processing pipelines that would automatically clean, aggregate, classify and then extract these huge amounts of unstructured data. So that’s one of the really interesting problems to work on from a NLP perspective. We see a lot of this in the pharma space, the legal space, even real estate agents are trying to do things like this. That’s one of the more boring case studies.
Kirill: Quick question on that. Where does that text come from in pharmaceutical companies?
Mike: These could be things like field reports. So, pharmacovigilance – I don’t know if you’ve ever heard of this terminology – is basically the study of how people react to taking drugs. It turns out that lots of people actually die every year because they take some kind of combination of drugs. In the U.S., we love subscribing and taking medications that just interact really poorly with each other.
So there’s all this information out there about drug interactions and, you know, “this did this” and “this did that,” and you can kind of compile from just all historical records or academic literature huge amounts of case studies that you start doing this. If you’re a finance portfolio company, you literally have hundreds or thousands of investments and each one of these investments send that data to you. So there’s always this kind of data transfer happening out there and it’s pretty organic through most of the organizations that they’re collecting and seeing these data sources.
Kirill: Okay, gotcha. So, you take that text data and then you turn it into quantified data, into numbers and so on, and then they do something with it or do you actually deliver insights?
Mike: Right. We deliver insights. That’s the whole idea. Our whole core mission is to not only apply the machine learning algorithms, but allow them to get some consumption, make it consumable and actionable. If they can’t do anything with it, we’ve all just wasted a lot of time and money. For instance, if you’re taking it from the fintech perspective, the financial services, you’ve now allowed this portfolio manager to see his investments across a thousand different companies and start to draw insights like, “Hey, this company is really worth investing in,” or, “I need to drop them because I’m losing money.” There’s tons of actual business ROI that can happen by just aggregating and extracting tons of different unstructured text information.
Kirill: Gotcha. Okay, cool. So that’s case study number one. Let’s move on to the next one.
Mike: All right. Number two – let’s talk about medical stuff. So, data science is having a huge bust or boom right now in the medical field. We work a lot with these deep learning technologies in medicine specifically. For instance, there is this field of study called digital histopathology, which basically means you go to a doctor, the doctor takes some kind of biopsy, they look at it under a microscope, and a human sits there and says, “Do I think this is cancer or not cancer?” It’s a very important process, you know, diagnosing cancer is not something to take lightly, but it’s very expensive and it’s very human-intensive.
So what we started to do is build out large scale algorithms using these kind of really interesting state-of-the-art neural network architectures to automatically kind of take the cellular activity and start detecting and diagnosing things such as cancer or other types of diagnoses purposes, like identifying viable cells for in vitro fertilizations, or whatever you’re trying to do, you can start applying deep learning to that and doing it from a very precise data science way. So that’s a really cool case study that we’re working on right now.
Kirill: That’s really interesting and I’m really glad you mentioned deep learning, because I find a lot of the more traditional, bigger, larger in size consulting firms — I’m not going to point any fingers here, but big ones with thousands of employees — they are very slow to adapt these bleeding edge technologies such as deep learning. Deep learning sounds like a very sophisticated thing, something that is very hard to put in an engagement letter for them to bring to a client because it’s risky, you know, it’s something completely new and they don’t know how to even wield that sword.
Whereas you guys are like, “Oh, yeah, we’re just going to throw in some deep learning into this mix.” And why not, right? It’s like the most powerful technology and it can get the best accuracy out of most of the machine learning algorithms. Why not apply that? So I’m really happy that you mentioned that.
Mike: Yeah, you’re absolutely right. And I mean, we get asked a lot about competition from these larger traditional management consulting firms that now have moved to data science. And the problems that they’re solving are those kind of low-hanging marketing analytics problems. They really haven’t started to dive into these more R&D-based problems. Which is great for us, because there’s a huge market evolving there that we can really be in the forefront of and capitalize on, which is fun, these are really challenges and interesting problems and there’s really not much competition, so it’s a great space to play in.
Kirill: Yeah. It almost sounds like you guys — what you’re doing is kind of, like, research — similar to what you’re doing in CERN, but in business. You’re solving problems that haven’t been solved before. How cool is that?
Mike: You’re absolutely right. It is very cool. And I’ll give you one more case study like this. This is something that we’re going to begin working on very soon and I’ll be a little cryptic about it, but basically we’re going to be using deep learning to reconstruct organs, such as your lung, that will be turned into 3D images and then stem cells will be grown on top of them to have fully viable organs.
Kirill: Wow! That’s insane.
Mike: Yeah. So, not too much more to say about that, but you can just imagine the implications of taking a deep learning exercise and turning that into an actual organ. This is the state of the science. This is bleeding edge R&D research that we have the privilege and the opportunities to be working on. So it’s just incredible.
Kirill: Do you guys take people part-time? I want to come work for you. (Laughs)
Mike: Sure. You got it.
Kirill: This is the best. This is so cool. I love it. And on a side note, you guys are really working in health, right? You’re changing people’s lives. I always appreciate that, when people apply data science to impact people’s lives, because you can do it at scale. It’s not like you just change one person’s life, which is already an immense thing, but you’re changing thousands, potentially millions, of people’s lives in the future with all of these things that you’re creating and the consulting work that you do with these companies. So hats off to you. I think that’s a very noble thing to be doing.
Mike: Thank you.
Kirill: Okay. So, case study—we’ve done one about NLP pipelines; case studies 2 and 2b were about deep learning. Do you have another one for us? Case study number 3?
Mike: I have one more completely different one and this is just because you’re in Australia.
Kirill: Okay, gotcha.
Mike: I mentioned we had an Australian client, so completely different type of project. We had somebody who you guys call a professional punter, who essentially, for all our U.S. listeners, is a professional horse race better.
Kirill: Oh, that’s so cool. I didn’t even know that term.
Mike: Yeah, he’s a punter. So, I guess Australia is the world’s largest betting country and they bet like 85% of the entire world market. And one of the biggest things you guys bet on is horse racing. So he was a professional horse better and he came to us and he wanted us to build a really large set of algorithms around predicting winners of horse races. And it’s so different than NLP or medical things. Now we’re talking about sports betting.
Then we get to do projects like that and you go in and that’s just an exercise at data cleaning and really understanding what is horse racing, what does it mean, how do you build these features, because there’s so many nuances, there’s jockeys and trainers and venue. It just gets crazy. So that was a really fun project and it actually was very successful. He was able to use this automated machine learning pipeline that we built and I think he has something like 10-20% ROI on his betting, which is ridiculous because it’s an automated system. We don’t get many of those, but when we do, they’re fun to tackle and everybody jumps for them because they’re fun to think about.
Kirill: That’s really cool and an exciting example, completely left field. Okay, thanks a lot for sharing those. Those are some exciting case studies, but it sounds like you need a very diverse set of skills for your organization to be able to deliver on those projects. How do you find or train up your amazing staff in order to be able to take on these challenges?
Mike: Yeah, that’s a challenge. The only specialists I look for are specialists in the deep learning field that could do things like those cellular type of classification problems. Because that takes an extreme amount of just knowledge and knowhow, and you literally need to be up-to-date on — you know, if a paper is published that morning, you need to be an expert on it by that night, type of thing. So we do look for specialists there, but otherwise, we are looking for people who just can really understand how to do data science across all these domains. And it’s a challenge to find these people, but they’re definitely out there. And what we do like to do is take on some interns and work with them and really teach them how to apply data science across all these domains and then we kind of grow them into consultants one day. So, you either hire really, really good people, or like anybody else, you take people who have a lot of great raw skills and you grow them in a couple of months or a year into really great candidates.
Kirill: Gotcha. And let’s talk about you for a second. We’ve talked about your background. At the same time, what I wanted to ask is, among our listeners, about 10% are either executives or business owners or self-employed. It’s a very interesting example. You’re running a data science consulting firm. What are some of the challenges you face on a daily basis?
Mike: My challenges as a consultant or running a consulting company, is finding and landing clients. That’s the biggest challenge.
Kirill: Wouldn’t it be the case that more and more companies are picking up data science, realizing that value, and they’re just running to you? Wouldn’t you have a huge a backlog of clients? That’s the way I imagine the world right now.
Mike: Yes. However, we’re not a household name like McKinsey. So there’s still the outreach, there’s that outbound marketing that has to happen, to go out and talk to people and get people to speak with you. But that’s the challenge of any consulting company. Then you have five other people bidding on a project. How do you prove that you’re better? Those are the types of challenges.
The other challenge, the flip of that is, as you’ve seen, we get all very diverse sets of projects. Being able to solve all of these is sometimes a challenge in itself, because one day you’re solving cancer, the next day you’re predicting horses and the next day you’re doing things from a marketing perspective and understanding complete user population statistics or understanding customer bases. So, it’s staying up on the technology that is coming through so quickly such that I can have the conversations with the other business owners and convince them that we are the firm to go with. Because if we are not up-to-date, there is going to be somebody else out there that is going to say better words or understand the newer technology and they’re going to scoop us. So staying up-to-date and basically convincing people to work with us.
Kirill: Gotcha. And in terms of managing the team, do you face any challenges there? Obviously you have some very talented people there, but at the same time, as I understand it, it’s a decentralized company. What are some of the problems that you’re facing there?
Mike: You know, we actually do a really good job of not having too many internal issues. It’s very rare that we have any issues truly internally. Sometimes a client will be misinformed or they won’t really understand something and then we’ll have to stick our heads out there and talk them through that. But it’s very rare, even though we are very disaggregated across the country, that we do have any internal problems. Most of us have known each other for a long time, we have a really great relationship, and everybody is just really happy, they’re working on really fun and hard problems, so there’s really not too many issues, to be honest with you.
Kirill: That’s really cool. It’s good to hear because for somebody working with you guys, if you have everything down pat inside the organization, then your outputs are going to be superb. There’s not going to be delays, there’s not going to be any issues, the team is working nicely together. I think that’s a great state to be in to achieve for a company, that everybody is on the same page inside the team itself, and then you worry about the external issues. That comes next.
Mike: Yeah, absolutely.
Kirill: Okay, cool. Thanks a lot for that. Let’s talk a bit more about the tools that you use, the tools and methodologies. We’ve already mentioned a couple of methodologies like deep learning, natural language processing, things that you use in your company, but how about the tools? What are some of the most popular tools that you guys rely on? Do you use open source software? Do you use commercial software and things like that?
Mike: Yeah. So, everything we use is open source unless our client has already bought something and they want us to use that. But that’s rare. That doesn’t happen too much. So often everything is open source. I don’t do much coding anymore, but the team’s go-to for solving more basic problems — XGBoost is a wonderful algorithm open source tool that lets you solve a lot of easy to difficult types of problems. We’ve migrated a lot to using TensorFlow. I think they really capitalized and made deep learning the sexiness that it is today. TensorFlow did a really nice job about marketing that. So we have a lot of clients that ask for TensorFlow, which is fine, we’re experts in that as well. So TensorFlow, XGBoost, we write a lot of our code in Python, if it needs to be scaled we use Spark or some Scala or PySpark type of derivative, you know, the very kind of basic toolsets. If there’s large engineering tasks, we’ll use Kafka or some kind of streaming software, you know, the basic big data architectures. So, we’re pretty standard and up-to-date with our software tools.
Kirill: That’s really cool. And I’m glad you mentioned Spark and Scala. That means you guys are on top of Hadoop as well, right?
Mike: Yeah, absolutely.
Kirill: So, walk me through this. When a client comes into your organization, do they know their problem already or does it take time to help identify the problem?
Mike: Every client is phenomenally different. Let’s take a client that needs the full spectrum of services. If you do it this way, there’s four very distinct steps that we go through. And each client is different, some need us for only one or the other. But let’s pretend that you needed all of them. The first thing we hit on is an overarching data strategy, so really getting in and understanding what are your business questions, what are your objectives, how are you leveraging data, where are you today, and where do you want to be in a year or two from now. So, that really traditional strategy – get on the whiteboard and write stuff down. That’s really important, because at this point you have to convince the stakeholders, you have to talk to management, you have to talk to data governance and set all those things up.
Once you’ve done that, you’ve defined basically what you’re going to do, what is the data science going to be. It all comes from your data strategy. So the next logical step is actually not data science, it’s data engineering, setting up these architectures, the Hadoop stacks, Spark, whatever it happens to be. Getting in there and actually doing the data engineering, integrating with the DevOps guys, the software engineering people, whoever it happens to be. From there, that’s when we finally get to the data science, so actually taking the business questions and solving them mathematically. That’s where we do all our algorithms, our machine learning, all that kind of fun stuff.
The fourth step, which absolutely is the most important step, is how do we make any of this consumable. And I kind of mentioned this before. If you did all of those three prior steps and they still can’t take that output of that model, the probability score and the classification label, whatever you’ve outputted, if that can’t drive a business somehow, if your salesperson can’t use it, if your end user can’t do anything with it, it’s a complete waste of time. So we really strive to make sure that we make our final product consumable and actionable such that they can drive your business, increase your speed to market, sales and profit, lower your R&D time, increase market shares, something. We have to always drive at that.
So clients come to us in various stages of where they are along that life cycle and we plug in accordingly. That’s a very longwinded answer to say to you: Some clients have absolutely no idea what they want, some clients have a very firm idea and are correct, but a lot of the times they have a very firm idea and it’s not necessarily correct, but it’s more of a re-education of what they properly could do or want to do.
Kirill: Gotcha. That’s really cool. Just to reiterate those steps, we’ve got overarching data strategy, data engineering, data science, and ‘make it consumable.’ I do like ‘make it consumable’ a lot because I think that’s where you can really differentiate yourselves – and you probably do – to the bigger historical management consulting firms with thousands of employees, because for them, a lot of the time they have to be in very tight cost brackets and they have to go in and they have to do this work and they have to make sure, because they charge exorbitant fees, like, partners charge $400, $700 an hour, sometimes more.
And at the same time, they have to really create this work. And for them, once they’ve done whatever is in their scope, they’re done and they’re out of there. They don’t stick around to bring the value to the business. I’ve seen it quite a few times where the work is done, but then the client is just sitting there with this report or with this algorithm that’s been ‘tailored’ to them and they don’t know what to do with it. So I think it’s a very important step that you mention at the end there – to make it consumable for the client and educate them on how to use it going forward.
Mike: Yeah, you’re absolutely right. Our whole business is literally built around that principle. And it’s not just the management consulting companies that – not to knock anybody – maybe lack there, but also our other source of competition is black box type of products. We don’t need to necessarily name them, but there are some data science products out there where you take your data, your problem, you kind of shove it (to be crude) into their black box and you get out something. But the issue is the same.
So you’ve gone through steps 2 and 3, you’ve designed some engineering, you did some data science, you tried to solve it, but so what? You still need that next step of “Okay, where do I go from here?” So that’s, in my opinion, where all of these products are lacking, it’s the “So what? What do I do now?” which I think is our huge differentiator, it’s “Fine, no problem. We can build all these algorithms, but now let’s take it to the next step.”
Kirill: Yeah, exactly. And we spoke about this a bit before the podcast. You guys are doing a lot of different things. That approach with the black box or with “Here’s your report, go deal with it” is good enough for low hanging fruit a lot of the time. It can solve problems. But what you guys are doing is like top of the industry cutting edge technology, problems that have never been solved before, and obviously you need people’s input, obviously you need a lot of critical thinking to get this off the ground, let alone bring value and actually impact the bottom line of businesses or the mission of businesses and ultimately impact the value that those businesses bring to the world.
Mike: Yeah, you’re absolutely right. Yes, not to say we don’t solve some of the lower hanging fruit, because I don’t want to put that out there. I mean, we do all of that as well, but you’re absolutely correct in the case that a lot of the problems that we solve, maybe about half of them, are that kind of cutting edge spectrum where without the critical thinking, without the real kind of knowhow how this applies in a large scale across other verticals and other domains, there’s just no way that a generic solution is going to work there. So, yeah, you’re absolutely correct in your point.
Kirill: Okay, cool. So, in order to accomplish these four steps that you’ve outlined, how many people normally do you put onto a project? Is it like one person? Is it two or three people working on a project at the same time?
Mike: It depends. I mean you could logically go in order and have one person doing it. A lot of the times we will have a main person working on it and then somebody backing them up, especially when you get into the machine learning side of things where sometimes we’re trying different approaches because machine learning, at the end of the day, nobody has a strict strategy that’s going to work. Sometimes we’ll parallelize the work efforts. Somebody will try a deep learning solution, somebody will try a more traditional feature engineering approach. We see which one works and then we can converge, ensemble and get something a little bit quicker out. So mainly one person, sometimes two, usually is able to fit the bill.
Kirill: That’s really cool. And I just wanted to use this opportunity of you being on the podcast and being the CEO of the company to demonstrate to the world, or to people listening, what actual value companies like yours can bring to an organization, what actual value data science can bring. Obviously, clients have an expense related to your services, they have to pay for your services, but could you maybe show off with an example of how what you’ve implemented, how that has translated into revenue or addition to the bottom line to the client, and what that was compared to the expense they had in terms of ROI, how much more did you bring than they had to pay in order to get your services?
Mike: Sure. So, without saying specific numbers, I can think off the top of my head, maybe by the time I finish speaking I’ll think of others, but off the top of my head, one simple one was the largest retail Japanese client. They’re the largest online retail company, and they wanted to do something relatively simple where they said, “Okay, I want to send out e-mail campaigns to my customers such that I can tell them about deals going on in the site, you know, this computer or this shoe is on sale. How do I target them a little bit better?”
We wrote some pretty standard segmentation models which they were then able to go in and say, “Here are my high value customers, here are the people I should go after, here are the people who aren’t going to do anything for me,” and they just literally sent out different e-mails to those different groups of people. And just by doing that, the last time I spoke to them they said that the return on value for that was — I think they said upwards of $2 million. And that literally took us maybe two or three weeks’ worth of effort to do it and actually profile the clusters and something. That’s phenomenal ROI, and that’s probably still on-going.
Other studies — for instance, these NLP pipelines that I was talking about before, the cost saving on that is phenomenal because what happens is, if you build a pipeline — let’s say that it even takes you three, six, twelve months to do, which has a decent upfront cost, that doesn’t necessarily make these people fired, but it actually takes the job away of multiple people who have to do this manually. So manual annotators, manual people doing data entry. And if you think of the salary of four or five people that were doing this and the associated cost when they did it incorrectly because every time you incorrectly enter a data point, that actually costs a company roughly $100. So it’s very expensive from all sorts of scenarios. If you look at these five people’s salary over the course of a few years, you’re again in the millions of dollars. So you can have really huge impacts.
Another one that I can think of is from a deep learning perspective. We built — this company is doing real-time human and object tracking. So from a security system they put in front of their house, they can say, “Oh, there’s a car. There’s a person. There’s a dog,” which is interesting, but what it can really do is say, “That’s not just a person, that’s dad,” or, “That’s the mailman,” or, “That’s a stranger.” So, it starts to learn about who you are, what you are, what the nuance is of people coming home or not coming home, and starts to say, “Oh, this person is in the house. Send a message to 911.”
So this technology that we developed actually launched their entire product suite. Because without us, they’re just a fancy camera that doesn’t really do anything. That now I think has a market share in the $10-$15 million dollar range or something. I don’t know the exact numbers. But you can take technology out there that on its own wouldn’t do — all it’s doing is recording, it’s no different than a $20 camera — and add some kind of really interesting logic on it and all of a sudden you have a very viable product in the market worth a lot of money. So, yeah, there’s a lot of real ROI out there and sometimes it’s very easy to get to it and sometimes it’s difficult.
Kirill: Yeah, that’s really cool. And those examples are great. Thank you so much. It sounds like you guys are sometimes adding to the essence of a company, that extra little bit to really make it more efficient, more tailored, more targeted, which is good for the customers as well because they get the products that they want that they maybe didn’t even know they wanted.
On the other hand, like with this camera example, you are building the essence of a company. Without you, as you said, it’s just like a fancy camera, just like a wrapper to a company, and then you put the company inside it. You probably in some cases feel like, “I wish I had that idea, I could’ve built that company myself.”
Mike: I know. We’ve had that discussion as well. It’s terrible to say it. If any of our clients are listening, I love you all dearly. But sometimes you do have that thought, you know, “I think that we’re doing the bulk of this work. Why don’t we just do this?” But yeah, that is sometimes the thoughts that do cross your mind.
Kirill: Yeah. But I think you’re doing something way more interesting in the long run, right? You get to work on all these different types of projects and you never know what will come in the future. So, personally, if I were you, I wouldn’t give that up in order to go and pursue some one specific idea.
Mike: Yeah, you’re absolutely right.
Kirill: All right, that was really cool. Let’s talk a bit about how do people become consultants in data science. Not necessarily start a consulting firm, like you did, but for those of our listeners who are looking for jobs or who are looking to enhance their careers or maybe change their careers, if they want to break into the space of consulting in data science or get hired by a company like yours, what do they need to do?
Mike: I think that what has to happen — a lot of the times, I see data scientists out there that work for a company, and these could be very large companies or very small companies. They get pigeonholed into solving one very small problem. They’re really good at solving this exact problem for this exact dataset, which is fine for that company, but it doesn’t give them the skills to go out and really understand the entire market.
So to get hired, in my opinion, at least for us as a consultant, having that very general understanding of data science, how it applies to different data types, like I described before – texts, images, time series, marketing stuff and then all the other stuff that falls in that large catchall, and how do you apply methodologies to all of these, that then is the type of candidate that I think will thrive in the consulting environment.
And why I really like what we do — this is kind of a pitch out there to people who are interested in consulting — is every day you do get to work on different types of challenges. You know, it will probably take you a couple of weeks to finish a project, but you’re always doing something new, it’s always a different challenge, you’re growing so much as a data scientist by always seeing different data types and different problems and different algorithms that your toolbox now is exploding and you then could be the rock star because now you’re seeing it applied to all these different domains and verticals and the challenges and you can just knock them all out of the park. So I think it’s a great thing to want to do and to be. And to do it I think you just have to really understand the field and come in with a great attitude to want to learn and really just put in the hard work.
Kirill: Gotcha. And I can totally attest to that about the learning. When I was back at Deloitte, that was such a steep and fast and quick learning curve. I was grabbing so much knowledge just because of the variety of projects you’re in – you’re doing this, you’re doing that. And I always recommend to students who ask me “Where do I start? How do I get into data science?” I always say consulting. In my opinion, consulting is the best place to start. I don’t know about your company, it sounds like you have a pretty good culture, but in other consulting firms you’ll be worked to the ground, you’ll be working until 2:00 A.M. and so on, you won’t have a life. But if you go through that training period you will learn so much. It’s like going to university again and getting paid for it. It’s a really cool thing.
Mike: Yeah, I actually think that analogy is perfect. It’s like going to university and getting paid for it. That’s true. You have to work hard, but the benefits are very, very good.
Kirill: Yeah. And one more thing on that is, there’s two reasons to leave a consulting job. One is, if you stop growing. Like you pointed out, Mike, if you stop growing — there’s a saying, “Unless you’re learning, you’re dying.” So, yeah, once you stop learning, once you get pigeonholed, or the projects stop becoming interesting or you’re doing this one project for, like, seven months or a year and it’s very iterative, it’s not anything exciting, then you leave.
Or the other one is when the culture changes, when the culture in an organization was great and then all of a sudden the management changes or they change up your partners or something like that, or the team that you’re working with changes, then you stop enjoying your work.
But apart from that, if those two criteria are satisfied, then you can stay in consulting for several years or maybe up to a decade and actually learn so much and really become one of the top world’s experts in certain areas of data science or become one of the most multifaceted data scientists that there is on this planet.
Mike: Yeah, absolutely. I couldn’t agree more with that.
Kirill: Okay. Yeah, that’s a really cool excurse and I hope a lot of people will pick up from that and get inspired if you are looking for opportunities joining a consulting firm or maybe considering that for your future. Okay, so we’ve discussed all of the box stuff. I’ve got some rapid-fire questions for you, Mike. Are you ready for this?
Mike: Sure. I’m ready. Go ahead.
Kirill: Cool. We might have covered a few of them, so if that’s the case just say and you can repeat your answer, but I’m sure you’ve got lots of great examples. What is the biggest challenge you ever had as a data scientist?
Mike: Oh, geez.
Kirill: Except for the Higgs Boson. (Laughs) You can’t use that one again.
Mike: The biggest challenge I ever had—oh, boy, that’s not rapid-fire because I have to think now.
Kirill: Oh, no, that’s totally cool. On my side it’s rapid-fire, for you it’s—
Mike: So, I actually think the biggest challenge is not in solving the problem, but it’s educating the person you’re solving it for. So, having that conversation with somebody who owns the data or who is going to use that algorithm. And this doesn’t even have to be in a consulting environment. If your boss needs to understand the output or another team needs to consume whatever you’re doing, having that conversation where you guys are speaking different languages and lingo and they don’t understand what you’re saying, you don’t understand what they’re saying, I think bridging that gap is the real challenge that I’ve seen and that we face. Sometimes it’s very easy to come by but sometimes it’s really challenging if somebody really doesn’t understand what data science is doing and why it’s important. That’s what I think is the biggest challenge.
Kirill: Gotcha. Thank you. That’s a very, very cool answer. I think a lot of people could be doing that better as well. Okay, you’ve shared a few case studies with us, but what is a recent win or the best one or the biggest one which you can share with us that you’ve had with your organization which you are very proud of? It doesn’t have to be a case study, it can be anything that you had in your organization.
Mike: I think our first Fortune 500 client – that was really exciting.
Kirill: That’s awesome.
Mike: Yeah. We started off with a lot of start-ups or Series A type of clients who were great, they wanted to give us a chance and they saw the value in working with us. But I was really determined to get that first logo that if somebody came to the site they would know, like, “I know this company!” So that was very, very exciting. That was our big one that made me very happy and proud.
Kirill: That’s so cool. Congratulations on that.
Kirill: And for everybody listening, the website is sflscientific.com. Go check it out, find out who that logo is. I’m curious now. (Laughs) I’ve got to check it out. Okay. What is your one most favourite thing about being a data scientist?
Mike: The diversity in projects. Like I said, every day is a different challenge, every day is a different dataset, a different thought process. It’s just so dynamic and it really keeps you on your toes. If you’re bored as a data scientist, then you’re doing something wrong. You’ve got to think harder because there’s a lot of interesting things out there you can be thinking about.
Kirill: I was about to say, you cannot be bored as a data scientist. You’re definitely doing something wrong.
Kirill: All right, cool. Thanks a lot. And the question I’m very excited about, very interested to get your opinion on because of your diverse background both in physics and what you’re doing now and your vision for your business, from where you’re sitting, from what you know about the world of data science and what you see, where do you think the field of data science is going and what should our listeners prepare for to be ready for the future?
Mike: Yeah, that’s an interesting question. So, I see there’s always going to be these kind of low hanging challenges out there where companies will evolve and they’ll want to understand their customer profiles or look for log threats in security or things like that. That’s fine. I see the field of data science really driving business. Most decisions out there from every business unit, from HR to operations to supply chain management to your software engineering team, sooner than later will all be driven by data science. Think about HR. We’re now doing projects where you have an algorithm predicting who is your proper candidate, so you don’t have somebody calling and screening people or giving them exams and grilling them. You literally can predict it through an algorithm. When you think of it that way, it’s crazy. You take these very traditional jobs and you’re starting to completely not automate them, but augment them with some type of data science. So I see it really just changing the culture and the dynamics that happen inside businesses.
Kirill: Gotcha. That’s a great example on that HR example. We had Ben Taylor from HireVue on a couple of months ago, and the things they do there are crazy. Like, they can take a video of an interview and apply deep learning to understand people’s facial movements to understand when they’re confident, when they’re lying, when they’re telling the truth, the speech that they’re saying, analyse all of those things and then predict a candidate. So I totally agree with you on that, it’s a crazy, exciting world that we’re moving into.
Mike: Yeah. And I think if you’re not prepared — and it’s interesting because we just did a project with an HR team at another Fortune 500 company where they were very traditional HR and now the people that we were working with, they started learning how to write code in R, they started learning how to do some basic regression. Because they saw it, they saw that that’s going to be the next steps in the future of their world and it’s really important. So, I think if you’re not thinking about it from “Oh, it won’t affect my job,” I think it will. If you told me your job, I could probably tell you at least one part of it that we can tackle with some kind of data science.
Kirill: Yeah, that’s totally right. And it’s very exciting to see when you can influence people to start picking up these tools, start getting excited about them, interested, and you’re kind of driving not only these algorithms into an organization, but also this cultural shift into an organization when people are getting more excited by data science themselves.
Mike: Yeah, absolutely.
Kirill: Okay, so we’re wrapping up the podcast here. It’s been very exciting. Is there anything else that you’d like to share with our listeners before I move on to the final questions?
Mike: Come check us out. If you want to talk about a job, come talk to us about a job. If you want somebody to help you solve some really easy, medium or difficult problems, come check us out, or if you’d like to just chat. Yeah, I think that’s a selfless pitch, sorry. I don’t know if that’s what you’re looking for.
Kirill: No, that’s totally cool. I think that’s warranted. You’ve shared so much value here. I think people would be interested to definitely check out what you have to offer. And I wanted to add to that. Even if you don’t have problems, even if you don’t have specific issues where you think you need data science, companies like SFL Scientific, as Mike mentioned, can help you with the overarching data strategy. And I think that is a very, very important part going into the future where you understand “Okay, what is my data strategy for the next 2, 5, 10 years?” 10 is a long shot, but still, it’s not going to change your organization, it’s not going to redirect your organization, but, like one of the CEOs I was working for back in the day would say, it would tilt your organization.
And if you look at the way this industry is going and the rapid expansion, how quickly everything is going, even if you have a 2, 3, 5 degree tilt in your strategy, that can take you to a completely different place and that can mean the difference between going out of business in 5 years and staying in business and prospering and smashing your competition in 5 years. So, even if you think you don’t have challenges, talking about data strategy, people like Mike who have been there, who have done everything, I think it can bring huge value to an organization.
Mike: Yeah, you’re absolutely right. Thank you for saying that. You’re phenomenally on point. Even when we do talk to people, especially at conferences, who weren’t there looking for somebody to help them, we just ask them very simple questions. It becomes evident to them very quickly, even though they haven’t thought about it or they don’t think they have any of these challenger or potential to do things, it’s very often a time when there is something sitting right there that’s a quick win and they can solve very quickly from a strategy perspective, from an actual algorithm perspective, so there’s all sorts of opportunities out there.
Kirill: Yeah, totally. Thanks for sharing that. On that note, how can our listeners contact you or get in touch with you or contact you about a job that they need help with or maybe once you guys are hiring they could get in touch or maybe they could just follow your career? What are the best ways?
Mike: You can obviously just come to the site. You’ve said it a few times, but we’re sflscientific.com. If you go on there, it says you can send an e-mail for [email protected] or [email protected] Those come to me. I love seeing people write to the company so I just have them come to me. I can’t not look at them. If somebody wants to talk and I see them, I instantly get in touch with them. So, send us an e-mail. Otherwise you can contact us on LinkedIn or Twitter. We have a Facebook page as well, I don’t think that one is as maintained as the other ones, but all the very kind of standard social media channels and outlets.
Kirill: Gotcha. And we’ll definitely add those to the show notes for anybody who’s going to be looking for those. Thank you very much for that. I just have one more question for you: What is your one favourite book that you can recommend to our listeners so that they can become better data scientists?
Mike: Sure. I think there’s two. The first one is “The Elements of Statistical Learning.” It’s technical, but you’ll learn about the algorithms and what’s happening, which isn’t always as important when you’re just applying some scikit-learn model through Python. But it really is nice to give you the foundations of what’s there across supervised, unsupervised, regression, classification and cross-validation. It really goes into that in depth. When I have conversations with people that are interviewing or that are kind of first coming up, I probably ask questions that are very found in that book. So, I think having a nice grasp — obviously you don’t have to know every single word, but that’s really nice from a technical standpoint.
The other thing that I would suggest is some kind of book that bridges the gap between data science and business. Because if you want to really succeed and you want to take it to the next level, just being the person who sits at their computer and cranks out one or two models every week or two and doesn’t really understand what happens next with these models, that’s not really how you’re going to push forward in your career either as a consultant or somebody who is doing this in a company.
Having some kind of understanding of the entire landscape, so data science and business, what’s the implications, how am I driving processes – that’s what’s going to make you a superstar in consulting or in any business that you’re in. That would be my other suggestion.
Kirill: Okay, gotcha. Thanks a lot. So, guys, if you’re ever going for an interview with Mike, remember to read “The Elements of Statistical Learning,” the answers to these questions are there. Yeah, go find yourself a book on how to apply data science in business. We’ve had lots of recommendations on the podcast already and there’s lots of books that can help you out with that.
On that note, thank you so much, Mike, for coming on the show. I really appreciate your time. I think a lot of people are going to learn from this and benefit from your case studies. And I’m sure you’ve inspired a lot of people to at least consider working in consulting or engaging data science consultants. Thank you so much.
Mike: Perfect! Thank you very much. Take care, and I hope to hear from everybody at some point.
Kirill: So there you have it. That was Mike Segala from SFL Scientific. I hope you enjoyed this podcast. Definitely lots and lots of value. I’m so glad Mike came onto the show. As you can see, there’s value for those who are running a business, who are thinking about introducing data science, leveraging data science in their organizations. There’s value for those of you who are building careers in data science and want to understand better what options out there exist and how you can explore them and how you can enhance your careers.
Also, we had some great case studies that Mike shared. So I’m very happy with this session, and personally my favourite part among all of our discussions was the notion of learning data science through consulting. This is something that’s very dear to me because I personally went down this path and you’ve heard me talk about this a few times. Of course, there are different views on this and there are different opinions, but I believe that going into consulting to do data science is a very powerful thing, because like Mike and I mentioned in this session, it’s like getting a degree, an extra degree, and also getting paid for it at the same time because you’re learning so much, you’re working really hard, you’re putting in the effort. There’s so many diverse problems and tools that you’re using. It really helps you get up to speed very quickly and break into the space.
So if you do have an opportunity or if you are considering getting into consulting, then firms like SFL Scientific or other data science consulting firms out there are definitely a great option. Or even large-scale management consulting firms are great as well as long as you make sure you don’t get pigeonholed just because they have all these processes quite ironed out. And of course, if you are a business owner and you’re looking for assistance, some help, some advice with your data science, maybe challenges or just data science strategy, then make sure to reach out to Mike, connect with him and pick his brain about how he can help you or what you can do about those challenges that you’re facing. Mike is a very, very knowledgeable guy, as you can tell.
On that note, thank you very much for being here today. Make sure to connect with Mike on LinkedIn and follow them on Twitter. As you can see, he’s up to lots of great stuff. And I’m sure they occasionally hire at SFL Scientific and you want to be at the top of the list, one of the first people to find out when that does happen. On that note, you can find all of the materials mentioned in this podcast at www.superdatascience.com/65. I hope you enjoyed today’s session and I look forward to seeing you next time. Until then, happy analyzing.