SDS 037: Develop your dream Data Science Career with Experfy

Podcast Guest: Experfy

March 23, 2017

Welcome to episode #037 of the SDS Podcast. Here we go!

Today’s guest is Founder and Co-CEO of Experfy, Harpreet Singh
Data Science platform Experfy holds all sorts of opportunities for data scientists, from courses and training right through to the chance to get paid to do interesting work on exciting projects. Harpreet Singh joins us today to tell us all about it.
You will hear us discuss at length various case studies of applications of data science across diverse industries that have emerged from Harpreet’s experience through the Experfy platform. And having 30,000 data scientists on the platform, Harpreet will share his view of the “secret to success” in data science.
Let’s get started!
In this episode you will learn:
  • Learning and Assessment on Experfy (11:30) 
  • Data Science Application Case Study: Marketing in Medicine (26:32) 
  • Data Science Application Case Study: The Internet of Things (31:50) 
  • Prognostic Analytics and Predictive Analytics (36:50) 
  • Data Science Application Case Study: Preventing Health Insurance Fraud (39:01) 
  • Data Science Application Case Study: Rewarding Customer Loyalty (45:01) 
  • The Secret to Success in Data Science (49:47) 
Items mentioned in this podcast:
Follow Harpreet
Episode transcript

Podcast Transcript

Kirill: This is episode number 37 with Founder and Co-CEO of Experfy Harpreet Singh.

(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Welcome to the SuperDataScience podcast. Super excited to have you on board, and today we’ve got a very interesting guest. Today we’ve got the Founder and Co-CEO of Experfy Harpreet Singh. So what you need to know about Experfy is this is a huge online marketplace for data science. So basically, companies come along to Experfy to post their problems, their challenges that they’re facing that can be solved, or that they think can be solved, with data science. And then data scientists actually bid for those projects to participate or to solve those projects. And at Experfy they have a total of a staggering 30,000 data scientists. And so how do they have so many data scientists? Well, because it is a marketplace where anybody can come and apply to be part of this marketplace. So basically, you could go to Experfy, submit an application, and become a data scientist that has the opportunity to bid for these projects, to participate in these amazing projects that are changing the world.
So in this podcast, you’ll get to know more about Experfy and how they operate, and also you’ll get a good overview of what other services they offer which are some interesting ones, such as education, and Harpreet will actually make a first time public announcement about a new project that they launched. Plus, in this podcast, I could not resist the temptation to use this opportunity to actually ask Harpreet about all these applications of data science, machine learning, analytics, deep learning, to real world projects. So in this podcast, we’re actually going to go over four real world case studies of how data science has been applied to different industries.
We’ll talk about industries such as marketing in medicine, predicting insurance fraud, prognostic analytics, and the Internet of Things. So this is a podcast you definitely don’t want to miss. Buckle up for a fun ride. We’re going to talk about so many different applications of data science and you’re definitely going to have a lot of takeaways from today. And without further ado, I bring to you my good friend, Founder and Co-CEO of Experfy, Harpreet Singh.
(background music plays)
Hello everybody and welcome to the SuperDataScience podcast. Today I’ve got a very special guest, a good friend of mine, Harpreet Singh, calling in from Boston. How are you, Harpreet, today?
Harpreet: I’m very well, Kirill. How are you doing?
Kirill: I’m doing great as well, especially having you on this show. Harpreet is the Founder and Co-CEO of Experfy, a huge online learning platform, and not just learning, it’s a huge data science platform launched through the Harvard Innovation Lab. So this is going to be a very exciting podcast, especially for those of you looking to break into the space of data science or get some education or get some experience in data science. Super excited about this. Harpreet, how are you feeling about the podcast?
Harpreet: I’m very excited to be speaking with you.
Kirill: Awesome. Thank you so much. Alright, to get us started, could you give us a bit of an overview of Experfy? What is Experfy? What do you guys do?
Harpreet: Yeah, so Experfy is a platform where we have curated a very large number of data scientists for on-demand consulting and training. We have 30,000 data scientists, perhaps the largest platform in the world, where companies can come to us and seek experts for various use cases that they’re working on. Also companies can leverage the same practitioners to upskill their own workers, their own professionals within their firms. So there’s a very interesting dynamic going on, but if you look at the macro trend, there is a growing scarcity of data science talent. And it’s only going to get worse, and companies are realising that and they want to equip themselves with their own in-house staff so that they don’t have to rely on outside consultants. So training is also a very important area for us, that we are fulfilling a need in a very different way than the traditional companies out there.
Kirill: Gotcha. That’s very interesting. In all of that, I have so many questions. Probably the first one is—30,000 data scientists. I’m assuming they don’t all work in the same building. How did you build up that capability? Where are these people located? How are they connected and how did this all come to be?
Harpreet: You know, marketplaces are extremely hard to start because you have a chicken and egg problem. Unless you have the demand, you don’t get the supply and unless you have the supply, you don’t get the demand.
So getting that started was quite hard. We were lucky, however, that we started three years ago. We were first to market. We got some very good media coverage in the beginning with TechCrunch, Forbes, Mashable, Wall Street Journal and the like. That kind of propelled us in the limelight. And because we were the only consulting platform, many data scientists decided to join us. And once the projects started flowing in—you know, marketplaces are like a machine, they kind of work themselves—and we’ve been growing since. The supply is growing very nicely, and the demand is also growing because there is a real need out there.
Kirill: So, to understand it better, it’s basically a marketplace where a company can come in and post their data science problem and then data scientists come in and bid on who is going to be solving it and then they build a relationship and that’s how it goes from there. Is that about right?
Harpreet: Yes. However, there is a high-touch aspect to the service we provide because unlike other disciplines or other marketplaces, data science is quite complex as a field and the problems can also be very complex. And every problem is so unique because the data that a company possesses, the format that data may be in, and other systems that that data interacts with or comes out of is also quite unique.
So we provide an account management team that specializes in data science in various verticals. So, if you are coming from oil and gas or retail, we have an account manager for you that understands that industry and then works with you to articulate that use case and translate that into a project description.
Once that project description has been articulated, then we put it on the platform and we have an algorithm that looks at who are the best matches for this project, and then those people are invited to come in to provide a proposal. Even though these are all bids, it’s never the cheapest or the most cost-effective resource that wins. It’s always the person that’s most qualified. So, you’ll see rates ranging from $100 all the way to $300-$400 on our platform.
Kirill: Per hour?
Harpreet: Yeah, per hour. U.S. Dollars, yes. But that’s still quite a bargain because if you’re going to go to a Big Four professional services firm, or if you go to a larger consulting firm, I guess the cost is much greater there and could be running to six or seven figures. Whereas on Experfy, a proof of concept on average costs $10,000-$20,000.
Kirill: Yeah, I can totally agree with that. I attest to that, having worked at a Big Four consulting firm. I worked at Deloitte and the fees, of course, are much greater. On the other hand, what Experfy charges, or the fees that are available on Experfy, are very good both for clients and for data scientists. So somebody working in that space of data science, being an individual data scientist, having an opportunity to make $100-$400 an hour, that’s a very, very good price, especially for a freelance type of work when you’re not really committed to any consulting firm or company. With that in mind, can data scientists listening to this podcast somehow get onto Experfy and become part of this talent pool of 30,000 that you have currently?
Harpreet: Absolutely. We are always looking to expand our pool of experts. It’s very simple: you go to experfy.com and you sign up. There’s an application process you have to go through. You fill out the application, we pull in your LinkedIn profile as well so that you don’t have to do a lot of hard work, and basically then we review the application and see if you are a good fit for the platform.
Kirill: That’s very interesting. And what determines a good fit so that people listening to this podcast can be prepared or maybe start thinking in the right direction? What is deemed a good fit? Maybe number of years of experience, or a different variety of toolset? What are the things that you look out for the most?
Harpreet: Data science is something that you can’t just learn part-time. It requires years of education, you know, some quantitative education, not necessarily data science education. For example, you may be someone who studied theoretical physics and that kind of person deals with a lot of data and would make a terrific data scientist. So, we look for relevant education and we also look for relevant experience. You know, in the application it’s very good to talk about the kind of use cases you may have worked on. So, the tools are not as important as the actual ability to work with large amounts of data or to think analytically.
Kirill: Okay, gotcha. And speaking of education, you guys have your own educational platform and I’m proud to say that I have a course published on Experfy, so that was a very interesting start to our relationship and I’m very excited about that. I can see people who are taking this course and are excited to learn data science. So, with that, tell us a bit more about your educational platform. How many courses do you have? Who is it tailored towards and what are the volumes of students coming through right now?
Harpreet: I want to preface that, that your course is a terrific one and it’s really something that people are taking quite a bit and we see a lot of enrolments and people are really benefitting from that Tableau course on visualization.
Kirill: Thank you.
Harpreet: Maybe I can take a step back and tell you the genesis of this platform and how it began. You know, we started as a consulting marketplace, and we’ve been talking about that briefly, but while we were providing this consulting, we noticed that a lot of companies were coming to us and posting projects related to training.
For example, University of California Davis came in and posted a project that they wanted to launch a data science program and they were looking for experts. This was two years ago. And then many Fortune 500s were also struggling to find subject matter experts. For example, someone came to us and said, “I need someone who can teach supply chain optimization” or “I need someone who can teach how do you analyse certain kind of health care data.” Those kind of courses are not available anywhere, not even on the MOOCs. The MOOCs are a great place to learn for the sake of learning, to build that foundational knowledge. And they’re providing a very important function because much of the education is free and you can really learn the basics of something.
But as you want to progress into something that is more industry specific, something that requires understanding of a domain and the use cases within that, then you really have to learn from someone who is working in the trenches, someone who is actually doing that every day. And the reason for that is that these technologies are changing so rapidly that an academic cannot help you in understanding that kind of content.
So we find ourselves in a very good place because we have access to the best thought leaders in the industry, they’re on the platform consulting, and we are able to also look at which use cases are hot, which use cases are actually being requested in the consulting context. So, we can combine the thought leadership of our experts and also the project-based work we’re doing and say, “Okay, these are the projects.” For example, in the context of media and advertising or retail, there are use cases like recommender systems that every retailer wants to have. So every retailer is trying to build the recommender system that may look like a Netflix recommendations or what Amazon is doing.
We’ve executed dozens of such projects so when we think about creating a course, we are seeing where the trends are in the retail industry and we are building a retail track for retail companies so we know which courses are important even though the retail managers themselves may not know. Or the Chief Learning Officer at a large retailer is a generalist, so that Chief Learning Officer isn’t really aware what kind of courses they should be offering to their employees. They are thinking in a broad sense of, “I want to facilitate digital transformation of my company so I should look at data science, big data,” but they don’t really know what to offer.
So we can then go into our library of projects we are performing and make recommendations. And often we see ourselves co-creating these courses with our industry partners. That’s what makes us very unique. You know, we are more focused on the B2B model than B2C, so we are partnering with companies like Duracell and we’ve done some text analytics training recently for the Federal Reserve Bank of San Francisco. We’ve even had some of our experts fly into India to present a training program for the executives at Tata Teleservices, which is one of the largest telecom companies in India.
So if you’re looking for training in emerging technologies, like Internet of Things, certain types of industry analytics, then we’re a much better venue than others that exist out there because we have the courses.
Kirill: Gotcha. That’s interesting that you mentioned it because that was my next question: How Experfy actually differs to platforms out there like Udemy and Coursera and so on, that offer either free or near to free training? That’s a great answer. Like, those marketplaces have merits, they definitely have advantages and they teach you the broad spectrum of data science and the skills that you want to learn. But with Experfy it sounds like you guys are doing something completely different, where you’re going into what’s exactly happening in the industry right now in these specific use cases, and then from there you’re extracting the right knowledge, you’re finding the right instructors to create that content and offer it to your clients so that they can get upskilled in a very laser specific way in what they need.
With that, you mentioned you mostly deal with B2B clients. We have about 10% of our listeners who either own their business or are entrepreneurs, and they should definitely check out Experfy if they are looking to upskill themselves or their team in data science. But for the majority of our listeners, is there still an option for people to take these very interesting courses if they are just a client, if they’re not a business?
Harpreet: Yeah, absolutely. We are an online platform and all the courses are available online. It’s as simple as finding the course you like, or a learning path for that matter, and just clicking on the “enrol” button and enrol in that course. When we think about our go-to-market strategy as entrepreneurs or as a business, we are selling primarily to our business clients in a B2B fashion. But there is still a very large population of students who are enrolling in the courses who are just consumers.
We have, for example, the University of Alberta in Canada. They’re having their students enrol in our data science certification program, so we have a certification program which is five courses and the first course on probability and statistics using R is taught by a Harvard professor, Michael Parzen and Kaitlin Hagan. Kaitlin is at Harvard Medical School and Michael Parzen is at the Harvard University, Harvard College. He’s been teaching this content for 30 years, so it’s fantastic for folks to learn from them.
And then there’s a course on data wrangling using R, and that course is taught by Connie Brett. She was the founder of Analytics Incubation Center at Cisco. And then there’s econometrics course taught by Alan Yang, who is a professor at Columbia University. And then there are others from the industry, from Target and other major corporations who are teaching in that track.
So we are trying to develop these certification tracks or learning tracks so that you can say, “Okay, I want to become a fraud and risk analyst, a data scientist who specializes in fraud and risk or a data scientist who specializes in retail analytics,” and then we will provide a pathway to take five or six courses, or perhaps even more, that leads you to that qualification. So there’s a lot of interest in upskilling employees among companies. So we are taking this very specific approach of how do you get someone going from the basics all the way to a practitioner in a specific use case.
Kirill: Gotcha. That’s very interesting. I just wanted to comment that it’s very cool how a university outsources their main function of teaching students. They outsource it to you guys. Instead of teaching them at the University of Alberta, they send them to you to upskill them on certain topics. I imagine that’s just the way of them recognizing that some certain skills are so cutting edge that they just can’t keep up with the university curriculum.
And in terms of your comment on the certification tracks, I think that’s just fantastic. That’s not something you see often in many places. For instance, Coursera has certification tracks, but they’re like just data science. They’re very general certification tracks, like a specific skillset for data science, a certain industry whether it’s fraud analytics, or it could be predictive analytics, or certain retail or industry sector. I think that’s very valuable. And do you guys provide, upon completion of these certification tracks—a question that a lot of MOOCs get—do you provide a certificate of completion that people can show off or show to their employers and so on?
Harpreet: Yes, absolutely. We do exactly what Coursera and others may do. You’ll get a certificate of completion that’s generated by our systems and you can attach it to your LinkedIn profile, the same way you would attach other certificates. And we haven’t announced this yet, this is the first time I’m actually talking about this publicly, that we are launching an assessment platform as well. This assessment platform will focus on different types of skillset, so anyone who hasn’t even taken a course on Experfy could go and take an assessment and we will then validate this person has certain skills.
Again, our target here is more of a B2B market where companies, or the HR departments, are struggling to understand whether someone is a qualified data scientist so we are giving them a lot of tools to say, “Okay, you are hiring someone who understands R and Python in a role where they’re going to be doing insurance analytics, for example. So how do you validate that this person knows R and Python in the context of insurance analytics and also has some of the other skills that you may desire, like understanding of Hadoop and Spark and Scala?” So we are focused on building these test banks that will be incredibly useful to not only the industry, but also to individuals who can come on to Experfy and then take these assessments.
Kirill: Fantastic. I just want to preface my answer with, everybody listening to this, did you hear that? It’s the first time this information is available publicly! I am so proud that it’s been announced on this podcast. That’s the first time this has ever happened, that this podcast is being used as a source to get information out there into the world, so thank you for that, Harpreet.
Yeah, assessment platform—I can totally see where you’re coming from. It is such a needed thing. I get questions all the time, like, “Hey, I have these skills. I’ve taken these courses. I’ve done this type of work, but how do I prove to employers that I have this knowledge, that I’m ready?” And you get this from passionate people who want to make a difference in the world, but their main barrier is the fact that their skills, even though they’re very strong when you actually speak to them and they know they’re very strong, other people, employers can’t see that. And I think this assessment platform—congratulations on that—I think that’s one of the first, if not the first in the world. So I’m very excited for you guys. I’ll definitely check it out when it’s ready. It sounds like a very, very big and exciting thing.
Harpreet: Yeah, thank you.
Kirill: I have so many questions. I could keep going and talking on about Experfy for much, much longer, just drilling into what’s going on there and how you guys are doing things, but I would like to actually also talk about something else, Harpreet, about some of the very interesting case studies that you are sharing, about the successes that Experfy is having. For example, you’ve posted close to a dozen articles on LinkedIn about different successes of Experfy. I’ve had a look through them and found them very interesting and fascinating, the way you apply data science to different projects and different industries. Are you happy to talk us through a few of those?
Harpreet: Absolutely. It would be my pleasure.
Kirill: Okay, awesome. How about we start with your most recent one, the most recent one just published like a week ago, or two weeks ago? Artificial intelligence for marketing mix models in the pharmaceutical sector reducing cost and boosting sales. I’m just going to read out a couple of figures from here. The pharmaceutical industry is over $30 billion. Over $30 billion is spent on pharmaceuticals annually. This is from your article. Basically it’s all about the fact that this is a huge global industry, and therefore it provides access to lots of markets for pharmaceutical companies, but at the same time it’s highly, highly competitive and you need to have effective marketing there. Otherwise you’ll end up spending so much money on marketing instead of the actual product. And this isn’t a high margin product like with online products. This is a physical product that is tangible, that needs to be shipped, that needs to go places and that people actually need. So you can’t afford to spend too much on marketing. And therefore a lot of responsibility is on data science to optimize that. What were the challenges, opportunities, and what solutions did you guys come up with at Experfy?
Harpreet: Yeah, this is a very interesting use case. As you mentioned, $30 billion are spent on the marketing of these drugs alone. There’s additional expense like R&D and others, but we’re just talking once you’ve got a drug that’s been approved, how do you get it out the door? So, you have to influence the physicians, and you have to influence others out there to prescribe your drug—you know, the patients want to see them, you see these infomercials on television, so it’s tricky business.
So the way we’ve thought about this problem is that it’s all about having access to good data. You know, what we are after is, what are these pharma companies spending? So, once a drug is launched, a pharma company may spend over a billion dollars to market that drug, so if they can be more judicious, they can save lots of money, hundreds of millions of dollars, if they are judicious in how they’re spending, and if they are able to track the ROI, what is being effective and what is not. So it is possible today to track the sales of these drugs on a zip code level. You know, there are these providers who are capturing that data and then extrapolating it to say, “Okay, this is how much this drug sold in this week.” And then there are other ways.
You know, some drugs are renewed, so you’re looking at renewals as well, and then you’re looking at fresh prescriptions as well, and they’re all tracked as individual line items for each zip code. So if one can isolate the marketing for each of these regions and say, “Okay, I had a conference in this region,” or “I actually ran television ads and radio ads,” or even “I had Google ads or ads on WebMD,” all of that can be captured, one can then create a marketing mix model against the sales. So you can have a control group where in one adjacent zip code or a different region altogether, you don’t do certain activities.
For example, in a zip code you may have a sales rep going to a doctor and doing these lunch conferences where they’re trying to educate the doctors by doing lunch and learn sort of activities, and then in a different region altogether, you don’t do those things, and then you try to compare what exactly is the difference in terms of sales, in terms of adoption.
By creating these kinds of control groups and by looking at the data of the sales and the spend, one can then begin to model the spending. What we’ve done is we’ve been able to create machine learning models where you can say, “I’m going to spend this much money on radio, this much on television, this much on Facebook ads, and then predict how much sales that’s going to generate, that kind of a mix.” And surprisingly, these models become more and more accurate as you feed more data into them. So there’s a lot of benefit to the pharma companies as a result.
Kirill: Fantastic! That’s a very good description, and I like the term “marketing mix model.” So, guys, it sounds like that term is going to be picking up in the future, so that was a good overview of that as well. Okay, thank you for that. And now I’d like to move on to a case study that is very close to my heart. It was so cool reading this. I actually shared it around on LinkedIn last week and a lot of my students actually responded the same way. It’s called “The Internet of Things and Prognostic Analytics for Predictive Maintenance in Control Systems.”
So what this talks about is that you have huge companies—well, let’s start with the basics. We have sensors everywhere, right? For instance, an iPhone, you might think it has four or five, but it actually has close to 30 sensors. And that’s like sensors about geolocation, about the gyroscope, it’s got some sensors for audio coming in or light sensors, and so on, so close to 30 sensors. And that’s just an iPhone.
Everything around us is slowly getting covered with sensors, and when you connect sensors to other devices all around the Internet, that becomes the Internet of Things, and by 2020 we’re predicted to have—and this is from another one of your articles—we’re predicted to have about 50 billion things connected to the Internet of Things. That’s more than the number of people that we’re going to have on the planet at the time.
So this specific case study which you wrote about talks about using this inter-hyperconnectedness of things to run prognostic analytics, and that specifically means maintenance and improving efficiency of control systems in, for instance, large power plants or airlines or large machinery. And you quote some interesting numbers.
For instance, just a 1% increase in efficiency of control in airlines, and therefore prognostic analytics, can lead to a cost saving between $2 to $3 billion; in utilities, $4 to $5 billion; in oil and gas companies, $5 to $7 billion; $4 to $5 billion in health care, and $1 to $2 billion in the transport sector. And I’m assuming this is, for instance, if you have an airplane and you’re running all these analytics, you don’t have to wait for something, even for your data to show that there’s a problem. Running prognostic analytics, you can see that this performance is dropping. It’s still above average, it’s still good performance, but it’s dropping. You can see the trend in which it’s going, and therefore you can predict basically that something is going to happen and it’s going to need maintenance, and you can account for that maintenance early on. Can you walk us a bit more through this case study, please?
Harpreet: Yeah, absolutely. As you mentioned, a lot of the heavy industry machinery uses control systems. These control systems generate tons and tons of data. This has been happening for 10, 20, 30, 40 years. This is not something recent. The control systems, by definition, they are storing that data and that data then goes into some black hole and it’s never used. So there is a huge opportunity here for heavy manufacturers. For example, Siemens happens to be one of the manufacturers of control systems. This is a very highly fragmented market. Siemens probably has 10%-12% of the market share, so there are many others like that.
So if somehow we can take the data from these control systems, the data that’s being generated as the machine works, if we can take that and build some streaming pipelines into the Cloud, whether they go to AWS or somewhere else, maybe even a private cloud if people are not happy with a public cloud, then we can look at this data for anomalies. We can start analysing this data for preventive maintenance and for other things.
As you pointed out in these numbers, how much can you save if you just improved efficiency by just 1%, right? I mean, these numbers are staggering. And the way to think about this is, if you are in a power plant and your machine fails, someone from Siemens has to get on a plane from a different city, bring that part to your plant, and replace that part. So that is all cost, someone had to rush over there to do this job.
But if we start doing prognostic analytics—and I want to differentiate prognostic analytics from predictive analytics in a sense that predictive analytics tells us that something is going to fail, you know, that “I’m going to predict that this part is going to fail some time in the near future,” whereas prognostic analytics tells us that something is going to fail in the next two weeks or in the next ten days. So there is almost a time dimension to prognostic analytics that isn’t so accentuated in predictive analytics.
And how many times has it happened where we’re trying to take a flight and something goes wrong with the aircraft and then we’re sitting there until someone comes and changes that part or fixes that issue? So, all of that, again, can be avoided if we are making use of the data that the aircraft has been collecting, but no one is actually making use of that today.
So somehow, if we can start building these streaming pipelines, and if we can start taking the data and start building preventive maintenance use cases, it can be a huge saving to everyone. Obviously, as passengers in the airline context, airlines may pass that onto us and lower airfares. So I think there is a value chain here that gets impacted as we start to do more of this sort of analytics.
Kirill: Thank you for that. That’s a great overview. I was actually after that definition or distinguishing terminology from you about prognostic versus predictive, and that’s a very good description, that prognostic actually has a time dimension to it. Alright, that was awesome. I hope people are picking up some value from these.
And we’re moving on to case study number three: using big data to prevent health insurance fraud. Very interesting space. And as we learned from one of our earlier podcasts, I think it was podcast #5 with Dmitry Korneev, fraud is actually a huge industry. You don’t hear about data science and analytics in fraud that much, it’s not a huge focus, but especially in the U.S., where the legal system is such that a lot of companies are unfortunately in a lot of lawsuits with other companies, the space of fraud analytics is huge, specifically here—we’re talking about health care.
Some numbers that you’ve mentioned is that the National Health Care Antifraud Association estimates that the country has fraud costs of $68 billion annually. That’s 3% of the whole health care spending, which is about $2.26 trillion. Some people will be interested to know I was actually very surprised to know that the health care industry is so large. $2.26 trillion! That’s 18% of the GDP of the U.S.A. It’s a huge number. So, please, tell us a bit more about fraud analytics in the health insurance space.
Harpreet: Again, this is a very valuable use case, fraud analytics, when it comes to health insurance fraud. The challenge that most insurance companies are facing is that the laws of the U.S. are such that if someone were to submit a medical billing claim to a health insurer, they have no choice but to pay it within a certain time duration. You know, it’s like two days or three days, and if the claim is not paid, then the insurer is liable and they can be fined.
For that reason, the claims are paid like clockwork. As they come in, they’re paid. So one has to get to a point where you can start predicting fraud in real time for this to be valuable. So, you know, there are a number of ways in which this can be done, the data that is being gathered. Unfortunately, today the way a lot of these claims are paid is through paperwork. It’s a paper intensive activity. So, the first challenge is how do you—
Kirill: —convert that to digital.
Harpreet: Exactly, so the digitization. A lot of progress has been made in recent years, and I’m sure we will eventually get there. And then the second question becomes—once you’ve got that, then how do you start modelling for fraud and what are the characteristics of fraud that you’re looking at? And as you start developing—here, one thing that we’ve learned through our consulting practice is that the better training data you have for a specific use case, the better algorithm you are going to build.
So, because there is such a high volume of fraud, and because this is such a big market, it is certainly possible to create these training datasets that are very helpful. And then you can do feature engineering and you can then start looking at which features are the most useful. You know, the features may differ if I’m trying to prevent fraud for dental insurance versus health insurance. We’re currently working on a very exciting project to detect fraud in the life insurance sector, and that’s even more challenging.
But it’s certainly doable because you don’t have to predict everything 100%. You can say that if I can predict with 70% confidence that this is fraud, then at least someone can take a look and say, “Let me take these additional three steps to find out what happened, or request more information on this particular claim.” That’s the opportunity here, that we don’t have to build models that are 100% accurate. We can still build models that are useful and then there is some human intervention to get more information before a claim is paid out.
Kirill: Okay. That is definitely going to be useful. Again, it’s such a huge industry. It’s just mind-blowing that $68 billion—whoever solves that problem, that’s a multibillion dollar analytics company waiting to be created right there. So thank you again for that overview. And I’m just looking at the number of different case studies that you have so kindly shared with everybody. To be honest, I’m getting torn apart. We’ve done three, and we definitely have time for at least one more. What I would like to suggest is, if you could, could you choose the best one? What would you like to talk about? What is, in your view, one of the most successful breakthroughs that you guys have had at Experfy, and if you can share that with us?
Harpreet: Yes, I mean, there are a lot of very exciting things we are doing in the IoT space and that doesn’t get talked about enough. We had a very interesting project that we embarked on with Gulf Oil, which has their gas stations. This was Gulf Oil out of Mexico, their franchise there, and they had a wonderful idea of how do you differentiate yourself from other similar businesses. One way is that, if you are a full-service gas station, then you have to add more value. How do you do that? We started with that question.
The way we work on it at Experfy is that generally, when there’s a big question, we start with a road map of some kind of a visioning exercise. So someone who’s done this sort of thing before will sit down with the client and see what does the road map look like, and what does the ROI look like once we are done with that road map.
We thought it was a huge customer analytics opportunity that if you could somehow, using IoT, identify who the customer is as they drive into the gas station—and there are a number of ways of doing that—you can use computer vision or image analysis to look at the license plate of the car. Or you can install beacons in these gas stations, and in your mobile app, or the Gulf app, you would have the identity of the person who’s just driven into the gas station. And now you can say, “Oh, by the way, the gas price is $3 a gallon, but because you’re such a loyal customer, because you’ve been here twice already this week, we’re going to lower the price for you to $2.75 a gallon.
And then you could say, “By the way, this person also buys coffee from the convenience store every time so they can be given that while they’re in their car,” because you already have the pattern of spending. Similarly, in economies like Mexico where this experiment is going on, there is this need for prepaid cards and things like—if you want to send a package through courier, often the gas stations end up being the location where the courier services are also installed. So a lot of these value added services like prepaid cards and other things can be added. You know, folks don’t have printers in their homes, so you could even have a way to print things and the gas station attendant on their app can provide these value added services and bill the customer seamlessly without accepting any cash and it all happens electronically.
Those are the kinds of things that we’re doing on Experfy, and they have the potential to really reimagine how work gets done in these industries that are so boring and they haven’t changed in a hundred years. And thanks to IoT and analytics, we are going to start seeing a shift where new models of doing business emerge. We are very excited to be an enabler in this space.
Kirill: Wow! That’s fantastic! That’s such an interesting case study of personalizing services through data science and not just data science, but machine learning, deep learning, you mentioned computer vision, image recognition, facial or number plate recognition. That is the full suite of analytics at play. So, thank you so much for that. These case studies are so useful because they broaden people’s horizons on what can be done with analytics, on how much power analytics has, and data science has, and machine learning has, and how it’s becoming more and more embedded into all of these different industries.
Thank you so much for sharing that. I’ve got a couple of questions leading towards the end of this podcast. First one I’d like to ask you is what would you say is the secret sauce for being a data scientist? I don’t usually ask this question, but you have seen so many data scientists come in to Experfy, so many people looking for data science skills, and you’ve educated so many data scientists. You’ve influenced so many data scientists. What would you say is the secret to becoming successful in data science?
Harpreet: I guess the secret is to be someone who is able to ask a lot of questions, form a lot of hypotheses, not start with one particular solution or approach. The way I look at it, data science is really about asking many hypotheses and then validating or invalidating those hypotheses. And then you come to some kernel of truth that can then be helpful in that business. I guess the best data scientists that I know are the ones that are not married to one approach, that are always looking for answers to a broad range of questions that apply to a particular problem.
And the second thing I would say is that domain expertise is really important. If you’re a data scientist, it’s not a good idea to be a jack of all trades. It’s much better to embrace one industry and develop a fair amount of domain expertise in that industry so that you can have a greater impact in that industry. I think those are the two things that come to mind.
Kirill: Fantastic. Thank you so much. That’s very good advice. So, make sure you’re asking the right questions and you’re open-minded to all of the things that are coming your way, and pick an industry and start to specialize to build that influence so people know you as the best data scientist in that specific industry or space.
And the other interesting question I had as well, which I’d really be curious to get your opinion on, is from where you sit, from all the things that you see going on in the space of data science, where do you think this field is going? What should our listeners prepare for to be ready for the data science of 2020? Or the data science of 2025? What would you recommend for them?
Harpreet: This field is changing so rapidly that it would be a fool’s errand to make many predictions. But one thing is for sure. You know, there is a lot of automation going on, we have a lot of tools that are being developed, and this is going to be a very exciting space and it’s going to impact every industry. And the industries that are going to see the most change are the ones that have the best data or the richness of data, so those we will see evolving much faster than the others.
And if you are in such an industry, then I think it’s a very good idea to embrace analytics. Even if you’re not a data scientist, even if you’re a manager, understanding how one can become data driven and how processes can benefit from different types of analyses is really important. You know, making sure that the company has some kind of a data strategy to capture the right data is another important consideration because companies that are not going to do that are frankly not going to be very competitive. They probably won’t even exist in the next 5-10 years. It’s a bold sort of assumption, but if we look at how many Fortune 500 companies exist from the last century, let’s say 1950s, I would say at least 30 or 40 have disappeared. I think companies that do take data science seriously are the ones that are going to stick around.
Kirill: Yeah, I totally agree with you. That’s some very interesting advice and overview of what to expect. And you’re totally right, it’s evolving so quickly. It’s hard to make very definitive predictions, but it’s very interesting, what you said about automation and that managers should also look into data science. And I totally agree with you that there is even some predictions that out of the Fortune 500 companies, over half of them will disappear in the next decade just because of what’s happening in the space of data science, so it’s a huge disruptor as well as an enabler for companies.
Thank you so much, Harpreet, for coming on the show and sharing all your insights. How can our listeners follow you or contact you or get more access to all of these—I don’t have a better word for it—bombs of knowledge that you’re sharing? You know, you just write an article and you open up a whole new world of how data science is being applied. What’s the best way for our listeners to follow you?
Harpreet: Well, there are over 200 projects that are listed on Experfy and you can look at them in quite a bit of detail in terms of the description of these projects. So you can go to experfy.com and you can find me on Twitter @hsingh and we can connect there as well.
Kirill: Okay, beautiful. Thank you so much. Guys, definitely check those out, check out Experfy and connect with Harpreet on Twitter. And one final question I have for you today: What is your one favourite book that you can recommend for our data scientists to become better at what they do?
Harpreet: This is a tough question. I’m a voracious reader and I read a lot. One book does come to mind, thinking about the audience. There is a book by Eric Siegel called “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.” It’s a funny title, but what Siegel is doing, he is bringing to life the power of predictive analytics in the context of marketing. It’s a fascinating read, even if you’re not a specialist.
Kirill: Okay, beautiful. Thank you. We’ve already had somebody recommend that book on the podcast previously, so Eric Siegel, “The Power to Predict Who Will Click, Buy, Lie, or Die.” Once again, thank you so much, Harpreet. It has been a pleasure having you on the show to learn all of this amazing knowledge that you have to share. Thank you again.
Harpreet: Thank you, Kirill, for having me. Take care.
Kirill: So there you have it. The amount of knowledge and practical examples of data science application Harpreet shared with us today is immense. I mean, in just that one hour that we had today, we’ve covered so many different applications from marketing and pharmaceuticals to insurance fraud to Internet of Things to prognostic analytics, which I like so much. I think it’s a huge space and there’s a lot of disruption that can happen in prognostic analytics. Sensors are really dominating the world, but not that many companies are leveraging them to their full potential, so that is always going to be a space where you can add value.
And my favourite part of the podcast is perhaps what Harpreet mentioned about their upcoming assessment platform linked to Experfy. It’s definitely something that is needed in the space of data science and it’s very cool to see that they are pioneering this feature, they’re pioneering this new edition where you will be able to go to Experfy and just tell them about your skills, submit your application, perhaps pass some sort of assessment tests and get your skills verified by Experfy so then you can take it to employers, you can take it to different companies to show that you do have these data science skills. Because a lot of the time we are learning data science, we are educating ourselves, and that’s what it’s all about. It’s not about that piece of paper that you get at university. Sometimes you want to go to university and get the knowledge and go through the experience. But sometimes you just want to learn online. And having a way to verify your knowledge is going to be very, very valuable and I hope that more and more companies are going to start doing that and following Experfy’s example.
So, there we go. That was Harpreet Singh from Experfy. Definitely go check out Experfy, and if you have some free time, you want to do some freelancing work, or you just want to try yourself out in the marketplace of data science and you think you have the skills and you have what it takes, then submit an application to Experfy and become one of their data scientists in their marketplace.
Also check out the courses on Experfy, some very valuable courses. You can also find my Tableau course there and maybe other ones as well. And also make sure to follow Harpreet on Twitter so that you can get updates about his articles as well as updates about what’s going on at Experfy.
And as usual, all of the links, resources and show notes are available at www.www.superdatascience.com/37. And one more thing for today. If you are enjoying these sessions, if you like this podcast, then we would really appreciate if you could log onto iTunes and leave us a rating or review. That would really help us propel the podcast forward and bring it to more people. And on that note, thank you so much for being here, for sharing this time, for taking an hour out of your day to listen, to talk about data science with Harpreet. I can’t wait to see you next time. Until then, happy analyzing!
Show All

Share on

Related Podcasts