SDS 043: Solving an Optimization Problem with a Custom Built Algorithm - SuperDataScience - Big Data | Analytics Careers | Mentors | Success

SDS 043: Solving an Optimization Problem with a Custom Built Algorithm

Welcome to episode #043 of the Super Data Science Podcast. Here we go!

Today's guest is AI Researcher Deblina Bhattacharjee

Subscribe on iTunes, Stitcher Radio or TuneIn

From an early childhood encouraged in math and computing, Deblina Bhattacharjee grew up and moved abroad to work in AI Research in South Korea, where she built her own algorithm.

You will hear her share insights on AI in healthcare and smart homes, as well as the unusual inspiration for her algorithm and the library she built around it.

From her view at the forefront of academic research, you will also hear about what she sees for the future of data science and the software, skills and techniques that will be vital for the future.

Let’s get started!

In this episode you will learn:

  • Discovering Great Research Opportunities Around the World (10:20)
  • Building an Automated Healthcare Solution (12:50)
  • Applying the Fibonacci Sequence to Solve Optimization Problems (17:10)
  • What is an Optimization Problem? (24:00)
  • Vital Skills and Techniques for a Future in Data Science (36:38)
  • How SVMs Work (40:25)
  • Is Hadoop Still Relevant? (44:23)
  • Security in AI and Data Usage (50:23)

Items mentioned in this podcast:

Follow Deblina

Episode Transcript

0

Full Podcast Transcript

Expand to view full transcript

Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.

(background music plays)

Welcome, welcome, welcome to the SuperDataScience podcast. Hope you're having a great week, and today we've got a very interesting guest, Deblina Bhattacharjee. She is calling in from Seoul, which is South Korea, and she is an AI researcher working at one of the universities there, or doing her degree at one of the universities there. And a very, very interesting conversation that we had. It's all about AI, all about artificial intelligence, the different types of algorithms, different types of tools, different types of problems. So in this podcast, you will learn what an optimization problem is and the different approaches to the optimization problem. You will also learn about the important tools for a data scientist to learn now to prepare for the future of the field of data science and artificial intelligence. You'll also learn about the important techniques which are going to be valued in the near future.

And of course, Deblina will tell us about the research project that she is doing. Very interesting, it's about artificial intelligence, but it's different to neural networks. It's a different approach. It's not inspired by the human brain, it's inspired by something else. And what exactly, you'll find out inside this podcast.

And of course, we'll talk about many, many other things. So we'll talk about Hadoop, we'll talk about Strata, Scala, Spark, all the different tools, all the different applications, and you will even see how Deblina's algorithm can be and is used in health to actually help people have better and healthier lives and even sometimes save people's lives.

So there we go. That's what this podcast is all about. And can't wait for you to check out all the interesting and insightful and even cool concepts that we're going to be discussing. And without further ado, I bring to you Deblina Bhattacharjee.

(background music plays)

Welcome everybody to the SuperDataScience podcast. Today I've got a very interesting and exciting guest with us, Deblina Bhattacharjee. Deblina, how are you going, and where are you calling from today?

Deblina: Hello Kirill. Thanks for inviting me to be a part of this podcast today. I'm doing great and I'm calling from Seoul right now, in South Korea.

Kirill: Yeah, in South Korea. Wow, we've never had anybody from South Korea on this podcast. What brings you to South Korea?

Deblina: What happened was I was working on this automated intelligence project for healthcare during my Bachelors. So I just sent out the project proposal to a couple of universities for pursuing my higher education. So at that time, one of the universities -- I got offers, but then there was this particular university which was exactly working on what I wanted to work on in the future. And also, the kind of opportunities that Seoul was giving me was really enticing. So I chose Seoul and ended up as an AI Researcher over here.

Kirill: Ok, that's very cool. And we'll get to that part in a second. But just out of curiosity, do you speak Korean? How do you get by in Seoul?

Deblina: Yeah, you require Korean.

Kirill: Yeah?

Deblina: Yeah, you do require. But then my Korean is really bad. I am just getting my grip. It has been a year since I started learning Korean, so yeah.

Kirill: Do you know like kamsahamneeda?

Deblina: Oh yeah, kamsahamneeda, that's like thank you.

Kirill: Kamsahamneeda! I also know how to count, I think. [Counts in Korean.]

Deblina: Oh yeah, exactly! Wow!

Kirill: That's pretty much all. How long did it take you to learn Korean?

Deblina: So I told you I started like a year back. I know how to read and write, but then my vocabulary is like really bad. So I need to like pick up words whenever I come across people, and there is this blank look that they give me and I give them when we really can't get across what we want to say to each other. So yeah, things happen. But then I pick up a bit from here, and then I listen to people talking. I manage.

Kirill: Ok, alright, that's pretty cool. That's awesome to hear, and it's a big jump to learn a new language, to move to a country for your dream work. That's awesome. But give us a bit of background. Where did you start? You obviously didn't start in Korea. Where did you start and how did your life take you here? What events happened in your life, what did you study in high school, and just walk us through how you ended up in Korea.

Deblina: Ok, so what happened was when I was like 8, my granddad, he left me a treasure of close to 40 books on mathematics puzzles and those things on pattern analysis that we used to solve as kids. So that really drew my interest towards the field of math and science. And those books were by the famous Indian mathematician Shakuntala Devi, I don't know whether you have heard of her or not, but those books were really something and it drew me towards that field, and I used to relentlessly solve patterns and used to look for patterns around, and basically do anything that's related to numbers, which is all data. And lots of finding patterns out. So yeah, machine learning happened.

Also, the second thing that happened was at around 2003, my dad gifted me a computer. So I was taken aback by the amazing stuff and awesome, cool stuff that I can build using the computer. So I started doing my pet projects at around 14, I guess. After that, I used to take part in the National Olympiads. So one of the national cyber-olympiad in my country --

Kirill: Sorry, this is in India, right?

Deblina: In India, yeah. So I topped it.

Kirill: You topped it. Congratulations. That's awesome. 14.

Deblina: Yeah. Thank you so much. Yeah, it's been a great journey since then, and at 14, after that, I believed that maybe I could code and take this up as a career. And the Bachelors happened, and thereafter my Masters.

Kirill: Nice.

Deblina: In Machine Learning and AI. Yeah.

Kirill: That's awesome. And so what languages did you start to code in when you were 14?

Deblina: Ok so the first language I started to code in was C, which is --

Kirill: Yeah, me too! That was my favourite!

Deblina: Yeah, exactly! After C, Java, and I used to do C and Java with the advent of the standard template library. And C++, I started coding with C++.

Kirill: Okay, beautiful. But your Bachelors, did you also do that in C, C++ or did you move on to other languages?

Deblina: During my Bachelors, it was really diverse, because depending on what I’m building, what class I’m taking, I used to switch between languages because again it was like a course requirement. So it ranged from everything —sometimes I was doing C, C++, C#, sometime just using the platform of Visual Studio exploring everything to F#. And then I got into Python and R, I think in my junior year. That’s the third year in my Bachelors. After that, all these database related languages, too, SQL-related languages and Hadoop. Yeah, I used to do all of them.

Kirill: Okay. And in your Bachelors, you said you studied machine learning. Is that correct?

Deblina: Okay, so for Bachelors I didn’t have a specialty because in India you need to study approximately 42 courses. You have to do all of them, but at the end, in my senior year, you have these electives. So, during that time, I went through whatever can be the possible choices which is related to data crunching and applying algorithms or models to solve them. Machine learning was the best thing which was coming close to it. And not only machine learning, I was always interested in building intelligence systems. So I wanted to do something really cool in artificial intelligence, so I took that up and thought of doing my Masters.

Kirill: Okay. And just before the podcast, you were telling me about how you scored this opportunity in Seoul and I think that can be very useful to some of our students, or some of our listeners who are still learning and maybe want to pursue a Masters. Tell us a bit more. How did you go about finding this great opportunity for yourself in Seoul? Did it just fall on you?

Deblina: No, what happened was I used to always look up — I was always a part of the communities online which are related to machine learning and data science. I’m strictly speaking about communities like “Data Science Central” and different kind of opportunities on academic fronts, all those postings. So I came across every possible lab, because for a Masters, what you need to do is you need to not only file an application to the university, but also send a separate application to your supervisor, under whom you will be working, and also to a different lab with respect to your specialty.

Because of that, I screened across 200-300 opportunities and finally, I struck gold at the 250th one, I don’t really remember. (Laughs) I just saw — this lab, the work ranges from designing intelligent traffic systems to modelling smart cities, building intelligent health care solutions, everything which is related to variable sensors, machine to machine communication, and Internet of Things basically.

This was what I wanted to do because this is a very high level overview of the names that I just said. But internally what we do is real-time big data analytics and along with that, building algorithms or even tuning our models to build these systems up. This was all I wanted to do and this lab was a perfect fit and there are so many opportunities in Seoul and the best part is that this country has amazing technological advancement. Coming from India, I didn’t really know, and I don’t even think most of the people around the world know, what this country has to offer. Everything is really well-organized. Only language is a bit of a barrier, but everything else is super fine.

Kirill: Okay. Wow, that’s really cool. Why were you always so interested in artificial intelligence?

Deblina: As I said, I loved pattern analysis. And after that what happened was, when I was building my project which I started off at around the time I was doing my Bachelors, it was something like an automated health care solution. And I used to see this everywhere. Something like fever, or any kind of first-time diagnosis that a person wants, and he is not having a doctor around or doesn’t have the luxury of visiting a doctor. So for such people I wanted to create a really affordable or, if possible, free automated health care solution which can be like your on-call doctor and you can use that platform and type in whatever is your problem and get the first-hand diagnosis.
There have been such projects by Microsoft and the likes out in the world. But I also wanted to form a recommendation engine for nearby doctors, just in case of emergency. So I made that and then I thought, “Okay, now that I can do that, why not AI and contribute further to this field?” So that’s how AI happened.

Kirill: Wow, that’s really cool. Tell us then about what are you doing now in your research. You mentioned that you’re about to graduate very soon, right?

Deblina: Yeah, just two months away.

Kirill: Oh, wow! Congratulations. It must have been a very long journey.

Deblina: Yeah.

Kirill: All right. So, what are you doing in your research?

Deblina: As I said, in my lab we work from modelling all these intelligence systems and basically designing this smart city concept which is right now happening around the world. My work specifically, if you ask us to, is to evaluate the different models and techniques in machine learning and apply them to solve these problems. These problems range from sometimes just automating a traffic system or the health care, but the thing is we need to integrate this all together and make it an end-to-end system, the vision is a city of future and totally smart.

So my work is to make sense out of the data that we receive in real time, which is huge, and also design solutions, sometimes mathematical models for solving these problems. So first, what I do is I evaluate whatever existing techniques are and sometimes I come up with my own models. Recently I developed an entire algorithm from scratch and its inspiration is quite interesting. If you might ask, I would tell more about it.

Kirill: Yeah. Tell us more about it. So you basically built a whole library, is that right?

Deblina: Yeah, I built a whole library. First of all, the entire design because it wasn’t there. So I designed the model because before building a library, I need to make sense of it mathematically so that they completely understand.

Kirill: Yeah. What language was this library in?

Deblina: This was basically in C because the kernel of any language is always C, the lowest kernel that it’s built on. So in order to understand properly, I always start with C. So the thing that happened was — if you look around and you look at the way how trees branch themselves out, you would see that there’s a pattern in how they branch themselves out.

Kirill: Yeah. I guess it depends on the type of tree, but yeah.

Deblina: Actually, not even the type of tree. You can even look at a cactus, any species of tree in that entire kingdom, the plant kingdom. You can see there’s a pattern of branching. That pattern is basically the Fibonacci series, which is 1-1-2-3-5-8-13. What happens is, the ratio of the two numbers, if you would divide it, it becomes a golden ratio which is prevalent everywhere, like even how our galaxy spirals out. So from there, I thought “Wait a minute. These trees do not have a brain, so to speak, they do not look intelligent. So how do they know exactly in which direction, angle to grow in in such dynamic environments, something that they don’t know?” The environment can be really non-stationary. I’m using really technical terms—

Kirill: Yeah. Basically, how do the trees know what the Fibonacci ratio is?

Deblina: Yeah, exactly. And somehow they just gain that overall stability. Even if they’re slanted above the ground, they still have the stability. So I decided to dig in about their mechanism and what works, and then something blew my mind. I didn’t know this, but they can communicate, see, hear, and even have a memory of 40 days. It’s all there, the biologists have researched it. The best thing is that they can learn and they have 13 different discovered sensors and we humans just have 5. So, the kind of sophistication that these trees have is just mind-blowing. I just decided to model their intelligence and design an algorithm based on this. So it’s just strictly nature-based, like any other nature-based algorithms of soft computing. So I built this algorithm and I made it to solve optimization problems in the applications that we work in our lab.

Kirill: Okay. So how well is it doing? Is it beating other algorithms?

Deblina: Yes, yes, perfectly. With respect to accuracy, it’s definitely beating other algorithms but not so much with respect to time. We already have better algorithms because the speed of any such nature-inspired algorithms is hindered a bit because it has an enormous number of parameters on which such algorithms are based. So, the parameter tuning is required, and that takes a bit of time.

Kirill: Okay. All right, just for everybody out there, I just wanted to say, in terms of Fibonacci numbers, I have already heard about them, but if you haven’t, what Deblina is describing is very interesting, they are all over the world indeed. And the golden ratio, if you divide those numbers, it’s basically 1, 1, and then you just keep adding. So 1+1 is 2, 2+1 is 3, 3+2 is 5, 5+3 is 8, 8+5 is 13 and so on. And if you divide one number by the other and you take the limit of that, it will be 1.1618 something. Basically, that number 1.1618 is called a golden ratio. You can see it all over the world.

Basically, right now, if you pause this podcast and you take a ruler and you measure the distance between the tip of your middle finger to your wrist, so your hand, and then you measure the distance between your wrist and your elbow, I think it is, you will find that the ratio between them is exactly 1.1618. How crazy is that? Even us humans, we are designed by that ratio. And the fact that trees grow based on that ratio is no coincidence. It’s just anything that is natural, like starfish grow in 1.1618, galaxies spiral in 1.1618.

There’s lots of debate about which galaxies spiral like that and which don’t but nevertheless, you can see it all over the world. It’s a real mystery, but I’m not surprised when Deblina says that an algorithm that is based on the golden ratio can outperform others just because it takes into account something that is so fundamental and all around us. Yeah, that’s pretty interesting. But when you developed this algorithm, and you’re saying you’ve come up with some applications, can you talk us a bit more through the applications or possible applications of this library that you’ve written?

Deblina: Okay, so what I’ve done, it’s basically an optimization algorithm so you can solve optimization problems using this algorithm. Whenever I have presented my papers based on this algorithm, there have been a lot of curious eyes around and equally questionable minds. Some of them couldn’t really get a grip on it. I totally understand that. They just said it might be for Law School. So after the results that I presented, and there were successful demonstrations where I just selected an application like medical imaging and I processed numerous CT scan images to find the location and area of growth of tumours.

That was one application that I did and the results were phenomenal because I just got it presented at one of the top AI conferences in the United States this year and it was well received. I’ve also applied it to other applications because we do a lot of sensor data processing, so to find the optimal features from that sensor data I have used this algorithm of mine.

Kirill: Okay, that’s very interesting. And I want to slowly start getting into the more applied area of artificial intelligence and data science. To start off, can you please describe for our listeners what is an optimization problem?

Deblina: So an optimization problem is — there are two types of optimization. One is local optimization and the other is global optimization. So when you’re looking for something and you know what will be the result, the final result becomes your global optimal solution. But when you move towards that trajectory to get that final result, you get across some local best results—

Kirill: Local maximums, yeah?

Deblina: Yeah, local maximums, exactly. I’m trying to just break it down in a non-technical manner, which is a bit difficult for me.

Kirill: Thank you for that. So, you have a global maximum which you’re trying to find, but you have local maximums that are possibly going to look like the global maximum and you might think that that is the best option.

Deblina: Yeah. That’s what basically any optimization algorithm does. It finds those solutions. Again, there are different kinds; single optimization where you just have one objective like, “Okay, I need to go to the grocery and get some stuff and I need to get this product.” So that’s like one objective. Multi-objective optimization becomes like, how many objectives you are addressing. So it’s like, “I will go to the grocery store, get that product, but then it has to be of the minimum possible price.” I have two kinds of things to look onto, so that becomes multi-objective. These algorithms have sometimes conflicting objectives, like something increases and something decreases, sometimes both are increasing so you have a maximization problem, many different objectives and then based on that, the algorithms are built.

Kirill: Okay. All right. That makes sense. And then the more objectives you have — for instance, when you have one objective like “Get to the store,” then you have lots of different ways to get there. You have lots of different paths that you can take. That’s an optimization problem.

Deblina: Exactly, yeah.

Kirill: But then when you have multiple objectives, for instance, “Get to a store and buy the cheapest butter,” you have so many more. You have so many different types of butter to choose from, so many different paths to take, and then you can also go to different stores. That’s even three optimization problems, but basically, is it correct that you have to multiply all of the options? It’s not just a simple addition, it’s a multiplication of all of the options.

Deblina: It’s a multiplication of the options and then you have to form a single function of all those options together.

Kirill: Hence something called “the curse of dimensionality,” right? Like, if it takes you 0.01 seconds to solve one optimization problem, then when you have a thousand of them, it doesn’t mean it’s going to take you like 10 seconds to solve those problems. You have to multiply that. It’s going to take you like a million years to solve them altogether. That’s why it’s such a big deal in artificial intelligence that you cannot just brute force through these problems. Even given the computational power that we have now, you just cannot simply brute force through optimization problems. You have to come up with smart ways of solving them.

Deblina: Yeah, because brute force for such kind of a problem would definitely take two million years for sure. You have so many parameters to take care of.

Kirill: Yeah, exactly. And the funny thing is that, our whole lives, all of our lives, whatever we are doing in life, is an optimization problem. You have to get to school, what time do you wake up in the morning, do you pick up your kids and then you go to work or do you go to work and then pick up your kids. In what order and how do you do certain things, what routes and paths you take, that’s all an optimization problem.

And funnily enough, if we want to build artificial intelligence that can rival us in terms of intellect, it has to be able to solve optimization problems as good as we do. As humans, somehow we can solve these optimization problems. Natural selection has given us this ability and this amazing tool called the brain which allows us to solve these optimization problems. I think that’s why we’re trying to build these neural networks, these deep learning techniques, because they can mimic the human brain in the hopes that then robots will be able to do the same. Is your algorithm based on neural nets or are you taking a different approach?

Deblina: My algorithm is not based on neural nets because clearly trees do not have a brain and a central nervous system.

Kirill: That makes sense.

Deblina: Yeah. But then, I have totally worked on something that you just mentioned – natural selection. So, guided by natural selection, there’s a continuous reinforcement loop of penalty and reward and the system builds on that. You know how natural selection works, right? I do something, it’s a good thing for me, and I will continue doing that. If it’s a bad thing, I won’t do that. That’s the thing. My algorithm, the library that’s built is just that. There’s a loop, an underlying reinforcement algorithm, which guides this for natural selection. That’s how it functions. But I do understand how neural nets and all possible deep learning techniques work on because again, they are inspired from a human brain. Their inspirations are different but the basic natural selection theory is the same for both of them.

Kirill: Okay. Very, very interesting. And have you ever compared your algorithm solving an optimization problem against a neural network solving the same optimization problem?

Deblina: Yes, definitely. I have to do that because I get questioned at conferences. I have compared it to other existing algorithms. I would name a few, if you may.

Kirill: Yeah, sure.

Deblina: Okay. The particle swarm optimization, the artificial bee colony optimization, and the ant colony optimization are some of the really great optimization algorithms which are out there from soft computing field. And in the deep learning field, I have compared it to — because obviously there are training and testing involved, that’s a different approach for the deep learning field. Again, I need to subdivide and show why I’m applying it to deep learning. That time, I compared it with recurrent neural networks, and I think the last time I compared it in one of my applications it was with restricted Boltzmann machines.

One of the things that I saw, and it was quite controversial in one of my presentations, it was like recurrent neural networks and it was also having a fuzzy logic base with that neural network scheme. With increasing number of generations of run, I mean, when it was running for more than 400 generations – I’m talking about image processing – it did not cover the exact regions of interest on that image. But rather the contour that we wanted to select on an image, it got scattered all over. And this was deep learning doing it. So somehow there was some problem, but then when I did it with my algorithm, maybe because it was having a continuous feedback loop—I mean, I know deep learning has that, but this was more based on experiences. So this library took more time but it gave better results, I mean exactly the regions of interest on the image. So that’s how I compared it.

Kirill: Interesting. You’re saying that your contour was contiguous?

Deblina: Yeah.

Kirill: Okay. Very interesting. I thought the deep learning area for image recognition is convolutional neural nets?

Deblina: Yeah, it is, but then what I did was I used this particular fuzzy neural net system. Exactly, it was fuzzy convolutional neural nets.

Kirill: Fuzzy convolutional neural nets. Okay.

Deblina: Yeah, exactly. So that was the one which I compared it with. Maybe because I was working with R last night and I got confused—anyway, that was how I did it. So, the contour of interests were scattered for FCNN and not for the algorithm which we developed.

Kirill: Cool. So your algorithm is pretty up at the top there. Interesting. We might be studying that very soon. If you write a Python library, maybe.

Deblina: Yeah, sure. Maybe. (Laughs) I will do that.

Kirill: All right, cool. And once you finish your degree in two months, where do you think that will take you?
D
eblina: Right now, I really don’t know because I’m totally focusing on this graduation stuff. I’ve been writing my thesis. And after that I’m headed to Intel for two months in their R&D section for some work on Internet of Everything. That’s IOx, a new thing. And after that I will just be open to opportunities. I don’t really know, but I would definitely love to learn and just keep doing what I’m doing.

Kirill: Okay. And do you have some kind of a dream, some problem that you want to solve in the world using artificial intelligence?

Deblina: As I told you, for me, health care is one of the things that I really want to get out there and totally automate it. I’m not saying like perform surgeries and stuff, but for the first-time diagnosis, or even helping the doctors, making their work easy. So that would be great, so that the time for diagnosis is saved considerably and the accuracy of your prognosis, when you’re doing it, that will be much better. I want to do that.

As of now, I haven’t really thought long-term what I’m going to do, but it’s going to be everything related to AI. Another problem is that right now, the field of AI has a lot of capabilities like NLP, text and speech, knowledge recovery, image processing, separately, but none of the work that we have done with respect to all the work around the world happening right now. We need to integrate it and make it as one single system. That hasn’t been done as of even 2017. The day that we build an end-to-end AI system, that would be great, with all these functionalities. That would be the ultimate aim of any AI researcher.

Kirill: Okay. That’s a big undertaking, definitely. Let’s talk a bit about where the field of AI and data science is going in general. From what you’ve seen around the world and from the research you’ve done, what do you think the future is of artificial intelligence?

Deblina: As far as I know, from the opinions of the scientists and the researchers with whom I’ve met in conferences around the world, what all of us are thinking is that the field of data science is headed towards a fusion with intelligence systems to create smart cities of the future. That’s the main vision. I also strongly believe that with the on-going research with real-time big data and Internet of Everything right now, data science is going to explode in the future with a lot of stuff happening. As of today – I just read this this morning when I got up – Strata and Scala have been replaced by Hadoop already—I’m sorry, Strata and Scala have replaced Hadoop.

Kirill: (Laughs) Just a bit of a different direction.

Deblina: Yeah. And there are these DataOps tools which are being developed to help data engineers, like DevOps tools which used to be previously for all the developers. Now there are DataOps. They have been built by companies like Nexla and DataKitchen. It’s really great, how the data field is progressing. And also the automated predictive analytics, which is the thing which is happening right now. This predictive analytics had been automated last year and the data robot was created and people were like, “Okay, by 2025 everyone is going to be out of jobs.” But then it was a bit soon to say, because the data robot as of now just speeds up model development for any model that you’re building, it’s like the one-stop solution to speed up whatever you’re implementing in the industry. It has a long way to go, definitely, it’s a budding field. Both AI and data science together is going to be really powerful in the future.

Kirill: Yeah, I agree. I don’t think data scientists will be out of a job. I think that’s going to be the last industry to go.

Deblina: I really don’t feel so because as of yesterday there were more than 4 million jobs out there for data scientists.

Kirill: Wow! Everybody listening, do you hear that? 4 million!

Deblina: Yeah, anyone with skills like Python, R, SQL, Hadoop, you’re good to go. You are looking at good jobs straight down the line for 25 years, a stable career.

Kirill: Stable and explosive career. What would you say are the most important tools for data scientists?

Deblina: As I said, definitely Python, R, SQL. Spark right now because obviously it has taken over Hadoop. Basically the entire scikit-learn/numpy/TensorFlow of Python. If you can do that, that would be great. So these are some tools, and even I use that on quite a regular basis. Among the techniques, if you might ask, there are clustering regression, neural nets and decision trees. Most importantly, there are two things, which is support vector machines and ensemble learning, that you need to learn if you really want to get into data science because all the companies out there work with ensemble learning. Everything is an ensemble.

Kirill: Okay. That’s important. And why would you say SVMs are an important tool?
Deblina: Yeah, SVMs are really powerful and they work very differently than the existing clustering or regression techniques. The way how they work is really beautiful, the accuracy of the results that they get because of that mechanism. And from that accuracy, it has been applied to a lot of product designing, modelling in most of the corporate sectors that I’ve come across. So that’s why it’s viewed as an important tool in your career.
Kirill: All right, give us a five sentence breakdown of how SVMs work. What is their main advantage?

Deblina: Okay, say a set of data is there and you need to classify it into two classes. So, for example—Kirill, if you could give me two classes?

Kirill: Apples and oranges.

Deblina: Okay, so apples and oranges. Great! Your machine needs to know—

Kirill: I like to participate in your examples. I can see how you’re a great researcher.

Deblina: Okay. (Laughs) So, you have a bunch of apples/oranges combinations and your machine should classify which one is apple and which one is orange. That’s the objective. So thereafter, the next step is how will the machine know that. The model of SVM, what it does is it builds something called a hyperplane. To be non-technical, I would say that’s the margins between the two classes. So those margins, what happens is any other algorithm would find a similarity, like which is an orange for class ‘orange’ and which is an apple for class ‘apple’. But what SVM does is, among the apples, it will select which has the most similarity with an orange, just the opposite. And with the orange, it will select which has the most striking resemblance with an apple. It finds out the outliers or the mistakes in a very non-technical manner and puts that as your margins. And based on that, the remaining data is classified. That’s how SVM works.

Kirill: Yeah. It’s very counterintuitive if you’re thinking about it in terms of the other algorithms, where they look for the most apple-y apple or the most orange-y orange, and then they build their classes based on that.
Whereas here, you’re looking for the really cool orange which actually looks like an apple and really a rebel apple, which looks like an orange, and based on that you’re like, “Oh, so those are my boundaries.” And then you’re like – bam! Hyperplane in-between them and that’s it. That’s a completely different approach to classifying.

Deblina: Yeah.

Kirill: Okay. That’s cool. Another interesting thing you mentioned—just to summarize for the guys listening, tools of the future are Python, R, SQL, Spark, scikit-learn and TensorFlow, and techniques of the future are clustering, regression analysis, neural networks, support vector machines and ensemble learning among others.
And other interesting things you mentioned, and these are just from before on this show, Strata and Scala are replacing Hadoop and Spark has taken over Hadoop. Can you go into a bit more detail on that? Like, Hadoop is such a trendy buzzword, everybody wants to learn Hadoop. Does that mean that listeners on this show shouldn’t be learning Hadoop and they should be learning Strata, Scala and Spark instead?

Deblina: I wouldn’t say that, but right now—again, it depends on what the listeners want to do and what they’re looking for with their model to solve. But why I’m saying that Strata, Scala and Spark have replaced Hadoop is because right now what the researchers are doing, in all these conferences that I travelled to, I saw that Hadoop has been there for quite some time. And right around from 2007 to today, it has almost been replaced by these technologies and the companies are also looking towards these technologies, obviously for the real-time analysis of these technologies.

Right now, I don’t think that listeners should just stop learning Hadoop because even I use Hadoop on a regular basis. But I find Spark much easier, and I find it has more parts to it rather than Hadoop strictly because of the real-time processing that it can do with big data. I don’t know so much about how companies or even academic organizations are using Strata and Scala because I don’t have full knowledge of that, but I can speak for Spark for sure.

Kirill: Okay. That’s very interesting. And what would you say to somebody out there who runs a business who is using Hadoop right now? Do you think they should start considering switching to Spark, or are they fine for the next couple of years?

Deblina: That depends on what business that person is running, but definitely you should be starting to make a transition to Spark. I strongly feel so. Again, it’s not a personal opinion; it’s like speaking the minds of everyone who I’ve come across in the past year.

Kirill: Let’s say they’re running an online store, so they have a lot of OLTP/OLAP type of things. What’s the main advantage of Spark over Hadoop?

Deblina: Okay, if they’re running an online store, basically it’s more neater, it’s nicer in the way how it works with respect to handling and processing the data and also the kind of intuition it has towards modelling the data into different—if you’re looking towards classification and stuff like that, put into clusters, Spark is better. Those are certain advantages. I don’t know so much about speed and stuff because right now even I am in a jiffy, like “What should I be using? Hadoop or Spark?” Right now I’m trying my hands on both. So the moment I get to a proper thing, I will put that up on my LinkedIn profile.

Kirill: Okay, sounds good. We’ll be looking forward to that. I’ve got a few quick questions for you, rapid-fire type of questions. Are you ready?

Deblina: Okay.

Kirill: All right. What’s the biggest challenge you’ve ever had as a data scientist or machine learning expert or AI researcher?

Deblina: That would be handling unstructured data from all possible sources and giving it a proper structure. That’s very important.

Kirill: Okay. That’s a very deep challenge. I can totally appreciate that. What’s a recent win that you can share with us that you’ve had in your role, something that you’re proud of?

Deblina: It would be the completion of my recent project, the intelligent health care system of those CT scans detection that I presented in that conference of artificial intelligence in the United States.

Kirill: Do you think that can have a real world application and soon we will be using those?

Deblina: Yeah, because a project got acquired by a hospital.

Kirill: Oh, nice. Very cool.

Deblina: Yeah, it’s giving results.

Kirill: Congratulations. That’s awesome.

Deblina: Thank you.

Kirill: It reminds me of the podcast with Damian Mingle, I think it was number 13, where he came up with a machine learning algorithm to predict sepsis. It’s always very cool to see people using artificial intelligence for good.

Deblina: Yeah, I remember that. I heard that.

Kirill: Yeah, that’s awesome. Thank you. Now I have two people who are saving lives. That’s awesome. You never mentioned, what’s the name of your library that you’ve developed, if we can look it up or something later on?

Deblina: I told you it’s not on Python yet, so—

Kirill: Yeah, but even in C, did you give it a name like a codename? Cobra or something like that?

Deblina: No, because right now I haven’t put up the name. It’s still in the beta version. So once I do that, I will keep you posted.

Kirill: All right, sounds good. Next one is, what’s your one most favourite thing about being in the field of data or being a data scientist, being an AI researcher?

Deblina: I really like the power that we have to build awesome and cool stuff with data, making machines to think more like us, and it’s just the beginning. We can create a huge impact on creating a smarter tomorrow. Take, for example, Alexa and Google Home – it’s just the beginning. I really like that about our field, and also how in demand it is among the various sectors around the world.

Kirill: Yeah, totally. But on that, this is kind of my question that I really wanted to get your opinion on after we warmed up with everything else. What do you think about a lot of people saying that AI is a threat, that not only are we going to develop smart homes or smart cities and help people in health care, but actually we’re going to create super intelligence or artificial general intelligence which will have a prerogative of its own which will eventually decide that humans are not meant to be on this planet? What are your thoughts about that?

Deblina: I totally understand that because even we keep thinking that and that’s one of the issues that I discussed whenever there are meet-ups and business things. I second that. I feel, however, it can be solved by going through—okay, we need a stricter security. You know how it’s coming because just in 2017 every person’s personality can be assessed from his online data. So imagine from all the data sensor services and commodities that a single person uses, how easy it will be to know everything about a person. And I’m not talking about an end-to-end integrated AI bot. That’s too much into the future. I’m just talking about simple intelligence machines. That is equally powerful and dangerous. So what we need is a data architect or scientist with a solid information security background. He will be really indispensable. If we can build a security mechanism around it, it’s good to go, yeah.

Kirill: All right. That’s important. It’s good that you have the confidence that we’ll be safe. (Laughs)

Deblina: Yes. (Laughs) You need to stay positive. Whatever you’re doing, you need to just enjoy it and think that it’s going to work. That’s how I look at it.

Kirill: That’s true. Okay, it’s been a real pleasure having you on the show. Thank you so much for coming on.

Deblina: Thank you so much, Kirill.

Kirill: So how can our listeners follow you or maybe connect with you, maybe even ask you some questions if they’d like to learn more about your career?

Deblina: Okay, I’m on LinkedIn. I go by my name – Deblina Bhattacharjee. And you can also connect with me via Gmail. I go by [email protected], so I would be definitely open to sharing ideas, discussions, basically learn. Yeah.

Kirill: Beautiful. Thank you. And one last question I have for you today is, what is a book that you can recommend to our listeners to become better in the space of data science or artificial intelligence?

Deblina: Okay. This book would be the one with which I started off: “An Introduction to Statistical Learning” by James Witten Hastie. In my opinion, it was a great read. The book is free and it’s just so good. Also, if you might allow me, I would just recommend another book which is “Applied Predictive Modelling” which has thorough examples and explanations. So, these two books, yeah.

Kirill: Okay, beautiful. So, “An Intro to Statistical Learning” and “Applied Predictive Modelling.” Actually, I also wanted to mention or reiterate that author that impacted you at the very beginning of your journey. If anybody is interested to see how Deblina started out, her name was Shakuntala Devi, right?

Deblina: Yeah.

Kirill: Can you pronounce that for us? How do you pronounce it correctly?

Deblina: Shakuntala.

Kirill: Shakuntala Devi. Do you think they’re available in English, like those pattern recognition books?

Deblina: Yeah, all of them are in English.

Kirill: Okay. I’m really curious to check that out. You know, it’s always interesting to go back to the source, where everything started out.

Deblina: Yeah, sure.
Kirill: Yeah. Once again, thank you so much for coming on the show and sharing all this wealth of knowledge with all of our listeners.
Deblina: Thank you so much for inviting me, Kirill. It has been a pleasure.

Kirill: So there you have it. I hope you enjoyed today’s presentation. It was quite an overwhelming discussion, actually. There was lots of interesting things. You can tell that Deblina is very well-versed and very knowledgeable about all of these subjects and has a lot of experience in all these different tools and techniques. And it was just a great pleasure that she was able to share these things with us.

Perhaps for me the biggest takeaway for me from this episode was what Deblina said about her algorithm and how it’s structured. I always thought that neural networks are the most powerful thing and they’re the endgame for humanity in terms of artificial intelligence, but in reality it actually turns out that it’s not. It’s very interesting that such a forward-looking researcher like Deblina has chosen a different approach, an approach inspired—other than by human consciousness and the human mind, Deblina chose an approach inspired by the kingdom of plants and the natural selection that has been happening there.

Based on some of the tests that she’s done, her algorithm is performing at least as good as the existing ones out there. Basically, it shows that there are lots of avenues for artificial intelligence, not just neural networks, and also kind of underlines how broad this field is and how many opportunities there are and how interesting it can be. So pretty much, as long as you have the passion and have the drive to learn the programming skills that you need, the world is your oyster. You can come up with any type of inspiration and code that and see how that goes.
So there we go. That was our podcast on artificial intelligence. I hope you got some very valuable takeaways. If anything, now you know which tools to focus on and which techniques to study to prepare for the world of the future. You can find all of the resources mentioned on this podcast including ways to connect with Deblina at www.superdatascience.com/43. Also there, you can get the transcript for this episode. And definitely make sure to connect with Deblina on LinkedIn and follow her career.

And by the way, I just had a look at Shakuntala Devi, and she’s considered a human computer. This is a person who can multiply 13 digits by each other within like several seconds. We’re going to include that in the show notes as well. I think that could be a very interesting thing to have a look at as well. And on that note, thank you so much for being here. I really appreciate you and I can’t wait to see you next time. Until then, happy analyzing.

Kirill Eremenko
Kirill Eremenko

I’m a Data Scientist and Entrepreneur. I also teach Data Science Online and host the SDS podcast where I interview some of the most inspiring Data Scientists from all around the world. I am passionate about bringing Data Science and Analytics to the world!

What are you waiting for?

EMPOWER YOUR CAREER WITH SUPERDATASCIENCE

CLAIM YOUR TRIAL MEMBERSHIP NOW
as seen on: