SDS 439: Deep Learning for Machine Vision

Podcast Guest: Deblina Bhattacharjee

January 27, 2021

Today we discussed the software tools Deblina utilizes, critical math subjects necessary for machine learning, unsupervised learning approaches, top tips for productivity, and more!

 

About Deblina Bhattacharjee
Deblina Bhattacharjee is an explorer. She is a learner. Applying things differently to solve problems, drives her everyday. In the past, she has explored evolutionary computation to model the intelligence of biological plants, which was in turn used to design automated solutions for medical imaging. Upon completing her Masters in Computer science from South Korea, she worked at the Samsung Intelligent Media Research Centre, building algorithms for early earthquake detection using computer vision. She then transitioned back to academia, where she is currently pursuing her PhD jointly in the Computer Vision Lab and the Image & Visual Representation Lab at EPFL, Switzerland. She has a passion for building algorithms, AI, large scale computer vision models and anything that can help improve others’ lives via technology. She is equally passionate about mentoring girls in technology and she has spearheaded many such campaigns in the past. When she is not immersed in research or mentoring, you can find her either playing the drums, singing at a gig, cooking, or struggling to write poetry. She continues to explore unknown things and she will forever be a student.
Overview
Four years ago, Deblina was a guest on the podcast and we’ve got a wealth of updates we went over. In 2017, she was still a student in her master’s program studying the inherent intelligence of plants and, separately, detecting white blood cells in images. She was able to combine the math from the project regarding plants with the work in her project around white blood cells. Since then, she has been working in computer vision research with Samsung.
Deblina’s focus of research has always been across multiple fields. She found herself working on a project to make the early detection of earthquakes more efficient. Her work in this project was utilizing computer vision to achieve visual magnification on waves and fluctuations in a building that are not visible to the human eye but detectable by the computer. After this work, she was directly hired into a lab at EPFL where she works on her doctoral dissertation in European art utilizing computer vision to turn them into immersive experiences.
60% of Deblina’s day is research and 40% experimentation and implementation. Her day starts, in a non-COVID world, going to the lab where she reads through papers on the topic to stay on top of trends. Then she sketches out a proposed method for implementation, after which comes coding in Python, PyTorch, and MATLAB. She ensures her math in statistics, linear algebra, and visualization are up to snuff to make sure she can communicate her results. She also spends time inspecting her failures to make sure she clocks one more way to not tackle a problem. 
When it comes to skills, Deblina splits it up by role. If you are a decision scientist, domain knowledge is important so that you can educate others on your team around what kind of data they need to be working on. But if you are a junior data scientist, then math and application are more important than domain knowledge. She recommends a solid handle on a coding language and skills in visualization and presentation for entry and mid-level data scientists. To prepare for the coming trends, Deblina recommends all of the above for entry-level jobs but for higher levels, she suggests you delve into the current state of the art and be prepared for research-based questions during interviews for specific roles by specializing your research. She also suggests paying attention to unsupervised learning, which is on the rise.
In this episode you will learn:
  • Deblina’s master’s program work [4:03]
  • Deblina’s computer vision research and Ph.D. [11:46]
  • Deblina’s drumming hobby [20:18]
  • The daily work [24:40]
  • What key skills do you need as a data scientist? [33:21]
  • How can a data scientist prepare for the future? [37:03]
  • How does Deblina tackle time management? [40:24] 
Items mentioned in this podcast:
Follow Deblina:
Follow Jon:
Episode Transcript

Podcast Transcript

Jon: 00:00

This is episode number 439 with Deblina Bhattacharjee, artificial intelligence researcher and deep learning engineer. 
Jon: 00:12
Welcome to the SuperDataScience podcast. My name is Jon Krohn, chief data scientist and best-selling author on deep learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now, let’s make the complex simple. 
Jon: 00:42
Welcome to this episode of the SuperDataScience podcast. I’m your host, Dr. Jon Krohn and it is my great pleasure to be joined today by the fascinating and brilliant Deblina Bhattacharjee. Deblina is a world-class artificial intelligence researcher who specializes in using deep learning for machine vision tasks. She has a breathtaking wealth of applied machine learning experience in industry, as well as in academia. Having developed models for detecting white blood cells and medical images, predicting the arrival of earthquakes by watching the subtle movements of skyscrapers, and better understanding the intelligence of plants. Yes, you heard me. The intelligence of plants. 
Jon: 01:24
If that hasn’t blown your mind yet, Deblina’s current PhD research is focused on developing models that will enable 2-dimensional images of landscapes, such as the landscape paintings of fine art, landscape photographs in magazines, or landscape illustrations in comic books to be converted into immersive 3D virtual reality experiences that you can jump into and explore. 
Jon: 01:49
In this episode, Deblina fills us in on what the software tools in typical work week is like for a professional AI researcher, the critical mathematical subjects and data science skills you need to master to be an outstanding machine learning practitioner, the increasing significance of unsupervised learning approaches for automatically labeling training data as datasets become exponentially larger each year. And her top tips for being a prolifically productive professional. This episode will be of interest to anyone who’s interested in learning what the state of the art is in computer vision and what it’s like to be working at the cutting edge in AI. 
Jon: 02:28
Here and there throughout the episode, Deblina also provides specific technical guidance that will be beneficial for any hands-on data scientists, particularly if you’re interested in machine learning, deep learning, or machine vision. Deblina is so amazing, I can’t wait for you to hear from her. Let’s go. 
Jon: 02:46
Deblina, welcome to the program. We’re so excited to have you back. It’s been four years. Deblina was a guest on the SuperDataScience podcast in 2017 when she was pursuing a Master’s in South Korea. We can’t wait to hear about all of the exciting updates that have happened since then. I know from already talking to you before we started recording that you are an extremely interesting person with so many diverse interests, the research that you’re doing, the commercial projects you’ve had in the data science sphere are all so exciting. I can’t wait for our audience to hear about it as well. Welcome to the show.
Deblina: 03:33
Thanks a lot, Jon, for having me over here. It’s equally exciting for me to come back to SuperDataScience podcast after four years. Back then, I was just a student, I’m still a student and I think I forever will be a student. But I have learned much more than what I had known whilst doing my Master’s in 2017 when for the first time Kirill interviewed me. Well, it’s really exciting to be back on this platform again. 
Jon: 04:03
Nice. So in 2017, you were doing a Master’s, what was the research focus? 
Deblina: 04:10
So, I graduated in 2017. When I started my Master’s, it was based on evolutionary computation, so I was working to simulate or model mathematically the intelligence mechanism of biological plants, which is- 
Jon: 04:29
Oh. What? 
Deblina: 04:30
… different. Yeah. Which is different. 
Jon: 04:34
The intelligence mechanism of biological plants? 
Deblina: 04:37
Yeah. 
Jon: 04:40
What does that mean? 
Deblina: 04:40
It’s quite different. Actually, so I had to sit in meetings with different biologists and even go through the biochemistry and how signals are transducted amongst the different plants, how they communicate with each other, what kind of sequencing is done when they have some chemicals being transported. And what happens, like the inner mechanisms. So I studied them and tried to model it mathematically. I don’t think I was so successful, but there were quite so many obstacles, which is always there ever so slightly in any research. 
Deblina: 05:16
But anyhow, I did model, come up with something, and finally, I presented I think my first paper at one of the student research competitions in ACM in Italy, back in 2016. And there the judges happened to ask me, what are the possible implications of the research? And where are you going to apply it? So, I did not have an answer that time, so I couldn’t really finish the competition at the top, but then later I had more discussions with my advisors and the other people on the team. And frankly enough, I really had no clue. 
Deblina: 05:57
And then there was this new project where I had to identify white blood cells from medical images. So I was like, why not turn the modeling from the evolutionary computational perspective into a search optimization solution to solve such problems? And so I did that and finally, that was used to solve some medical imaging problems. 
Jon: 06:24
So, let me interrupt you, I have one quick question, and then maybe a longer question. So the quick question is, just for our listeners’ sake, what is the ACM competition that you were at? What does ACM stand for? 
Deblina: 06:35
So, it’s Applied Computer Machinery, I think. ACM.
Jon: 06:38
Yeah, yeah. 
Deblina: 06:42
And it’s the top bodies of research in computer science. Yeah. 
Jon: 06:47
Definitely. 
Jon: 06:50
This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. If your New Year’s resolution is to skyrocket your data science career, then we have something super special prepared for you today. Until the end of January, January 31st, you can lock in year-long access to the SuperDataScience platform at a deep discount. You save 40% on the annual plan and pay only $166 USD, instead of the usual $276. Our streamlined user interface will enable you to move easily between learning materials and become a top data scientist, business intelligence analyst, or machine-learning engineer. 
Jon: 07:32
Inside the SuperDataScience platform, for a whole year, you’ll have unlimited access to all of our 50+ courses which together provide over 300 hours of content. And finally, you will be part of a community of talented, inspired, and driven data scientists who are keen to learn together and grow their careers together. So, don’t hold off. Visit www.superdatascience.com and save 40% on your annual membership. Available only until the end of January. Once again, the website is www.superdatascience.com. And now, let’s get back to this riveting episode. 
Jon: 08:11
And then, so let me try to summarize what you just said was that, so you were doing this research, you weren’t really aware of the practical implications, so you’re in a way, I mean not basic math because you weren’t studying topology, but you were coming up with mathematical models to try to simulate the intellectual behavior of plants. 
Deblina: 08:37
Exactly. 
Jon: 08:39
And you weren’t sure what usefulness that would have, but then for a separate project, you were trying to identify white blood cells in images and you were able to use some of that math to do that work. Wow. 
Deblina: 08:54
Exactly. Yeah. So, when it started, I really did not have an idea, as I said, but my advisor said that plants have this inherent intelligence, which if you see a plant growing, well it depends on the species of plant growing, but inherently they follow the Fibonacci series of growth when they branch out. Even the root has some way to find the water gradient within the soil. 
Deblina: 09:19
So, they have this kind of, which comes from evolution, because they have been surviving since eons, so to understand how to optimize the search grid and basically force the agent to locate certain things which are not, the scarce resources. So that’s what the plant does. So if you see, inherently it has kind of intelligence going on, so we sort of, why not model it? As I said, I stumbled upon how to start it, so there was a lot of calculus, there was all kinds of graph series. There was of course geometry based, there was topology too involved to understand-
 
Jon: 10:00
Oh. 
Deblina: 10:00
Yeah. Yeah. Yeah. So finally, I came up with an objective-based function and an algorithm which can work. Also, there was some reinforcement signal used, because of course, it’s like an agent searching through an unknown world to find its goal. So, it was something like that. And ultimately I used that to this particular leukocyte, or white blood cell detections, because the thing is, it aids doctors and diagnoses much faster if it’s automated, because manually finding them is expensive and time consuming. And also subjected to intern intro-observer availability, depending on who is examining it. 
Deblina: 10:42
Also the people doing it are very well versed and very skilled at it, but detecting white blood cells from medical images from the blood samples are very, very difficult. So I thought maybe I can use it. So I had to change it to basically a search optimization problem and apply this particular algorithm to find the solutions to them. The accuracy of it was pretty high and the publications, I think it was AAAI that I published the paper in. So, back in 2017, yeah. 17. Yeah. So, that was it. 
Jon: 11:20
That sounds amazing. And you’ve been doing computer vision research since, right? 
Deblina: 11:24
Yeah. Exactly. 
Jon: 11:25
So, if I remember correctly, it was a relatively brief stint, a stint for a couple of years at Samsung in South Korea which is where you did your Masters. And then you started a PhD, which you’re doing now. So maybe tell us a bit about, I mean, without going into intellectual property at Samsung or something, but let us know a little bit about the kind of computer vision research that you were doing at Samsung. Probably not detecting white blood cells, though maybe. And then, yeah, and then tell us about what you’re doing right now in your PhD. 
Deblina: 12:01
Okay. It’s pretty interesting, because I always had this diverse focus of how to apply a given set of solutions maybe differently to existing problems. Either that or to attack as many fields, if not a single field, and deep dive into it. So my area of research of focus of research has always been across multiple fields. So, when I finally graduated in 2017 from Kyungpook National University in South Korea, I got this job at Samsung Intelligent Media Research Center in another city in South Korea. 
Deblina: 12:44
And then, because South Korea was at that time hit by consecutive earthquakes, so there was this project by the government over there where they told their researchers who was being hired to formulate a model in architecture where you can use computation in order to solve the early earthquake detection problem. So, the moment that the secondary wave of the S-wave hits a particular building, the building starts shaking, and then a person understands the severity of the earthquake. But what we do not understand is when the P-wave hits, because it’s relatively… You cannot actually decipher when it hits a building or anything. 
Jon: 13:29
What’s the P-wave? 
Deblina: 13:30
The primary wave. P-wave is the primary wave. 
Jon: 13:37
The most destructive part of the earthquake, I guess? 
Deblina: 13:39
The most destructive part is actually the secondary wave, which is much larger in amplitude and the way how it travels within the earth is also different. The primary wave’s amplitude is much lesser, because it’s the first when it just starts. So, my work was, and actually MITC Sale was also doing the same work and my inspiration for the work, because I had to use computer vision for this, was the publications by MITC Sale to, I think it was Abe Davis, his work on visual magnification. So you basically take the amplitude and you magnify the phase of how the waves are traveling into your algorithm. 
Deblina: 14:23
And this is all captured by simple, conventional cameras, even your mobile phone can do now. So, you just capture the fluctuations in the normal vibrations of the building. And you record that and you use that particular thing and monitor the amplitude and the phase, but you have to magnify phase because those vibrations are not commonly visible to the human eye, but it might be, of course, visible to the computer vision algorithm, which you have trained it on. 
Deblina: 14:57
So, that was the overall thing that we were working on. So that was- 
Jon: 15:03
Oh. Oh. This whole time I forgot that we were even talking about computer vision, I was kind of imagining like, I was thinking of like time series analysis, maybe detecting vibrations. But you’re using computer vision to detect changes in the movement of a building and that can be used to predict when an earthquake is going to hit. Wow. 
Deblina: 15:27
Yeah. 
Jon: 15:28
That is super cool. 
Deblina: 15:30
Yeah, it was very interesting for me to work on too. So, once I finished that project, there was a personal stuff for which I had to quit and come back to India where I am originally from. So I went back to India and then within a month, I received this offer letter from EPFL at Switzerland where I got directly hired into a lab. So, and then I- 
Jon: 15:56
Nice. EPFL, it’s a top Swiss university. I know it. It’s in Lausanne, right? 
Deblina: 16:01
Yeah. It’s in Lausanne. So, under the advice of my… So I’m jointly in two labs, one is the Image and Visual Representation Lab and the other one is the Computer Vision Lap at EPFL. So, I’m working on a particular landscape computer vision project right now. It’s also the crux of my doctoral dissertation. So I was very much interested because it’s European Arts. What better place to study art than in Europe? 
Deblina: 16:35
So I got this project where I had to bridge the gap between art, [inaudible 00:16:42] humanities, and computer vision. So I was- 
Jon: 16:46
So when you say landscapes, you’re talking about fine art? You’re talking about like painted landscapes? 
Deblina: 16:51
Yes. I’m talking about that. And not only landscapes, even historical periodicals from the yesteryear, starting from the end of the 19th century till to the beginning of this century. Comics also. Magazines. The artist had, from that time, there are so many comics, European comics. They have different artistic styles, so I’m working to- 
Jon: 17:13
Like Tintin? 
Deblina: 17:15
Oh yeah. For sure. 
Jon: 17:16
Really? 
Deblina: 17:16
Oh, yeah. They call Tintin over here, it’s a French-speaking cartoon. 
Jon: 17:22
Right, right, of course. 
Deblina: 17:24
Tintin. So to reconfigure all those things into digital platform in order to maybe form an augmented reality, animated, and to basically do scene understanding out of all these things. So that’s the- 
Jon: 17:38
What? 
Deblina: 17:41
… proper focus of the project we are working on. 
Jon: 17:42
Wait. Wait. Wait. The idea of the project is to use computer vision to take landscapes from any source, it could be fine arts, it could be comics, it could be from magazines, and turn that into an immersive experience? 
Deblina: 17:58
Yeah. The end goal would be to turn it into an immersive experience, but right now, because it’s still at its very inception, the project, and we’re taking baby steps so we are just working with saliency and depth estimation style transfer to get it going. So ultimately, using all these different methods, we’ll try to understand the scene and retarget the images from these particular comics or period articles and magazines of the yesteryear to the digital platforms. It might be a tablet, a television, or even your mobile phone. And then- 
Jon: 18:32
And maybe VR goggles. 
Deblina: 18:34
And then, and then we will [crosstalk 00:18:37] animation. Yeah. So VR goggles for sure, later. 
Jon: 18:46
Wow. 
Deblina: 18:46
But that depends on [crosstalk 00:18:47]. So… 
Jon: 18:49
Wow. So, in a few years maybe, or maybe a little longer, we’ll see what happens, but theoretically we could have kids who pick up a Tintin, Tintin, comic book and then they could also put on VR goggles and go into the scene that the hero of the comic book is in? 
Deblina: 19:09
Oh, for sure. It’s a very ambitious project, but I guess it would be something really interesting, because I think part of entertainment, a new avenue of entertainment which was not so exposed. We are trying to explore that, but let’s see how it takes [inaudible 00:19:27]. 
Jon: 19:27
That’s amazing. I mean, so I started off this podcast by saying, there’s so many interesting things about you and I can’t wait to get to them, but we haven’t even gotten to any of those things yet. Everything that you’ve said is completely new to me. Oh my goodness. All right. So that’s one of the labs you work in on your PhD. What’s the second lab you work in? 
Deblina: 19:47
So Image and Visual Representation Lab and Computer Vision Lab. These two labs have hired me jointly to work in this project. 
Jon: 19:56
Oh, I see. I see. 
Deblina: 19:56
Yeah, yeah. So I’m just a joint member of these two labs. Yeah. 
Jon: 20:01
I understand. Okay. Well, that was an amazing intro to what you do professionally. But I also know you have a very interesting hobby that I would love everyone to know about, and I’m hopeful that we can somehow find a link so that people can enjoy this from home. But I discovered while we were chatting before recording that you are a drummer in progressive rock bands. So, tell us about that and tell us what kinds of artists you love to cover. They’re some of my favorite bands, it’s my favorite genre of music and so I am so excited that this is something that you do. 
Deblina: 20:38
So, prog-rock, I got into it when I was in high school. No, rather I would say middle school, because there was this drummer who had to change schools and so there was no replacement. So that was a Thursday, so the music teacher was like, “Who is ready to drum?” And I think I was the only girl in that particular team so I was like, “Why not? Let’s go to try.” So it didn’t work out for the first two hours as I was trying to pick up the beat. But I just took a break and I was really trying hard and somehow, I don’t know how, I could sing the beat and I got selected during the audition. 
Deblina: 21:15
But [inaudible 00:21:15] to what my music teacher gave me the extra time which was required to practice otherwise I wouldn’t have been a drummer. And I’m nowhere near a proper drummer, a professional drummer, I just do it out of love for drumming. So- 
Jon: 21:28
I like that you say that, but prog-rock has some of the most complex drum rhythms possible. So when you say something like that, I have this feeling that you’re an amazing drummer, but it’s just that you’re like, “Well, I’m not actually a professional prog-rock drummer.” 
Deblina: 21:45
I mean, prog-rock has such difficult beats and the synchronization and the way it changes, it’s really difficult to actually follow it through. So I try to improv sometimes, but it really isn’t improv, it’s like literally escaping the touch parts of it and replacing it with something that you can do. 
Jon: 22:05
Right. 
Deblina: 22:05
So, yeah. So from high school actually, my friends of the band that I was in, we used to play a lot of classics and then it was more of pop classics, alternatives, and like Beatles, ABBA, and then finally we were like, wait, prog-rock is so fun lyrics wise also, and it was really an escape route to me. So I delved into it throughout my undergrad school also. I really like listening to it. My major or favorite bands, I always liked Dream Theater and I really like the way how it progresses. Systematic Chaos is one of my favorite album. And there are so many other albums that it would just take a lot of time. And that’s another podcast I think I could do. 
Jon: 22:57
Sure. Yeah. We don’t need to go in there too long, but is Dream Theater, that’s the one with Steve Vai is the guitarist? Is that right? 
Deblina: 23:05
Ah, Petrucci. 
Jon: 23:06
Petrucci? Oh, man. Oh, just embarrassed myself. 
Deblina: 23:10
No, no, no. And Pink Floyd too, so we cover a lot of those things, beautiful, beautiful songs because all based on the guitarist that we have, we have a fantastic guitarist. So yeah, and sometimes we go to Genesis. Love that. 
Jon: 23:27
Love Genesis. 
Deblina: 23:29
Riverside is another, there’s a friend who got me introduced to this band, Riverside, that’s equally fun. I love their album and yeah, there’s so many. I really can go on and on and on about this particular thing. 
Jon: 23:44
Yeah. Speaking of complexity and originality in prog-rock drumming, Phil Collin was a drummer in Genesis for a long time, right? Before going out on his own. And he has really incredible drumming, originality. And in my head, I’m flooding, I’m imagining it in my head, but I know that if I try to make it come out of my mouth, it’s going to just be terrible. And everyone’s going to switch off the podcast. 
Deblina: 24:14
Oh no. 
Jon: 24:16
But I have in my head, the really famous Phil Collins song, ah, I’m not going to start singing it. But amazing. So, I think what you do is absolutely so fascinating to be involved in so many different kinds of mathematical modeling, today in computer vision. So, what’s a day like in your role doing your computer vision PhD in Lausanne? 
Deblina: 24:48
So, it’s heavily focused on research. So I would say 60% is research and 40% experimentation and implementation of the ideas that I come up with. So the day usually starts with going to the lab and now because of COVID, it’s work from home in my office. Yeah, so yeah, so basically you get up, you start with a particular set of papers that you might have, like state of the art that you read through and always try to keep up yourself up to the latest trends in that particular field that you’re working in because that’s very important. 
Deblina: 25:26
Things which were prevalent two years back or even last year have already become obsolete and have been beaten by the latest state of the art. So that’s important. And then sketching out a new method or your proposed method that you want to actually implement. So that goes into the research time. So it takes 60% of my work week. 
Deblina: 25:49
And then the remaining 40% is coding, which I do use in Python, PyTorch, and sometimes MATLAB is required. Yeah. When you have some MAT files, some H files, it depends how you’re saving, so file format. So, you just need to port it from one language to the other. And of course, mathematics is very important. I start with… I cannot stress how important it is. Like linear algebra, so at least when you’re working with machine learnings, linear algebra, calculus, you have to know statistics well, a bit of probability, theory of probability is important. And what else do I work with? Visualization. You should know a good way to visualize your results, because you have to communicate it with your team at the end of the day. 
Deblina: 26:42
So, if you do not know how to… You might have fantastic set of results, but if you do not know how to communicate with the rest of your team, then I don’t think… It gets lost in the translation I feel. So, that is how I go through. And also, inspecting my failure. Most often it happens like, I come up with an idea and I just lose track, why it was supposed to work. Theoretically I might have proved it, but then I also try to disprove it, but when I work on the system, it just doesn’t make sense anymore based on the domain that we’re working on. 
Deblina: 27:22
So, I just try to inspect those failure cases because when that happens, you know of one more way of how not to tackle the problem. So, I spent a lot of time into those inspections as well. Yeah. 
Jon: 27:38
I am so happy that you… So, this is going to be a bit of a shameless plug, but you’re talking about the importance of linear algebra, calculus, probability, statistics, and then also data visualization. And so, and I know that SuperDataScience does have great data visualization courses that you can get through www.superdatascience.com or Udemy, but there’s other topics, linear algebra, calculus, probably even statistics, I’ve been working through 2020 I was developing my machine learning foundation’s content, which covers exactly those topics because I also appreciate how critical those underlying subjects are to being able to do data science at a really high level. And so, did you know about this? Or it’s a complete coincidence that you were bringing this up, I imagine? 
Deblina: 28:39
I didn’t know what [crosstalk 00:28:39] are. 
Jon: 28:39
Okay, great. So, I have this course called the Machine Learning Foundation’s course and so there’s like, in GitHub, Machine Learning Foundation’s, go get my Machine Learning Foundation’s course so all the code, I’ve already created it for all of these subjects. 
Deblina: 28:53
That’s fantastic. 
Jon: 28:54
And we have live in Udemy today. We have almost all of the linear algebra content live in Udemy. In a course where SuperDataScience is my partner on getting this stuff published in Udemy and so being able to leverage the expertise that SuperDataScience has in creating the most popular machine learning courses of all time in Udemy. 
Jon: 29:18
But anyway, so it was just basically a shameless plug, but we’re speaking the same language here. 
Deblina: 29:25
I think it’s going to benefit all our listeners, so for sure. Yeah. 
Jon: 29:30
Nice. All right. So, yeah, so you end up working in Python, you end up working in MATLAB, PyTorch library you mentioned. Are there any other key libraries that you work with a lot in Python? So, maybe for data visualization, seaborn, maybe something like that? 
Deblina: 29:51
For the project that I’m right now working with, it’s basically Matplotlib and a lot of NumPy, scikit-learn . So that’s basically what I work with. 
Jon: 29:59
Classics. 
Deblina: 30:00
Yeah, classics. Sometimes Visdom to visualize what’s happening [crosstalk 00:30:05]. Yeah. Yeah. Yeah. 
Jon: 30:05
I haven’t heard of that. Wisdom, just like W-I-S-D-O-M? 
Deblina: 30:08
No. It’s I think V-I-S-D-O-M, if I’m not mistaken. 
Jon: 30:14
Ah. 
Deblina: 30:14
Yeah. 
Jon: 30:15
Got it. 
Deblina: 30:15
But I might be wrong. I rarely check the spelling when I’m coding because the [inaudible 00:30:21] takes care of that. 
Jon: 30:23
Right, right, right. Well, we’ll find it and we’ll make sure that it’s in the show notes so that listeners can click through and find that. Awesome. And then, so we know that a big chunk of your day is spend reading machine learning papers, staying up to date on the latest state of the art. Have you come across, so we’re recording right in the beginning of January 2021, and this really exciting… I’ve been completely blown away by this model that released this week which is called DALL-E by OpenAI. Have you come across this? 
Deblina: 30:55
I saw that. It was great, but I did not really read through the paper as of now, because I couldn’t find time yet. But I saw that in one of the news articles that I follow, that was great. Yeah. 
Jon: 31:09
Yeah. So it relies on GPT-3, which isn’t available to the public, so this model isn’t available to the public either, but this model takes text, free-form text that you write, and converts it into an image and it is mind-blowing. If you go to the DALL-E blog post, so it’s pronounced like Salvador Dali, but it’s spelt like the Pixar movie, the Disney Pixar movie, WALL-E about the AI robot. So it’s D-A-L-L-hyphen-E in all caps. And so viewers, if they Google OpenAI DALL-E, it’ll take you right to the blog post. 
Jon: 31:50
And in the blog post, you can, I don’t know if you’ve played around with this Deblina, you can change inputs into the model. So you can’t type anything you want, but you give a- 
Deblina: 31:59
I see. Yeah. 
Jon: 32:00
… broad range of possible inputs like, I for example, I created illustrations of an avocado with a mustache skating on ice. It automatically creates for you 25 of them and you see 25 completely different avocados with mustaches skating on ice. 
Deblina: 32:22
That’s so cool. 
Jon: 32:24
A bokchoi with headphones sipping a latte. A cucumber in a leather jacket looking in the mirror. 
Deblina: 32:33
Really cool. 
Jon: 32:33
And one of my favorites was a sleep pepperoni pizza slice. 
Deblina: 32:39
Oh. 
Jon: 32:41
Which it did an amazing job. 
Deblina: 32:43
That’s classic. 
Jon: 32:46
It creates a pizza slice, but the pizza slice looks sleepy, so… 
Deblina: 32:50
Crazy. 
Jon: 32:51
Yeah. It’s amazing. It has a face on it. I don’t know. Yeah. So it’s mind blowing to me how quickly things are changing and went off on a bit of a tangent, so I could talk about DALL-E, but… 
Deblina: 33:04
That’s really interesting. After this podcast I’m going to check that out. 
Jon: 33:09
Awesome. So, I think we’ve already touched on this by talking about Python, MATLAB, linear algebra, probability, calculus, statistics, data visualization. Are there any other key skills that you think our listeners should have to be a great data scientist, either as an academic like yourself, or maybe in industry? 
Deblina: 33:35
I would say, so it depends really. If you’re talking about data science or if you’re talking about a decision scientist. Because if you’re a decision scientist and if you’re at, say, the mid-management level of a project in an industry, then the domain knowledge is very much important. But if you’re just a data scientist at the junior level, the domain knowledge is not that important. Rather the implementation of it, the mathematical side of it is more important. 
Deblina: 34:04
For the data scientist, I would like to say, all the above things that I said before, which was linear algebra, calculus, stats, probability, particular visualization tools, communication, how to present maybe a PowerPoint presentation or a keynote, whatever you like it. And of course, coding. A good grip on a coding language and a deep learning framework. It’s kind of important. 
Deblina: 34:30
Be it for any job across industry, or even for academia. But when it comes to domain knowledge, I wouldn’t say that the stress is so much on the data scientist, because the decision scientist knows a bit more about the domain and tries to educate the data scientist about the type of data that they have to work on. 
Deblina: 34:56
So, I think an intermediate level of domain knowledge should be important for decision scientists, and a big level of domain knowledge is okay for a data scientist. That’s the only difference. Also I would say, knowing Git, like the bash commands, it really helps. And it has so many shortcuts, even during programing, implementation really helps. 
Deblina: 35:19
And it depends from industry to industry, because I remember the first job that I had, they wanted me to know C with Python and sometimes Cython. So there are a lot of different, it depends on which company you’re working in and on what project you’re working on. So, I would say, it really is a broad spectrum question, but ultimately, you should try to target these areas. Yeah. 
Jon: 35:51
That was such a perfect answer. You articulated that beautifully. 
Deblina: 35:55
Thank you. 
Jon: 35:56
The only thing that I want to circle back on and make sure I’m getting right is, so what were these key differences between a decision scientist and a data scientist? So the decision scientist was a bit more senior than a data scientist? Or you would consider a more entry-level role? 
Deblina: 36:14
Normally I would say yes, at least in the industry that I was in and even in the projects that I have worked on. I have seen the decision scientists to be at a senior level who know the project much more, depending upon the domain knowledge that they have, and they know the data much more than the researcher or the data scientist who’s going to work. So they are the people who explain the data and even the project, that break down basically the complex end goal of the project, the end problem, and do solve problems, easier problems, so that the data scientist can tackle it. And whilst doing that, they also explain the data that they will be working with. So I think that’s the difference. 
Jon: 36:54
Amazing. That’s so helpful, Deblina. I can only imagine that that’s really helpful guidance for our listeners. Another question related to this is, what do you think is coming in the coming years? So, what should a data scientist be doing to prepare for the future? Is it the same kinds of things that you mentioned? So the Machine Learning Foundational subjects that we’ve talked about, the deep learning library frameworks, or is there anything in particular that if somebody’s thinking that they’re preparing to either be a more advanced data scientist, or they just like to be looking to the future, is there anything that you think they should be focusing on? 
Deblina: 37:34
For entry-level jobs, I would say whatever I’ve mentioned should cover it. If you are just seeking a junior position, but if you are trying to go into senior-level positions for data science, or even as say, a senior research scientist, or even a research scientist [inaudible 00:37:52], then I would say, you need to delve a bit more into the current state of the art and know exactly. 
Deblina: 38:00
I you are applying for a position in computer vision, you should know a lot about computer vision because the round two of the interview will be strictly based on computer vision, and they are going to give you a research-based problem that the company’s working on and they see what kind of solutions do you have to offer? It might not be the perfect solution, but they see that you can try and how well do you think about breaking down the problem into nice sub-problems which are easy to handle and how are you able to attack that problem? And what solutions do you have to offer? So that is what a research scientist position entails in the second round of the interview that I have been subjected to in the past. 
Deblina: 38:42
And I would also say, if there’s something specific that I see this trend of everyone going towards, it would be unsupervised learning, because most of these companies, yeah, self-supervise for the matter. They still do not have a very perfect demarkation between the two terms, self-supervised and unsupervised. But ultimately, it’s like learning from a vast amount of real-world data that we have, be it images, be it just non-annotated data, because the datasets which are annotated, they are very few and annotating is very expensive and very time consuming. 
Deblina: 39:24
So, the big companies might have the resources, the finances to hire people or even outsource it and get those annotations or even from millions of customers just signing onto their platforms, but generally startups of mid-level companies, or even academia, the schools, the institutions, they do not have that kind of funding as big companies do. So we try to find out about ways of finding the inherent pattern in the data that we are supposed to work with, rather than train the model using some ground true labels. So unsupervised learning is the future. I strongly believe in that. Yeah. 
Jon: 40:05
I love it. That’s such a great answer and I agree with you 100%. All right. So obviously you’ve done a huge amount already in, I think, it’s fair to say, despite all of your accomplishments, to say you’re still in the early part of your career and you have exciting hobbies on the side. Do you have any particular productivity or prioritization or time-management secrets that you can share? 
Deblina: 40:31
Oh yeah, for sure. So all my calendars, my work calendars are synced across all my devices. It’s very important because the night before the next day, I always try to see what I’m supposed to accomplish by the end of the next day, and accordingly, I make a list so that always helps me to prioritize amongst all the different things that I do. 
Deblina: 40:54
I did not really have this habit, but the moment I started doing research and I started juggling multiple things, I had to maintain lists. So I just pull out my phone, or even a pice of paper or my whiteboard, I just list it down, whatever I have to do the next day, the goals I have to achieve. And if there is something really major to be taken care of in the coming week, I do that on the Sunday so that I know that… It’s like a reinforcement signal to my brain that, okay, this has to be done. And this much accuracy should be there for this particular task. So that’s how I do it. 
Deblina: 41:28
Also, I would say for productivity, there’s something I cannot stress enough on. So, for researcher or even for people who are in data science and [inaudible 00:41:40] who are studying, right? Or even trying to venture into this particular field, I would say, sign up for some study sessions or reading sessions online, because that’s how I increase my productivity. If I have some ideas that might not work, looks to me fine, perfectly fine, might work, but I always get second, third, fourth opinions from a diverse set of individuals, just to have that extra, I would say, super vision on my particular idea, but why it shouldn’t work. 
Deblina: 42:12
And also like the latest trends, we have reading groups to keep up to date with the state of the art, it always helps, one person presents and everyone gets to learn from that particular presentation. And then you have the brainstorming session after the presentation. This is very important, I feel, and it increases my productivity a lot, kind of smart work, I would say. 
Jon: 42:34
That sounds brilliant. Those are all such great tips. I do some of those things and then I’ve taken some notes on some of the others that I could fill in. So thank you so much, Deblina. So, I know that you’re reading a really exciting book right now and I want to hear more about it. Can you tell me about it, please? 
Deblina: 42:53
So, I started reading Other Minds by Peter Godfrey-Smith and it basically delves into the consciousness of the octopus, the sea animals, and how the consciousness has basically developed. What is consciousness? It hits all those hard-hitting questions. And the flow of the… The way how the author has articulated this entire thing, it doesn’t set into boredom, it keeps the grip, it keeps the interest. And I don’t want to give away a lot, because it will be like spoilers. It’s really- 
Jon: 43:30
Who’s the murderer? You don’t want to tell us, [inaudible 00:43:33]. Yeah. 
Deblina: 43:34
No, never. 
Jon: 43:36
You don’t want to spoil anything. 
Deblina: 43:39
Yeah. So, I do not want to spoil because it’s really, it’s like an experience, if you just take that book, it’s an amazing book. I couldn’t stress enough how amazing the book is. Please do read it. I’m really liking reading it and perhaps with your background Jon, you would love it too. Yeah. 
Jon: 43:56
Yeah. It sounds really amazing. I actually got a full PhD scholarship to study consciousness at University College London, so I had to write the proposal on everything and I was really deep into it. That ended up being not the route that I went down for my PhD, but yeah, I absolutely love consciousness and yeah, this book sounds incredible. So we will definitely make a link to that in the show notes and yeah, I am going to have to make time to read it for sure. 
Deblina: 44:23
Yeah. 
Jon: 44:23
So thank you so much, Deblina. You’ve had such amazing applications of data science to tell us about. You have had such great practical tips for data scientists, whether they’re research scientists or not. And a great book recommendation as well. So, how can our listeners contact you or follow you or find you to get some more of your insights in the future? 
Deblina: 44:50
I’m available both on LinkedIn and on Twitter. So my Twitter handle is @deblinaforAI, so we can just brainstorm ideas if someone wants to follow me, I would be happy to engage in a meaningful conversation. And on LinkedIn too, I think you would post a link, so that would be great if we could chat. Yeah. 
Jon: 45:12
Definitely. Yeah. We’ll definitely have your LinkedIn and Twitter handles in the show notes. And yeah, amazing. I absolutely loved this episode with you, Deblina. Thank you so much for making the time. And- 
Deblina: 45:26
It was an honor. Thank you. 
Jon: 45:26
Yeah. Looking forward to catching up with you again soon. 
Deblina: 45:31
Sure. Thank you so much. 
Jon: 45:37
As you could tell, at many points in that episode, I’m so wildly impressed by Deblina’s breadth and depth of machine learning know-how, productivity, and clear communication of complex concepts. I had so much fun recording this episode, was inspired and learned a lot. I hope you did too. We covered so many amazing, cutting-edge machine vision applications, the lifestyle and toolkit of a world-class AI researcher, and the importance in machine learning of a strong foundational understanding of linear algebra, calculus, probability theory, statistics, and data visualization, a cause that’s very close to my heart. 
Jon: 46:15
If you’re interested in being involved with some of Deblina’s work on yet another diverse project, this one on outer space, Deblina’s the head of the Sponsoring Vertical for the Spacecraft team in Lausanne, Switzerland that aims to send two nano-satellites to space by the end of 2023. More than 60 engineers, researchers, and scientists across Switzerland are involved in this project. The data collected from those nano-satellites will be applied to study the upper atmosphere of the earth to facilitate climate research and to study space debris, as well as exo-plants beyond our solar system. 
Jon: 46:56
Her team has received public funding of two million Euros already, with the project’s early phases, but for the next phase of platform development, deployment, and testing, she’s seeking private funding of up to two million Euros. If you’re in a position to help out, this would not only provide you with access to innovative technologies, but it would support an extraordinary environmental sustainability project as well. 
Jon: 47:21
If you want to join forces with her, you can reach out to her directly via Twitter or LinkedIn. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show and URLs to Deblina Bhattacharjee’s LinkedIn and Twitter profiles, as well as my own LinkedIn and Twitter profiles at www.superdatascience.com/439. That’s www.superdatascience.com/439. 
Jon: 47:51
If you enjoyed this episode, kindly leave a review on your favorite podcasting app or on YouTube where you can enjoy a high fidelity video version of today’s program. It sure is nice to put smiling faces to all the laughs we had today. I also encourage you to tag me in a post on LinkedIn or Twitter to let me know your thoughts on this episode, I’d love to respond to your thoughts in public. 
Jon: 48:13
All right. It’s been so, so great. Thank you for listening. Looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 
Show All

Share on

Related Podcasts