Kirill: This is episode number 9, with neuroscience PhD turned data analyst Muhsin Karim.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week, we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Welcome to this episode of the SuperDataScience podcast. Today I’ve got a special guest, a student of mine, Muhsin Karim. And Muhsin actually reached out to me on Udemy just to say thank you for the courses and I asked him for his LinkedIn. Then when I looked at it, I was so fascinated. Muhsin actually has a PhD in neuroscience and he has quite some experience in academia. And then he went down that path that we have witnessed so many times on this podcast when people from academia turn into data analysts and data scientists.
So in this podcast you’ll find out what drove Muhsin to change his profession to move away from academia and move into data analytics, what skills he leverages from his neuroscience background and his data processing that he was doing there in his current day to day role. Also, it was really interesting how we had a conversation about where the world’s going in terms of data, what machines are coming up to, how the human brain is going to be modelled by machines in the upcoming years. We talked about Moore’s Law in quite some depth, so if you’re not familiar with Moore’s Law, this is a good episode to pick up on Moore’s Law that currently governs how computers are developing, and that exponential curve in technology that is all dictated by Moore’s Law at the moment.
Also in this episode, we go into quite a lot of depth about how to get into the space of machine learning. So Muhsin has quite a lot of experience with data from his PhD, so he uses R programming pretty much on a day to day basis and he has expertise around some statistical methodologies, and his next venture in data science and in analytics is to get into machine learning. And he’s taking actually one of our SuperDataScience courses on machine learning. And he’ll share some tips and advice on how you can get into the space of machine learning. How you can tackle this huge broad challenging field which is machine learning. How you can dip your toes in the water, how you can get into there slowly. He’ll give you some of his experiences of how he’s gone about it and how he’s gotten into this field slowly making his way and expanding his skills. And also we discuss why it’s necessary, why machine learning is such an important field to get into now. This is the time to get into machine learning.
And also in this podcast, we talked about quite a lot of books. So in total, there are 7 books that we mentioned in this podcast. I brought up 3 and Muhsin recommended 4. So if you’re interested to fill up your library with data science and related books, then this episode is definitely for you.
Can’t wait to get started, can’t wait for you to hear all the insights. And without any further ado, I bring to you Muhsin Karim.
(background music plays)
Welcome everybody to the SuperDataScience podcast. Today I have an exciting guest. Muhsin, who is one of my students on Udemy! Welcome to the show, Muhsin. Thank you for taking the time.
Muhsin: Thank you for having me.
Kirill: I had a look through your LinkedIn. Very impressive, and we’ll get to that in a second. But first of all, could you share with me and with the listeners, how did you find out about my courses on machine learning and data science?
Muhsin: It was actually through a colleague of mine. So I work at a company called Harvey Norman, and I was chatting to this colleague. We both share an interest in data science, and he happened to mention your course, “Machine Learning A-Z”. And I signed up, and in a weekend, I went a third of the way through, and I was very impressed with the course.
Kirill: Fantastic. So you’re enjoying it so far?
Muhsin: I am indeed, yeah, absolutely. I like that as an R programmer, I can code up in R, but then you have a direct comparison to the Python code, which I’m currently learning at the moment as well. So it’s good to have those two side by side comparisons between the code I’m familiar with and one that I’m learning.
Kirill: Oh that’s really cool. And are you enjoying the Python one as well? Do you find it much different to the R code?
Muhsin: Not terribly different. I mean programming the concepts are similar. A few things I just need to get my head around, but otherwise I can see the similarities. So it’s good to have them both there.
Kirill: Ok, fantastic. And I remember that we had a quick chat on Udemy. Do you remember why you reached out? Why did we get in touch?
Muhsin: Oh, I just expressed my appreciation that you were a very good communicator in demystifying very complex —
Kirill: Oh that’s right, that’s right. Thank you.
Muhsin: I always appreciate when people can do that, when they can take something that’s quite difficult to convey and describe it in a palatable way. So I just said thank you.
Kirill: Oh, right. I remember that. And thank you so much for that, and it’s interesting how that transferred to you coming onto the show! Because I have a sense sometimes when people are extraordinary, I have a sense that I have to take a few more seconds, a few more minutes, to actually find out where this person’s coming from, what their background is. And so I asked you for your LinkedIn, and I didn’t make a mistake! When I saw your LinkedIn, you have a PhD in neuroscience! And as soon as I saw that, I was like, I have to bring Muhsin onto the show! So yeah, thank you for reaching out, thank you for coming onto the show. You ready to talk a bit about your background?
Muhsin: Yeah, absolutely!
Kirill: Alright. So PhD in neuroscience. I was just going to quickly recap on this. So you have a publication which is called “Behavioural and neural correlates of vibrotactile discrimination and uncertainty”. And you have some other publications as well. So to start us off, how did you get into neuroscience, and why did you choose this field?
Muhsin: Yeah, so I kind of stumbled into it. But I guess it begins with my degree. So I actually have a degree in Molecular Biology Genetics. So that’s the first science I picked up. And that degree involved a lot of wet lab based work. So it’s centrifusing test tubes and adding chemicals together and wearing white lab coats. It was very — looking like someone from a CSI crime lab!
But despite that excitement, I really didn’t like the degree! And I hated it all the way throughout. But what I found during the degree was I attended a talk. And it was about neuroscience and biochemistry. And I realised I was more fascinated with brains and behaviour rather than biochemistry and genetics.
From there, I pivoted into a PhD research project that was to examine neural cells from an animal model that had been fed different concentrations of an important early brain development nutrient. And my task was to lay neural cells in a petri dish and watch their synaptic growth. So when you put two neural cells together in a dish, if they’re healthy, they’ll start communicating with each other, and they’ll form connections, these synapses. Unfortunately, the cells kept dying. So it was a terribly demoralising experiment and I can’t emphasise enough that I really hated what I was doing! I was terribly frustrated, nothing was working, and I actually quit that PhD. And it was the best decision I made in my life, and I wish I had done it sooner.
But then I eventually found a neuroscience lab at the University of New South Wales that had received funding to take a more multi-disciplinary approach to studying human cognition. And there, my project was to examine how the human brain made optimal decisions when faced with uncertainty. So I think going to the particular tasks I was doing for my PhD —
Kirill: Actually, I read a little bit about your overview of the PhD, and I would really like to learn more. Like, without going into too much detail about the technicalities, but what were you examining and what were the differences between the experiments and what were your conclusions? I found it quite interesting, and would like for you to explain what exactly the outcome was.
Muhsin: Sure. So generally what I was examining was, we wanted to see how the human brain can make optimal decisions when faced with uncertainty. So our brains are these amazing devices, these statistical machines that, when they’re faced in the presence of uncertainty, they tend to actually make optimal choices. And we wanted to explore that further. And the way that scientists can do that is they can present people with very simplistic tasks and then use neuro-imaging techniques to start to pick apart the areas of the brain that are implicated in decision-making.
So the way I did that was with a very mind-numbingly dull task, which is I would get participants to come in a laboratory and they would place their index finger on this vibrating probe. And the task is called a vibrotactile discrimination task. And they had to discriminate between pairs of vibrations. So you get a vibration, pause, a second vibration, and you have to answer a question, was the second vibration faster? Yes/no response. And they had to do that 400 times.
Kirill: Oh wow!
Muhsin: Yeah, yeah. So the participants were paid. But it was a very boring task, but scientists have to do this, introduce very carefully controlled conditions, so we can study bigger things further up the neural pathways. So I took that task into an MRI scanner. It’s called functional MRI imaging, which allows us to infer what brain regions are involved in the task under study. So the way it works is that when you are engaged in a particular task, the neural cells in your brain that are implicated in that task will start to have local blood flow to the brain, and the FMRI can actually image the regions of the brain where there’s local blood flow. That blood flow is just an indicator of neural activity. So we were able to find out what brain regions were involved in decision-making when people were uncertain. So you can increase the uncertainty in our task by making the vibrations very similar to each other. So if you had pairs of vibrations where they were only separated by a few hertz, then participants really struggled to make that choice. And we were able to highlight brain regions that were uncertain.
Kirill: Ok, interesting. So you found the brain regions that are responsible for uncertain decisions.
Muhsin: We found ones that the literature often commonly cited. So they typically are these prefrontal cortical regions, particularly one that was called the left dorsal lateral prefrontal cortex, which it’s been years since I’ve actually said that out aloud! But yeah, it’s a region that’s involved in executive functioning and working memory. The neural imaging, although it’s very exciting, actually wasn’t the most interesting part of the PhD. It was more about the behavioural sets that I was able to discern from the task.
Kirill: So what were they?
Muhsin: It’s a little challenging to describe without drawing it, but I’ll do my best. With the vibrations that our poor subjects had to sit through, on average tended to be about 34 Hz in frequency. What I was able to show was that our participants, they weren’t really comparing two vibrations when they made a decision. They compared the second vibration with a memory of what the first vibration was. And our memories, they degrade over time. And even over a short period of time. So in my experiment, let’s say that your first vibration was 40 Hz. And that memory of 40 Hz, your brain is just trying to remember 40 Hz. But soon it starts to degrade and actually drifts down to the average. Feels more like 34 Hz. So the brain is very good at representing the environment as averages. And these kind of stereotypical pictures. In the end, it didn’t really matter what the second vibration was, people were making comparisons to this average that the brain was representing.
Kirill: That’s very interesting. That’s a whole discovery on its own, right?
Muhsin: Yeah, well people had known that this was a known effect. It’s been known for a while, and it’s called the time order effect. But what researchers weren’t doing was that when they set up this vibrotactile discrimination task, they weren’t accounting for it. So it was like you would set up a model and they had their predictors of interest, so you know, changing the frequency, or introducing noise as predictors. But they didn’t include the time order effect as a predictor.
And the time order effect, it was a significant influence. It’s something they should be accounting for in these tasks.
Kirill: Ok, yeah. Could you explain time order?
Muhsin: So the reason it’s called the time order effect is because people will bias their decisions based on the actual order of the way that the vibrations are presented. So people have a response if they received 40 Hz first and then, let’s say, 20 Hz. Compared to 20 Hz and 40. Because it’s based on that drift.
Kirill: Ok, got it.
Muhsin: So when you swap the orders, you actually come up with a different type of way of decision making. Your decisions are biased.
Kirill: That’s really interesting. And we might be getting a bit carried away here. The reason I’m so fascinated by this is actually I recently read two interesting books. One is “The Future of the Mind” by Michio Kaku, and he actually explains all these things, like how the MRI works, and how the left brain versus the right brain operates. So I’m really interested in this. And the one I’m currently reading is “Emotional Intelligence” by Daniel Goleman, and it also talks about how IQ is not the only factor that decides whether a person’s going to be successful in life. There’s also emotional intelligence.
But moving on to more of the data stuff. So all of this data that you were getting in through these experiments, I’m assuming 400 tests that you were doing, how were you storing this data, and what tools were you using to process it?
Muhsin: I was using a program called Matlab, which is very popular for engineers and it’s a statistical program. And it’s very similar to R, except it’s a commercial program. And universities use it quite a lot. So we used Matlab to essentially collect the data and also to store it. And then to process it, there were other external contributors that had created their own package called SPM, which is Statistical Parametric Mapping. And what they did was they created this package that took the neural imaging data and was able to perform the statistical analysis that would highlight the significant neural regions that were implicated in the task at hand.
Kirill: Ok. I’ve used Matlab as well before. But also in research, when I was doing my physics degree. So how do you find the differences between Matlab and R? What would you say are the advantages of either of the tools?
Muhsin: The reason I’m using R is because, for one, it’s free. And also I found that transition quite — almost — I wouldn’t say seamless, but it was quite intuitive, in that they’re very similar languages. Like these vectorised operations. So for me to go from Matlab to R, for the most part it was pretty intuitive. So although now I’ve been using R for so long, I think I certainly have a preference. Also because the community for R is so huge. The variety of packages is impressive. So if you need something like a package that will beautifully plot your data, that’s available. There’s packages that will nicely wrangle and manipulate your data. It’s been years since I’ve used Matlab, but now I’m definitely in the R camp.
Kirill: Very, very good. I agree. R has so much support around the world, it’s definitely a great tool to have in your arsenal. And our listeners might be a bit confused thinking that you are still doing your research and working in the neuroscience space. You moved on from that, and you moved on to some data-driven roles in different companies. Can you tell us a bit more about why you decided to make that move, and where exactly did you go?
Muhsin: I actually kind of moved on whilst I was doing my PhD. So when I was writing up my PhD, I was actively looking for work in industry in different sectors for data analyst roles. The reason being is that I think if you wish to stay in academia, it helps to be very passionate about a research topic, and I didn’t really find my niche in science. So I knew that the academic career path wasn’t for me. So I looked elsewhere, I looked across different sectors, government, non-profit, industry. Ultimately for me, it’s always a case of, I’m interested in human behaviour, I’d like to be exposed to different data sets that capture that behaviour, and I look for organisations that have that data and are in need of someone of my background.
Kirill: So that’s great. And then you moved on to more retail-focused roles, so now you’re a customer experience analyst at Harvey Norman, and if I’m not mistaken, Harvey Norman is a massive chain. Is it like warehouses, or what does Harvey Norman do again?
Muhsin: Harvey Norman is a large Australian retailer. So we’re a retailer of furniture, bedding, computers, communications, and electrical products.
Kirill: That’s right. I was confusing Harvey Norman with Bunnings. Bunnings is a different store.
Muhsin: They’re one of our competitors.
Kirill: Yeah, there we go. So at Harvey Norman, how do you apply those skills that you learned and you developed through your PhD, through that academic research, those data skills, how do you apply them, and how do they benefit you now in your current role?
Muhsin: Being a large retailer, we have data. We have lots of data. What wasn’t happening prior to my role was that a lot of that data was being left unanalysed. It was being used in a way that you could build a one-off report, but my manager wanted to do something more with that data. So the skills I acquired from the PhD, there’s the obvious — the coding skills, where I was able to wrangle data and tidy it, and then produce something useful. That’s something that I use every day. But the more generic scientific skills that I picked up are actually quite useful. My scientific background, it kind of made me a bit of a careful planner. So when you’re setting up an experiment, an MRI session costs $600 an hour. You do learn to be quite methodical and prepared. It’s almost like project management, really, in that for a given project at Harvey Norman, I’m very good at mapping out what’s required, what resources do I need, what’s the best approach, and do I need to speak to anyone to get the job done, and get past any hurdles. So doing a science degree, it does teach you a lot of skills that I think a lot of people in science don’t realise have a lot of applications outside of academia.
Kirill: That’s very interesting. And resonates very well with what Wilson Pok in episode 3 mentioned, the same thing, that academic background. And Wilson has a PhD in nanophysics. That academic background helps you set up the problem, identify the problem, the challenge and set up for solving it. So like you say, that project management, or pre-project management, skills are very powerful. So would you say that for our listeners who are in academia and are kind of contemplating on maybe making the move into data science and more industry or retail focused, would you say that this is one of their top skills that they should sell themselves or advertise themselves as, focus on that skill that they can actually identify these problems and set up projects to minimise expenses, maximise how quickly they’re processed and things like that? Would you say that that is one of the key skills that academic-minded people are able to develop?
Muhsin: Yeah, absolutely. I think that if you do a PhD, it’s really project management. Not in the traditional sense, but you really are given a problem to solve. And obviously, with a PhD, you have a lot more time to really focus on it, but it is a different way of thinking in that you go from science, but you should really pitch yourself as someone that can find solutions in a very ambiguous scenario. And in business, what I found is you’re often faced with a lot of ambiguity. Not just in the project itself, but also around the particular resources that are available. And I think another advantage going from academia or science to industry is that having access to things like being very familiar with open source tools. So when I started, I had the option to request various commercial tools to do my job for data analysis. And because it was exploratory, I was actually more keen on just trying things out with R and other open-source platforms, and then we found that it completely suited our purposes and now I’m actually introducing it to more people across the organisation. So you can bring a lot of what you learned in academia and be very surprised about what the business finds useful.
Kirill: Yeah, wonderful. And thank you for that support. I’m sure a lot of our listeners will find that very encouraging. So if you are in a situation like Muhsin was, where you’re in academia and you’re maybe not enjoying as much what you’re doing, or you’re not as passionate about it as you thought you’d be, then don’t be afraid of this move into data science because those skills—like, we’ve had great examples. We’ve had chemical engineers, we’ve had nanophysics, and now Muhsin is more neuroscience. So people from all different sides of academia are moving into data science. You will always find skills and ways of thinking that you have developed that will highly benefit you in the data science role. And that’s not to say that people that are not from academia shouldn’t move into data sciences, it’s just good encouragement for those who might find themselves a bit stuck in academia, that there is this whole other world that you can explore.
So we’ve discussed a little bit how you moved from academia to being a data analyst at Harvey Norman. And before we started this podcast, you mentioned to me that slowly, as you’re picking up new skills through these courses and your own education or self-education, your role is slowly transitioning from a data analyst to a data scientist or that there is an option, a pathway that the company is happy for you to undertake to slowly transition to a data scientist. What does this entail in your view, and what kind of new responsibilities or new methodologies and new ways of working does this transition from a data analyst to a data scientist entail?
Muhsin: So I’m fortunate to be a part of the team where it’s encouraged to research and explore and come up with new techniques to find insights around data. So with traditional data analysis skills there’s reporting and there’s dashboarding, where you can showcase the information at hand. But what I’m looking forward to with the data science skills that I’m picking up is, can we gain insights from the data that we didn’t necessarily consider before and, quite frankly, can I make my job easier. So an example is, we collect very simple customer experience data which tends to be post-purchase service where we ask customers for ratings based on their shopping experience. And that can be from very good down to poor. We also ask them for feedback and that’s where customers can leave us open-ended responses based on how they viewed our services. And one of the first tasks I’ll perform with data science techniques is text clustering. I’d like to take all that text, of which there is a large volume, and apply some machine learning to cluster that information into categories and really highlight to the organisation that this is what our customers are saying en masse.
Kirill: Very interesting. And what tools would you think you’re going to be using would be using for text clustering?
Muhsin: To begin, I’m actually looking into Python. So even though I use R predominantly, I’m now picking up Python and I’ve been told to look into something called topic modelling specifically, which I’ve been told is very good for various ranging strings of comments, whether they can be short or long. Our customers might say very little and then they may say a lot. So my hope is that that modelling approach will be able to factor that in.
Kirill: Okay. And so from this text clustering, walk us through your thinking. How are you going to take these comments? Are you going to then put that unstructured data into structured data and then analyse it and what insights do you hope to extract from it, or do you have some different approach?
Muhsin: Actually, I’m not entirely sure yet. This is all very new. In fact, it’s so new that my first real attempt will be this weekend.
Kirill: Oh, wow! Okay. That’s really exciting and best of luck with that. Sounds like an interesting undertaking.
Muhsin: Yeah, I’m looking forward to it. Ultimately, what we want to do is have these techniques to organise the data in a more intuitive way for our end users to digest, really.
Kirill: Yeah, it’s always good to put data in the right format for end users to be able to take action or make decisions on that data. And what you mentioned about finding this new data that hasn’t been used before kind of resonates with the notion that was brought up in one of the previous podcasts of a data landscape, of identifying or mapping out the data that you have. I would assume that an organisation such as Harvey Norman would already have all this transactional or customer data mapped out pretty well. But what you are actually doing is you’re finding this new data that the company didn’t even think about, and you are adding it to the data pool so that you can combine it with other data sources in order to extrapolate certain decisions, or come up with certain insights. So is that something that you have in mind in combining this data with existing data sets that you have in the company already?
Muhsin: It’s something that we certainly do, where we’ve got our customer experience data, and of course there’s transactional data, and we can combine the two. Of course, given the breadth of data we have, it doesn’t mean that you should combine everything. There are various reasons why you couldn’t or shouldn’t. You know, for privacy issues, for instance. Is it possible for us to do that? Yeah, you can gain a lot of insight by talking to somebody in a different department and learning more about the practices that they go through and coming up with different ways of shedding insight into our customers.
Kirill: Yeah. Wonderful! And I would like to move on a little bit to—a topic I’d like to explore and I’m interested to hear your opinion on this, is you studied both neuroscience and now you’re getting into machine learning. And we know that machine learning is kind of–-the pre-emphasis of machine learning, or some of the algorithms and branches of machine learning, is to model the human brain or take certain aspects of the human logic and model them in a machine way so that decisions can be made faster. What is your take on the similarities or discrepancies between neuroscience and machine learning?
Muhsin: Big question! I’ll be honest with you, I’m not entirely sure.
Kirill: Through learning, through undertaking these studies in machine learning, do you see any resemblances with what you used to pick up in your PhD and about how the human brain is structured?
Muhsin: Yeah, so I guess an example is — it’s a modelling approach I haven’t had an opportunity to explore yet, but there’s Bayesian inference. So I think with Bayesian statistics, that’s when you need to update a probability, you can actually look back to past instances to inform your current decision. And we actually used that analogy quite a lot with my PhD findings. So when I was talking before about the time order effect, what essentially the brain is doing is that when given some stimuli and it needs to make some decisions, it will take that stimuli, but it will compare it to past instances, like the average of a frequency. We’ve always viewed that as being very Bayesian.
So yeah, there are – at least in a very analogous way, a lot of machine learning approaches have modelled what the human brain does. And I heard that some other PhD colleagues of mine at the time, they were getting very heavily involved into neural networking, of which I’m not entirely sure how much that reflects the human brain. I know anecdotally, I believe IBM has an initiative called “Big Blue”, where their plan was to, and I may be paraphrasing incorrectly here, was to essentially build a brain from scratch but with a machine. So it’s almost like you’re building a brain cell by cell, but it’s using code. Certainly, I think the inspiration of using machine learning techniques inspired by the brain, I think it’s a very good way of seeing what evolution has constructed in trying to replicate that in a way that we can reproduce.
Kirill: It’s very interesting you mention that, because I was reading a book—I think it was last year I finished reading a book called “Bold” by Peter Diamandis. I’ve mentioned it a couple of times already, but there he discusses in detail Moore’s Law and how—obviously, Moore’s Law states that—Gordon Moore started Intel back in 1965 and he then came up with this concept which was later called Moore’s Law in 1968. He noticed that the price of an average computer is dropping in half every 18 months, but the processing capacity of the average computer you can buy somewhere in the stores is doubling. So basically that’s the main concept, that the processing capacity, the processing power, is doubling every 18 months, and that law has still held constant. And based on Moore’s Law, right now we already have computers that think as fast as a rat’s brain, or the brain of a mouse.
If you extrapolate Moore’s Law, which has held constant since 1968, which is like crazy, the human brain—we’ll have computers that think as fast as a human brain by 2025. And that’s why that’s a big number in machine learning and why everybody is looking forward to the year 2025 because that’s, according to Moore’s Law, when we’ll get those computers. And by the year 2050, if the Moore’s Law still holds constant, we’ll have an average computer which you can buy for like $1,000 in Harvey Norman which will think as quickly as a whole human race. And that is like insane. Right now, we don’t even have the infrastructure or the computers to reconstruct the human brain, but by the year 2025, we’ll be able to do it with an average computer which people will just have in their home. It’s very interesting where the world is going and what implications that will have. What are your thoughts on the implications? What do you think will happen if all of a sudden we’re going to be able to reconstruct human brains cell by cell?
Muhsin: That’s fascinating! I’m a little worried now. I didn’t realise that we had advanced to that point where by 2025, we would have a machine that could think at that capacity because at the time when I was doing my PhD, I knew that the advantage that the human brain has over the machines is the capacity for parallel processing in that each cell can have many multiple connections to various cells and from cells located great distances away from each other. I mean, I guess it’s slightly concerning in that I’m quite convinced that—we already know that machines will be taking over a lot of our jobs. Automation is a reality, but that’s fine. I think it will actually open up new endeavours. But it’s kind of hard to think about what the future will look like because it wasn’t that long ago that there was no Google, there was no Facebook or Uber or Airbnb. I don’t know what’s around the corner. Even now I’m still impressed when you get an alert from Google asking you to review something, even though you’ve never really explicitly told anyone that you’d been to a restaurant and yet it’s sending you a notification saying, “You should review this restaurant.” It knows that you’ve visited there with friends before. It’s interesting to see how that large volume of data and our daily interactions, and how that will actually start to shift out our physical world and our presence in our day-to-day. And then combining it with machines that can think very quickly. Who knows what the future will hold?
Kirill: Yeah, it’s getting freaky. Just a couple of days ago, I had this weird experience where I clearly remember—I don’t remember exactly what it was about, let’s say it was about a gym membership. And I clearly remember I wasn’t Googling that topic, and I wasn’t even searching it or anything. I was just speaking to somebody about it, and then the next day I get a Google pop-up. I get this impression that maybe our phones are actually taking in everything that we say and then Google or some other online project is analysing all of the textual information that comes from there and that’s how we got the ads. Because that’s happened like once or twice to me just over the past month. Have you ever had an experience like that?
Muhsin: Yeah, on occasion. Basically everyone has experienced targeted advertising, but it’s getting to the point now where people are almost expecting that level of service. I’ll admit I’ve actually clicked on targeted ads because they’re spot on. So it’s quite a shift in—whereas before it was quite intrusive, but I think increasingly we’ll start to almost demand and certainly expect this level of—I use the word “service”. I’m quite paranoid. I think it’s a given that a lot of these devices are recording us or watching us. I think Mark Zuckerberg has his little piece of tape over his phone. You know, I think that says quite a lot. It’s the world we live in now, and I think a healthy dose of paranoia is not a bad thing. But yeah, I do find it interesting that we have certainly shifted into a world where we’re becoming increasingly comfortable with it, which is fine. With some respects, maybe it will improve something like the shopping experience, for instance. And it could have other beneficial effects. Let’s say you’re—this is just a silly example—accused of a crime, but based on your digital footprint, you’re able to clearly prove that you were not involved. We leave so much data behind in our day-to-day that we’ll be able to defend ourselves. So it’s not entirely a negative thing, but who knows how data could be utilised.
Kirill: Totally. It can go both ways. And the thing is that you can’t really stop it, right? It’s gonna go where it’s gonna go. It’s already in motion. The machine is working, there’s no stopping it and I get surprised at these movies about Terminator and Skynet, and then people are like, “We should be worried about Skynet. Maybe that can happen in real life. We should not create Skynet.” But if you think about it, Skynet is already there. It’s like this big massive thing called the internet which you cannot switch off and it’s always gonna be working and collecting knowledge and information. It’s just waiting for artificial intelligence to get onto this internet and quickly learn all of the things that we’ve been learning and then we’re kind of at the mercy of that artificial intelligence. Don’t you think?
Muhsin: Absolutely. I’m concerned about how all this data will be used to the point that I actually find myself disengaging from some platforms because I don’t want to feed the beast necessarily, but sometimes you can’t avoid it but you have to. I remember early on I used to be quite hesitant about making purchases online but I certainly do it now. There’s no way to avoid it. And I think you can’t be an engaged citizen unless you’re using the internet. You can’t go completely off the grid. I mean, some people try and do it, but it’s not really a pathway for many people.
Kirill: Yeah. So just like on that—most of this, you know, these recommender systems that we spoke about, these AI and mimicking the human brain in 2025 and so on, all of that is governed or most of that is governed by machine learning algorithms. And now you’re getting into this field of machine learning, which I’m very admiring and I’m sure lot of our listeners want to get into it. It’s the new data science. Like, if you want to be a data scientist in the next 10 years, you need to know about machine learning. So my question to you is, machine learning is so broad. Off the top of my head, I can probably name at least seven different branches of machine learning including clustering, classification, association rule learning, deep learning and so on. How do you go about getting into—you know, taking your first steps, or like mastering the first algorithms in machine learning? How do you decide where to start? How do you actually go about conquering this field? What is your plan, and what can you recommend to those who are just looking at machine learning and want to get into it as well?
Muhsin: Yeah, that’s a good question. I guess there are various approaches. But the one that I took was—I mean, being a data analyst I knew that I needed more power and more techniques and abilities. So at some point you kind of reach a bit of a limit with what you’re currently doing. And at the time when I was transitioning into industry, I think data science became a thing. And my approach was—I was fortunate to be in an age where online platforms, you know, the massive online open courses were starting to become available. So, one big introduction for me was that I learned R programming by the John Hopkins Data Science Coursera course. That’s a series of 8 or 9 courses where they take you through cleaning data and presenting it and they teach you R. That was my avenue into this world. And then increasingly from there, you realise that there’s a wealth of information available, whether it’s from online courses, various providers.
Often at times, I’m on stackoverflow every day, you just Google for answers. So it’s a case of when you’re in a position when you need answers. I mean, the Internet is a great resource to find those answers. So for people just kind of getting into it, yeah, the more time you spend doing data analysis, and the more complex questions you have, I found that I start to reach a limit. And that’s why I feel machine learning is a way to broaden the horizons in terms of what’s possible. So that’s why before I said I’m really interested in what can machine learning do for me in terms of highlighting things that I didn’t realise before. And I think that’s a very powerful concept, in that traditional reporting or traditional dashboarding is presenting the data you kind of already know something about. Like, people have an intuition about the data and you’re showcasing that data. But with machine learning, because of the sheer power that machines have, they’re able to potentially highlight something that a person couldn’t have considered.
Kirill: Yeah, definitely. And especially with the huge volume of the data, or maybe the complexity of the data, they help you break those barriers and get those insights which otherwise would have taken you ages or you wouldn’t even have thought of. That’s some really great advice. You’re totally right that with the new world that we have all these online courses, massive online open courses and just any kind of online education. Anybody can get into this field and anybody can slowly start exploring step by step what machine learning can do for them and how to get into this field. I was actually at a conference last week. It was related to data – it was on digital marketing, so there was a lot of things related to data. It wasn’t specifically related to machine learning, but somebody at the conference, one of the speakers, said that the world we live in, you can learn anything. You just have to know how to Google really, really well. And that’s resonated with me now when you mentioned Google. It’s exactly right. So if you want to find out anything, if you want to learn pretty much anything on this planet, you just open up Google and it’s about how well you can Google that topic, and how quickly you can find your way into that information and access to information.
Muhsin: Absolutely. You said it best there. It’s about how well you can actually structure the question. So when I’m teaching people programming, one of the first skills I say they should develop as quickly as possible is finding the right words to use to find those answers. Provided you search correctly, the more you read, the better the language becomes in terms of how you hunt down those answers.
Kirill: Yeah, totally, and when you say teaching programming, does that mean in your workplace? You’re like helping out your co-workers or is that something else?
Muhsin: Yeah, yeah, helping out co-workers. So at the time when I’m learning I’m also—I find the best way to learn is to try to teach someone else.
Kirill: Exactly! I feel the same way.
Muhsin: Yeah. I think that if you can convey something in a way that you yourself understand, and that someone else appreciates, I think that’s beneficial for both parties. And that’s why I really—I’m high like when people can break down very complex, often uninviting information into something that’s demystifying and palatable. So, yeah, I’ve often in past jobs sat down with co-workers who were kind of stuck in their Excel world. I love Excel, but if you only know Excel, then you have an upper bound in terms of what type of analysis you can do. And they have approached me to learn a bit more programming and I’m always pleased when they actually progress. And it’s kind of nice feedback to know what you’re teaching is getting through. Yeah, it’s not just collaborative. Recently I’ve had an intern who worked with us for the past three months, and it was great to see her flourish and pick up a few new skills in data wrangling and some data science techniques.
Kirill: That’s really cool. I admire that a lot because I do the same thing through my courses online. What would you say, for somebody who is as passionate about data science and bringing the culture of data science as you are, into their organisation, what would you say your best tip would be on when they want to spread this information, when they want to maybe teach somebody programming or they want to just increase awareness of data science and get somebody excited about data science? What would your best tip be for somebody in that situation?
Muhsin: Keep it simple, but be mindful of that person’s end goals. Often someone will come to you with a problem that they want solved. And that could be anyone. It could be senior manager that says, “I need these numbers,” or it could be a colleague that wants to learn more programming that says, “I’d like to process my data in this Excel sheet more efficiently.” I think once you identify the goal, then you kind of need to break down that complexity into really nice bite-sized chunks and then kind of step through that process, but keep coming back to the end goal of where, you know, we’re working towards something in order to achieve what you want. So I think that it’s often the case that particularly for large projects, it can be overwhelming. And you don’t quite see what’s over the horizon. But if you keep keeping in mind the purpose, then it can be motivating. Particularly when you’re working with someone, it’s always nice to bounce ideas off each other. It’s good to get feedback, and keep people in the loop. Yeah, I’m always a big advocate of keeping things simple and being mindful at all times.
Kirill: Yeah, I totally love it. I have the same approach. I break down my courses into sections, and in every section, there’s like a challenge that needs to be solved. And I’ll always describe the challenge, and sometimes I’ll even show—when it’s like the super complex challenge — I’ll show what the end result is, and what we want to achieve, what that end visualisation looks like, or end insight looks like, and I’m like, “Okay, that’s where were gonna get to, and we’re gonna break down the process into steps,” like you said, into these simple steps, and it really helps. So keeping that end goal in mind keeps people focused, I guess, on why they’re doing this. Rather than just learning the mechanical steps, they’re actually learning the reason behind why they’re learning what they’re learning.
Muhsin: Yeah, absolutely, And that’s a great technique to actually show people, “Look, this is where you’re heading.” I remember the intern I mentioned before, when she started with us, I told her it wasn’t that long ago that I was where she is now. And with a number of years learning the processes of data wrangling and picking up some data science techniques, she can get to where I’m currently at. I think that she appreciated that, knowing that even though something may take some time, it’s an achievable goal. Yeah, it’s good to find someone, like the people that have reached that point, and they can actually tell you, “Look, it is possible. Here’s what you can do about it and here are some tips to get there.”
Kirill: Fantastic! Thank you for sharing that. And I’ve got a couple of interesting questions for you. First one will be: What has been the biggest challenge for you as a data analyst?
Muhsin: So the biggest challenge as a data analyst is often kind of at the two ends of the data analysis pipeline, and that’s when it’s getting the data and then distributing the data. So often in an organisation it can be challenging to acquire different data sets for various reasons. There are multiple business constraints, whether it’s potentially political or ethical, there are privacy concerns, it may be practical reasons, so you might need the resources of someone from IT, but they’re just too busy. And because of those reasons, often getting the data and then sharing it, such as pushing your data out into a dashboard could become a bit of a hurdle. So that tends to be a bit of a challenge, at least what I’ve seen in big organisations. In those circumstances it means that you can kind of wait until a solution occurs, or you find a workaround, and then focus on another part of the pipeline.
Another challenge when it comes to data analysis and data science for me is not having other analysts close by. So I mean, I am the analyst of my team, and it’s great to bounce ideas off people and just come up with different ways of analysing data and talking through a problem. And when you don’t have those people immediately with you, you can approach other people so I’m often talking to people from IT and our solution architects and they’re great at getting you past blockages. But then also reaching out to analysts across different departments, even though they’re not within your immediate team. So you can kind of build a bit of a community depending on how large your organisation is, or even go outside your organisation and join various meet-ups and just share a lot of your experiences and learn from people as much as possible.
Kirill: Yeah, I find that that’s very valuable advice. There are always these organisations and always these altruistic groups and meet-ups where people want to share their knowledge. You know, it’s like learning a language. You want to converse with the people that already know the language, or are also already learning it so it just makes it more fun and interesting. And there are so many different groups that you can join. Pretty much in any medium to large-sized city there’s gonna be at least one group of aspiring data scientists or analysts that you can join and share experiences. That’s some great advice. And what would you say is your recent win that you had using data science or data analytics that you can share with us?
Muhsin: What I’ve been spending the majority of my time doing is building an analytics dashboard platform. This is where I’m grabbing all our various customer experience data and other data sets, and I’ve got a bit of a pipeline at the moment where I’ve got my R scripts and some Bash scripts where I’m automating a lot of the processing, the data wrangling, and then the tidying of data, and then piping that into dashboards. And we’re going down the path of automated solution, because I want this system to be as independent from me as possible. I want it to be something that lasts long after I’ve left the organisation. I mean, that would be quite flattering if this system is still chugging along even when I’m not there. Recently we’ve been showcasing these dashboards, this analytic platform’s potential and a lot of people across the company are asking about it and we’re about to launch it. So yeah, I’m looking forward to using these dashboards and this platform as a means to showcase that you can make decisions through data analysis, data-driven approaches. And these dashboards would also serve as a means to present a lot of those insights gained from data science, of course in a very simplistic, palatable way that the end user, whether it’s a staff member in a store or a senior manager, where they can take that information and action it. And do something meaningful with it.
Kirill: That’s totally a very admirable thing to do. And you mentioned that it’d be flattering if those dashboards were still there even if you move on from that role and you move on from maintaining those dashboards. This brings me to the next question, which is kind of in line with that. What is your most favourite thing about being a data analyst or being in data science? What inspires you the most to keep doing what you’re doing?
Muhsin: Definitely the creative aspect of the work. I think there’s a lot of creativity in data analysis and data science. That’s why I’m not entirely sure the machines will completely take us over unless machines can master creativity. If they do then we’re in a little trouble.
Kirill: In a lot of trouble.
Muhsin: Yeah, yeah. But yeah, I really enjoy being able to combine and transform data, and then with my neuroscience background think about different ways of highlighting behaviour, whether it’s customer behaviour or staff behaviour or even just organisational behaviour. And when I’ve thought about a different way to do that, then taking a step back and thinking, “Okay, given the data that we have, given the resources that we have, is it possible to actually code this up and then present it?” So I really enjoy that creative aspect of work.
Kirill: I totally agree with you. It’s interesting, you know. It’s such a technical field. It’s such a—you know, the numbers, mathematics, algorithms statistics, programming. It’s so technical and yet there’s so much room for creativity. You don’t expect that. You think creative subjects are arts, and English, and literature, but here, in data science, amidst all of that technicality, there is something that is so inspiring about this creative aspect of the job.
Muhsin: Absolutely. Yeah. I mean, the thing I love is that in this field, we get to build something that didn’t exist before. You know, we’re building useful products. So I think it’s very creative pursuit.
Kirill: Yeah. Thank you very much for sharing all that knowledge and coming on the show and sharing these insights. I just have a couple of quick questions to finish off this episode. What is your main career aspiration that pushes you forward to better yourself and become a better data scientist every single day?
Muhsin: For me it’s overcoming a challenge. So a lot of the times when you have your end goal in mind and you’ve broken your task up into something that’s small and achievable at each stage, it’s really focusing on those little tasks and investigating the ways that you can overcome and conquer that challenge. That’s using a variety of tools and the resources you have and speaking to people to help you achieve that. So I really like that aspect of the field where you can keep pushing yourself to the next stage. Yeah, it’s almost like an inevitability where you can—even though at the time you’re getting over a hurdle, once the goal is achieved you can kind of look back and realise along the way you’ve picked up all these skills, had great conversations with people, and you learned quite a lot. At the time, the motivation is, “I just really want to solve this task at hand.”
Kirill: Fantastic. And kind of the way I think about it is that on one hand, you’re learning something on the very edge of what technology is capable of, of the algorithms that humans have come up with so far. You know, it’s creating something brand new in terms of machine learning. These are things people have already created and used before. But at the same time when you apply it in your specific challenge, it makes it unique. Even though you’re learning something that already exists, when you’re applying it in the circumstances, and to the business problem at hand, it makes it unique. And that gives you that fulfilment that you’ve actually done something new, you’ve created something brand new for the world. And I completely agree with you. That’s such a great feeling, that even just that is worth waking up in the morning and going and doing it again.
Muhsin: Absolutely. You said it very well in that I could go online and find someone that has outlined how to perform a particular method of clustering text and then apply it to my own data set. And because I’m so familiar with that data set, it’s motivating to see how the insights start to take form. And you’re right, the end processes is you end up creating something new and for your organisation that’s ideally something that will be useful.
Kirill: Yeah, I totally agree. Thank you so much again, Muhsin, for coming on the show. If any of our listeners would like to get in touch with you, maybe follow your career, how can they best find you?
Muhsin: I’m on LinkedIn. They can just search for my name and find me there. I also have a blog that I don’t update frequently enough but now that I’m mentioning it perhaps I’ll get back to writing more because I do enjoy writing. So on that blog, it’s called probablyabetterway at blogspot.com.au and I can send you a link.
Kirill: Interesting. All right, definitely. We will include that in the show notes. So if you’re listening to this podcast, definitely check out Muhsin’s blog, probablyabetterway at blogspot.com.au. And one final question for you today: What is your one favourite book that can help our listeners become better data scientists and analysts?
Muhsin: I’ll mention the book that had a huge influence on me. It’s a very popular one that I’m sure many people have read. It’s called “Freakonomics” by Steven Levitt and Stephen Dubner, and for me it was the first of its kind to showcase how our assumptions can be challenged by study design and data analysis. And it really did invite me into the methodologies of how real world data can uncover really surprising behavioural insights of people. Since then there have been a lot of similar books. I know you’ve asked for one but I’m gonna mention a couple more.
Kirill: Sounds good. Go for it. Go for gold.
Muhsin: Another popular science book I recommend people looking at, particularly with people of my background from science, it’s called “Invisible Gorilla” by Christopher Chabris and Daniel Simons, and that’s probably the best popular science book I’ve read that shows how our intuitions can deceive us. And a number of your guests have mentioned Nate Silver’s “The Signal and the Noise,” which I highly recommend. And a fun and probably also depressing one is called “Dataclysm” by Christian Rudder. He is a person who’s worked for the online dating site OkCupid. He’s crunched the numbers for online dating to reveal some pretty sobering insights of how people try to find The One online.
Kirill: Very interesting. Thank you very much. There you go, guys. We’ve got a whole library of books: “Freakonomics”, “Invisible Gorilla”, “Signal and the Noise”, and “Dataclysm”. Once again, thank you so much, Muhsin, for coming on the show and sharing your insights. I’m sure so many people will find so much value in this conversation that we had just now.
Muhsin: Thank you for having me.
Kirill: All right. Take care and good luck with your machine learning aspirations.
Muhsin: Thank you. All right, all the best!
Kirill: So there you have it. I hope you enjoyed this podcast. We did go into quite a lot of interesting conversations. I really enjoyed the conversation about machine learning, about how somebody who’s completely new to this field, so somebody who has experience in data science, but somebody who wants to challenge and tackle the challenging field of machine learning, how they went about it, what kind of tactics or just the approach Muhsin has been using to get into the field of machine learning. I think that can help everybody get started into that field.
Also, I always enjoy talking about Moore’s Law, so hopefully you picked up some valuable stuff from there. Once again, check out the book called “Bold” by Peter Diamandis if you want to learn more about Moore’s law. It’s described in great depth there and it is the governing law of how computers are developing in that exponential curve. So all in all, I hope I enjoyed this episode and of course, we’ll include all the links to the mentioned resources, you’ll be able to follow Muhsin on his linked in and check out his blog. So don’t forget to go to www.superdatascience.com/9 and while you’re there join the SuperDataScience community and hang out with more people like Muhsin and other students of SuperDataScience. I can’t wait to see you next time. Until then, happy analysing.