Jon Krohn:
This is episode number 639 with Mariya Shaw, the host of Python Simplified. Today’s episode is brought to you by Kolena, the testing platform for machine learning.
Welcome to the SuperDataScience Podcast, the most listened to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.
Welcome back to the SuperDataScience Podcast. My fun, fabulous, and very freaking clever guest on today’s episode is Mariya Sha. Mariya is the mind behind the Python Simplified YouTube channel, which has over 125,000 subscribers and makes both beginner and advanced concepts simple to understand. Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, Google development, and object-oriented programming. She’s renowned for taking complex concepts such as gradient descent or unsupervised learning, and explaining them in a straightforward manner that leverages hands-on real life examples. In addition to her brilliant YouTube channel, Mariya is also pursuing a bachelor’s in computer science with a specialization in AI and machine learning from the University of London.
Today’s episode is a fascinating and wide ranging conversation that should appeal to anyone who’s interested in or involved with data science, machine learning, or AI. In this episode, Mariya details how the incredible potential of machine learning in our lifetimes inspired her to shift her focus from web development languages like JavaScript to Python. She talks about why automation and web scraping are critical skills for data scientists to master. She talks about how to make learning any apparently complex data science concept, straightforward to comprehend. She talks about her favorite Python libraries and software tools, the one rarely mentioned topic that every data scientist would benefit from knowing, and the pros and cons of pursuing a 100% remote degree in computer science. All right, you ready for this super interesting episode, let’s go.
Mariya, It’s awesome having you on the SuperDataScience Podcast. I can’t believe it. I’ve been watching your videos for years. I think that they’re amazing. And now I get to interact with you on air, what a treat. Welcome to the show. Where are you calling in from, Mariya?
Mariya Sha:
Awesome. Thank you so much for inviting me. I’m here in beautiful British Columbia, Canada, in particular in the outskirts of Vancouver. So you see this beautiful view behind me.
Jon Krohn:
Yeah, I think-
Mariya Sha:
Living the life.
Jon Krohn:
For our audio only listeners out there, which is most of you, I do suggest you check out the YouTube version of today’s podcast if you want to see the best background of any guest we ever had on the show. It’s beautiful. When Mariya came on, I was like, “Is that your garden back there?” And she was like, “No, this is the mountain side. When it’s not so foggy, you can see the ocean.” And we’ve got this Buddha statue in the background.
Mariya Sha:
Yeah, for sure.
Jon Krohn:
And some digital candles adding atmosphere. You definitely win the… Yeah, I’m going to have an award show at the end of the year for best X, best Y of 2022. And yeah, best background you’re for sure the winner, Mariya.
Mariya Sha:
Yay. Awesome. I tried very hard this morning. This is not how it looks like usually, so it’s specially for you guys.
Jon Krohn:
Nice. Well, we appreciate it. I’m sure even the audio only listeners, the ambiance will reflect in the way that I interview you.
Mariya Sha:
Hopefully. Yeah, I’m counting on it.
Jon Krohn:
So you have a super popular YouTube channel. It’s called Python Simplified. And at the time of recording, it has more than 125,000 subscribers, which is wild. The channel’s only a few years old. And so to have amassed such a large audience in such a short time, it’s a testament to the very high quality of the videos that you create. And in these videos, you cover a broad range of Python topics. You’ve got web development in there, you’ve got game development, which is super fun. And for our audience in particular, you’ve got a lot on machine learning. So Mariya, let’s start with a simple one here. What got you interested in Python in the first place?
Mariya Sha:
It’s a funny story because when I was first introduced to Python, it happened absolutely by accident. I was basically interested in machine learning and artificial intelligence, and Python was just a way of implementing it. So instead of starting from what usually normal people do, they have a very basic starting point and they learn the basics and they move up. I went for the creme de la creme immediately, which was the opposite of what everybody else should do. I don’t know if it was the best decision, but it sure was very memorable. It is very hard to forget this experience. And I remember that what first really got me drawn into Python was the aesthetics of this language.
So basically when I’ve seen how this clever way of how Python is utilizing indentations, that was an incredible moment. And in addition, the entire syntax is based on such simple words. We are basically using words is, not and, or.
So instead of looking at this piece of code that is so sophisticated and hard to understand, you’re basically looking at a set of instructions and almost plain in English. And that’s what immediately drawn me into Python. But two years back when I first started this channel, when I first thought of Python Simplified, it was more of an intuition. It was basically the aesthetics and the intuition of the language. But now after two years of exploring it from all kinds of angles, from all kinds of topics, I can see that it has so many benefits that nobody can even imagine.
Jon Krohn:
So well, it’s super interesting that you got into Python because of AI/ML, which is something that’s going to be near and dear to the heart of most of our listeners in this data science podcast. So that’s awesome. What drew you to that? You’re just like, this is a super cool thing. In our lifetime, it’s going to make an enormous impact. It’s already making an enormous impact right now, I’ve got to get involved. Is it something like that?
Mariya Sha:
Well, the funny part is that I’ve considered AI as science fiction for a very long time. I had no idea that this is something that we can implement in our lifetime, especially since you don’t really need to be a crazy genius in order to understand it. You can sit on your couch in your pajamas and you can basically make a neural network out of nowhere. Now this is something that… And I think that up until this point, not a lot of people are aware of it. So I remember I was reading an article about Tesla and they mentioned that their autopilot was utilizing artificial intelligence technologies. And they didn’t use it in a future term. They used it in a present term, so basically we’re doing this right now. And this is where my mind completely filled in with some great ideas. And I realized that if there’s anything I want to learn, if there’s anything I want to do in the rest of my life, it’s probably this. You’re basically building something out of nothing and you give it some logic, you nourish it. And I was very excited about this idea.
Jon Krohn:
Yeah, you’re preaching to the choir over here. I think it is the most fascinating thing that will not only happen in our lifetimes, but this transition from humans being by far the most intelligent thing on the planet to in some narrow areas us already now having been overtaken by machines in the last few decades. And that trend looking set to continue for the coming decades. This is a pivotal moment in the history of our species and the planet. And you, Mariya, and you listener, can play a leading part in that. It’s really cool.
Mariya Sha:
Absolutely. It gives you an opportunity to discover new things because even though a lot of people have been doing this for many years before the invention of computers probably, this whole idea it’s basically based on math. And right now we are just able to implement it in a very efficient way because we have processing powers, and we have a lot of data that we have been collecting for many years. But this practice is just at its very beginning. I always say it’s at its diapers. We can take it in all kinds of directions. So for now, it’s a piece of potential and it’s up to us to take it into any realm we would like to take it to.
Jon Krohn:
Yeah, it’s early days for sure because although as you say it is in the last decade or so. Since about 2012, that we’ve had cheap enough compute and some intelligent compute ideas like having model training happen on GPUs instead of just on CPUs. And then also data storage become very cheap as you mentioned. And so those kinds of things, those hardware capabilities is what has allowed artificial neural networks, which drives AI innovation today to be so dominant, to be so widely applicable. But most of the underlying ideas of artificial neural networks have been around since the 1950s. So the base algorithm that makes up each of the artificial neurons has been around since the 1950s. And there’s a really fun… So the first machine to implement this algorithm was called the Perceptron Mark-1.
And you can actually go see the Perceptron Mark-1. It’s a big computer in the Smithsonian Museum in Washington DC. And at the time that that machine was created, the New York Times had articles that were saying, this machine replicates the way that human brain cells work. And in the next few years it might overtake human capability on task.
Mariya Sha:
It’s a very ambitious prediction, I would say.
Jon Krohn:
For sure. And we continue to be ambitious about this prediction. It’s I think the advent of an artificial general intelligence. So a machine that is capable of learning as broadly as a human is similar to nuclear fusion, where it’s something that’s like it’s always like 20 years away. And that’s where we’re still today, it’s the same. It’s like yeah, maybe in the next 10, 20 years we’ll have it. But yeah, it turns out people have been saying that for many decades. And it’s interesting also that you mentioned there, you touched on how the ideas behind AI could be even older than that. So we have things like Charles Babbage coming up with theoretical ideas behind how we could be doing computation with mechanical machinery before we even had the technology to do it.
Mariya Sha:
Yeah, I find that in a lot of fields people first imagine something, and then many generations after their imagination is being brought to life. So the rule of thumb is if we collectively imagine something, I believe that we have no way of getting away with this. We’re going to have to reach it. We’re going to have to get involved with it because if this idea is really good, it can advance us. But sometimes this idea is also very dangerous. And AI is kind of in the gray area between those two. It all depends on us.
Jon Krohn:
It’s a double-edged sword for sure. So what are the kinds of things that we should be imagining on the good side?
Mariya Sha:
So what I first imagined when I read about Tesla’s AI, is I imagined an entire system of automated vehicles in which people who go to work they just sit down in their car in front of the wheel, they don’t even touch it and they’re able to read a book. So I imagine something as though you’re using a GPS. So you’re using a phone application like Google Maps and you say, this is where I want to be. Your car just takes you there. So that’s what I had in mind when I first thought of this idea. And that’s only one field in which we can improve our life. Now, if you ask people like my spouse who really love driving even on manual gear, this to them is a nightmare. So it all depends on the eyes of the beholder. Some of us are excited about it, some of us are excited to sit and to avoid traffic jams and things of that sort. And other people really like the experience of pushing the gas and enjoying the combustion technology. So it depends on who you ask always.
Jon Krohn:
Yeah, enjoying the combustion. That’s true, people do. Are you unit-testing your machine learning models? You certainly should be. If you’re not, you should check out Kolena. Kolena is an ML testing platform for your computer vision models. It’s the only tool that allows you to run unit and regression tests at the subclass level on your model after every single model update, allowing you to understand the failure modes of your model much faster. And that’s not all. Kolena also automates and standardizes your model’s testing workflows, saving over 40% of your team’s valuable time. Head over to Kolena’s website now to learn more. It’s www.kolena.io. That’s K-O-L-E-N-A.io.
Jon Krohn:
Well, we could probably think bigger than self-driving cars because we have more or less autonomous driving vehicles today now. So I guess what kinds of things, maybe things like helping with nuclear fusion. So AI helping us to simulate energy equations or how fusion reactors could work and help us transition to cleaner energy.
Mariya Sha:
Yeah, for sure. It can do all kinds of stuff. Basically, I don’t know much about energy. I’m not really… I know that fusion energy is the future and we all are dreaming of it and it would really make our lives much better once we discover it, of course. But if I have to think of an area where, not the near future but the very late future goes, and it’s something that comes from the pop culture mostly is that human enhancement in a way. So right now we live in a reality where Neuralink is being developed. Now, Neuralink is basically allowing folks with disability to have a fuller quality of life. So if somebody doesn’t have hands but they would like to use the keyboard, they are still being able to do so because they have this chip that is being installed into their head, which allows them to interact with the machinery.
Now when I think of the future, I don’t really think of it from that perspective. What I imagine is folks without arms who will basically get some robot arms, instead and you can connect those arms to the organisms. So instead of just interacting with machinery, they’ll be able to interact with the human world. So when I’m thinking about this type of technology of robotics, it’s very hard not to have machine learning involved in it. It has to be based on some sort of an AI. So if you ask me where this technology goes, that’s where my imagination brings me.
Jon Krohn:
And so Neuralink starting with people with disabilities, but the intention behind the Neuralink company that Elon Musk started is that the technology could eventually be applicable to you and me, to everyone
Mariya Sha:
For sure, to enhance our ability to think. Now, I’m a bit scared when it comes to these type of topics. I know the value of the technology overcomes the dangers of the technology when you have a certain disability. But when you are a healthy human being, it’s very dangerous to install a piece of software in your mind that somebody else is controlling. That’s where I’m having a bit of a problem digesting it.
Jon Krohn:
Yeah. So another one of Elon’s companies, Tesla has had something like 18 or 19 major recalls in 2022. The most recent recall was like 300,000 vehicles due to some taillight issue. And so you don’t want to be in a situation where your neural link system that’s been implanted is like this embedded mesh across your cortex.
Mariya Sha:
Absolutely. Or hacked, there’s so many ways of messing with technology. And for somebody who’s been in programming for quite some time you know all those ways of how it can go wrong, and it’s part of your consideration. So there’s always a good side and always a bad side. And hopefully we’ll be able to have much more confidence in those type of a technology in the future.
Jon Krohn:
Yeah. One of the things that concerns me about this is, so I’ve read about all the kind of different ways that they think that they could have the interface interact with your brain directly. And so it involves things like injections or having a mesh over your cortex. But then, so something that’s obvious to me is that, okay, you do that, you’re like, oh, sweet Neuralink is here. All of my friends are doing it. It allows us to be surfing the internet, getting all the answers to questions on tests. I’ve got to have that or I’m going to be behind. I’m going to be an idiot because I don’t have a search engine in my brain. So you get it implanted and then three years later, Elon Musk is like, oh, we’ve got Neuralink two and it’s 100 times better. But unfortunately, if you already have Neuralink one, we can’t just fit more neural links into your brain. So you’re stuck with Neuralink one forever.
Mariya Sha:
For sure, that’s another disadvantage. Technology keeps updating and if we embed it in our minds, holy smokes, another brain operation. A month later, people who love gadgets will be in a very weird position.
Jon Krohn:
In and out of hospital. Yeah, so that’s going to be interesting. We’ll see how that plays out. All right, so coming back to the present a little bit, you said that you were really excited about Python because of how it is the lingua franca of machine learning in AI. And then you went on and described it in really beautiful ways and how that drew you into it. That implies to me that you were already programming. If you came across Python and you were like, oh, I love this syntax, I love the indentation, I like the simple language. It doesn’t sound like that was your first programming language.
Mariya Sha:
No, I’ve actually started programming as soon as I got my broadband internet connection when I was 12. So it’s been a while. I was very fascinated with the internet. I liked the fact that you can learn so many things and people are sharing so much knowledge, especially in the realm of programming. So I started building websites. I started with HTML and CSS, and I was building the most simple websites in the world. And as time went by, I ended up learning JavaScript and you actually advance your skill in the process. So by the time I was introduced to Python, I already had a good set of tools that I was proficient with. So it was mostly in HTML, CSS, JavaScript, jQuery, Bootstrap, all those technologies that we used to build websites. So I had that in mind before I looked at Python. So I always compared Python to JavaScript. And in that form, Python had the upper hand.
Now, I didn’t have much experience with low level language, languages such as C++ or C-Sharp. And now that I do, I understand that Python is even more beneficial. So it just keeps getting better and better with time.
Jon Krohn:
Yeah, it seems like your interest in machine learning and AI led to you in the right direction because over the last few years, Python has become… I don’t know if it’s yet overtaken Java as the most popular language period, but it’s coming close if it hasn’t. And it is starting to become this kind of language that even beyond machine learning and AI, it’s kind of an expected language for software developers to know. So three years ago when you started your Python Simplified channel, that ended up being really good timing.
Mariya Sha:
Yeah, it was purely intuition. I’m surprised that I had such a good intuition. I didn’t know that you are able to build desktop applications and mobile applications. And I didn’t know you can use Python as a backend to your website, completely disposing of PHP. Who’s working with PHP nowadays? Everybody is using Django or Node.JS even. But nobody’s really looking into those traditional practices that I was familiar with when I was a teenager. So Python has such a versatility, and the beauty of it is that we discover new things all the time. On my channel when I’m very confident about something and I’m posting a video and like, hey, there you go. This is the easiest way of you how you can do it. I always get comments suggesting an easier way, a better way, which is amazing. And I usually pin those comments to the top so people can see them very well. But it’s a language that keeps progressing and every few couple of months we have a new version of it. There’s always new developments and there’s a lot of community support. So I think that there’s a reason why it becomes such a popular language over time.
Jon Krohn:
You just mentioned there how your community is so engaged with you. One of the things that I noticed you do when I was researching for this episode, was that you hold live sessions where the community can interact with you directly and ask you anything. I think that’s a really cool idea. And so I don’t even really have a question. I’m just bringing this up in conversation so that our listeners are aware of that, that if after this show you want to not only be watching Mariya’s Python Simplified videos but actually interacting with her. She does YouTube and live sessions that are really slick. What are you using? What kind of software do you use to have that? It’s like watching a TV show. You’ve got a banner along the bottom. You’ve got popups of questions that people are asking. How do you do that so smoothly in real time?
Mariya Sha:
There’s a broadcasting software called StreamYard. There’s, there’s a few software [inaudible], there’s Restream, there’s StreamYard, and they basically allow you those pop outs and they’re connected to your YouTube account and to other social media accounts. Though sometimes I’m streaming across LinkedIn and Twitter and many many other platforms other than YouTube. And basically it allows you to interact with your audience in a much more efficient way. So basically what happens when you start a channel or where you start some kind of a media outlet, you get a lot of emails from people, you get a lot of messages that are private. And it’s very hard to communicate in that way because somebody else might have the same question. And now you spend a lot of time giving this answer, but only one person will be able to read it. That’s why I enjoy the interaction via comments on YouTube videos.
But sometimes people would like to ask you things that are not necessarily relevant to a video. So it’s not necessarily a question in the realm of machine learning or data science, but maybe somebody… Just like you asked me, maybe somebody wants to know which software I’m using. So for this type of question, I try to make myself as available as possible in streaming sessions. And also I find that it gives a lot of motivation to people because people who are new to this world of programming, it’s very easy to get frustrated. There are so many things to explore. There are so many things to do. There are so many advices that everybody is just so eager to give you online, and you don’t know which one of these to take and which one not to take. So sometimes you just need to have a good word, good motivational speech from someone. Don’t worry. I know it’s overwhelming. It happened to me too.
Here’s how you can go past those struggles. So trying to be there for my community. And there’s always new people joining, but very often I’m talking to the same individuals for a very long time. I have a very… My audience sticks around.
Jon Krohn:
Yeah, I watched a recent live stream of yours. I watched a recording of it in the last few days in the run-up to doing this interview with you. And I noticed that a lot of the names you were like, “Oh yeah, welcome back. And you’re always here.”
Mariya Sha:
Yeah, for sure. There’s the regulars of course. And in the end of the day, you end up meeting a lot of people, especially if you communicate not only through YouTube but also through Discord. And you basically have this community of friends in the end of the day. And many of us are developers by trade, many of us are doing this because we want to help others. And yeah, it’s fun. It’s fun. It gives you something to do and you are also networking.
Jon Krohn:
And so this StreamYard, did I get this right, that it allows you to realtime broadcast across all of these platforms simultaneously so you can be broadcasting onto YouTube, Twitter, or LinkedIn. Can you broadcast on Discord?
Mariya Sha:
I haven’t tried that yet, but you should be able to. You should be able to. You can broadcast even on platforms like Rumble. Now, Rumble doesn’t have a traditional way of broadcasting, you need to do this through OBS Studio, but you can always do some engineering in the background. And I know that if you reach out to StreamYard and you tell them, “Hey, can you make this available for this and this platform?” They might be able to adjust it because they have good customer service. I enjoy it.
Jon Krohn:
And then how does the Discord channel tie in? So Discord, why would somebody be engaging with you or your community on Discord relative to in the YouTube comments?
Mariya Sha:
So basically, the only limitation that I find that is a bit frustrating… Actually, two limitations that YouTube has is first of all, you cannot share photos, you cannot share files, you cannot share photos which is something that for programmers is a bit of a deal breaker, because sometimes you want to show a screenshot, sometimes you want to share a big piece of code.
Another problem is that you cannot use tags, you cannot use… Oh, it’s those HTML type of tags. I forgot how you call those characters. You’re not allowed to use it within your comments, and a lot of our code involves… Especially in the realm of web development, a lot of it involves tags or even just the “greater than” symbol. Hah, that’s the name of it. The “greater than” or “lower than” symbols, so you can’t use them. And it’s a basic logical operator in Python so that makes it a bit challenging. So if you have a project that you’re struggling with and you need some additional help, not necessarily from me but from developers who are willing to help, definitely going to Discord is a very good resource. And in addition, of course, we can communicate via audio or video in this Discord environment. So it gives us a bit more tools for a proper conversation.
Jon Krohn:
Nice. Mathematics forms the core of data science and machine learning. And now, with my Mathematical Foundations of Machine Learning course, you can get a firm grasp of that math, particularly the essential linear algebra and calculus. You can get all the lectures for free on my YouTube channel. But if you don’t mind paying a typically small amount for the Udemy version, you get everything from YouTube, plus fully worked solutions to exercises and an official course completion certificate. As countless guests on the show have emphasized, to be the best data scientist you can be, you’ve got to know the underlying math. So check out the links to my Mathematical Foundation’s and Machine Learning course in the show notes or at jonkrohn.com/udemy. That’s Jonkrohn.com/U-D-E-M-Y.
Jon Krohn:
All right. That’s good to know. I’ve only dipped a few toes into the Discord world, I should probably spend a bit more time there.
Mariya Sha:
It’s addicting. I warn you.
Jon Krohn:
That’s why I’ve only dipped a few toes. I’m like, “I better leave that closed.” Because there’s lots of interesting things going on there that are going to get in the way of me creating content. So speaking of being able to be productive, you have lots of videos about productivity hacking, so web scraping, automation and bot making. Actually, first, before we even talk about that, do you have a favorite topic to create content about? Is it machine learning or…
Mariya Sha:
It’s hard to pinpoint my finger on the topic, but what I can tell you for sure that my favorite videos are those concepts that are so complex that I cannot even understand them myself. It’s concepts that I don’t even know where to start from, and those end up to be my favorite videos. Now what I usually do is when I explore a topic, it’s always challenging but then you always have this moment of eureka. This moment of like, “Wow, all the pieces of the puzzle are now fitting in place and now I can truly understand something.” I’m not just memorizing it and I’m explaining things in somebody else’s words, but I have this pure understanding of this topic.
And from that moment on, that’s not where things end, this is where you only start planning your tutorial. So this eureka point, you start asking yourself like, “Hey, how am I going to explain it to such a wide audience that some of them don’t really have any experience with programming? How do I translate it in such a way that even a six year old can understand?”
Now, by the time you’re done with that, if you didn’t have much confidence about this topic to begin with, you can be woken up in a middle of a night and you’ll be able to articulate very well what this concept is all about. So you end up teaching yourself really, really well.
And then another beauty is that after you post this video, you can take a look at the feedback that people have, and this what really, really makes you happy, especially if you explain something like gradient descent or cross entropy loss which only sound… The sound of it sounds intimidating, not to mention all the formulas and the implementation. So it’s a nice tap on the shoulder. So I do enjoy covering machine learning, it’s definitely one of my favorite topics. But there’s also topics of object-oriented programming that I also enjoy, or just basic parallel computing things such as Cuda and controlling your GPU’s in a very efficient way. So the more complex, the more I like it.
Jon Krohn:
Nice. I love that answer. I imagine that some of the game development stuff is also relatively complex?
Mariya Sha:
Actually, I find it a bit more simple. When it comes to… Because I’m used to [inaudible]
Jon Krohn:
I haven’t done it so from a distance I’m like, “That must be tricky.”
Mariya Sha:
Yeah. Actually, that’s one of the things that I’m mostly comfortable with. Python in the realm of Gooeys, in the realm of desktop publications and game development actually has a very easy way of… You are using very simple commands and they’re all very logical, so it brings you back to how Python is just plain English. And as long as you know the basic concept of game development which is something I learned in my early days when I was just first exploring the internet. As long as you are aware of how a game environment should be, about the states that a game has, implementing it with Python is the easiest part of it.
Jon Krohn:
Nice. So at the same time that you were doing web development, you were also doing game development from the very beginning?
Mariya Sha:
Yeah, you can call it. You can call it game development, it’s all web games, things that are relatively simple.
Jon Krohn:
Right. I see. Yeah. Yeah. Yeah. So browser based, click and point games of some kind you’ve been working out for a long time.
Mariya Sha:
Yeah. Yeah. Even flash, A bit of flash. I learned it a very long time ago when it was still relevant.
Jon Krohn:
Right. Yeah, that’s something you don’t hear about anymore but it definitely was for a long time for playing games, flash was key. Okay. So a few minutes ago I started asking a question but then I ended up going off on a tangent, and so the question was that you have a lot of videos about productivity hacking, so things like web scraping, automation and bot making, how important do you think these kinds of productivity hacking skills… How important do you think these are for software developers, data scientists to master?
Mariya Sha:
It becomes more and more important as time goes by, that’s the way I see it. So basically, you can look at a few topics of programming. So for example, even when you build Gooey applications or even when you build games, you need to find a way of testing them. Now, manually testing those applications, manually clicking on all the buttons that your program has, manually trying all the types of input that exists on a planet, it’s going to take you a very long time. It’s not a very efficient way of approaching this. So with automation, we are basically able to write a piece of code that would do those tests for us. We basically set it up in advance and we can do whatever we want. We can go drink coffee, we can go enjoy, and we can let our code do the hard work for us.
Now, when we take it into a different realm, for example, in the realm of machine learning, basically web scraping is a great way of getting a database. Yeah, we can of course take our camera and we can go ahead and we can film a million cats in different poses, in different settings. We can organize this database on our computer and it’s going to take us a century. We might retire by the time we’ve finished. But if we go in into Instagram and if we set up a bot that scrapes all the images with the hashtag of cat, that’s going to take us a fraction of the time. So yeah, our database may not be as accurate, some of those hashtags will not be relevant to cats. Some of them will show you illustrations of cats or maybe women who are dressed like cats. But in the end of the day, it saves you so much time that not knowing it puts you in a very big disadvantage.
Jon Krohn:
Cool. Yeah, that was a really great explanation. And so web scraping is a very important tool in machine learning because of exactly what you just said, it allows us to get huge amounts of data. My entire machine learning company, both for training our models as well as for the information that we provide to our users, all of it is scraped from the web. It’s critical. So Mariya, what are your favorite tools for scraping from the web?
Mariya Sha:
I am a huge fan of Selenium, specifically Selenium that is involved with Python. So basically, what makes it so powerful is that you have a browser software that interacts with the servers of any website you are scraping. So instead of having your code directly interact with the code of a server, you have this middleman which is a browser program that regular humans, not just pieces of code that are automated, are using. Especially in the world of bot making, this is a very, very important library to master.
Another big benefit of it is that it’s not only limited to XML and HTML websites. You can also use Selenium with dynamic websites. And that’s why it’s a very, very good a tool to use when we’re dealing with platforms like Instagram and Twitter that keep loading over time. Because you rarely see the end of your webpage when you first load it, and that allows you to work with it in a very, very easy way. Also, you can break CAPTCHAs with Selenium, which is not as ethical but it also gives you the means of being a bit more malicious. That’s why I like it.
Jon Krohn:
Yeah, it could just be annoying to not be able to get the data that you want. And it might not even be malicious or illegal, but it’s just that the platform is trying to avoid you getting all the data that you’d like. But you need all those cat photos for your cat detector?
Mariya Sha:
Absolutely.
Jon Krohn:
So Selenium, that is from my relatively beginner experience with web scraping. It seems to definitely be the way to go. So do you have any ethical concerns about web scraping?
Mariya Sha:
Jon, I have plenty of concerns in the realm of automation and web scraping, both. So when it comes to web scraping, I find that the biggest and the most dominant concern is a distributed denial of service attack. Now, when we make a single bot that browses a website and it’s trying to for example fill up a form, so they go to your contact page and they really want to be heard, they really want you to answer their message. So they will automate this bot that will spam you with the exact same message and will send you 100 of those type of messages. Now when it’s a single bot, sure it may not be a very big threat but imagine that instead of a single bot, you’re talking about 20 bots or 100 bots and they are making simultaneous requests to the server of your website. Now while they make those requests, regular users who are actually human, who are not automated pieces of code, they do not have access neither to the form or to your site.
So as a result, this basically can collapse your entire server or make it a bit slower. So basically, that’s a classic distributed denial of service.
Now in the realm of automation, we are dealing with a completely different set of problems, it mostly has to do with phishing. Now phishing is when somebody pretends to be someone who they’re not in order to maliciously attack other people who are innocent and they are not aware that somebody’s trying to trick them. So basically, we can take a look at a Twitter bot situation. So we have this awesome bot which we have built and now it is retweeting all our tweets. It is liking all our cat photos. It’s really, really enjoying interacting with us. And it’s nice to have somebody that keeps affirming you, “Yes, you’re right. You’re correct, it’s great.” But what happens when you have a series of these bots?
So now you’re an anonymous person, you are trying to post the most basic Twitter post and suddenly 100 other people liking your post. So people get the impression that you are this very popular individual, but all those hundred likes came from an automated piece of code which you have created. So it’s some sort of a pretending to be something that you’re not.
So I find that in the realm of automation, that’s the biggest thing to watch out from. We can see this on Instagram a lot. I don’t know if you’ve noticed, but many times when you post a photo with a hashtag style or hashtag fashion, I don’t know if that’s the type of hashtags you use, Jon, but [inaudible] do that a lot. Hashtag style for example, you will get immediately a direct message from some brand that really wants you to model for this brand, “I really want you to promote us because I love the way you’re posting your Instagram photos.”
Now, the fact that you have 14 followers is not suspicious to a lot of those girls, including myself, “Oh, they want us to promote their brand. Wow, I’m so blessed.” So what they usually do is they tell you, “Hey, you can order anything on our website for 35% off.”
Jon Krohn:
Mm. I see.
Mariya Sha:
So it’s like, “Oh my God. Now I’m a presenter of a brand and I get 35% off.” But guess what? Their only customers are those girls that fall for it. So girls and guys, sorry not only gals. But it’s a classic example of phishing.
Jon Krohn:
Got it. Now it’s interesting because I usually think about phishing as people trying to steal your details, phishing for details, like getting an email where you put in your social security number.
Mariya Sha:
Yeah. Yeah.
Jon Krohn:
So those are really great things to have brought up in the realm of web scraping and automation as things we need to be concerned about, and you articulated them very well. So a while ago we stopped talking about machine learning, we started talking about collecting data for machine learning. But I’d like to go back to machine learning because I know that you still have much more to say there. So as we’ve already talked about, on your channel, you have tons of videos that explain in simple terms, step-by-step, complex machine learning concepts like gradient descent that you mentioned earlier. You have videos on deep learning, neural networks. So why do you think it is that machine learning, why terms like that, like grading descent and even programming in general, why do these things often seem so intimidating to people who haven’t learned them already?
Mariya Sha:
I think there’s a few reasons, but when you talk about folks who are not familiar with this field, I think it’s mostly a pop culture problem. So basically… I’ll give you a good example of it, so when we imagine a piece of AI technology, we think of Jarvis from Iron Man which is a very sophisticated piece of program. Now when we think about this program, we’re not thinking that it’s built of a few layers. So when we imagine it as AI and we start thinking how on earth, where are we even starting to build it from? This is where we struggle. Now, the idea of programming is based on divide and conquer strategies. We’re not thinking of something as a whole, but we’re thinking of components of the components that make up this entity. So for example, if we take a look at Jarvis once again, it’s an automated assistant that basically listens to what you tell and basically applies all the commands you give it.
So the first layer of such a program would be analyzing the sound, recording sound, analyzing it and translating it into text. Now, this is component one and it’s a piece of neural network on its own. That’s an AI on its own. Now basically, when you then continue working on this program, that’s where you take care of other components.
So once you have translated it into text, then you can do some sentiment analysis on this text. What does this text mean? I have a collection of words here but I don’t know what they mean, I’m just this piece of program. So you need to train it to understand those words. Now after it understand those words, you start working on another component which basically applies the commands you’ve asked it to apply. So basically, it’s like an onion. There’s plenty of layers involved. And when people imagine programming, they think of the program as a whole.
They imagine an entire social media network that was never built in a day. Now, what Facebook is today is not what Facebook was 12 years ago when they first started. You were not able to rent a house through Facebook, you were not able to engage in video conversations. You started from a very basic piece of software. And I think that the pop culture is what basically presents us a much more glamorous reality than what it actually is, something a bit more sophisticated than what we are getting exposed to when we start learning programming.
Jon Krohn:
I think you’re right, absolutely. You took that answer in a very different direction than I would’ve thought of or anticipated and I think you’re spot on. Yeah, talking about the complete system as being intimidating. But what I was imagining is intimidating is just it seems like the idea of writing code itself, when people see code and they’ve never written code before, even though to those of us who are already familiar with it, especially as you described Python is having so much syntax that is in plain English anyway, it’s interesting how intimidating just looking at any code seems to be for the vast majority of people. Now, that plays to our advantage as people who have decided to learn how to deal with it. So you, me, most of our listeners probably, have at some point gone over that hump, that intimidating hump of I’m going to learn how to do this.
And you did it when you were 12, so maybe you don’t even really remember what it was like or maybe you were never intimidated at all. Maybe you’re just that kind of person.
But I don’t know, that’s what I imagine that there’s this… Even before people think about, “Oh, how am I going to program this big system?” It seems like people get caught at the first hurdle. I think it’s so great that you were able to have this inspiration of machine learning and AI. And be like, “Oh. And that is coded in Python, and so therefore I’m going to start learning Python.” And you just do it. But I think there’s probably a lot of people out there, including maybe some listeners who think, “Wow, data science, machine learning, AI, these are the coolest topics ever. They’re going to make such a big impact in our lifetimes. And I can have a lucrative career if I go in this direction.” But then have trouble getting started because they have trouble getting over that first hurdle of programming. So I don’t know, I’ve been waffling here for a bit.
Mariya Sha:
No, I can tell you that I’m a perfect example of what you just described. When I first took my first machine learning and deep learning course, it was absolutely horrific. I thought I’m not smart enough to understand any of it. I was struggling a lot and for a very long time I couldn’t understand what those people want from me. Because I find that the way that they teach you is based upon prior knowledge, so you need to come with a certain set of terms that you already predefined. And based on those terms, you can then learn the rest of it. So you need to have some kind of a background with analytics or programming or math, for example. Now, what is intimidating the most about it is the way that it’s being taught, is the sophisticated language that people usually use when they describe artificial intelligence, is all those formulas that you are presented with.
So instead of showing you an example with the most basic piece of input and passing it through the most basic neural network, you’re basically looking at all those complex formulas that are dealing with sets and limits and a lot of math concepts that are frankly not for beginners, not at all for beginners.
Now, for a very long time I thought that, “Well, maybe it’s just not for me. Maybe I don’t understand it well. Maybe something is wrong with me.” And then I remember it was about a year after I took this course and I was cursing for taking this course, I was like, “Oh, that’s awful. Why did I do this to myself?” I ended up taking a piece of paper and a pen, and instead of implementing everything with PyTorch or with a piece of machinery, I ended up manually implementing it.
So I went for the math approach. I’ve selected the most basic piece of input which was a list with three different floating point numbers, and I’ve basically passed it through all the layers of a neural network. And I mean it both in terms of gradient descent and cross entropy and the fully connected layers, I’ve implemented manually all those things that I was theoretically thought on. And only then I actually understood what I was doing.
And when I understood it, I also realized that it’s so easy. I couldn’t understand for a very long time why I’m struggling with it. And then I realized that it’s just not adjusted for a level that is suitable for beginners.
And that’s what I’m trying to solve with all those videos that I’m filming. I’m trying to explain it in such a way that even a kid can understand. That’s my goal. If I can explain it to a child, that means that I understand it myself. If I can’t explain it to a child, it means that I’m using somebody else’s terms. I’m using somebody else’s fancy language. Yeah, it’s solvable.
Jon Krohn:
Like Python simplified.
Mariya Sha:
Yeah, absolutely. Absolutely.
Jon Krohn:
Right. And this is perfect, you did take the bait there. That’s how I was hoping you would answer the question and you did.
Mariya Sha:
Yeah. You have to do it manually. Forget about computers. Just do it the old-fashioned way. If it worked for the smartest people on earth, it’s going to work for you too.
Jon Krohn:
Interesting. Neural networks or deep learning particularly seems intimidating to people. Even people who are already familiar with some machine learning techniques, already familiar with programming, it seems like neural networks and deep learning is something that the people really struggle with. And I mean, part of why I created a lot of deep learning content, I think I have a similar pedagogical style to you. When I’m watching your videos it resonates with me perfectly. You teach in the same way that I aspire to teach. And so that’s part of why I was so excited to have you on the show.
Mariya Sha:
Awesome.
Jon Krohn:
And yeah, it is really interesting once you break down, especially the forward pass of a neural network. So just going from after the neural network is already trained and it has its model weights, the math is actually really simple if you work through the numbers. And so breaking it down for people, you can definitely do it. So yeah, so listener, wherever you are in your data science and machine learning journey, just break it down. Watch some of Mariya’s videos.
Mariya Sha:
Yeah. Take a piece of paper, pen and paper, forget about computers for just a little bit. And I think it brings us back to the principle of divide and conquer. Because when we look at this very bombastic algorithm of a neural network, we can break it down into some very simple math formulas in the end of the day. The whole idea is just to understand the order in which we call those math operations and the reason for which we need those operations to begin with. As long as beginners can master that, I think that gives you a very solid background to move on with a more sophisticated type of content that is not as simple as I do it, but is very, very accurate.
Jon Krohn:
Yeah. You mentioned PyTorch in one of your answers a few minutes ago. Is that your go-to Python library for machine learning or deep learning specifically?
Mariya Sha:
Yeah, absolutely. 100%. I really like PyTorch. I don’t do as much machine learning as I do deep learning, because my favorite AI implementation is probably neural networks. And once you find a way of doing this, you kind of just repeat yourself. And I find that PyTorch has a very intuitive syntax. I know that TensorFlow is also very popular, but I’m not as experienced with it.
Jon Krohn:
Yeah, TensorFlow has become more PyTorch like and it has become more intuitive, but for a while TensorFlow was really a pain to use. You had to… There was a three-step process where you had to design your computational graph and then as a second step, well I’m actually forgetting what the three steps are now. I taught it for so many years. Set up your computational graph, initialize it, I can’t remember what step two, and then step three was you actually start flowing data through it. And it didn’t work right away. You couldn’t just flow data through functions. You had to set up your computational graph first. Oh yeah, step one was setting the computational graph. Step two was allocating that graph to whatever computational resources you have. So TensorFlow would split up depending on how many CPUs and or GPUs you had. It would split up the compute of that graph across all those different devices.
And then finally in step three you could flow information through it. And the PyTorch people came along and they were like, let’s just make it run right away. Forget about that optimization, let’s just make it easier to use like most Python programming is, especially if you’re doing it in something like a Jupyter Notebook.
And so the TensorFlow people then with their TensorFlow 2 release caught up to that kind of idea of this immediate imperative programming. But for a while the TensorFlow was a bigger pain. PyTorch is definitely, it is more Pythonic and it also is much easier to follow up on stack trace issues. So when you get an error in TensorFlow, you get these super long stack traces that don’t make any sense, even when you get all the way down to the bottom of the errors. And so it ends up being like when I get a TensorFlow error, I copy and paste into Google and then I find the stack overflow and then it says like, oh yeah, you need to add a comma or whatever. And I’m like, oh. Whereas with PyTorch, you get the error and typically I can just read it and understand what my mistake was and fix it.
Mariya Sha:
And generally there would be lots of support on it too, especially on stack overflow.
Jon Krohn:
And then in case listeners are wondering, so you might be wondering if you lose something by having it run that way. Because I talked about how with TensorFlow, with that three step process that it used to have with TensorFlow 1 that we had that second step where you were optimally allocating whatever compute you were doing in your graph to whatever devices you had. The trick now with the latest TensorFlow libraries as well as PyTorch all along is you run a script afterwards. So you can be imperatively executing code in your Jupyter Notebooks like you’re used to just running most other kinds of functions. And then once you have designed your model and you’re like, well now I’m actually going to train it on a huge number of devices, or I’m going to run this in production, then you can run a separate script that figures out how to allocate it optimally.
Mariya Sha:
Yeah, absolutely. It’s some sort of a switch these days. It’s very, very easy. You either turn it on and turn it off, do you want to do this on GPU or CPU? Choose. I definitely love this approach. What you just described about TensorFlow is truly intimidating. Like TensorFlow 1. I’m glad I didn’t have a chance to play around with it.
Jon Krohn:
And there’s no point in learning how to do that. For me teaching it gave me so much more to talk about and it meant that there was so much more value in coming to me doing a TensorFlow course, because there really was so much that you had to know. But yeah, this is the way things go. Things become easier and that’s great. Okay, so PyTorch is a library that you like. What other Python libraries do you love and use all the time?
Mariya Sha:
I would say that my most favorite libraries are, it’s not unique, it’s NumPy and Pandas, which are very, very useful. It’s very hard to avoid them. It doesn’t matter what you do. So Pandas gives you a great way of creating complex data structures and it gives you a great way of organizing tables and traversing tables. In terms of NumPy, what I really like about it is that it’ll takes this… Because Python is a very high level language. You are allowed to be as ambiguous as you want. You don’t need to declare variables, you don’t need to allocate a certain amount of space and memory, and you don’t need to decide how many bits each piece of data takes.
But with NumPy, you can be so specific that you go down to almost this C++ level. Now I find that these type of libraries that are basically based upon C languages rather than Python, they basically give the commands in Python, but the implementation is being made on a much lower level. And I think that those type of libraries is one of the reasons that Python becomes so popular. So you can give your commands in a very ambiguous way. You don’t have to be specific, you take care of the big picture and you let Python worry about the small detail. So with NumPy it’s a huge, huge advantage. That’s what I find.
Jon Krohn:
Cool. Yeah, that was a great answer. And yeah, those are the kind of staple libraries, NumPy and Pandas for working with numbers and data frames respectively. So data scientists very often need to know them. PyTorch isn’t necessarily one that data scientists would need to know unless they’re working on deep learning. But if they are working on deep learning, that is, PyTorch is definitely my recommended one for people to get started with. Okay. So beyond specific libraries that you use all the time, what excites you about machine learning today? So for example, I know that you have this history in game development, has reinforcement learning tickled your fancy in the machine learning space?
Mariya Sha:
It scares me so much, Jon. Reinforcement learning is the only approach that I find very disturbing. Now when it comes to supervised and unsupervised learning, you have a lot of control of the information that you present to your neural network. So you’re some sort of a parent and you’re teaching your kid everything you know about life. Now with reinforcement learning, it’s a completely different approach. The whole purpose is not to teach it a thing, you let it learn on its own. So you basically set up this idea of a reward and an idea of penalty and you let your neural network figure out the world for you. And this is what I’m having a problem with because you cannot control any of the things that your model learns. So yeah, you can restrict it, you can basically disconnect it from the internet. You can not allow it to use the keyboard.
But guess what? People who are sitting in front of their couch, sorry, in front of their television on their couch, in their pajamas, who don’t know much about the risks of AI and things of that sort, they’re able to use these technologies independently. And this is what I’m not as scared of as much as corporations that we’re not exactly sure what they’re doing in the background. They have very large servers with lots of data, lots of connections to the internet. I don’t know what those companies are doing with those technologies and that’s where it scares me even more.
Jon Krohn:
Oh, so I thought you were going down a completely different road there. I just thought you were saying that you just find reinforcement learning scary because you have less control over your own algorithms as they’re learning because reinforcement learning algorithms, they interact in an environment and it could be a simulated environment or a real world environment, but they explore that environment. And as you say, they learn what kinds of things increase reward or result in penalties, and so they’re objective. So with the supervised or unsupervised learning approaches, typically we’re trying to minimize cost. So we’re having our algorithms descend the gradient of cost and find the model weights that lead to the lowest possible cost. With reinforcement learning, typically we use the same algorithm as gradient descent, but we call it gradient ascent because we’re figuring out what kind of model weights lead us to having the highest possible reward.
So we’re climbing this reward curve instead of descending a cost curve.
So we have less control over it in the sense that when we train a supervised learning algorithm on a specific set of photos of cats, those photos of cats don’t change. You can go and look at all the photos if you want to manually and take out the ones that are girls dressed up as cats or whatever. But with reinforcement learning, because the algorithm just explores the environment, it’s generating new data with every action that it takes, and it will end up generating data that’s never been seen before. You don’t know exactly necessarily what it’s going to run into. So that’s all I thought that you were going to say, and then you blew my mind by then saying it really scares you that any company can be out there deploying reinforcement learning algorithms on who knows what kind of task.
Mariya Sha:
Irresponsibly. I wouldn’t say even maliciously in a way. Just you have an irresponsible employee who forgot to disconnect the computer that runs this machine from the internet. Or maybe an employee who’s not happy with his job and he’s done something malicious in the code. Things that you cannot really control. And that’s what I’m mostly scared of because this lack of control, because this lack of understanding what data this type of model based its development on that’s where my imagination starts, thinking of Terminator and Matrix and things that I grew up on and things that hopefully will never be implemented, but that’s what I find scary.
Jon Krohn:
Yeah, I gotcha. So although we’re not aware today of machine learning systems that are running without a human in the loop, say in military applications, we could have reinforcement learning systems in the future that are just operating on some reward function that is trying to maintain peace. And so you’re like, oh, I have this completely positive and harmless reward function, but then it figures out that the way to maintain peace is to get rid of the humans or whatever.
Mariya Sha:
Yeah, it’s definitely one of the scariest examples we can think of. Now, as long as this piece of machinery is connected to a computer, we can always pull the plug. So at this point of time, it’s not as scary as it might be in the future. Once it has arms and legs and a way of moving in the space, that’s where things will get a bit more tricky. But again, that’s just me and my paranoia. I don’t know if it’s exactly based upon Hollywood films or just literature, but I have a feeling that we will prevent this from happening. We are smart enough, we’ve seen enough movies to be vigilant.
Jon Krohn:
Well, Mariya, you’re not alone in your concerns. In fact, we did a fascinating episode, episode number 565 with Jeremy Harris on exactly the kinds of concerns that you’re raising. So if listeners want a long, in-depth episode about how AI could be such an existential risk to humans, you’ve got the episode there. But Mariya, this is great that we’ve gone down this road and talked about this, but when I asked the question, I kind of just thought, given your interest in game development and given how reinforcement learning agents can be so useful as opponents in games, I thought that maybe you’d be interested in reinforcement learning from that perspective.
Mariya Sha:
I try not to step into it. I don’t have much experience with that. I’m mostly the supervised and unsupervised learning atmosphere. This speaks to me a bit better. And in terms of gaming, I’m sure that it’s very useful to make opponents. I just, other things draw me really, not AI in the realm of gaming.
Jon Krohn:
All right. Well then what does drive you? What excites you about machine learning today? So you talked about supervised and unsupervised, and just for our listeners who aren’t aware, a supervised learning problem is one where you have labels on the data. So you have a whole bunch of images that are labeled as cats and a whole bunch of images that are labeled as dogs. And then you can train an algorithm to be able to distinguish photos of cats from dogs.
With unsupervised learning, you don’t have those labels, so you just have a whole bunch of photos and you don’t know what they’re of necessarily. But there’s still patterns that you can recognize, clustering that you can do with those unlabeled data in an unsupervised approach. And so reinforcement learning is a third kind of paradigm in machine learning. You’ve got supervised, unsupervised, and reinforcement learning, three different kinds of paradigms for ways that machines can learn. And yes. So Mariya, you’re mostly deep in the weeds, on supervised and unsupervised. What excites you about that, or what excites you about machine learning in general?
Mariya Sha:
So at the moment, what excites me the most is probably recommendation systems. Being a YouTuber, I deal with recommendation systems on a daily basis. And that’s the reason why my channel is being exposed to such a wild wide audience. Now, a funny thing about YouTube’s recommendation system, that it works in a fantastic way even when you’re not signed in. And this I find very, very interesting. On my television, I’m not logged in and somehow YouTube always knows which videos I would like to see. It is based probably upon the cache memory rather than machine learning. But it’s interesting how you can take such a complex algorithm and you can narrow it down to work with way less data than it used to.
Jon Krohn:
What do you mean it works when you’re not logged in? I don’t understand what you’re talking about.
Mariya Sha:
Try using YouTube without signing in. Basically the videos that you recently watched, they’re saved in your cache of your browser or of the application that you’re using. YouTube is able to utilize this little amount of data and somehow recommend you on the most amazing and the most suitable videos that you can ever imagine. I’m truly surprised by it, and I think that’s definitely something to explore because usually when we deal with machine learning, we are dealing with enormous amounts of data. We would like our neural network or whatever algorithm we use, nearest K neighbors or whatever. We would like to have as much reference as we can. And when you don’t log into YouTube, there’s barely any information on you. So it’s very interesting how they’re able to utilize this very little information and to make it so suitable to your needs.
Jon Krohn:
I gotcha. Yeah, that is cool. All right, so we know your favorite libraries. We know that you’re excited about recommender systems. We know that our listeners should probably be looking into Selenium for scraping web data if they haven’t already. Are there any other techniques that you think our data science listeners should know that they might not already?
Mariya Sha:
Yeah, absolutely. I think that a very important technique that not a lot of educational facilities teach you about. For example, my university, I haven’t been exposed to it yet, is how to work with dynamic data. Now, it’s very convenient when your data is static, when you receive it in an organized table, and it doesn’t move over time. But what happens if you’d like to extract some data from the Dow Jones stock market and it’s data that is being updated in real time by the microsecond, and you would like to perform some analytics on it and you would like to predict the smartest investments that you can do. Now we have the technology to do so. But I find that it’s not as… You can’t find as many tutorials on the topic of working with dynamic data. There’s not a lot of guides to show you how to set up a web socket and how to fetch this data.
Jon Krohn:
Are there Python Simplified videos on this?
Mariya Sha:
There’s actually one. There’s actually one. There’s a company called Deep Haven that basically helps you… Basically, they create some quick hooks, they create web sockets for you. And for now, I have a video showing how to do this with a crypto exchange with Coinbase. But it would be amazing if we can apply these type of technologies on the Dow Jones or maybe even tracking all the planes that are flying in the skies at this given moment of time. Because we can get this type of data.
Jon Krohn:
What is a web socket?
Mariya Sha:
If I was able to explain it, I’d be very, very happy. I am yet to understand it. But maybe you gave me an idea for a future tutorial.
Jon Krohn:
Yeah, there you go. That’s the name of the YouTube video. It’s going to be a killer. What is a Web Socket?
Mariya Sha:
Absolutely.
Jon Krohn:
Look out for that one.
Mariya Sha:
Basically when you look for some articles and when you try to see some tutorials on the topic, you once again encounter some bombastic and very sophisticated technical terms that folks who don’t have experience with web sockets not necessarily understand. Now I am able to use the word web socket because I memorized it, but to understand it, that’s a completely different thing. So thank you for the idea.
Jon Krohn:
Well, but it sounds like you also are able to at least tell us about them in terms of how you can use them. So it sounds like they’re a way of interfacing with dynamic data on the web.
Mariya Sha:
Yeah, so it basically connects this source of data that you’re looking to fetch and it connects it to your code or to your environment, to your docker. It basically creates this corridor between you and the data and allows you to pull it. But how to set it up and how to formally define it is something you’re going to have to ask an expert. I wish I knew more about it. I wish they taught it in the university.
Jon Krohn:
No, that’s great. And then I know that you’re fascinated by natural language processing, but we haven’t talked about it all that much. So what’s interesting for you in the NLP world right now?
Mariya Sha:
So what is interesting is basically trying to fix my very inaccurate models. So when I first started Python Simplified, I made an attempt of explaining how you can make an Ngram type of model. So basically it’s a bag of words. You feed a bunch of children’s book into this neural network, Alice in Wonderland, Brothers Grimm, all kinds of fairy tales. And then in the very end of this process, after the network is trained, you then provide it a sentence and then you expect it to complete it into a completely different story.
Now with that said, because I did it before, I’ve implemented machine learning on a piece of paper with a pen. My explanation was absolutely horrible. So my next mission in the realm of NLP is to make this neural network in a smart way, to make it as accurate as it gets and to optimize it. Because for now it exists. It works, it is able to make a prediction, but the story is absolutely horrible.
Jon Krohn:
Got it. So natural language generation is the particular area of NLP that interests you the most. There are some really cool things that can be done there. And yeah, I look forward to seeing your future video as you start to refine that.
Mariya Sha:
For sure.
Jon Krohn:
Nice. Are there any other tools that you use regularly? They could be tools related to producing your videos, or they could be software tools like an IDE. Is there something else that is some secret sauce that you have that you use all the time that I should know about that our listeners should know about?
Mariya Sha:
So in terms of video creations, I am using Adobe Suite, which basically combines all your graphic design needs. You have a software of working with raster images, you have a software working with vector images or with videos, audio. So you basically have this very bombastic toolkit for video editing and for marketing, which is great. I use it all the time. But in terms of IDEs or any secret that I can share in the realm of programming, I can’t tell you that I have any of these because I’m using the most simple IDEs in the world. I’m using Sublime, I’m using Jupyter Notebook sometimes I’m using a cloud-based program called WayScript. They have a really, really nice way of sharing code via cloud, but I’m using-
Jon Krohn:
WayScript?
Mariya Sha:
…the most traditional tool. Yeah. WayScript.
Jon Krohn:
I hadn’t heard of that. So that is a new one.
Mariya Sha:
No, yeah it-
Jon Krohn:
And it allows you to collaborate on code?
Mariya Sha:
So basically you can share your code very easily. You can even host your websites on it. I have a tutorial where I’m hosting a Flask application, so it basically takes Visual Studio into a cloud-based realm. And I’m sure there is a alternative there, but it always has a lot of buttons and it’s always very hard to understand it. So they kind of simplified this IDE interface. So I do use it a lot in my videos.
Jon Krohn:
Cool. Yeah, that is completely new to me. I’m glad I asked the question and I’ll be sure to include a link in the show notes. I was able to find it here in real time, but it’s Way W-A-Y Script. All right. And then, so beyond your technical tips, here’s another pragmatic one for you. So even though your YouTube channel is only a few years old with your 125,000 plus subscribers, you’ve demonstrated that beyond just your tremendous programming and Python skills that you have, you also are displaying many other talents like graphic design, video editing, marketing, personal branding. And now we know, okay, I could be using the Adobe Suite to do some of these things. But also how do you manage to do all of these things? Do you have any productivity tips or tips for learning new skills that we should know about?
Mariya Sha:
Yeah, absolutely. So to be fair, I’m a graphic designer for many years. That’s my trade, that’s my profession. So I’ve been working in the signage industry, so anything that has to do with visual communications, I already had some background there, so I don’t think I have a level playing field from that perspective. But video editing was completely new to me. With graphic design and marketing, I do have some experience, but when you dive into the realm of video, there are so many components involved and that it’s just overwhelming. So first of all, it’s the lights, then it’s the microphone, it is also the camera angles. And then not to even mention editing your video, how do you work with a green screen? So many different things. But…
Jon Krohn:
Yeah, you do great green screen work.
Mariya Sha:
Yeah? Thank you.
Jon Krohn:
It’s something that I under use for sure. But one of the great things about Mariya’s video on her Python Simplified channel is that many of them have her doing the programming. So over laid, so because she’s using a green screen behind her, she can have the code right behind her. And it’s not like a rectangle cut out of the code. There’s just Mariya’s face and hands gesturing as she’s typing these commands in real time that you can see appearing on the screen. Whereas when I record hands-on tutorials, I flip between having live shots and doing the hands-on code. And for me that’s also partly because I am behind the scenes, I’m like stressing over so many things and making everything work and doing so many cuts that, so that’s part of why I do that. But it’s amazing that you can do it with the green screen and just be going through it in real time and seemingly with very few takes.
Mariya Sha:
Yeah, so a good way to include a green screen is by using OBS Studio, by the way. You have a way of filtering out the green elements. You have basically a chroma key, I believe it’s called. It’s a filter and it basically does all that work for you. Now that’s not the only tool I use. I also do the editing. I edit out the green screen in certain shots. But all this equipment and all this level of sophistication, this is not how I started from. My first few videos I didn’t even have a camera filming my face when I talk. It was basically a recording of my voice and I was creating some graphics. I was screen capturing my computer, but there was really no person behind those videos. So I started from very, very simple videos and I didn’t have any fancy software or a fancy computer which I could use to edit those videos. I was using Windows Movie Maker, which is discontinued since Windows 10 came out.
Jon Krohn:
Do you use a Windows machine right now?
Mariya Sha:
Right now, yeah. We’re talking [inaudible].
Jon Krohn:
You use Windows Machine even though you do all that video editing and all that programming. Wow.
Mariya Sha:
Yeah. Well Adobe Suite kind of restricts you from Linux. I would’ve used Linux if Adobe Suite was available there, but I’m not a Mac person if that’s your question. A lot of my friends are, my spouse is actually a big Mac person. But I do enjoy gaming. I do enjoy things that are not very traditionally associated with Mac. So yeah.
Jon Krohn:
Fair enough. Yeah.
Mariya Sha:
But yeah. So the tip for folks, basically the biggest tip that I can share, especially for-
Jon Krohn:
Is be a graphics designer.
Mariya Sha:
No, not just that. It’s a nice skill to have, but I think that whenever you start a new channel, whenever you’re thinking about starting filming new tutorials and videos, you need to set yourself a realistic goalpost because we see all those fancy tutorials, we see channels with millions upon millions of subscribers. And we think that, okay, well my first video should look like that. No, no, those channels, not only they have very expensive equipment, but they also have a lot of employees and all of those employees are experts in their field.
So they have a camera guy, they have a lights guy, they have a editing guy. So when you’re trying to start something, don’t aim too high. Aim for what your equipment can give you. Now my first few videos, I now consider them as horrible, but back then I was so happy about them because that’s what I was able to produce with what I had. And I think that the most important part in being a media person is your connectivity with the audience rather than the quality or the editing of your videos. Right? Because if you have something that people enjoy, if people like your personality, that’s something you want to find out before you invest in all this equipment and spend all those hours learning sophisticated programs. Yeah.
Jon Krohn:
Cool. Well those are great tips. Don’t aim too high. Yeah. [inaudible]. Yeah. Have the right goal list. No, it makes perfect sense. You can’t expect… And this ties in a lot to the idea of having a process for success as opposed to a goal. Because if you set out with the goal of having 1 million subscribers or a 100,000 subscribers or 10,000 subscribers or whatever isn’t a great idea because one, it could be really hard to obtain that specific goal. And two, then what happens after you’ve attained it? So you get 10,000 subscribers and you stop. I guess you could set new goals, but then that’s endless as well. So I’m not a goal oriented person, I’m a process oriented person. So SuperDataScience Podcast, we make two episodes a week every single week. And that’s the process I’m committed to.
And I believe that good things will follow from doing that. I’ll become a better podcast host. I will become better at figuring out how to market the show and the audience will grow. And maybe I even get better at speaking on air. I don’t know, probably still um and ah as much as I did when I took over the show two years ago. But yeah, when you have a process oriented approach, it allows you to be refining the way towards success. And then as you achieve it, you can just continue going, just continue refining your process, stick to the process.
Mariya Sha:
Exactly. Like from how I see it. If you ask me what my goal is with this entire process because I’m still learning, I can tell you that a lot of people think my video editing skills are really, really high up, but I know I can do better and I know I’m just starting now. My goal is to make sure that the next video will be better than the last one. And that’s the only goal I have. And that’s the way you progress over time. Because Rome wasn’t built in a day, it’s a process, it takes time. So yeah, your first video is not going to be the most viral thing in the world and maybe only your family and your friends will watch this video, but guess what? You’ll be able to collect a lot of feedback in the process. and feedback is what makes you better, especially critical feedback. I love negative comments, I love it. I pin them to the top and I let people have a discussion right underneath. I learned a lot from it.
Jon Krohn:
That’s great. Yeah, I recently had a really cutting negative comment and I used to bother me. I used to take it so personally and it sit with me, but nowadays I got this negative comment and I replied, I was like, “Wow, that was such a cutting insult. Congrats. That was like wow, really.” And just laugh it off and learn from it. And because there can be a kernel, not every piece of critical feedback is going to be accurate, but often there is helpful critical feedback in there for improving. And so this is true, this process of getting yourself out there early, taking risks early, just publishing stuff publicly. This is true not only for content creation but also for developing applications. So you might be a listener who is maybe not interested in creating a YouTube channel or podcast, but you’re interested in having a machine learning application that users can use. It’s the same kind of process. Get it out early, have your friends and family kick the tires on it first, give you critical feedback and eventually you’ll have something that’s ready for an early paying customer.
Mariya Sha:
Yeah, maybe it’s going to be a startup, maybe we’ll hear about it in the news.
Jon Krohn:
Yeah. All right. So Mariya, I’ve just got one final topic for you today, which you’ve alluded to this somewhat in the episode already. So you’re talking about how you haven’t been learning about WebSockets in your undergraduate program that you’re studying. So you’re currently enrolled at the University of London in the United Kingdom for a computer science bachelor’s degree while you’re living quite far away in British Columbia, Canada. So what’s that experience been like and why did you opt for distance learning? Especially because I know you live very close to Simon Fraser University in Canada. You could be studying there in person, but yeah, you decided to go remotely. And I think that this is a particularly interesting question for our audience because there could be lots of listeners out there who are interested in getting deep into the academics of a computer science degree with a machine learning and AI specialization like you’re studying. So should they be choosing the University of London undergraduate program as well?
Mariya Sha:
See, there’s always a good side and there’s always a bad side to everything. Now the reason why I went for this distance learning degree, rather than just enrolling to the same university that I live five minutes of walk away from was the flexibility. So the program has a very flexible nature from all kinds of perspectives. So the first perspective is that you can choose the number of modules you take per each semester. So for example, if you have a half a year where you are very, very free when you have all the time in the world, take more modules. If you have this project that you need to complete for work or for your hobby, for your family, then take less modules because you, you’ll have the other project in mind. Now the flexibility is also something you notice in the study hours because when you do this distance learning degree, you’re not really attending a classroom that is being taught by Zoom.
You are attending pre-recorded videos very similar to what I have on my channel. So using green screens and basically a program that was recorded well before you even attended this university, which is nice. The flexibility is definitely something that I found value in because I was a graphic designer, I didn’t want to quit my job.
Now the world turned on me in the end of the day, so I ended up being home. It happened around 2019. But my concern was I do not want to quit my job. I do not want to change my life. I do not want to move away from where I live right now, but I still want to gain this extra knowledge. And I didn’t find a lot of differences between doing this undergraduate degree and attending Coursera or Udemy for my own personal needs. Because I’ve been self-learning for all my life basically.
So to me it was a no-brainer basically. Now there are some downsides as well that not a lot of people are talking about. So the very evident downside is that you don’t get to interact with an actual teacher. You don’t get to do some networking with your peers, right? Because after you finish four years of intensive learning, you then move on to your professional career. And some of your peers may become CEOs of very fancy companies and may fix you with a perfect job. So it’s an important aspect of university that you’re missing because you’re all alone in your room, study all by yourself.
Now these factors, I think a lot of people are talking about them. So I had them in mind when I started my degree. What I didn’t keep in mind was the fact that when you learn in a physical facility, there is a limitation on the amount of students that can attend a certain class.
And there’s also always some kind of a ratio teacher per student that has to be maintained because there’s only a finite amount of spaces in the room. Now when you study online, when you study in a university that is international, everybody in the world can attend. This space limitation is not being applied. There is no fixed ratio teacher per student. And this is where I find there is a big, big disadvantage now because there are so many students attending those type of degree programs. When it’s time to check your exams, when it’s time to check your midterms or your final exams, it takes a very long time to get some feedback.
Now in the majority of the modules I took at the day that I’m taking my final exam, I’m already in the final exam, I’m sitting and I’m writing and I still don’t have the results of my midterm.
So I don’t really know if I understood the topic. I don’t know where I need additional help. And another part, and I think this is something that a lot of universities even physical ones are implementing, is when you’re getting your grade back, when you are getting this feedback, nobody really tells you where your mistake was. So yeah, you got 95 or you got 85, but you don’t know if you’ve made a silly mistake in terms of a definition or you made a very bombastic and crazy mistake in terms of understanding an entire concept. And this is something that I find out through my channel, which is very interesting because I cover topics, for example, unit testing. I had a recent video about unit testing and I thought I understood it very well. I got a nice grade, I got 86.
So I thought that I have all the confidence in the world to continue sharing my knowledge with the world. But then if you notice, the pinned comment on this video is by somebody who’s been applying unit testing for many, many years and he is telling me, “Mariya, you are missing a very important point. Here’s how it actually works.” And I look at his answer and I realize that man, if I only had this conversation with him during my studying, maybe this 85 would’ve been 100.
So it’s something that you never know, it’s not a feedback that you get from the university. So if your consideration in doing this degree is the learning experience, is actually understanding what you do and getting this feedback, this might be the opposite of what you’ll be getting. Because it’s a very independent type of an environment and you do have to study in external sources as well. It’s not enough what you learn in class. You always need to have some kind of a backup.
Jon Krohn:
Nice. Great summary of the pros and cons of distance learning.And yeah, so food for thought for people out there. Mariya, this has been such a fabulous episode. You’ve been so generous with me with your time as well, giving us this very long, rich episode with tons of practical tips. So at the end of every episode, our listeners are already well aware, I ask for a book recommendation, have you got one for us?
Mariya Sha:
So I highly recommend a book called The End of Mr. Y by Scarlett Thomas. It is not an educational book. It’s not about machine learning, it’s about a cursed book and it’s a very interesting story and it actually touches some of the realms of machine learning and even Naturopathy. I don’t know if I’m pronouncing it well, but it’s-
Jon Krohn:
Naturopathy.
Mariya Sha:
Yes, that’s right. Sorry, my accent doesn’t allow me to really say the right words, but it’s very, very fascinating and it’s very entertaining. I read it in a span of three days because it was so interesting. So definitely if you guys are interested in fiction books, definitely check it out.
Jon Krohn:
Wow, I’d love to be so into a book that I read it in three days. That’s awesome, Mariya, thank you for the great recommendation. And then finally, obviously listeners know that they should be subscribing to the Python Simplified YouTube channel, not only to get your outstanding video content, much of which is on machine learning, but also lots of other Python and development related concepts. They also of course have the opportunity through that channel to interact with you in real time in your live sessions and your live streams. But beyond YouTube, are there other social media channels that our listeners should be following you on?
Mariya Sha:
Yeah, absolutely. So you can find me on LinkedIn, Mariya Sha. I post a lot of things on LinkedIn. Actually, the other social media platform other than YouTube that I’m using. The rest of the platforms I do have accounts on, but LinkedIn is where I’m mostly at when I’m not on YouTube. In addition, I have a nice Discord server, which we briefly discussed in this podcast. Please join and get introduced to the rest of the team behind Python Simplified. It’s not exactly a team as much as a team of moderators who help with replying to comments and with dealing with dating bots that keep spamming us for some reason. But yeah, the folks there are very knowledgeable. There’s a lot of software developers by trade, they’re helping beginners and non-beginners on their journey as well. In addition, I also have a Twitter account, which is how we met Jon. You guys can find me @MariyaSha888 on Twitter. What else, do I have anything else? Probably not. So definitely go to YouTube route.
Jon Krohn:
Yeah, YouTube. Yeah. I can’t recommend the channel enough. Mariya, I’m so delighted that you responded to my tweet and agreed to come on the show. My operations manager, Natalie and I were like, we couldn’t believe when you responded and so quickly and you were interested. We were like, I think it was after our workday had ended. And so she sent me a screenshot that you’d like replied, and we were both like, “Oh my God.”
Mariya Sha:
Thank you.
Jon Krohn:
So yeah, thank you Mariya, for coming on the show. It’s been such an awesome episode. I had so much fun and maybe in a couple years we can catch up with you again.
Mariya Sha:
Yeah, absolutely. Thank you so much for inviting me. I am honored that you guys are excited, very honored. It’s very humbling to be a student and get people who are so proficient in the trade excited about talking to me. I didn’t know. I didn’t know I’m so special. But yeah, thank you so much for having me over. I had a lots of fun, so definitely invite me the next time.
Jon Krohn:
For sure. And yeah, for our listeners, this was Mariya’s second ever interview, so tons and tons and tons of YouTube content, but only second time ever being interviewed and as I expected, it was easy and super informative. All right, Mariya, thank you so much. We’ll catch you again soon.
Mariya Sha:
Thank you. Bye-bye.
Jon Krohn:
Wow, wow, wow. What a fascinating guest Mariya was. We filmed for several hours and the entire session she was brimming with energy and insightful perspectives. In today’s episode, Mariya filled us in on how she got started with programming in Python and her hugely popular Python Simplified YouTube channel when she discovered it was the lingua franca of AI. She talked about how browser automation with tools like Selenium is an increasingly critical skill for data scientists to know, particularly if you’re looking to scrape data from dynamic websites. She talked about how you can make apparently complex data science concepts much simpler to understand by focusing less on abstract formulae and focusing more on implementing the concept with real numbers using pencil and paper. She talked about why PyTorch, Pandas and NumPy are her favorite Python libraries and how the Adobe Suite, the Sublime Text editor, Jupyter Notebook and WayScript are her favorite software tools in general.
And she talked about how useful being able to handle dynamic data with WebSockets could be for data scientists.
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Mariya’s social media profiles, as well as my own social media profiles at www.superdatascience.com/639. That’s www.superdatascience.com/639. Thanks to many audience members including Svetlana Hansen, Adrianne Rodriguez, Robert Robinson, Mark Moiyu, and Tuka Bade for inspiring many of the questions I ask Mariya during today’s episode. If you too would like to ask questions of future guests of the show, then consider following me on LinkedIn or Twitter as that’s where I post who upcoming guests are and ask you to provide your inquiries for them.
All right, thanks to my colleagues at Nebula for supporting me while I create content like this Super Data Science episode for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another super interesting episode for us today. For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors whom I’ve hand selected as partners because I expect their products to be genuinely of interest to you.
Please consider supporting this free show by checking out our sponsors links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode, you can get the details on how by making your way to jonkrohn.com/podcast. Last but not least, thanks to you for listening all the way to the end of the show. Until next time, my friend, keep on rocking it out there and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.