(00:06):
This is Five-Minute Friday, on The Machine Learning House.
(00:19):
On last week’s Five-Minute Friday, I discussed a point made during conversation with the Russian data scientist Nikolay Kurbatov about how, in the data science field, the learning never stops. The field is fast-moving with countless new approaches, tools, and techniques becoming fashionable, supposedly must-have skills every single year. This can be intimidating, but as I detailed in last week’s episode, this is also what makes choosing to be a data scientist exciting and, ultimately, we can relax into this fast-paced reality.
(00:52):
Now, all of that said, there is one big, related counterpoint that I’d like to draw your attention to. The foundational subjects that underlie data science, including those behind statistical models and machine learning approaches — i.e., linear algebra, calculus, probability theory, and data structures — these subjects barely change at all decade after decade after decade. So, by studying these subjects, such as by following along with the content in my Machine Learning Foundations curriculum (whether by purchasing my Udemy course like Nikolay did or simply viewing the free versions of the lectures I post on YouTube) you are wisely investing your time in a solid career-long bedrock. As the high-level tools and techniques change and change and change year after year, the underlying foundations are almost completely fixed.
(01:45):
The foundational math and computer science subjects aren’t necessarily the easiest place to start in data science so, if you’re early in pursuing a career in the field, perhaps consider having fun with the high-level code first, such as via scikit-learn and Keras. Get used to various real-world applications and develop an appreciation for their near-magical qualities. But, as you start to notice that you’re curious about what exactly is happening under the hood of the high-level code, or you’re hindered by limited flexibility in what you can do, you could then take a crack at shoring up the foundational subjects that underlie the entire field. You won’t regret it and as you immerse yourself more deeply in linear algebra, calculus, probability, and computer science, you’ll start to see opportunities to improve your models and model deployments everywhere.
(02:34):
Indeed, from my perspective, to be an outstanding data scientist or machine learning engineer, it doesn’t suffice to only know how to use models via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train really innovative models or deploy them to run performantly in production, an in-depth appreciation of machine learning theory, which I like to think about as the ground floor of the “Machine Learning House”, may be helpful or essential. And, to cultivate such in-depth appreciation of machine learning, the “Machine Learning House” has to have strong foundations, these foundational subjects – linear algebra, calculus, probability and computer science.
(03:19):
When the foundations of this “Machine Learning House” are firm, it also makes it much easier to make the jump from general machine learning principles to specialized machine learning domains such as deep learning, natural language processing, machine vision, and reinforcement learning. This is because, the more specialized the application, the more likely its details for implementation are available only in academic papers or graduate-level textbooks, either which typically assume an understanding of the foundational subjects.
(03:52):
These foundational subjects of the “Machine Learning House” are, again: linear algebra, calculus, probability and computer science — and all of which you can get a clear overview of via my ML Foundations GitHub repo. These foundational subjects may be particularly relevant for you at this point in your career if: You use high-level software libraries to train or deploy models, and would now like to understand the fundamentals underlying the abstractions, this will enable you to expand your capabilities. Another reason is you’re maybe a data scientist who would like to reinforce your understanding of the subjects at the core of your discipline. Or you’re maybe a software engineer, a machine learning engineer or data engineer who would like to develop a firm foundation for the deployment of machine learning algorithms into production systems. Or you’re maybe an ambitious data analyst or A.I. enthusiast who would like to become a data scientist or data engineer, and you’d like to deeply understand the field you’re entering from the ground up instead of relying on high-level abstractions. And finally, you may simply be keen to understand the essentials of linear algebra, calculus, probability, algorithms and data structures for its own sake, because it’s fascinating.
(05:09):
I hope this episode didn’t sound to sales-y. I’m primarily evangelical about this topic because I genuinely and wholeheartedly believe that it will serve you so very well. And, after all, you can get all this content for free from my YouTube channel, so I don’t have strong ulterior motives for being sales-y! All right, so with that, that’s yet another episode of the SuperDataScience show. Keep on rockin’ it out there, folks, and catch you on another episode soon.