SDS 558: Jon’s Answers to Questions on Machine Learning

Podcast Guest: Jon Krohn

March 17, 2022

Welcome back to another Five-Minute Friday episode of the SuperDataScience podcast.
This week, Jon recaps his recent discussion with the Open Data Science Conference, which touched upon his extensive machine learning and deep learning content library and his approach to building his online curriculum.

 

After speaking with the Open Data Science Conference about his machine learning and his extensive library of machine learning and deep learning content, Jon reviews his discussion for SDS listeners who might also be interested in his answers.
The quick Q&A provided insights into how and why Jon chose the topics to develop into online learning content and covers the tools and software he (and the industry at-large) currently increasingly or are looking forward to leveraging further in their work. Some of the questions Jon answers include:
  • Why did he focus his content on deep learning and foundational machine learning topics?
  • Would you consider deep learning an “advanced” data science skill, or is it approachable to newcomers/novice data scientists?
  • What open-source deep learning software is most dominant today?
  • What open-source software is Jon looking forward to using more?
Tune in for Jon’s insights into these questions and learn about the tools and software that he finds useful for his day job at Nebula.
ITEMS MENTIONED IN THIS PODCAST:  

DID YOU ENJOY THE PODCAST?

  • What popular data science tools and software are missing from your toolkit? Would it be beneficial to start mastering them this year?
  • Download The Transcript

Podcast Transcript

(00:05):
This is Five-Minute Friday on Answers to Questions about Machine Learning.

(00:19):
For Five-Minute Friday a fortnight ago, I provided a summary of the various methods of undertaking my deep learning curriculum, be it via YouTube, my book, or the associated GitHub repo. Then for Five-Minute Friday last week, I similarly detailed the various methods of undertaking my foundations of machine learning content, which covers all the subject areas that underlie a firm understanding of ML: namely, linear algebra, calculus, probability theory, and computer science.
(00:45):
Recently, the wonderful folks from the Open Data Science Conference asked me five questions about ML in general and about these curricula of mine in particular, and they published my answers on their blog. Given that ODSC thought this Q&A would be interesting for their data science audience, I thought that perhaps you might be interested in hearing my answers to their five questions as well.
(01:09):
Their first question was why the content I’ve released so far has specifically focused on deep learning and the foundational subjects underlying machine learning. What I told them is that there are lots of learning materials out there on machine learning in general. For me, there’s more value in learning content that delves into specific niches of ML that are both disproportionately impactful and disproportionately untapped.
(01:34):
My ML Foundations content is an example of one such niche. Instead of studying ML in general, the curriculum touches on ML examples only to provide hands-on applications of the four foundational subject areas: again, Linear Algebra, Calculus, Probability Theory, and Computer Science. By mastering these subject areas, one becomes an especially capable and valuable ML practitioner so it’s worth investing time in understanding those subjects.
(02:00):
My deep learning curriculum is similar. It again doesn’t cover ML in general, but rather it focuses on the deep learning subfield of ML in particular. With the abundance of cheap compute and data storage of recent years and the even cheaper compute and storage of years to come, deep learning is uniquely positioned amongst ML subfields to make real-world breakthroughs across applications as diverse as machine vision, natural language processing, artistic creativity, and complex sequential decision-making. Again, an area of ML especially worth understanding because of its capacity to be so vastly impactful in the years to come.
(02:39):
All right, so that was the first question. Their second question was: Would you consider deep learning to be an “advanced” data science skill, or is it approachable to newcomers or novice data scientists? The good news here is that, following particular pedagogical approaches like mine — which is visual, intuition-focused, and hands-on — deep learning becomes approachable to novice data scientists. So, while deep learning is an advanced skill with state-of-the-art applicability, you don’t necessarily need to be a highly experienced data scientist to make outstanding practical use of it.
(03:15):
All right, so that was question two. Their third question was about what open-source deep learning software is dominant today. I let them know that after a few years of catching up, in 2021 PyTorch finally overtook the TensorFlow/Keras combination as the most popular library for architecting and training deep learning models. In the show notes, you’ll find a link to my recent talk on the relative strengths and weaknesses of these libraries — PyTorch, TensorFlow, and Keras — for a sense of which to use or learn about first depending on your application needs.
(03:46):
In a kind of brief summary, TensorFlow and Keras are really great for production deployment still today, they have a lot more associated libraries for deploying deep learning models into different kinds of circumstances like servers, on mobile phones, into someone’s web browser. PyTorch on the other hand is a lot more fun and easy to use. So I find PyTorch to be better for actually designing models. So, the kind of spoil alert of that talk is that I think that is worthwhile learning both PyTorch and TensorFlow, as well as the Keras model that’s embedded within TensorFlow and makes building TensorFlow models easy. And the good news is that if you learn one, if you learn PyTorch, it’s very easy to learn TensorFlow. If you learn TensorFlow it becomes very easy to learn PyTorch. So you can’t really loose in what you start, I probably lean slightly towards starting with PyTorch but again check out that talk for getting more detail on the relative strengths and weaknesses of these libraries to get a sense of what’s best for you.
(04:56):
Nice, all right. And then their penultimate question was on what open-source software I’m looking forward to using more. For this, my answer is PyTorch Lightning, which is enticing as a lightweight wrapper around PyTorch code for easily scaling up models to training on lots of data or to deploying large models into performant production systems.
(05:16):
Finally, they asked me about a case study where I’ve used deep learning in practice. Well, at my day job at Nebula, we use deep learning to “understand” natural language on resumes and in job descriptions in order to automate human resources workflows, thereby enabling talented people to land the right opportunities for them more rapidly than is otherwise possible. This is because this kind of natural language understanding goes far beyond what you can do with keywords alone. If you are interested in getting a lot of detail about this particular idea of natural language understanding and how I use deep learning to evoke natural language understanding for this particular, human resources application you can check out the detailed webinar I gave, a link to that is in the show notes as well.
(06:05):
All right, I hope you found the answers to those questions from the Open Data Science Conference to be informative. If you’d like to stay up to date on the releases of anything I’m working on — currently I am releasing more content from my Machine Learning Foundations curriculum then you can sign up for my email newsletter on jonkrohn.com. Also, don’t be shy if you ever have questions about Machine Learning or Data Science that you’d like me to answer, then tag me in a LinkedIn or Twitter post about the question and yeah, maybe in a future Five-Minute Friday, I will answer that for everyone or perhaps I’ll just answer right there in social media.
(06:44):
In the meantime, keep on rockin’ out there and I’m looking forward to catching you on another round of SuperDataScience very soon. 
Show All

Share on

Related Podcasts