Welcome back to the Five-Minute Friday episode of the SuperDataScience Podcast!
This week, our three-part series on strategies for extracting business value out of machine learning comes to an end. In this episode, Jon recommends starting with simple models and reminds us that model speed could be more important to your users than accuracy.
For Five-Minute Friday a fortnight ago, Jon covered his first strategy, which involved identifying a commercial problem before starting data collection or ML model development. Then, last week we dug into the data collection process that should follow.
Today’s episode is all about the steps that come after data collection. If you’re collecting more and more labeled data gradually, then you’re better off using a simple model to start. This also enables you to confirm whether there’s any valuable signal in the data you’ve collected so far. So, to start, you might use a simple logistic regression model with a handful of manually-curated features you’ve come up with as your model inputs.
As you collect larger amounts of data — the exact amount depends on the particular problem you’re solving and how much signal there is in the data relative to noise — you can start experimenting with more complex ML models, such as deep learning models, which can automatically learn the most important features in the data and can outperform simpler models with respect to accuracy. Generally speaking, you might need at least tens of thousands of data points to meaningfully make use of deep learning. For some problems, you might need millions — or, in extreme cases, even billions — of data points for a large deep learning model to demonstrate its worth and capability.
One final point that Jon makes is that when you deploy your ML model into a production system speed is almost always more important than accuracy. Users have become accustomed to receiving the results of their query in seconds or less. The model needs to be accurate enough that users are satisfied with the results they get, but waiting ten seconds instead of one second for a result, however, will definitely be perceptible.
ITEMS MENTIONED IN THIS PODCAST:
DID YOU ENJOY THE PODCAST?
- Do you know what’s most important to your users? Speed or accuracy? Have you been prioritizing model accuracy over speed when building your models?
- Download The Transcript