SDS 466: Good vs. Great Data Scientists

Podcast Guest: Jon Krohn

April 29, 2021

Welcome back to the FiveMinuteFriday episode of the SuperDataScience Podcast!
Today I’m talking about what elevates a good data scientist to a great one.

 

A couple of weeks ago, my social media consultant sent me a blog post on good vs great product managers and suggested I put together a similar post for data scientists. Frequently, I ask guests on the podcast what skill sets they look for in potential hires and I’ve noticed a reoccurring theme. They almost always say: communication and willingness to learn. Sometimes it’s the only answers they give.
In our case, communication means being able to convey data science topics to a broad audience which includes not only fellow data scientists but stakeholders as well as people from different departments. The willingness and ability to learn means having demonstrable experience in excelling in taking on new information. It’s not about innate learning capacity, there’s more than one way to learn and digest information. Sometimes this shows up as the ability to focus, writing down your own learnings, quizzing yourself, and knowing how to do research.
Beyond these, I was curious what other professionals think. I put it out to Twitter and got a huge response with over 6,000 engagements. Some people had comical responses, but a friend of mine pointed out that a good data scientist can do reinforcement learning, a great one knows not to. The point he’s making is that any data scientist can know the technique while a great one knows when to employ simpler techniques instead. Another response was that data engineering skills or even the presence of a secondary data engineer elevate a data scientist and data science team. Other points were made on creativity, curiosity, humility, leadership, organizational skills, and other skills. Other favorite responses include that a great data scientist is willing to get intimate with data, caring how your work impacts others, and more!

ITEMS MENTIONED IN THIS PODCAST:

DID YOU ENJOY THE PODCAST?

Podcast Transcript

(00:06):
This is FiveMinuteFriday on Good Data Scientists versus Great Data Scientists. 

(00:19):
A couple of weeks ago, Maria Lee, who helps me out with social media marketing, forwarded me a blog post she really liked on good versus great product managers. She suggested that I write a similar post about data scientists. Immediately, two distinguishing factors between good and great data scientists came to mind for me. This is likely because, on the SuperDataScience podcast, I frequently ask data science leaders what they look for in people they hire. I’ve noticed a recurring theme in the response. They nearly always say communication and knowing how to learn. 
(00:59):
Surprisingly often, these are the only two items mentioned by guests with respect to what they look for in new hires. In the context of data science, the first one there, communication means the ability to clearly explain complex technical content in simple terms to a broad audience, including other data scientists on the team, engineers, product people, and more commercially-oriented folks like managers and your end-users. 
(01:25):
The second item, knowing how to learn, means having demonstrable experience from your background or perhaps via in-interview exercises in excelling at taking on new information and factoring that new information into decision-making as well as, of course, being able to then communicate it clearly. This one, knowing how to learn, is in solely about innate learning capacity. There are lots of structured ways that people can digest and rehearse information in order to learn more thoroughly more quickly. These are skills that you can practice. Examples include focusing attention on one task at a time. For example, with the Pomodoro Technique I covered in episode number 456. Writing down what you’ve learned in your own words, using flashcards to test your recall of the most important information, and knowing how to find the information you need by searching online or in a book. 
(02:22):
All right, so beyond these two big ones that I had already thought up, communication and knowing how to learn, I was curious what other data professionals think. And so, on April 15th, I posed a simple question on Twitter. I asked, “What separates a good data scientist from a great one?” I was absolutely blown away by the response, which garnered more attention than I typically get across all my tweets in a given month. The next day, the post had been viewed 180,000 times. At the time of writing this blog post on April 19th, just four days after the tweet went out, it has over 200,000 impressions and over 6,000 engagements. 
(03:04):
Some of the responses were rather witty and had me laughing out loud. Since I asked what separates a good data scientist from a great one, a good chunk of the responses were to do with pandemic related restrictions, such as by pointing out that at least six feet separates good data scientists from great ones. Ha ha. Some comedians went down the frequentist statistics route with their jokes by suggesting that two standard deviations separates a good data scientist from a great one. Others meanwhile, went down the machine learning route by conjuring up imagery from the support vector machine technique and suggesting that a decision boundary or a hyperplane separates a good scientist from a great one. 
 (03:50):
Martin Goodson, a friend of mine for more than a decade and CEO of Evolution AI in London, wrote, “A good data scientist knows how to do reinforcement learning. A great one knows not to.” While amusing, I think the broader point that Martin is making here is one shared by many respondents. Good data scientists know the most sophisticated modeling approaches, whereas great ones avoid a computational and complex approach when a simpler one will do. 
(04:21):
Rockstar data scientist, Chris Albon, who is Director of Machine Learning at Wikimedia and former host of the brilliant Partially Derivative podcast that inspired me to begin hosting a podcast myself, so now you know who to blame. Chris wrote that, “What separates a good data scientist from a great one is a data engineer.” While terse and entertaining, Chris’s tweet bears truth in two ways. A great data scientist can herself be a data engineer. Or alternatively, a great data scientist may have a data engineer or even a team of data engineers transforming their model from a collection of weights into real-time magic within a production application. 
(05:06):
Beyond the humorous replies, I was delighted that communication and knowing how to learn were indeed a recurring theme across many of them. Countless additional thoughtful points were made, however, including on creativity, curiosity, humility, the ability to listen, experimental design, product design, inspiring or leading a team, task prioritization, commercial awareness, organizational awareness like being able to manage up those above you in the corporate hierarchy, and specific technical skills like software engineering, Bayesian statistics, and distributed computing tools such as Apache Spark. 
(05:49):
I wish I could include all of the responses here, but you can of course refer back to the original post and make your way through them. Some of my particular favorites are as follows. First up, the legendary Brandon Rohrer, principal data scientist at iRobot and guest on episode number 341 of the SuperDataScience podcast, linked through to a post he’d made on LinkedIn a week earlier, and he made a point there that I could not agree with more. Brandon said that, “One thing that makes a great data scientist is a willingness to become intimately familiar with the data, to spend time looking at the raw numbers and strings to ask what those numbers and strings mean in the real world and how they were gathered. Someone who keeps their data at arms length might become a competent modeler or engineer or statistician, but there is no substitute for diving into the data if you want to be a great data scientist.” I couldn’t agree more. 
(06:47):
Another great response came from Chapman University statistics professor, Chelsea Parlett-Pelleriti, who wrote, “Empathy, carefulness, and respect for other people with different areas of expertise.” I do agree. Those are great traits for a great data scientist. 
(07:04):
Finally, statistician Isabella Ghement responded in a related vein. She said, “Caring about how your work impacts others, especially those vulnerable or disadvantaged, and being thoughtful and considerate about everything you do.” I agree with that, too. What do you think? Have we covered everything? What do you think is the difference between a good and a great data scientist? Feel free to add your thoughts to the Twitter thread. I look forward to hearing them. A link to the thread is in the show notes and my Twitter handle is @JohnKrohnLearns. Otherwise, that’s it for this week’s FiveMinuteFriday. Keep on rocking it out there, and catch you on another episode soon. 
Show All

Share on

Related Podcasts