SDS 524: The Highest-Paying Data Tools

Podcast Guest: Jon Krohn

November 18, 2021

Welcome back to the FiveMinuteFriday episode of the SuperDataScience Podcast!

I continue last week’s discussion by diving into the data tools associated with the highest salaries.

 

Two weeks ago, I covered the highest-paying programming languages based on O’Reilly’s 2021 survey, after that, I clarified the definitions of data tools and data platforms. This week, we’re going to go over the highest-paid data tools.
The most widely used tool in the survey was Microsoft Excel, however, this and other click-and-point tools were associated with a below-average salary of about $8,000 less a year. After that Python’s libraries scikit-learn, PyTorch, and TensorFlow — were the most widely used but they were associated with above-average salaries. PyTorch and TensorFlow were associated with $20,000 over the mean, while scikit-learn was associated with $11,000 over the mean. A general conclusion we can draw is that familiarity with commercial tools pays lower while open-source tools pay above average.
The highest-paying tools of all were relatively unpopular, making their use more in-demand. H2O, an open-source machine learning tool, had the highest salary. Knime was the next highest-paid tool. Tools within the Apache Spark framework also came in high, just after Knime. Honorable mention goes out to spaCy which came in sixth in the survey, ahead of the more popular Python libraries.
Employers are willing to pay a premium for those who have skills in rarer, open-source tools. 
ITEMS MENTIONED IN THIS PODCAST:
DID YOU ENJOY THE PODCAST?

Podcast Transcript

(00:05):
This is Five-Minute Friday on The Highest-Paying Data Tools.

(00:27):
Two weeks ago, for Five-Minute Friday, I covered the highest-paying programming languages for data scientists based on the results of O’Reilly’s 2021 Data/AI Salary Survey. Last week we used Five-Minute Friday to get our definitions of data tools and data frameworks straight so that today we could dig into the highest-paying data tools — while next week, in turn, we’ll tackle the highest-paying data platforms. If you get through today’s episode and don’t feel 100% clear about what a data tool is then you can consider popping back to Episode #522 to clarify.
(01:06):
All right, so data tools. The most widely-used data tool in the O’Reilly survey — used by nearly a third of respondents — was Microsoft’s Excel program for working with data in spreadsheets. Despite its popularity, Excel — along with other click-and-point tools in the survey — was associated with a below-average salary. Specifically, the mean across all respondents was $146k but those who indicated that they used Excel were paid on average $8k/year less at $138k.
(01:41):
The next three most popular tools after Excel were the Python programming language-based software libraries scikit-learn, TensorFlow and PyTorch. More specifically, scikit-learn is used by a little over a quarter of respondents while TensorFlow and PyTorch are both used by about a fifth. In contrast with Excel, however, all three of these popular machine learning-focused software libraries were associated with above-average salaries. PyTorch and TensorFlow in particular were associated with a juicy salary pop of about $20k above the overall mean, coming in at $166k for PyTorch and $164k for TensorFlow. The scikit-learn jump was about half as large, giving an $11k average increase in pay above the $146k mean.
(02:32):
Interestingly, expertise with almost any data tool was associated with above-average pay, the exceptions being Excel, again, Stata, and tools provided by the once-prestigious computing giant IBM. Since these tools are all commercial, a general conclusion we can draw across all of these results is that familiarity with commercial tools tends to pay below-average salaries while familiarity with open-source tools tend to pay above-average.
(03:04):
Ok, so we’ve covered the most popular tools now as well as the ones associated with below-average salaries. On the flipside, the highest-paying tools of all were relatively unpopular, which makes sense because it’s easier for smaller groups to stretch further away from the global mean across all groups.
(03:25):
The tool associated with the highest pay of all was H2O, an open-source machine learning tool, which is used by only 3% respondents — but those respondents had an average pay of $183k, nearly a whopping $40k above the overall mean salary. It’s a similar story for second-placed KNIME, an open-source analytics tool that is used by only 2% of respondents but has an average pay of $180k.
(03:58):
The third- and fourth-ranked tools for pay are both part of Apache Spark — a framework we’ll talk about more next week. But specifically, these tools within Apache Spark framework, Spark NLP is used by only 5% of respondents and was just a grand behind KNIME with average compensation of $179k while Spark MLlib is used by nearly a tenth of respondents and has average comp of $175k.
(04:25):
Honorable mention also goes out to spaCy, a Python library for working with natural language data, that came sixth in the survey — ahead of the more popular Python libraries scikit-learn, TensorFlow, and PyTorch — but spaCy wasn’t associated with salaries quite as high as some of the other less popular data tools there mentioned, H2O, KNIME, or the Spark tools.
(04:49):
Overall, similar to the programming languages that we looked at back in episode 520, the general conclusion to draw is that employers seem to be willing to pay a premium for expertise with relatively new open-source software tools that are generating a lot of buzz — especially if finding people who are already familiar with using these tools are hard to come by.
(05:11):
Ok, so now we’ve covered the highest-paying programming languages and the highest-paying data tools. Next week we’ll conclude the series of Five-Minute Fridays on this compensation topic by covering the highest-paying data platforms — platforms like Spark, which we already mentioned in this episode, as well as others like Kafka, Hadoop, and Dask.
(05:34):
If you’d like to check out the full salary report from O’Reilly in the meantime, we’ve included a link in the show notes. We’ve also included links to all of the data tools mentioned in today’s episode. All right, that’s it for today. Keep on rockin’ it out there folks and catch you on another round of SuperDataScience very soon. 
Show All

Share on

Related Podcasts