(00:05):
This is Five-Minute Friday on The Highest-Paying Data Frameworks.
(00:19):
Three weeks ago, for Five-Minute Friday, I covered the highest-paying programming languages for data scientists based on the results of O’Reilly’s 2021 Data/AI Salary Survey. Two weeks ago, we used Five-Minute Friday to get our definitions of data tools and data frameworks straight so that last week we could dig into the highest-paying data tools and now, this week we’ll wrap up this series on compensation up by covering the highest-paying data platforms. If you get through today’s episode and don’t feel 100% clear about what a data platform is then consider popping back to Episode #522 to clarify.
(01:01):
All right, data frameworks. Right off the bat, there are two general trends with data frameworks that I’d like to highlight. The first is that, similar to what we observed with data tools last week, familiarity with any open-source data framework is associated with higher pay.
(01:16):
The second big general trend is that, similar to what we observed with data tools last week and programming languages three weeks ago, the highest salaries of all are associated with relatively new software that does not yet have a lot of users but does have a lot of buzz associated with it. It appears that employers are willing to pay a lot more when they do find one of those rare few who have expertise with these fashionable new software packages. In addition, it’s also worth mentioning that it’s easier for small groups to stray further from the overall mean across all groups so we also shouldn’t draw too strong conclusions from the rarest data frameworks.
(01:58):
Those general trends and the glaring small-sample-size caveat out of the way, the four data frameworks associated with the highest pay are all indeed used by fewer than 1% of survey respondents. ContentSquare, a company that has raised half a billion dollars from big-name venture capital firms like SoftBank to create an analytics platform for tracking customers’ digital experiences, came on top: Folks who use it have an average salary of $225k, which is a whopping $80k above the $146k mean across all respondents.
(02:36):
Michelangelo, a platform developed by the ride-sharing company Uber to deploy and operate machine learning models in production, came second, after ContentSquare. People familiar with Michelangelo also on average enjoy an enormous bump in pay to $218k from the $146k overall mean.
(02:59):
ContentSquare and Michelangelo those two were head and shoulders above the next ones in the list in terms of pay so there is Ray, which is an open-source project for scaling computationally intensive Python code and Amundsen which is an open-source catalog, this time from ride-sharing giant Lyft, for storing metadata. Despite being in third and fourth amongst data frameworks, Ray and Amundsen are nevertheless both associated with average pay higher than any of the data tools that we covered last week or any of the programming languages that we covered three weeks ago. So not as big the bump as ContentSquare or Michelangelo, but Ray nevertheless came in at a whooping $191k while Amundsen was $189k.
(03:51):
This seems like a good juncture to reiterate again that all four of the frameworks covered so far — ContentSquare, Michelangelo, Ray, and Amundsen — are all used by fewer than 1% of survey respondents so their massive salary bumps do suffer from relatively small sample-size issues.
(04:10):
In contrast, more popular data frameworks like Kafka, Spark, Google BigQuery, and Dask — which are used by between 5% and 19% of all respondents — these did not suffer from small sample-size problems but they are nevertheless associated with salaries considerably above the $146k mean. Kafka leads this pack with a $179k average while Spark, BigQuery, and Dask were all around $170k.
(04:41):
Recalling that all the data frameworks are associated with at least some increase in salary relative to the mean, the relatively low performers in the bunch included older frameworks like Hadoop as well as commercial ones like Tableau, Oracle BI, and Google Analytics.
(04:48):
So what are the takeaway messages from all of this? In my view, relatively widely adopted but nevertheless greatly in-demand open-source frameworks for handling large-scale, distributed data streaming like Kafka, Spark, and Dask are your best bet for frameworks to consider learning next. You could also take a peek at the open-source Michelangelo, Ray, and Amundsen projects to see if these still-relatively-niche frameworks are useful to any projects you’re currently tackling.
(05:29):
Cool, well that’s the end of this four-part series of Five-Minute Fridays on the highest-paying programming languages for data scientists as well as the highest-paying data tools and data frameworks. If you’d like to check out the full salary report from O’Reilly that I based these episodes on or any of data frameworks mentioned in this episode, we’ve included links in the show notes.
(05:49):
All right-y, that’s it for today. Keep on rockin’ it out there folks and catch you on another round of SuperDataScience very soon.