SDS 522: Data Tools vs. Data Platforms

Podcast Guest: Jon Krohn

November 11, 2021

Welcome back to the FiveMinuteFriday episode of the SuperDataScience Podcast! 

This week I’m priming future episodes by going into some quick definitions.

 

O’Reilly put out their 2021 data AI salary survey, as I mentioned before. I plan to go into the 2021 data tools and platforms in future episodes but to help prime us, I wanted to make sure we’re all working with the same terminology.
Data tools are any software for working with data that is neither a stand-alone language nor a platform. For example, Python, a widely used language, utilizes software that is considered a tool. Data tools don’t require code (it can be something as simple as Excel).
Data platforms have a less clear-cut identification. Platforms are broad software frameworks that, despite not being languages, can support distinct software tools within them. For example, Spark supports data tools to work with massive quantities of data. Other examples are Kafka and Hadoop.
With these definitions in mind, we’ll go into data tools and data platforms that are associated with the highest salaries.  
ITEMS MENTIONED IN THIS PODCAST:
DID YOU ENJOY THE PODCAST?

Podcast Transcript

(00:05):
This is Five-Minute Friday on Data Tools versus Data Platforms.

(00:19):
In last week’s Five-Minute Friday, I covered the highest-paying programming languages for data scientists based on the results of O’Reilly’s 2021 Data/AI Salary Survey. Next week, I’m going to expand on those survey results by covering the highest-paid data tools and the week after that I’ll cover the highest-paying data platforms.
(00:43):
To make the most of those forthcoming episodes, today we are investing a few minutes in getting our definitions straight: I’ll detail what exactly data tools are as well as exactly what data platforms are. In a phrase, data tools are any software product for working with data that are neither a standalone programming language nor are they a platform. So let’s do the first distinction first.
(01:09):
The first one between data tools and languages — is pretty straightforward: Python, for example, is a widely used programming language in data science while software libraries that operate within Python — such as scikit-learn, TensorFlow, and PyTorch — are examples of some of the most popular data-science tools. Data tools need not be implemented via code, however; data tools can also be click-and-point software such as Microsoft Excel.
(01:40):
Relative to the distinction between data tools and programming languages, the distinction between data tools and data platforms is sometimes less clear-cut and arguments could in many cases be made to classify a given product into either bucket.
(01:55):
Generally speaking, however, platforms are broad software frameworks that, despite not being standalone programming languages, can nevertheless support the development of multiple distinct software tools within them. For example, Spark is a platform for working with massive quantities of data that itself supports particular data tools such as Spark NLP and Spark MLlib within it. Alongside Spark, other prominent examples of data platforms are Kafka and Hadoop.
(02:28):
All right, so there are our definitions: Neither data tools nor data platforms are standalone programming languages. Nevertheless, data platforms can support the development of multiple data tools within them. All right, so there we go. As promised at the onset of this episode, for next week’s Five-Minute Friday we’ll explore the data tools, which are now well defined, that are correlated with the highest salaries and the week after we’ll do the same with data platforms and look at the data platforms that are associated with the highest salaries.
(03:03):
If you’d like to check out the full O’Reilly 2021 Data/AI Salary Survey in the meantime, we’ve included a link in the show notes. All right, that’s it for today. Keep on rockin’ it out there folks and catch you on another round of SuperDataScience very soon. 
Show All

Share on

Related Podcasts