Podcasts SDS 779: The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham

88 minutes
Career Tips, Data Science, R Programming

SDS 779: The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Hadley Wickham talks to Jon Krohn about Posit’s rebrand, Tidyverse and why it needs to be in every data scientist’s toolkit, and why getting your hands dirty with open-source projects can be so lucrative for your career.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Hadley Wickham

Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr) and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science.

Overview

To Hadley Wickham, Tidyverse is a game-changer. For him, “Tidy” is all about breaking down larger ideas into their component parts. In essence, tidy reframes databases in an intelligible way that helps people work with data. Hadley notes that this is a different approach to the methods taught at the university level, emphasizing its user accessibility and future collaborative capabilities. He says entering the “flow state” is important to him, where a program or tool frees him up to think. Hadley believes this is where the Tidyverse is headed, and there is nothing else like it on the market. For those who would like to get involved in this open-source community, he recommends attending the Tidyverse Developer Day in Seattle on August 15. Hadley says it’s going to be a fun day! And, for those further afield, Hadley says that contributing in any small way, such as proofreading, can make a huge difference and get your skin in the game.

Hadley also considers how far data science has come in terms of operability since he last appeared on the podcast four years ago. He cites DuckDB and Arrow as particular packages that users can try out today to increase efficiency in their projects. Jon and Hadley also mention generative AI’s capacity to translate code across programming languages, breaking down barriers and helping users quickly develop the tools they need.

Finally, Jon asks Hadley about the secret to a tech company’s longevity. He says that one of the reasons for Posit PBC’s success is that it is a public benefit corporation rather than a limited liability corporation (LLC). Putting the community and the company’s employees “first” means that Posit is doing projects supporting the consumer rather than only putting money in stakeholders’ pockets. “We want to build tools for you,” says Hadley, “we want to improve academia.” (53:28). He feels that it is also essential Posit’s employees are driven by wanting to do good in the world.

Listen to the episode to hear about Hadley’s and Jon’s favorite libraries, the goals and visions of Posit PBC, and Hadley’s views on the future of self-driving cars.

In this episode you will learn:

All about the Tidyverse [04:46]
Hadley’s favorite R libraries [17:10]
The goal of Posit [30:29]
On bringing multiple programming languages together [36:02]
The principles for a long-lasting tech company [52:10]
How Hadley developed ggplot2 [55:24]
How to contribute to the open-source community [1:05:43]

Items mentioned in this podcast:

This episode is brought to you by HPE Ezmeral Software powered by Intel® Xeon® Scalable processors
SDS 337: Hadley Wickham Talks Integration and Future of R and Python
Posit
tidyverse
dplyr
ggplot2
dbplyr
testthat
plotnine
SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow)
R for Data Science by Hadley Wickham
Arrow
duckDB
Keras for R
Quarto
Shiny for R
Shiny for Python
S7
Tidyverse Dev Day
“Strategies” by Hadley Wickham
“Tidy Data” by Hadley Wickham
Hadley Wickham’s recipes
Hadley Wickham’s cocktails
Charlotte Wickham
The Grammar of Graphics by Leland Wilkinson
Gideon the Ninth by Tamsyn Muir
Machine Learning Level 2 (in Python)
Calculus Level 1
Calculus Level 2
SDS special code for a free 30-day trial on O’Reilly: SDSPOD23
New York R Conference
Nebula
The Super Data Science Podcast Team

Follow Hadley:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 779 with Dr. Hadley Wickham, Chief Scientist at Posit. Today’s episode is brought to you by Intel and HPE Ezmeral Software.