Podcasts SDS 817: The Positron IDE, Tidy NLP and MLOps with Dr. Julia Silge

96 minutes
Data Science, Machine Learning, R Programming

SDS 817: The Positron IDE, Tidy NLP and MLOps with Dr. Julia Silge

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Dr. Julia Silge, Engineering Manager at Posit, introduces the brand-new Positron IDE, perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Julia Silge

Julia Silge is a data scientist and engineering manager at Posit PBC, where she leads a team of developers building fluent, cohesive open-source software for data science in Python and R. She is a tool builder, author, international keynote speaker, and real-world data science practitioner. She holds a PhD in astrophysics and serves on the technical advisory committee of the US Bureau of Labor Statistics.

Overview

Dr. Julia Silge brings her wealth of experience as an Engineering Manager at Posit to the table. She shares insights on brand-new IDE Positron built with data scientists in mind. This isn’t just another IDE—Positron is crafted for those deep into exploratory data analysis, offering upgraded variable panes and a heavy focus on visualization that makes data work smoother and more intuitive. If you’re looking to up your game in EDA or simply want a more flexible environment for your data projects, Positron is worth checking out.

Julia doesn’t hold back when it comes to her top picks for LLMs that streamline code generation. She highlights “Continue,” “Tabnine,” and “Codeium” as her top tools, breaking down what sets each apart and where they shine the most. If you’ve been searching for the right AI assistant to boost your coding efficiency, Julia’s practical breakdown will help you decide which tool fits your workflow best.

NLP is another hot topic in this episode. Julia digs into when traditional NLP methods still outshine LLMs. Whether it’s achieving greater statistical rigor, handling extensive datasets, or executing exploratory data analysis, sometimes sticking with tried-and-true NLP techniques is the way to go. Julia talks through practical strategies, like using TF-IDF or her own tidylo library, to sidestep common issues like typos and demographic biases in text analysis.

She also sheds light on making MLOps more accessible with open-source tools. Julia introduces Vetiver, a flexible and straightforward tool that simplifies the deployment of machine learning models. For data scientists looking to keep their MLOps pipeline smooth and scalable, Julia’s advice is packed with valuable takeaways.

In this episode you will learn:

Overview of Posit and Positron IDE [05:20]
How the needs of a data scientist differ from those of a software developer [10:54]
How to contribute to the open-source Positron [19:50]
MLOps and Vetiver: Tools for deploying and maintaining ML models [37:01]
Natural Language Processing (NLP) and the Tidyverse approach [50:34]
The role of AI and LLMs in data science education [1:24:18]

Items mentioned in this podcast:

This episode is brought to you by Gurobi
This episode is brought to you by ODSC West – for an additional 10% off, use our special code: PODCAST
SDS 813: Solving Business Problems Optimally with Data, with Jerry Yurchisin
Posit
Positron
Text Mining with R by Julia Silge and David Robinson
Tidy Modeling with R by Max Kuhn and Julia Silge
Supervised Machine Learning for Text Analysis in R by Emil Hvitfeldt and Julia Silge
SDS 779: The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham
Julia
VS Code (Visual Studio Code)
“Continue” LLM for Code
broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames by David Robinson
Code OSS
Jupyter Protocol
Quarto
Tabnine
Codeium
SQLalchemy
Tidy Models
Vetiver for MLOps
Jane Austen R
David Robinson
Topic modeling with Taylor Swift Lyrics
Finding high FREX words and high lift with Stranger Things dialogue
tidylo
TF-IDF
The Programmer’s Brain by Felienne Hermans
SuperDataScience
The Super Data Science Podcast Team

Follow Julia:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 817 with Dr. Julia Silge, Engineering Manager at Posit.