Podcasts SDS 765: NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant

97 minutes
Artificial Intelligence, Data Science, Machine Learning

SDS 765: NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

This week, Jon Krohn is joined by Dr. Travis Oliphant, the creator of the influential Python libraries NumPy and SciPy. Travis shares moments of his journey in the world of open-source software, exploring how his work has become foundational in the fields of data science, software development, and machine learning.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Travis Oliphant

Travis Oliphant (Ph.D., Biomedical Engineering, Mayo Clinic) has worked extensively with Python for numerical and scientific programming since 1997, and was the primary developer of the NumPy package and the author of the definitive Guide to NumPy. He is also the primary founding author of the SciPy package and creator of the Numba project, and has been an organizing influence in the creation of Conda, Dask, JupyterLab, and Nebari. Travis has founded multiple companies focused on scientific Python and the successful use of open source by enterprise, including Quansight, a Python data & AI/ML consultancy.

Overview

In an engaging exploration of the digital tools that have become the backbone of data science, Dr. Travis Oliphant, the brain behind NumPy and SciPy, unfolds the story of these libraries’ inception and growth. With over 8 million and 3 million daily downloads respectively, these tools have become indispensable in the data science community. Travis shares his journey from a biomedical engineering PhD student at the Mayo Clinic to becoming a pivotal figure in the open-source software world. He recounts the inception of NumPy and SciPy, driven by his own research needs and a deep commitment to the open-source ethos.

Travis also delves into the challenges and triumphs of building community-driven software, emphasizing the importance of collaboration and credit within the open-source ecosystem. He discusses the evolution of scientific computing, highlighting the role of compiler technologies and generative AI in shaping the future. The conversation also touches on the commercial aspects of open-source software, with Travis outlining his efforts to create sustainable business models that support open-source development through his ventures like Quansite and Open Teams.

He also shares insights into the complexities of balancing commercial success with community values, the future of Python libraries in scientific computing, and how programming languages can influence thought processes and problem-solving approaches in science and engineering. Join Jon and Travis for a deep dive into the world of open-source software and its profound impact on data science and beyond.

In this episode you will learn:

Travis’s journey to creating NumPy and SciPy [08:05]
How Anaconda got started [42:24]
How Numba, a high-performance Python compiler, was brought to market [54:48]
Python’s influence on the thought processes of scientists and engineers [1:04:21]
The commercial projects that support Travis’s vast open-source efforts and communities [1:10:22]
How to get involved in Travis’s commercial projects and communities [1:22:34]
The future of scientific computing and Python libraries [1:29:50]

Items mentioned in this podcast:

This episode is brought to you by DataConnect Conference: for 15% off, use the code superdatascience
This episode is brought to you by Data Universe: $300 off a Data Universe pass with promocode: SUPERDATASCIENCE
This episode is brought to you by CloudWolf (30% membership discount included)
NumPy
SciPy
Anaconda
PyData
NumFocus
OpenTeams
Quansight
Open Source Professional Network
Numba
POSSEE.org
Open Source Professional Network
Fortran
Nebula
Ed Donner
Guido van Rossum
NumArray
Buffer Protocol
data-apis.org
pandas
PyTorch
Tensorflow
FairOSS
Jim Hugunin
Soumith Chintala
Siu Kwan Lam
Enthought
Blas Python package
ufunc
Lex Fridman Podcast: #224 – Travis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming
Poolside
SDS 737: scikit-learn’s Past, Present and Future, with scikit-learn co-founder Dr. Gaël Varoquaux
SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow)
SDS 754: A Code-Specialized LLM Will Realize AGI, with Jason Warner
OpenTeams OSA Community Application
Money, Bank Credit and Economic Cycles by Jesús Huerta de Soto
The Super Data Science Podcast Team

Follow Travis:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 765 with Dr. Travis Oliphant, the creator of NumPy and SciPy. Today’s episode is brought to you by the DataConnect Conference, by Data Universe, the out-of-this-world data conference, and by CloudWolf, the Cloud Skills platform.

00:00:22

Welcome to the Super Data Science Podcast, the most listened-to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today, and now let’s make the complex simple.

00:00:53

Welcome back to the Super Data Science Podcast. Today’s episode is with the absolutely iconic and absolutely brilliant Travis Oliphant. Travis created the ubiquitous NumPy and SciPy packages, which are downloaded over 8 million and 3 million times per day respectively for numeric operations and scientific computing in Python. He also founded Anaconda, the company behind the also ubiquitous Python package manager. He founded the massive PyData conferences in communities, as well as its associated nonprofit foundation NumFocus. He currently serves as the CEO of two firms, OpenTeams and Quansight, and he holds a PhD in biomedical engineering from the Mayo Clinic in Minnesota.

00:01:33

Today’s episode will be primarily of interest to hands-on practitioners like data scientists, software developers, and machine learning engineers. In this episode, Travis details how his journey creating open-source software began, and how NumPy and SciPy grew to become the most popular foundational Python libraries for working with data. He talks about how he identifies commercial opportunities to support his vast open-source efforts and communities, how AI, particularly generative AI, is transforming open-source development, and where open-source innovation is headed in the years to come. All right, you’re ready for this jaw-dropping episode? Let’s go.

00:02:12

Travis, welcome to the Super Data Science Podcast. I am beside myself that you’re here with us today. It’s amazing. Where are you calling in from today?

Travis Oliphant: 00:02:20

Jon, thanks. It’s great to be here. I’m calling from Austin, Texas. I’ve lived here for 16 years, almost 17 now.

Jon Krohn: 00:02:27

My first few times to Austin, Texas have all been post-pandemic. I’m relatively new to Austin. I guess there’s a lot of that happening, there’s a lot of people who pandemic era … But our company has an office in Austin. And the hotel that we were staying in was right across the street from a big Anaconda logo on a skyscraper.

Travis Oliphant: 00:02:48

Yes. I don’t know if the Anaconda logo is still there, but it was really fun when that got up. I pointed it to all my kids and people that came in. There’s a lot of virtual presence that I have, you can see the impact virtually from the stuff you’ve worked on. But there’s something about the city you’re in having a logo that’s visible to everybody.

Jon Krohn: 00:03:07

Yeah, that’s cool.

Travis Oliphant: 00:03:07

It was pretty cool. I think in the pandemic. I think Anaconda shut down that office space. I shut down office spaces I was using pre-pandemic, so now I’m totally virtual. We use occasional office spaces, basically hoteling, but a lot much reduced, our office space, which has its own … That’s a topic we could cover, but remote work versus in-person work and some of the trade-offs is very real.

Jon Krohn: 00:03:34

Yeah, yeah. Well, around 2021, that logo was still up there at least.

Travis Oliphant: 00:03:38

Awesome.

Jon Krohn: 00:03:39

Yeah. And then speaking of my work actually, that’s kind of how we connected. So Ed Donner, who is a co-founder of my company, Nebula, our CTO and whom I’ve worked with for 10 years, and something that I frequently say when we do 360 annual reviews, something that I put in them, is that I hope that I’ll be working with Ed until I die because he’s just unbelievably intelligent, hardworking, amazing at personal relationships and following through on everything he says. And yeah, one of the smartest people I’ve ever met. So amazing to be working with Ed, so I really appreciate him connecting us. And so I guess somehow his time at J.P. Morgan connected with you there then.

Travis Oliphant: 00:04:21

Yes, no, I’m thrilled. I was thrilled to reconnect with Ed. I’ve known him for 15 years. We first met while I was a consultant at Enthought when I first came to Austin. I was an academic before then and came to Enthought to learn business. And came to Austin, worked at Enthought. We went to J.P. Morgan and basically we’re helping them build out their use of Python in risk systems, and met Ed there because he was basically managing … He’s a young guy, or either he looks young, he either is young or looks young, one of the two, maybe both.

Jon Krohn: 00:04:53

Actually. It is wild how old he is. People never expect-

Travis Oliphant: 00:04:59

I could believe that, because I met him … He was like the boss of the boss of the boss who actually pulled us in, and I was like, “Oh, this big meeting with the boss’s boss and boss,” and he looks like he was 25.

Jon Krohn: 00:05:11

Yeah. And he still does, it’s wild.

Travis Oliphant: 00:05:13

He was super capable, super intelligent, definitely visionary. That’s how I got to know him first. And so I knew him when he left J.P. Morgan and I had lots of connections there, so I was really thrilled to hear from him. And then I saw the LinkedIn post he did, which is phenomenal, just about explaining how to use LLMs effectively. It was actually one of the best documented examples and experiments of how to do it. It was really great. It really was great.

Jon Krohn: 00:05:42

Yes. So he’s an unbelievable teacher, he’s an unreal explainer of technical concepts, and so I’ll be sure to link in the show notes to Ed’s LLM posts. He did a really fun project where … And so he’s left step-by-step instructions and everything is open-source tooling, which obviously you’ll appreciate, Travis. And so it provides you with everything you need to know to download your own text message history off of your phone and create an LLM that can mimic not only you, but Ed was able to … Anybody that he shared at least 100 messages with, the LLM was able to effectively replicate those people.

Travis Oliphant: 00:06:23

That’s wild. Yeah. Definitely that concept has people’s attention. Either to, could we create my mimic, who could I fake out with a text message with me?

Jon Krohn: 00:06:36

Yeah, there’s a funny … I probably shouldn’t be sharing this on-air, but I don’t think anyone would care. So someone that Ed and I worked with for many years was a guy named Gareth Moody. And so Ed simulated a conversation with Gareth, and Gareth kept saying, “I’m running late,” the Gareth bot, and that is spot on. That is …

Travis Oliphant: 00:07:02

Yeah, I’m probably guilty of that too. My wife would say, “That’s probably going to be your LLM.” That’s hilarious. I’m actually thinking that may be helpful. Obviously there’s dark versions of all of this, but I’m more of an optimist so any … Yes, I know there are potential negative things that can happen but I’m a big optimist because I feel like, “Great, we’ll just respond to those and make them better.” What we need to do is create a bunch of really talented people, distribute the capability to as many people as possible, and that way, yeah, there’s a few bad actors, but you overwhelm them with goodness [inaudible 00:07:37].

Jon Krohn: 00:07:36

Absolutely. And enough good people out there looking for ways to offset, to red team.

Travis Oliphant: 00:07:42

Yes.

Jon Krohn: 00:07:42

And to be able to offset the relatively small percentage of bad actors out there, for sure.

Travis Oliphant: 00:07:47

Exactly.

Jon Krohn: 00:07:48

So we are going to come back, in our second topic area that we discussed, we’re going to come back and talk about bridging open-source with business, the kinds of things like your collaboration with J.P. Morgan that we just alluded to, and bringing them around to seeing the huge value in open-source. But first, let’s get into the huge open-source achievements that you have in your past. You are best known as the creator of not only popular, but absolutely foundational Python libraries like NumPy, SciPy, and Numba, the Anaconda distribution, as we already alluded to with the logo on the skyscraper.

00:08:26

And so the Anaconda distribution is probably, if not the majority, the plurality of how most beginners, the plurality of beginners, start their Python journey. It makes it so easy to get all of the key packages that you need, probably most of our listeners use Anaconda at some point. And then whether they used Anaconda or not, surely any of our listeners who program in Python, which is probably most of them, have used NumPy and they have used SciPy, for sure. Yeah, I guess there’s so many ways that we could jump off from here, but I don’t know if it would be awesome to hear how you got started with this, how you ended up contributing these absolutely foundational libraries, from the start.

Travis Oliphant: 00:09:11

Yeah. Well, I’ll try to give a summary, a brief synopsis. I’ve actually given talks about this in several venues, and they’re always just giving a slight window into the journey because it’s a big journey. I would say I started with a need. I wanted Python to be useful for my work. I was a scientist at the Mayo Clinic and I was doing five-dimensional derivatives to try to do image processing on medical imaging data, in particular ultrasound and MRI data, and I was looking for tools to help me. I could code in C reasonably, I used MATLAB a lot. But I was running out of space, I needed to take more control. And I very, very much did not like the fact that if I wrote in a proprietary language that I was essentially giving somebody a burden, if I said, “Here’s my code,” and I love the scientific ethos of sharing progress. And if I shared progress with somebody, I didn’t like the fact that I was actually telling them they had to have a license for some … I was kind of giving them a bug, I was telling them … That was my problem.