SDS 573: Automating ML Model Deployment

Podcast Guest: Doris Xin

May 10, 2022

The co-founder and CEO of Linea, Dr. Doris Xin, joins us this week for a high-energy episode that dives into the groundbreaking tools her team is building and how they’ll change ML model deployment from here on out. She also shares what it’s like being the CEO of an exciting, early-stage tech start-up, and details her founder journey.

About Doris Xin
Doris Xin is the co-founder and CEO of Linea, a seed-stage MLOps startup on a mission to build developer tools that empower data scientists and enable organizations to rapidly generate value with data. Doris received a PhD in Computer Science from UC Berkeley. Her thesis focused on designing machine learning systems for developer productivity, research inspired by her experience as a machine learning engineer at LinkedIn. Doris’ PhD studies were supported by the National Science Foundation Graduate Research Fellowship. Her career includes engineering and research roles at Databricks, Google, LinkedIn and Microsoft.
Overview
As the co-founder and CEO of Linea, an early-stage start-up that dramatically simplifies the deployment of machine learning models into production, Dr. Doris Xin aims to help data science teams, such as alpha users like Twitter, Lyft, and Pinterest, to increase their productivity.
Calling in from San Francisco, Doris and Jon kick off the episode by discussing the main problem she’s solving at Linea. With heavyweight investors such as DJ Patil and Anthony Goldbloom behind her, it’s clear that the pain point of slow ML deployment is a rampant problem for data science teams. “The process by which data scientists clean up their notebook is very mechanical, and it’s very mentally taxing,” she says. “Data scientists incur a lot of technical debt while they work.” 
Enter Linea, which addresses this issue head-on. “With just two lines of code […] Linea is able to streamline all of the productionization software engineering for the data scientist,” she says. The beauty of Linea is that the tool is extremely low-code. Through several real-life cases, Doris found that Linea saved nearly 40% of time and aided in monitoring data pipelines and understanding the intricate dependencies across an organization.
While completing her Ph.D. at the University of Illinois, Doris focused her research on what has become her passion: increasing the efficiency of ML development. One solution that she proposed in her thesis was HELIX: Holistic Optimization for Accelerating Iterative Machine Learning. HELIX is an ML system that optimizes execution across iterations— caching and reusing, or recomputing intermediates. In short, the acceleration of the ML pipeline is done by optimizing the directed acyclical graph (DAG) of an ML workflow. A DAG is a way of representing an ML workflow, and when data scientists are iterating on their models, this sequence of steps doesn’t change much. HELIX takes advantage of this DAG representation, allowing them to cache intermediate results within a DAG so redundant computations can be avoided in future computations. This observation has led to 10x production workflow efficiency.
But the University of Illinois isn’t the only place where Doris has had the chance to explore ML development optimization. While working at Google, Doris analyzed over 1000 production pipelines and found that there were wasted computations. When examining the pipelines, her research team found a great deal of the processing power used by models that were not actually being deployed. While only one in four model trainings went to deployment, the other three resulted in computational waste. Therefore, eliminating these graphlets would significantly reduce wasted computation.
When it comes to the future of AutoML, Doris reflects on the present first, saying that there is generally a lack of explainability in AutoML tools and that complete automation of ML is neither realistic nor desirable. In her paper Whither AutoML, she performed a qualitative study with participants who use AutoML tools. She concludes that instead of full automation being the ultimate goal of AutoML, designers of these tools should focus on supporting a partnership between the user and the AutoML tool. She predicts this collaborative element will be integrated within five to 10 years and expects future developments to include the human as a collaborator, rather than taking the human completely out of the loop.
Tune in to hear Jon and Doris continue their discussion on topics such as her daily life as a CEO of an early-stage tech start-up, and the characteristics she searches for when hiring data engineers and data scientists. 

In this episode you will learn:
  • How Linea reduces ML model deployment down to a couple of lines of Python code [5:14]
  • Linea use cases [11:30]
  • How DAGs can 10x production workflow efficiency [22:12]
  • ML model graphlets and reducing wasted computation [24:14]
  • What future Doris envisions for autoML [35:23]
  • Doris’s day-to-day life as a CEO of an early-stage start-up [42:43]
  • What Doris looks for in the engineers and data scientists that she hires [52:21]
  • The future of Data Science and how to prepare best for it [53:58] 

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 573 with Dr. Doris Xin, co-founder and CEO of Linea. 
Jon Krohn: 00:00:11
Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today, and now, let’s make the complex simple. 
Jon Krohn: 00:00:42
Welcome back to the SuperDataScience Podcast. Today’s guest is the visionary entrepreneur, Dr. Doris Xin. Doris is co-founder and CEO of Linea, an early-stage startup that dramatically simplifies the deployment of machine learning models into production. Her alpha users include the likes of Twitter, Lyft, and Pinterest. Her startup’s mission was inspired by research that she conducted as a PhD student in computer science at the University of California, Berkeley. Previously, she worked in research and software engineering roles at Google, Microsoft, Databricks, and LinkedIn. 
 Jon Krohn: 00:01:15
Today’s episode is more on the technical side, so will likely appeal primarily to practicing data scientists, especially those that need to or are interested in deploying machine learning models into production systems. In the episode, Doris details how Linea can reduce ML model deployment down to a couple of lines of Python code, the surprising extent of wasted computations she discovered when she analyzed over 3,000 production pipelines at Google, her experimental evidence that the total automation of machine learning model development is neither realistic nor desirable, what it’s like being the CEO of an existing early stage tech startup, and where she sees the field of data science going in the coming years and how you can prepare for it. All right, you ready for this captivating and informative episode? Let’s go. 
Jon Krohn: 00:02:06
Doris, welcome to the SuperDataScience Podcast. It’s so exciting to have you here. I’ve been looking forward to this for a while. Welcome. And where in the world are you calling in from? 
Doris Xin: 00:02:16
Thank you so much for having me, Jon. I’m super excited to be speaking with you today. I am calling from San Francisco. 
Jon Krohn: 00:02:23
Nice. And so I know you through Kevin Hu who was in episode 541. He had a brilliant episode. Like you, he did a PhD and then has taken some of the inspiration from his PhD to found a company. And so that’s kind of a common thread between the two of you. How do you guys know each other? 
Doris Xin: 00:02:46
We were introduced by my co-founder actually. And I think my co-founder got connected with Kevin from a shared research interest. 
Jon Krohn: 00:02:55
Cool. Yeah. I’m not surprised you guys have a lot in common and you’re both doing incredible things. I can’t wait to dig into what you’re doing with Linea today for our listeners. It’s so exciting. So you’re the founder and CEO of Linea. 
Doris Xin: 00:03:09
That is correct. 
Jon Krohn: 00:03:11
And it’s an impressive early stage company. At the time of recording, it still says on your LinkedIn profile that it’s a stealth startup, but I know that this podcast episode is a big part of your launch. You have lots of exciting alpha users already. You got Twitter, Stitch Fix, Lyft, Pinterest, a company that everyone else pronounces Asana but I say Asana which I’m pretty sure is how you say that in Sanskrit, but nobody else seems to know that. So I’m the one who sounds weird. You’ve had amazing investors already. The CEO of Kaggle, Anthony Goldbloom, the co-founder of Databricks, DJ Patil, who was the first US chief data scientist. He’s also in SuperDataScience episode number 355. Hilary Mason is an angel investor. She’s coming up in an episode soon this year. And I hear that you have large institutional investors to be announced soon. So Doris, how exciting to be where you are. Yeah. It must be an incredible experience to have transitioned from your PhD, to now have this company launching. How does it feel? 
Doris Xin: 00:04:21
It’s absolutely incredible every day. You learn something new every day. You fail at something every day and that’s how you learn. And this is like nothing I’ve ever done before in my entire life. 
Jon Krohn: 00:04:33
Cool. Well, I have every confidence that it is going to be a great success. Everything that you’ve done up until now has been an extraordinary success, and yeah, even prepping for this episode, it has been a really nice experience. So for listeners to be aware, I have to do research and we have an amazing researcher, Serg Masis, who comes up with potential topics to cover with guests. But in this case, Doris went out and she brought lots of topic ideas to us. It just seems like you’re such a thorough thoughtful person. I have no doubt that Linea is going to be a big success as well. 
Doris Xin: 00:05:10
Thanks. Thanks so much, Jon. 
Jon Krohn: 00:05:13
So the key problem that you’re solving is a big pain point that I run into all the time at my job. So my team, the data scientists on my team typically use Jupyter Notebooks to play around with how a model might work, how they might pre-process the data, how they might clean things up, creating charts of results or exploratory data analysis before we put it into the model, and then the actual modeling itself largely happening in Jupyter Notebooks. We also have some people who use PyCharm, but Jupyter Notebooks are I think the most common way today for people to be developing their models. And then you run into this problem where you want to take the model from the notebook and put it into production. And very often, that means writing Python files from scratch. 
Jon Krohn: 00:06:12
You’re starting over, maybe some copying and pasting from the Jupyter Notebook, but it is certainly a big pain point. And so it can mean that modeling that might have taken a week or two then takes another week or two to get it into production. And it also means that we need to have people or data scientists who are doing the model development need to become pretty specialized at the kind of machine learning aspects… the machine learning engineering aspects of getting things into production. So this is a big pain point and it seems like that is the problem that you’ve set out to solve. So it comes out of your PhD research and we’ll talk about your PhD research more later on. But yeah, this pain point is something that you’ve managed to solve in just a couple lines of code. 
Doris Xin: 00:07:03
That’s right. 
Jon Krohn: 00:07:05
So for better or for worse, notebooks are used for experimentation during model development. And once the data scientist is finished with the experimentation in the notebook environment, models and the entire processing pipeline need to be deployed. So how does your startup, Linea, help solve that big pain point? 
Doris Xin: 00:07:25
Yeah. So Linea, the solution is based on a very key observation, which is whatever’s happening during development, at the end of it, what the data scientist is doing is simply extracting a subset of what they’ve already done. And 90% of what they do during development does not make into production and that is very natural, right? And that whole process of extracting requires recalling, dependencies, understanding basically the necessary and sufficient. That’s what I call the subset of code that leads to that model. And this is not something that data scientists have the mental bandwidth to track while they’re developing models. Their only mission during model development is to get to insights extremely quickly. And that allows them to train models that will perform extremely well in production, right? 
Doris Xin: 00:08:20
So that means they are incurring a lot of technical debt as they work, right? That’s why there is a ton of time that goes into cleaning up their notebook and pruning out the extraneous parts that were simply for exploration or for understanding, right? So what Linea is able to do is a couple things. One is that we are capturing everything that data the scientist is doing while they’re doing development. And this is not something that other libraries support. For other solutions out there, data scientists have to be very deliberate with what they save, what they record. Whereas Linea is very eager about capturing everything because there’s a chance that any part of this could make it into production. 
Jon Krohn: 00:09:06
Interesting. 
Doris Xin: 00:09:06
And then the second piece about Linea is that the process by which data scientists clean up their notebook, it’s actually very mechanical, but it’s very mentally taxing because they’ve done so much during development, right? So what Linea is able to do is we are able to analyze every single line of code during development such that we understand the dependency between all the different operations that the data scientist has done in order to prune out things that were not leading up to the final model for production, right? So that’s part one of what Linea does. And part two is also based on the observation that a lot of times, data scientists need to translate what happened in their notebook, that raw Python script that comes out of the cleaning process, the refactoring process, and they have to translate it into a different framework. 
Doris Xin: 00:10:05
For example, Apache Airflow is a very common framework for running pipelines in production. And the translation process also takes a long time because these are not frameworks that data scientists work with on a regular basis. They are frameworks that data engineers love to use because they lead to stability, they lead to reproducibility, and these are very desirable for production models, right? So the process of translating is very onerous for the data scientists. They have to reorient their thinking. They have to learn a new framework. And we think that’s actually very unnecessary because the two programs are describing exactly the same logical workflow. And if we are able to understand at a semantic level what that workflow is, it becomes very straightforward to be able to translate between the two stacks. And that’s what Linea does with just two lines of code. That’s the really magical part about Linea, is that it’s extremely low code because we do a ton of heavy lifting for the data scientists in the background as they work. So by the time they’re ready to productionize something, Linea is able to basically streamline all of the productionization software engineering for the data scientists. 
Jon Krohn: 00:11:23
Awesome. Sounds super helpful. And my team may need to try this out ourselves. So you’ve had amazing alpha users. I’ve mentioned some of them already like Twitter, Lyft, Pinterest. Do you have one or two case studies for how Linea has enabled either a specific company or maybe just speaking more generally, whatever you can disclose? I’d love to hear about a couple use cases of how Linea has been helpful to a data science team. 
Doris Xin: 00:11:49
I love to tell the story of our first design partner, Mike. The way that we got Mike to try Linea was very interesting. We first got to talking to Mike because we really wanted to hire him. He had very relevant background and he decided that he was comfortable where he was. But a couple months later, he actually hit me up on LinkedIn himself and he said, “Hey, are you guys still working on the thing that you told me about? I’ve been trying to build something just like Linea at my current job and it’s not quite working out. I’d love to talk to you guys.” And when he saw the product that we developed, he was so excited that he ended up joining us part-time. It was like if you can’t be them, join them. And Mike has been using Linea for his day-to-day work and it’s been absolutely transformational for him. 
Doris Xin: 00:12:42
Before Linea, he was given a whole bunch of very messy notebook by the data scientist at his organization. And he’s receiving tons of notebooks every day. And he has to go through every single one of them, try to understand what’s going on in there so he could translate what’s happening in there into Airflow. And that is very, very hard for Mike given that he just hasn’t had visibility into how the science was done. This process is extremely time consuming, it’s very mechanical, and he has to do this over and over again. So when he got introduced to Linea, he felt like he was saving about 40% of his day-to-day time onto- 
Jon Krohn: 00:13:30
That sounds about right. Yeah. That doesn’t surprise me. 
Doris Xin: 00:13:35
Yeah. So that was our first design partner, Mike. So the second use case is for a data… sorry, production engineer at a big tech company. I won’t say which one. What he was tasked with was being able to monitor the health of thousands of data pipelines at his organization. And if any single task within a pipeline ever goes down or if it was stuck, he needed to understand the downstream implications for everything. And because everything is intermingled, it’s extremely difficult for him to understand the interdependencies. And this speaks to kind of the second benefit for Linea, which is being able to understand not only how a single model came to be, but also how all the different artifacts, artifacts meaning models or data sets, charts, everything that a data scientist produces are related to each other. And this allowed our design partner to be able to say if this task is taking forever, all of these other pipelines that are feeding into dashboards or feeding into production models, they need to be alerted because there’s potentially a failure for them or there’s potentially other issues that could arise. So he was super excited to get his hands on Linea because now he has answers to a lot of the questions that he wasn’t able to answer before. 
Jon Krohn: 00:15:09
Cool. Yeah. I love those use cases and it underlines the same kind of issue that I was describing, that getting models from our playground environment into our production environment can be such a chore. And so that 40% figure that you said, saving 40% of your time, that sounds about right to me. I can imagine huge time savings through using a tool like Linea. And I can’t wait to try it out myself. So clearly, you’ve been able to convince a lot of people that this is a revolutionary idea. You have amazing investors already and even more lined up to be announced soon. So very exciting times. What was the initial point that led you to founding this company? I know that some of the work you’re doing in your PhD inspired this a little bit, but yeah, when did you actually decide I’m going to do this? I’m going to found a company and solve this really thorny problem? 
Doris Xin: 00:16:13
So the mission of Linea actually came from way back to my first job out of college. I was what we now call an ML engineer. Back then, that wasn’t even a term yet at a big tech company. My team had over 30 ML PhDs on it. I was extremely excited to be part of that and what really turned out to be a little disappointing that my entire job, which is data plumbing. All the data scientists got to do the super exciting modeling work. And after they’re done with that, they bring their model to me and they say, “Now make a pipeline out of it, put it into production.” That process was very important because until that process happens, the models are not generating any business value. At the same time was extremely mechanical. It was not the most satisfying work. And that’s what inspired me to go to grad school. I was basically thinking that if I wasn’t able to do machine learning in industry, maybe I should go to grad school, look at all the PhDs on my team. And I spent the first couple years in grad school exploring all the different ML algorithms, exploring ML research. And it turned out that where my passion really lied was in building system to support other data scientists to make use of mission learning. And that’s what really inspired me to go on the mission for what we’re doing at Linea today and in terms of wanting to do a startup. 
Jon Krohn: 00:17:52
So just to kind of recap there, so you’re at a big tech company. You’re working on what we would today call machine learning engineering types of problems, productionizing algorithms, the key problem that Linea solves. And you saw that there were all these machine learning PhDs around you that were getting to do what seemed like the cool stuff of actually training the models. You started a PhD, you’re training models yourself, you get familiar with it and you’re like, “You know what? Actually, that problem from my internship was actually a really exciting problem to be solving and I’m even more passionate about that.” Yeah. And then you ended up spending most of your PhD focused on that. We’re going to dig into some of the specific papers later, but yeah, is that a pretty good recap of that? 
Doris Xin: 00:18:33
Absolutely. Yeah, yeah. That’s perfect. 
Jon Krohn: 00:18:37
Nice. So then when was it? Was it during your PhD, right at the end of your PhD that you had this spark, this inclination to actually found your own company? 
Doris Xin: 00:18:48
So the inspiration for doing a startup actually came to me before grad school. 
Jon Krohn: 00:18:54
Oh, wow. 
Doris Xin: 00:18:55
I decided to blow off some steam between my first full-time job and grad school by doing an internship at Databricks. And there, I got to write a lot of Spark code, which was super fun, but what’s really exciting was seeing the very early stage of a startup. That journey was incredible to me and that’s what really got me very interesting thinking about, “Wow, wouldn’t it be nice if I got to do this for myself one day?” And that’s where I started thinking about what that could look like. And so during my PhD, there’s been many moments where I thought about how I can connect what I’m doing, connect my passion to the idea of wanting to found a startup. So other people from their perspective, it felt like I took the plunge into the deep end of the pool by founding a startup after my PhD. I feel like what really happened was I’ve been pacing on the platform for many, many years before I took the plunge. 
Jon Krohn: 00:19:57
Right. That’s so cool. I really applaud that. I mean, it really is, I think, one of the most exciting and impactful things we can do in the world, is to go off and start our own thing. And it is certainly a scary path or a risky path for a lot of people, but this kind of methodical way that you’ve approached it where you already years ago had this idea, “I want to do something like Databricks. I want to have my own startup.” And then going through an entire PhD program where you’re thinking about that and you’re kind of laying the foundations psychologically as well as practically, that is brilliant. We had a guest on the show recently who didn’t finish his PhD and has had an amazing career since, and he made the argument on air that almost nobody should do a PhD. He was like unless… He listed some very narrow circumstances where he thought that you should actually do a PhD, but you just outlined an outstanding use of one. I would love to have that kind of opportunity now to be able to take several years to dig deep into some problem that I can relate something and spin it out. That is such a brilliant use of that time. And yeah, congratulations on getting going with this. 
Doris Xin: 00:21:33
Thanks, Jon. 
Jon Krohn: 00:21:34
So when you were doing your PhD research, one of the main focuses was on increasing the efficiency of machine learning development. I mean, clearly. And so for example, there is your Helix paper that you’re a first author on, as well as your PhD dissertation itself. And so I’ll be sure to include those in the show notes for people to review. And one solution that you propose in that Helix paper or in your dissertation is accelerating the pipeline, the machine learning engineering pipeline, by optimizing the Directed Acyclic Graph, the DAG of a machine learning workflow. Can you fill us in on what DAGs are and how they can speed up development as much as tenfold? 
Doris Xin: 00:22:19
Absolutely. So a DAG is a way to represent a machine learning workflow. So when we think about a machine learning workflow, there are operations to read the data, to pre-process the data, to train the model, to validate the model, and to push models into production, right? So there’s a sequence of steps and sometimes it’s not a straight line. There might be multiple pre-processing operators that are feeding into downstream pipelines, and that’s what makes it a DAG, right? So it’s not just a straight line, it’s a bunch of operations that are interconnected. So that’s what a DAG is. And so Helix was able to take advantage of this DAG representation by observing that across iterations, the data scientist really isn’t changing their workflow much. And that’s because they are good data scientists. You can’t be changing 20 different things and be able to understand the implication of your changes. And what that allowed us to do is be able to very intelligently cache intermediate results within that DAG so we can avoid redundant computation in future iterations. For example, if the data scientist only changed a model hyper parameter from one iteration to the next, they really shouldn’t be recomputing the pre-processing part of it. And Helix was able to recognize this fact, and that’s how we were able to really speed up the development process. 
Jon Krohn: 00:23:49
Awesome. That is a very good explanation of DAGs and how it can speed things up. In another paper of yours from your research, so in this one, it’s another first author paper of yours, and this one was in SIGMOD in 2021. And so in that one, by analyzing over 1,000 production pipelines at Google, you realized that there was wasted computation. So how does your approach reduce that wasted computation and what implications would something like that have for organizational efficiency and beyond maybe even things like carbon footprint of big organizations and big models that they train? 
Doris Xin: 00:24:35
That’s a great question. And that paper was really interesting because the data set that I was working with was very unique. How often do you get to see over 3,000 production pipelines at a big tech company? So we started by just analyzing all the data pipelines to see what’s even out there. And the analysis was extremely difficult because we were trying to look at the execution trace of all of these pipelines. And it’s this giant messy graph because model training is dependent on a sliding window of data. So that means across two consecutive model training, you have interlinked data dependencies and that’s what made the graph extremely messy. So the first thing we had to do was just do figure out a mechanism to even do the analysis. And the way we did that was by proposing this concept of a graphlet. 
Doris Xin: 00:25:33
What we did there was to segment this giant trace into smaller graphlets that encapsulates an end-to-end execution of the pipeline that contains a model training, right? And what we were able to see there is that A, the graphlets take a long time to compute when you have a lot of data, when your model’s very complex. Each of these graphlet consumed a lot of computation. And the other thing that we realized was even though the end goal of a graphlet is to push a model to production, over 30% of them actually didn’t do that. And that was a very interesting insight and we dug into why this is happening. This is not desirable because until you push the model into production, this graphlet was just wasted computation. So we really wanted to understand, is there a way for us to prevent this in the first place? 
Doris Xin: 00:26:33
And some of the hypotheses that we came up with were maybe the graphlet didn’t push model because the data drift was too severe or maybe the workflow structure changed. Maybe the model type, if it’s a logistic regression or KALM net, they might correlate with model failures more so. And the other one is when data scientists change their code across model training. So these are some of the reasons that we hypothesized. The way that we were analyzing how these different features correlate with wasted computation was actually by using a machine learning model itself. We trained a random forest model to be able to predict whether a graphlet was going to push a model to production using all of these features. And the outcome was that this model itself was able to identify correctly over 50% of the graphlets that ended up not pushing a model. 
Doris Xin: 00:27:38
So it was able to reduce 50% of the wasted computation. And by analyzing the model correlation between the model and the features, we saw a couple very interesting things. The first thing was that the model type was a very weak indicator of whether graphlets were going to fail. So that means logistic regression, KALM nets, they’re equally likely to lead to wasted computation. And the other thing that was very surprising was that the code change really didn’t matter either. So when data scientists tweaked their workflows, that wasn’t a contributing factor to failed graphlets. And the things that were the most relevant were data change as we have hypothesized when data drifts that could lead to model behaving poorly. And the other thing was workflow structure change. So sometimes the number of operators within the DAG could change because data scientists have decided to transform their data differently or maybe they wanted to put a couple models together and then do an ensemble model. And these factors contributed heavily to graphlets failing and therefore to wasted computation. And Jon, back to your original question about the implication of wasted computation, it is a huge amount of energy that we are consuming. 
Jon Krohn: 00:29:07
Right. Yeah. So you’re talking about just in terms of rough numbers here that I’m just kind of jotting down. If 30% of these graphlets aren’t leading to a production model and your model is able to predict half of those and eliminate them, that’s 15% of all the computation that’s happening, which is 15% of an absolutely monstrous number. When you’re thinking about the big tech companies like Google, the amount of energy that that’s saving is huge. That’s way more than any individual in their lifetime, all those times you’re like, “Oh, I better make sure I recycle this or turn off the lights or switch to an electric vehicle,” all of the changes that an individual could make in terms of their behaviors would be negligible. I mean, many, many, many times over with respect to this kind of monstrous reduction in wasted compute. So I realized you were about to make that point and I’m kind of stealing your thunder, but I’m just so excited about how big that number is. 
Doris Xin: 00:30:16
And I just want to highlight that the data scientists weren’t aware of this until we did this study. They just didn’t even realize this was a problem and they didn’t have the mechanism to prevent it. 
Jon Krohn: 00:30:29
This is a really good example of how… so there’s an organization called 80,000 Hours that is focused on trying to help people pick careers where they can have a really big impact. And so we’ve talked about this a few times on the show most notably because we had one of the founders of 80,000 Hours on our show last year, Ben Todd, in episode number 497. But then also we more recently talked about this with Jeremy Harris in episode number 565. And so 80,000 Hours, one of their main… so the reason why we were talking about it in the episode with Jeremy Harris is because their number one recommendation if you want to make a big impact on the future of humanity is an AI safety research. So trying to avoid having AGIs that enslave humans or just kill us all. But another one of… maybe it’s like number two or number three in their list of careers of how you can make an impact is as a founder of a company solving engineering problems. 
Jon Krohn: 00:31:42
And so this is a perfect example of how your founding a company that solves problems like this that people weren’t even aware were a problem and now all of a sudden you’re saving 15% of the computation of this massive, massive energy consuming budget, that’s… Yeah. Kind of going back to my point of it doesn’t even… in the scheme of that, it doesn’t matter how many flights you don’t take or how often you turn off the lights. Yeah. The individual impact that you are making with this kind of innovation, creating the engineering innovation and then creating a startup to broaden that impact, it goes to show how that is one of the most impactful kind of career choices you can make. And so sorry, I’m going off on a bit of a tangent here, but I’m really excited about what you’re doing. And so yeah. 
Doris Xin: 00:32:40
Thank you so much, Jon. Thank you for highlighting that. I never actually thought about it that way. 
Jon Krohn: 00:32:44
Yeah. So if you think about that in contrast, so often people… I’m now going off on tangent and I promise, listener, we’ll get back to Doris and the Linea story in a second, but there is a really interesting idea that a lot of people have when they want to make an impact in the world. They think, “Oh, I want to be a doctor. I want to save people.” Well, actually, if you become a doctor, your net positive impact on average is zero. Because if you hadn’t become a doctor, someone else would’ve gotten into med school instead of you and they would’ve been just about as good as you. They would’ve had to pass the same admission tests. And so if you want to have a big impact, being a doctor isn’t the way to do it. 
Jon Krohn: 00:33:34
But things like solving engineering problems and creating startups to broaden the impact of those engineering problems. So it’s so interesting to hear that you say like, “Oh, I hadn’t even thought about it that way,” whereas people who are probably doctors, they’re thinking to themselves, “Oh, I’m making such a big difference every day.” And yeah. Anyway, so it’s just interesting how different careers kind of have that different social perception and when you actually dig into the numbers and who would be replacing people and that kind of thing, it is the kind of work you’re doing that is super, super impactful. 
Doris Xin: 00:34:10
I do appreciate all the doctors saving lives out there. So they collectively are making a difference. I never thought of- 
Jon Krohn: 00:34:17
They are collectively making a difference, but any individual doctor’s decision to become a doctor on average has no impact on the world. 
Doris Xin: 00:34:29
That’s an interesting point to think about. 
Jon Krohn: 00:34:34
If everyone realized that and nobody became a doctor, we’d have a really big problem. So it falls apart. Yeah, it only works on an individual basis, not on a collective basis. Obviously, in aggregate, everything that all doctors are doing is brilliant. And I do thank you very much, doctors, but yeah. Anyway, so I’ve completely derailed the conversation. So yeah. So I don’t know if you want to say more about that SIGMOD paper or if you’d like me to move on to another paper, if there any points that you didn’t get to say that I’ve cut off with my rude diatribe that cut into all the physicians out there. 
Doris Xin: 00:35:14
Oh, no, not at all. Yeah. I think I said everything I needed to about that paper. 
Jon Krohn: 00:35:20
Nice. Okay. All right. So then here’s another exciting one of yours. So it’s yet another first author paper from your PhD called Whither AutoML, which I had to look up whither in a dictionary, but you can explain that one to us in a second. So in this paper, you interviewed machine learning practitioners to understand how they leveraged AutoML. So algorithms that automate kind of hyper parameter selection choices and models or maybe even which model you use. And you concluded that currently, the complete automation of machine learning is neither realistic nor desirable. So what led you to that conclusion and what future do you envision for AutoML tools? 
Doris Xin: 00:36:05
So Whither AutoML, this title actually came from my advisor. The word itself means, what is the future of AutoML? And this is something I had to look up when he proposed this title as well. So we’re in the same boat there, Jon. So what made us realize… So for this paper, we interviewed over 15 practitioners who are currently using AutoML solutions. We got to understand very deeply their use cases, the tools they’re using, their organization, as well as their day-to-day work practices. What we found was that there is generally a lack of explainability to all the AutoML solutions out there. This was very problematic for them because at the end of the day, a human being is responsible for the model’s behavior. And if they weren’t able to explain what really happened, how we got to this model, that level of accountability simply wasn’t there for them. 
Doris Xin: 00:37:11
Therefore, they really wanted to have a lot more control over the model process, modeling process. I think it was great for some of them to be able to try out a bunch of different hyper parameters early on. But once you got to a stage where you’re really thinking about productionizing a model, they wanted a lot more control. So that was part of the reason why it wasn’t desirable to fully automate everything. And the realistic side was mostly about the ability for the systems out there to capture human intuition and human domain knowledge. Imagine if you are looking at a data set for a drug prediction or something of that nature, right? For a AutoML solution, these are just columns of numbers, but for a physician, for example, some of these columns embed a lot more information than just these numbers out there, right? And they are able to bring their intuition into the process to help the model really understand the intricate relationships between the different features in there. And I think we’re quite a bit… we’re still pretty far away from being able to somehow encapsulate the human intuition and human domain knowledge. 
Jon Krohn: 00:38:33
Right. So maybe this is the kind of challenge that Linea maybe years from now could figure out an answer to. 
Doris Xin: 00:38:39
Absolutely. This is something that Linea’s actually very excited about tackling in 5 years or 10 years time. I strongly believe by that time, we would have a couple things figured out. One is being able to represent human knowledge in a way that’s a lot more consumable for the computer in the first place. And the second piece is, and this is what I think is absolutely crucial for AutoML to gain more attraction is to figure out how to make the human a collaborator in the process instead of completely trying to automate the human out of the loop. Yeah. And there’s a lot of excitement about human in the loop computing today. And I believe in the next five years, we’re going to see some amazing progress. 
Jon Krohn: 00:39:29
Yeah. I agree with you there and that’s where I see the interaction between humans and machines going. And by the way, this paper was in Kai, which is the most prestigious paper for these kinds of computer human interactions, which is where the Kai name comes from. Some people worry about machines taking lots of jobs and they do, they can. They increasingly take automatable… that’s a silly thing to say. Automation takes the most automatable jobs, takes the most repetitive jobs, but it does open up other opportunities. And this is that kind of example where tools may come up that automate aspects of a data scientist’s workflow, but those aren’t going to eliminate the data scientist. In fact, it creates the opportunity for there to be more data scientists having a bigger impact across even more models than ever before because, yes, having a human in the loop with these kinds of interactions, it’s the same kind of idea as prior to the 1990s, prior to Deep Blue playing against Garry Kasparov and beating Garry Kasparov, there was this idea of a human against a machine. 
Jon Krohn: 00:40:52
And for a while, it was like computers can’t be as smart as a human at this complex thing like chess. And then it’s like, “Oh, crap, they can be.” But what the great chess masters did then wasn’t to discount computers. It was to work with them and to see how they could use guidance from machines to like, “Oh, what would I do here? And what does the machine recommend doing here? Okay. Well, I appreciate that guidance in this case, or maybe that gives you an idea, but instead of doing what you’re suggesting, computer, I’m going to go with my idea.” And so working together with the machine is a more powerful pairing than the machine on its own. 
Doris Xin: 00:41:33
Absolutely. Absolutely. You hit the nail on the head, Jon, by saying what we’re doing with automation isn’t to outcompete the human. It’s rather to augment the human, to give the human more opportunity to focus on what they do best. 
Jon Krohn: 00:41:47
Yep, exactly. So super cool that that’s something that could be in Linea’s future as well. So thank you so much for taking us on that tour of some of your big research breakthroughs. So your Helix paper with the DAGs that can increase the productivity of production workflows by up to 10X by avoiding unnecessary processing steps, reducing wasted compute by predicting which graphlets are not going to lead to any production results. And then also how AutoML isn’t something that we can completely replace with people today. So thank you for that. Very cool to kind of dig into the weeds on some of your research. Switching gears a bit and talking kind of about what you’re doing today and what it’s like in your role, what is it like day-to-day being the CEO of an early stage tech startup like Linea? 
Doris Xin: 00:42:52
Yeah. So that’s a wonderful question because it triggers a lot of reflection on the past year. So Linea has been around for a year at this point, and I think the CEO role is extremely poorly defined. You’re the C everything officer. If you’re out of snacks in the office- 
Jon Krohn: 00:43:14
That’s what CEO, I guess, stands for, the chief everything officer. 
Doris Xin: 00:43:18
Early stage of a startup, that’s absolutely what it is. So I think the biggest learning curve for me is to understand that being a CEO isn’t about doing things. It’s about putting in the infrastructure to support your team, to unite your team behind a common mission to really excel at doing something absolutely transformative. That requires talking to customers, going out and getting customers, hiring the best talent to build a solution, logistical things like getting an office so our engineers can be in the same place and talk to each other and whiteboard and riff, and yeah, just everything it takes to have a functional organization and thinking about culture at the same time. One thing I realized is that culture… there’s a saying out there, culture eats strategy for breakfast. And for an early stage startup, a lot of people don’t realize that culture shouldn’t be accidental. It needs to be very deliberate. And how to be deliberate with culture is something that I think a lot about, that I read a lot of books on, and is still not 100% clear to me, but I think what’s helpful is to codify the values that your organization embodies and then think about very specific actions that you can carry out to align yourself with those values. 
Jon Krohn: 00:44:49
Super cool. Yeah. So chief everything officer, but in these early stages, the culture piece is a big part of it and that makes a lot of sense. You’re laying the foundations for what the culture of the firm will be like in the future. And I interrupted you as you were saying earlier something about ordering snacks. And so I’m sorry for speaking over you. I didn’t let you have the chance to get back to that. But I think probably that was one of maybe several examples you had of the kinds of things that you need to do in an early stage startup because there’s no one else to do it. 
Doris Xin: 00:45:22
Yeah. And we have a hybrid team, some of our members are remote. So we also have to organize socials. We’ve done a bunch of virtual escape rooms. And that was me having to go online and search for all sorts of different remote opportunities or remote team building activities. 
Jon Krohn: 00:45:40
That’s cool. Yeah. I don’t do a good enough job of… well, actually, in one on ones with my remote team members, I’ll say things like, “Do you want to have more of these kinds of things?” And on my current team. And so in the future, this might change. But right now, they’ve said like, “No. I just have that time.” I’m like, “All right.” But yeah, that’s cool that you found things like that. I haven’t heard of that, a virtual escape room. That’s fun. 
Doris Xin: 00:46:08
Yeah. We’ve done a bunch of these and we keep doing them because everybody loves them. 
Jon Krohn: 00:46:13
Nice. That’s super cool. So what is your, if you had to pick one thing, one particular aspect of what you do professionally, what would be your one favorite thing? 
Doris Xin: 00:46:25
My favorite thing about my job is at the end of a demo, the user says to me, is this magic and that makes everything worthwhile. 
Jon Krohn: 00:46:37
Wow. Yeah, I bet. Yeah. That’s so cool. And with a tool like yours, that’s taking something that could take 40% of the data scientist time and abstracting it under a couple lines of code, I can see that you would get a lot of those wows. And I bet that that was very helpful for the early stage investment that you have already received from such illustrious folks. Super cool. So as the chief everything officer of your firm, what are the kinds of tools that you use on a daily basis? Do you still get to ever write any code or has that ship long sailed? 
Doris Xin: 00:47:19
Sometimes. It’s a lot more rare nowadays that I get to open up my PyCharm and start hacking on… submit a PR for somebody to review. I do still use PyCharm, my favorite tool for some reason, to write product requirements, user stories, and things of that nature. And other tools that I use on a day-to-day basis, Jupyter Notebooks to understand the user experience, to be our in-house alpha user for the rest of the team. So Python, everything in the data science ecosystem, I still use pretty frequently because of being in the alpha user for my team. On chief everything officer- 
Jon Krohn: 00:48:10
Another thing on the everything bucket. Yeah. 
Doris Xin: 00:48:13
Yeah, yeah, exactly. On the chief everything officer side, lots of productivity tools like Notion. I also use Lever for hiring. And I probably spent way too much time on email clients. 
Jon Krohn: 00:48:30
What did you decide on? That’s a big thing for me. I actually went on a big rant in a recent episode. I’ll try to avoid doing that again, but what email client have you gone with? 
Doris Xin: 00:48:40
Just straight up Gmail in a web browser is… 
Jon Krohn: 00:48:44
Right, yeah. That’s what I’ve got with you. I thought you might have said that you spent a lot of time trying out different email clients and trying to optimize. Yeah. 
Doris Xin: 00:48:52
Yeah. I try Superhuman because that’s- 
Jon Krohn: 00:48:55
Yeah, yeah, that’s how we got talking about it. That’s when I went on the rant that I’m going to really try not to go on, but in episode 565 with Jeremy Harris, the same episode where we talked about AI safety, we talked about Superhuman, and yeah, he’s a big fan of it, but it didn’t really click for you in the same way. 
Doris Xin: 00:49:16
I think my expectations might have been too high based on what other people seemed to be saying about the client. And it turned out that the benefits were small delta over web browser. 
Jon Krohn: 00:49:30
Right, right. It wasn’t a magic bullet. So the big rant that I went on in that episode was about Google inbox. Did you ever use Google Inbox before they canned it? 
Doris Xin: 00:49:40
No, I don’t think so. 
Jon Krohn: 00:49:42
Okay. Well, then I won’t go on the rant, but it was a free Google tool that sat on top of Gmail. So you used your Gmail account, but instead of logging into mail.google.com, you logged into inbox.google.com and it was this amazingly Zen efficient experience. But yeah, I won’t go into the rant if listeners want to… Yeah. 
Doris Xin: 00:50:06
I think I might have just buried it in my memory now that you talked about it. I think I checked it out when it first came out. There’s like a mobile app and it was- 
Jon Krohn: 00:50:14
It was a mobile app as well. 
Doris Xin: 00:50:17
It was more stress inducing than Zen because you’re constantly worried about, is this skipping an important email from me? 
Jon Krohn: 00:50:26
Right. Yeah. I guess I found that it never… That is the risk. So you’re relying on a machine learning algorithm to filter for you what is important and not, but I guess in the years that I was using it, I can’t recall an instance where an email that I needed that day was held away from me until the next morning. Yeah. Well, those days are gone and nobody’s stepped up to bring in something quite as good. Anyway. So other than your guidance on the tools that you kind of use as the chief everything officer, and then also… So tell us a little bit more about Notion and Lever because… So Lever is for human resources, right? 
Doris Xin: 00:51:16
Is for managing your pipeline for hiring candidates. 
Jon Krohn: 00:51:21
Right, right, right, right, right, right. Yeah. Your inbound candidate flow. And then Notion, I forget what that is, but we had a guest on recently that was really excited about Notion. 
Doris Xin: 00:51:32
It’s kind of like a replacement for Confluence. It’s like a Wiki. Yeah. 
Jon Krohn: 00:51:39
Nice. Yeah, yeah, yeah. Cool. It’s kind of for notes.
Doris Xin: 00:51:45
That’s absolutely right, for notes. And they provide you with a lot of plugins to be able to embed things into your notes like a database or a small snippet of a Google Doc and all sorts of bells and whistles that makes it a really nice experience. 
Jon Krohn: 00:52:01
That’s cool. That sounds great. We should try that one out. All right. So we just mentioned Lever. Obviously, you do hiring. I believe that right now you have software engineering openings. So if any listeners are out there and they want to be getting involved extremely early in what is sure to be a very successful startup, then that is something that you could do. So what do you look for, Doris, in the engineers or the data scientists that you hire? 
Doris Xin: 00:52:28
Yeah, that’s a great question. So I think one of the biggest things that we look for in our hire is, do they resonate with our mission? Every single person that we’ve hired, I recall at the end of the demo, they were absolutely blown away and they were already sold on the mission. So that’s a huge part of it. And the reason that they were so excited about this is they have some data science or data engineering background in their past to really identify with the mission and be able to understand the value that Linea brings. So it’s pretty important for us to have engineers that have that data science and data engineering empathy. They might not have done a ton of it themselves, but that empathy really help us build product with a user in mind all the time. 
Jon Krohn: 00:53:24
Nice. That’s a really nice kind of key attribute to be looking for. And that makes a lot of sense to me. So if people are looking to get ahead as a data scientist or as a software engineer, where do you think these industries are going? So we’ve touched on this a little bit in the episodes. We talked about how AutoML, for example, isn’t likely to replace data scientists though it could augment data scientists significantly in the years to come. So where is data science going and how can listeners prepare best for the future of data science? 
Doris Xin: 00:54:06
I really believe that we are moving towards democratization of data science, meaning that we no longer need to hire ML PhDs to do the sort of work that we’re very specialized. Now we have a lot more tools to, like we mentioned, automate a lot of the mechanical side, but also the mathematical side as well. So that means what’s left for the human is their intuition, their analytical skills, right? So that means for the future of the workforce, a lot of it comes down to data literacy, being able to understand how you navigate the data, how you extract insight out of it, using different algorithms, using different tools. And the second aspect is that people often forget the importance of productionization, right? If you don’t go through that whole process that we talked about earlier on, you are not able to generate value from your data science yet. And we’re starting to see a trend of data scientists owning the end to end chain all the way from development to production. And the reason is because, A, it’s really hard to hire data engineers. There’s some statistic that say for every single opening for a data engineer, there are two applicants on average. A lot of data scientists are forced to do the data engineering themselves. 
Jon Krohn: 00:55:44
I wouldn’t have been surprised if you said it was a fraction. That it was like for every one opening, there’s one third of an applicant. 
Doris Xin: 00:55:53
Is your team actively hiring for data engineers, Jon? 
Jon Krohn: 00:56:01
Yeah. We’re always looking for great engineers for like everyone else. This is a question that comes up a lot on the show, is I’ll say things like I did before the program. I asked if you had any particular opening so that when we talked about it on air, I was able to say that right now, Linea is doing software engineering hiring. And while some of our guests are doing data science hiring, they are all hiring machine learning engineers, software engineers, data engineers. That is where the biggest bottleneck is. And so again, listener, if you’re looking to get hired in this field being a pure data scientist, you’re still going to find work. But if you want to be super in demand, focus on some computer science skills, some software engineering skills, for sure. 
Doris Xin: 00:56:50
That’s exactly right and that’ll really help you elevate the value of your work. 
Jon Krohn: 00:56:55
Definitely. Then that’s also something that I’m sure I’ve talked about on air before, is that any data scientist that we do hire, I don’t require them to have engineering skills before they start, but on the job, it is inevitable. I mean, our company has something like 30 or 40 technologists across product, engineering, and science. We’re not big enough to have data scientists solely be creating models and then passing that off. If you want your model to get into production, you’re going to have to be involved in that because that is… I don’t know if you have thoughts on this ratio. I can’t remember where I read this initially, but it is in line with my experience as a chief data scientist, is that for every one person creating a model, you need four people to put it into production. Although I guess Linea is putting… maybe it’ll all of a sudden be one to two if you’re reducing 40% of the time required to get things into production. But that is kind of the split that we see. 
Jon Krohn: 00:58:01
Having the model weights is great and you need them, but having the model weights be accessible performantly in production is a huge undertaking and it’s so specific to your particular problem. You’re lucky if you have a problem that can be handled by something like Google Clouds Lambda, or AWS Lambda, it is. Google Cloud doesn’t have something equivalent, but it’s AWS that has those Lambda functions. It’s lucky if you have that, but for a lot of problems to have them run performantly, there’s caching things that you need to be considering on your side and just passing it off to some cloud function isn’t going to work because the size of the data is too much. You’d have to wait for all of the data to be loaded by that cloud function. So there’s all kinds of memory and compute things that you need to be thinking about to get your data science model into an actual production system and it’s that part where, yeah, we see the most… as I said at the beginning, it’s one of our biggest pain points. It’s one of the pain points that Linea is solving. And yeah, it also means that if you’re a data scientist on my team, while you’re on the job, you are going to learn how to engineer machine learning models. Otherwise, we’re going to have this backlog of data science models that aren’t getting into production and aren’t making an impact. 
Jon Krohn: 00:59:32
Anyway, Doris. So monologue over without divulging anything proprietary, do you have any other insight for us on where the biggest future opportunities lie? Maybe not just for data scientists, but in technology in general, where are the big opportunities? 
Doris Xin: 00:59:51
That’s a big question, Jon. 
Jon Krohn: 00:59:56
I know it’s non-fungible tokens. 
Doris Xin: 01:00:01
Web 3 is the future, it’s already here. The future is here. I mean, I think in the data land, so a lot of my thinking is skewed in that direction. In the data world specifically, we already talked about automating data engineering, accelerating that process is going to be the topic of the next few years. And further down the road, I do believe that we’ll get to a point where data scientists becomes like the internet. It’s that accessible of technology. When we think about the internet, we don’t think about TCP/IP. We don’t think about all the underlying tech. We just log into a browser and use it. Whereas for data science, we should think about Airflow, all these different libraries, Spark, all of these things. I do believe eventually one day, we will just be able to have something that’s a lot better. Well, that’s really well packaged to the point that data scientists just have to declaratively say that this is my objective. This is my data. Let’s figure out how to use data science to do something interesting. 
Jon Krohn: 01:01:19
Cool. I love that vision and it sounds like you are building that future this very day. Super cool, Doris. All right. Thank you for your brilliant insights into your research, into your company, into what it’s like to be a CEO of an early stage tech startup. Starting to wind down the episode here, do you have a book recommendation for us? 
Doris Xin: 01:01:42
I have read a lot of nonfiction since I founded Linea. And the one that stands out in my mind right now is Start With Why by Simon Sinek. 
Jon Krohn: 01:01:55
Nice. Yeah, yeah, yeah, yeah. 
Doris Xin: 01:01:58
That book really resonated with me because it really challenged me to think about why we do what we do at Linea. Why do our customers want to use it at a very deep fundamental level? Yes, they want to productionize, but why do they want to productionize? Right? So we can ask the five layers of why, and that really gets into the deep, deep motivations of what data scientists do on a daily basis and how they’re hoping to make an impact. 
Jon Krohn: 01:02:28
Cool. I love that recommendation and I think it’s been recommended on the show a number of times. Listeners can check. So we put all of the book recommendations that come on the show. Ivana, our podcast manager aggregates them into www.superdatascience.com/books. There’s a Google Sheet there with a record of all the books that have been recommended. And I bet if we look that one up, Start With Why has been recommended quite a few times. 
Doris Xin: 01:02:58
It’s definitely a classic. 
Jon Krohn: 01:02:58
Yeah, it must be hugely impactful. All right. So clearly, Doris, you are a brilliant speaker, leader, an engineer. How can people stay up to date on the latest from you as well as from Linea? 
Doris Xin: 01:03:11
You’re too kind, Jon. Please follow us on Twitter. We’ll have the handle in the bottom. 
Jon Krohn: 01:03:18
For sure. We’ll have it in the show notes. Yeah. 
Doris Xin: 01:03:20
Yeah. In the show notes. And you can also follow me personally on Twitter @me_dorx. And we also have a Linea community Slack where we’re going to periodically send out product update, but at the same time, it’s also a community for folks who are interested in Linea but also interested in getting help on productionizing their data science workflow to have a discourse. So if you’re interested in any of that, please join our Slack. Please follow us on Twitter. And also please go check out our open source library called Linea Py. Search for L-I-N-E-A P-Y on GitHub and you should be able to find the repo there. 
Jon Krohn: 01:04:02
Sweet. So with Linea Py, people can get started on trying out this amazing magic that we’ve been outlining for them all episode long. Super cool. Nice. Well, thank you so much for being on the program, Doris. It’s been such a great episode. And yeah, maybe we can check in again in a few years and hear from you on how the Linea journey is coming along. 
Doris Xin: 01:04:21
Thank you so much for having me, Jon. I look forward to reconnecting as well. 
Jon Krohn: 01:04:31
Dr. Xin is such an inspiring, thoughtful, and visionary entrepreneur. I loved getting to know her during today’s episode, and I have no doubt that a tremendous future lies ahead for her and Linea. In today’s episode, Doris filled us in on how in just one or two lines of code, Linea cleans up Jupyter notebooks and deploys ML models into production. How DAGs can 10X production workflow efficiency by avoiding unnecessary processing steps. How 30% of graphlets amongst the ML pipelines at Google don’t impact production systems and how half of these can be predicted, thereby significantly reducing wasted computation. How the intuitions behind devising ML models are not fully representatable today but could be in 5 to 10 years. And how thanks to the democratization of data science, PhDs are no longer essential to developing ML models effectively, but data literacy is more important than ever. 
Jon Krohn: 01:05:26
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Doris’s social media profiles, as well as my own social media profiles at www.superdatascience.com/573. That’s www.superdatascience.com/573. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of this show. Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks, of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng, and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another incredible episode for us today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of The SuperDataScience Podcast with you very soon.  
Show All

Share on

Related Podcasts