SDS 527: Automating Data Analytics

In this episode Peter details the revolution work of Sisu Data, how to succeed at growing your own tech startup, what Peter looks for when hiring, his most important daily tools, and more!

About Peter Bailis

Peter Bailis is the founder and CEO of Sisu (backed by a16z, gbv, NEA), the industry’s first decision intelligence engine that accelerates data exploration for data forward organizations. Before Sisu, Peter was an assistant professor of Computer Science at Stanford University, where he started the DAWN project and maintains an adjunct appointment. He received his Ph.D. from UC Berkeley in 2015, for which he was awarded the ACM SIGMOD Jim Gray Doctoral Dissertation Award, and holds an A.B. from Harvard College in 2011, both in Computer Science.

Overview

Peter’s company name, Sisu, is a name Jon really likes. It doesn’t have a perfect translation into English but essentially means grittiness or perseverance. Peter decided on the word sisu for both its meaning and the linguistical nature of the word. Sisu, as a company, was founded out of a need to help answer complex questions by taking the most painful and tedious data analysis to take in the entire scope of data which might normally take a team days, done in just a few hours. The company aims to operationalize the world’s data. This started as a research project while Peter was at Stanford where they raised a lot of money from interested companies who recognized the immense benefit of Sisu’s value proposition.

Sisu fits into the ever-evolving data science stack at the business intelligence level, coming in at the last mile interface to keep up with the speed at which data arrives. And it’s not just tech companies that have this immense amount of data. So, the problem to solve is measuring the relevance. Like the improving of ranking in a public browser, there is a way to boost the relevance and ranking of internal data to help companies find what they’re looking for fast. Peter believes if you can tell companies even 5 actionable things about their data a week, it’s incredibly beneficial and business-altering.

Peter’s background is in academia, and he had some advice for other folks who come from technical, academic backgrounds and moving into a commercial space. From his own experience, he notes that you develop your own stack of tools, even in academia, that will be utilized to solve problems. He suggests being diligent and thoughtful around who cares about the problems you’re solving and keeping in mind two north star metrics: revenue and engagement.

When it comes to hiring, Sisu is hiring a lot and looking at candidates with multiple skills and core statistics skills. On the engineering side, they look at data-parallel processing experience. It’s difficult to find a single candidate who has all the skills and experiences they look for, so as a result, they look at folks with a growth mindset and humility, and awareness of their own limitations. Jon found the humility aspect particularly interesting because he doesn’t hear that enough when it comes to what people look for in employees. Among tools, Peter’s favorite tool is a whiteboard. He does work with FigJam, various notebooks including Jupyter, he utilizes various notepads and text edit features on his machine.

As for his academic research, he’s been interested in data, synchronization, and making it more efficient and faster. The broad theme of his work was looking at the bottleneck in a world where massive amounts of data were constantly available. What causes poor engagement or issues with different operating systems? He spent a few years working on papers on these topics.

In this episode you will learn:

Meaning of the name Sisu [3:08]
What Sisu does [4:45]
Sisu and the data science stack [17:00]
Going from academia to startups [22:37]
What Sisu looks for when hiring [28:57]
Peter’s favorite tools [32:40]
Peter’s academic research [45:02]

Items mentioned in this podcast:

Sisu Data
FigJam
Figma
Jupyter
Adventures on the Wine Route by Kermit Lynch

Follow Peter:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 527 with Peter Bailis, founder and CEO of Sisu Data.

Jon Krohn: 00:00:13

Welcome to the SuperDataScience podcast. My name is Jon Krohn, chief data scientist and bestselling author on deep learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Jon Krohn: 00:00:42

Welcome back to the SuperDataScience podcast. Today’s guest is the brilliant, warm, and remarkably down-to-earth, Peter Bailis. Peter is CEO of Sisu Data, an automated data analytics firm he founded in San Francisco three years ago that has already raised over $128 million in venture capital from some of the most prestigious VC firms out there. His firm, Sisu, was inspired by work he carried out as an Assistant Professor of computer science at Stanford University, where he’s still an adjunct faculty member today. Prior to working as a professor at Stanford, Peter completed an undergrad in computer science at Harvard and a Ph.D. also in CS at the University of California, Berkeley.

Jon Krohn: 00:01:31

In today’s episode, Peter details the revolutionary work being carried out by Sisu Data by generating automated, actionable reports in minutes that might otherwise take a team of data analysts days. He talks about his guidance for people looking to succeed at growing a tech startup, particularly if they come from an academic or technical background. He talks about what he looks for in the data scientists and software engineers that he hires, his most important daily tools for developing software productively. And the academic research he carried out at Stanford that’s behind Sisu’s innovative capabilities. This episode does certainly get deep into the technical data science and computer science weeds here and there. But most of it is a fun conversation with an incredibly engaging entrepreneur who has practical tips for anyone who’d like to succeed with commercial applications of technology. All right, you ready for this? Let’s do it.

Jon Krohn: 00:02:38

Peter, welcome to the program. I’m so excited to have you on the SuperDataScience show. Welcome, and where are you calling in from today?

Peter Bailis: 00:02:46

Thanks for having me, super excited to be here. I am calling in from San Francisco, California.

Jon Krohn: 00:02:51

Nice. Is the weather very San Francisco-y right now?

Peter Bailis: 00:02:55

It’s been, honestly, pretty nice. A little bit of rain-

Jon Krohn: 00:02:58

So, that’s a no?

Peter Bailis: 00:03:00

Yeah, not… No, no, it’s been good. It’s been good. The sun is shining. It’s a beautiful day.

Jon Krohn: 00:03:06

Wonderful. So you are the founder and CEO of Sisu. I love the name of this company. It’s a term that I’ve been exposed to for a while now. My understanding is that there isn’t really a great translation of the Finnish word Sisu into English, but the idea is perseverance, grittiness, and I love that in a company name. Why did you choose that for your company?

Peter Bailis: 00:03:35

Yeah, it’s a great question. So I’m half Finnish and when we were spinning out the company, really struggling on what the name should be to capture the ethos and spirit and also be something that’s memorable. And I was really struggling to find a name and went home to visit my parents. And I came back after talking with them about different names, all of which were pretty terrible. My mom was like, “Hey, there’s this [inaudible 00:03:59] Sisu, that you might be interested in.” And there’s actually a bunch of stuff that linguists have done, looking at what is the shape of different words? And Sisu’s an interesting word because you have the Si, which is sharp inflection and then Su is deeper in the throat, so it’s a bigger [inaudible 00:04:15] the spike and then it opens up which, which I thought was a cool part compared to a bunch to other alternatives that we considered, none which were particularly good.

Jon Krohn: 00:04:24

I love that. That is a really good explanation. I don’t know, I’m really into etymology, so I ask a lot of guests, like, “Why did you name something this or that?” I’ve never had a guest explain the sounds that, “These sounds are what we are looking for.” I love that. All right. So we should talk about what Sisu does and not just the etymology of the company. So you’ve been hugely successful in a short period of time. You’ve raised $62 million in Series C recently, just a couple of months ago. And that brings total funding raised to 128 million in just three years, which is crazy. So congratulations on the rapid growth, and big-name venture capital firms are involved, interest in Horowitz, DeA, Green Bay Ventures. So what your company does, it’s built as a decision intelligence engine. Peter, what the heck does that mean?

Peter Bailis: 00:05:22

Great question. The idea behind Sisu is really simple. If you think about all of the data organizations have today, and a lot of the changes that have happened in terms of the data stack, Snowflake, cloud warehouses, cheaper ETL, there’s this massive amalgamation and consolidation of data inside of a typical organization. But very few people, the end-users who have access to data have any ability to use this data. So you’ll have teams staring at dashboards and reports to track their metrics, figuring out what’s going on, but as soon as something happens in a metric, or as soon as they need to answer a detailed question, they go, “Ask someone.” If they’re lucky to have someone to go and ask about, “Why is this changing? What should we do about it?”

Peter Bailis: 00:06:10

And for the analyst and data science teams who are tasked with answering this question of why things are changing and what to go do, it’s a huge amount of work. And the reality is there aren’t enough people to close what we call this decision gap between the data that’s collected and the decision that can be made. Hence, the term decision intelligence and really what this is, is taking some of the most painful and repetitive parts of analysis and accelerating them with ML, things like automated feature engineering, feature selection, recommendations to go and help to go beyond just what’s going on with their data and use all of their data to understand why things are changing, what to go do about it.

Jon Krohn: 00:06:47

Gotcha. So, all right. So I’m going to try to repeat back to you what you’ve told me with a hypothetical scenario. All right. So I’m the CEO of a company or I’m the head of analytics of a company and we get daily reports on how our platform’s performing. And we see that today there’s a drop in revenue for some particular digital product. It’s an unusual drop in revenue, or maybe it’s an unusual jump in revenue and it might be… So then I could be interested in what the cause is of that change. And today, for the most part, I would write a Slack message to somebody on my team and say, “Hey, what’s going on today? There’s this really big change in this number.” And then that sets off a cascade of emails and Slacks to other people who are then trying to dig into it. We get the specialist in this particular area, and then they’re getting in touch with other specialists. And so it could end up being a few days, maybe longer, depending on the urgency of the issue, before somebody can come back and explain, somewhat definitively what’s happened, if at all. And so what you’re suggesting is that with your platform, it could automatically somehow come up with some guess as to what the causal factors are?

Peter Bailis: 00:08:14

That’s a great summary. I think if you double click on the process that’s going on when you’re asking a question on why there’s some mental mapping between the metrics the business cares about and the bunch of data that’s likely in the cloud somewhere. And if you’re Samsung’s customer looking at device upgrades, you’ve got tons and tons of features about who’s operating their phones, old phone, new phone, old carrier, new carrier, so on. If you’re Gusto, which is another public case study, looking at customer satisfaction, you have all the information about who onboarded, what features they’re using. And it’s not uncommon to have tables or data frames with hundreds, sometimes thousands of different behavioral attributes. And so in some sense, what those people who respond to Slack messages are doing is effectively doing a lot of feature engineering. They’re trying to identify factors that are important, they’re trying to assess significance. And then they’re going to go and turn it into a slide deck to send back to someone. And that process is very repetitive. The metrics are defined by the business once a year, but the data’s always changing. And the insight is that that type of process, which is really only possible now with a lot of the consolidation of data because all the data’s in one place and you can have really wide feature vectors. You can start to automate that for more people.

Peter Bailis: 00:09:35

And this can make sure the CEO who asked the question gets their answer back in 20 minutes as opposed to two days. But it can also enable people who don’t have analysts or don’t have data science resources to go and answer some of those questions on their own. And the insight’s pretty simple. There’s a lot of machinery in terms of how you actually make it work well. But if you think about metrics as a concept inside of organizations, defined rarely, updated all the time. And the question is based on all the context you have around those metrics, what do you want to show to which people at the right time?

Jon Krohn: 00:10:10

Cool.

Jon Krohn: 00:10:18

Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such, I only subscribe to a handful of email newsletters. Those that provide a massive signal-to-noise ratio. One of the very few that meet my strict criterion is the Data Science Insider. If you weren’t aware of it already, the Data Science Insider is a 100% free newsletter that the SuperDataScience team creates and sends out every Friday. We pour over all of the news and identify the most important breakthroughs in the fields of data science, machine learning, and artificial intelligence. The top five, simply five news items, the top five items are hand-picked. The items that we’re confident will be most relevant to your personal and professional growth. Each of the five articles is summarized into a standardized, easy-to-read format and then packed gently into a single email. This means that you don’t have to go and read the whole article. You can read our summary and be up to speed on the latest and greatest data innovations in no time at all. That said, if any items do particularly tickle your fancy, then you can click through and read the full article. This is what I do. I skim the Data Science Insider newsletter every week. For those items that are relevant to me, I read the summary in full. And if that signals to me that I should be digging into the full original piece, for example, to pour over figures, equations, code, or experimental methodology, I click through and dig deep. So if you’d like to get the best signal-to-noise ratio out there in data science, machine learning, and AI news, subscribe to the Data Science Insider, which is completely free, no strings attached, at www.superdatascience.com/dsi. That’s www.superdatascience.com/dsi. And now let’s return to our amazing episode.

Jon Krohn: 00:12:14

Yeah, that makes a lot of sense. And so that’s another case study that I didn’t think about there. Because I’m thinking about a big organization like Samsung that you mentioned, but in addition, there’s all kinds of companies out there, way more, there’s a much larger quantity, a much bigger market than I’m sure you’ve noticed of smaller companies where you’re collecting data, but you have a small analytics team that’s busy with other things or data science team that wants to be worried about production problems instead of day to day analytics for the executive team. Or there might not be any data analytics team that has it all. And with a tool like yours, they can be getting insights automatically without necessarily being able to write Python code.

Peter Bailis: 00:12:54

Right. And I think even… So we started as a research project back at Stanford where I was on the faculty, had a bunch of big tech companies sponsoring the work that we originally started. And it was surprising to me where even say one of the largest online advertising companies in the world, which has amazing people optimizing click-through rate on ads, they don’t have the headcount to put one analyst per account executive. Now, the person who’s managing the Nike account or the Adidas account, they’re all going to have the same metrics, all different cuts of the data. They basically have a BI tool and a CRM and it’s entirely self-serve. So from a data science perspective, I like to think of it as you have all of this data, you basically have zero recall in terms of useful information, unless you actually dig in on your own. And the ergonomics of those interfaces in terms of entirely manually driven exploration diagnosis, or if you’re lucky to have a data scientist, bespoke regression models and then R or scikit-learn to PowerPoint, which is a manual process no one’s written that compiler yet, it just doesn’t scale. And not arguably what people are good at either, right? The repetitive stuff is automatable and not perfectly, but again, if you’re starting from zero, even incremental progress can radically improve someone’s state of the art.

Jon Krohn: 00:14:14

You’ve mentioned a couple of times the idea of a slide deck. And it’s interesting when you first said that about compiling automatically to PowerPoint a lot of reports on my team we have them compiled automatically to Excel so that I can send them to business people in the company. But you’re right, I can’t think of anything for PowerPoint specifically, but that is beside the fact. What I was going to ask you is because you’ve mentioned slide decks a couple of times, I’m guessing that there’s also a visual aspect to the way that Sisu presents results?

Peter Bailis: 00:14:43

Yeah, exactly. And I think that’s been one of the most interesting parts of our journey at Sisu is realizing that analyst productivity, data science productivity is super, super important. But when you’re in the cloud and you have a cloud warehouse, you don’t have to worry about the intern taking down the data warehouse anymore, which was literally a concern on the internet. A big internet company in 2012, not me, fortunately, but another intern clogged up to Hadoop cluster, CEO didn’t get their page view metrics at 6 AM for their report. So, you have everyone with access to this data, we’ve basically rewritten the data stack you have, better methods for pipelines, better ETL tools, better warehousing. I mean data integration’s still a pain and data prep’s still a pain, but it’s so much easier than ever before. But if you think about the number of people who have access to that data. If you can provide interfaces, which are basically low code or no code so that they can actually make better use of that data, it’s huge.

Peter Bailis: 00:15:35

And that’s partly why we raised so much money is that it is really hard to build out end-user human-in-the-loop data analytics pipelines, but the ROI for that is massive if you can go and reach those people. Because they’re the experts in the business, right, they are making the decisions. And if you can get the right data at the right people at the right time, by continuously processing all of this and telling them things that they would otherwise have only caught retroactively, it’s a really interesting value prop and one that people find really compelling, because you’re sitting on top of all this data and you’re stuck with a completely reactive manual analytics process.

Jon Krohn: 00:16:12

Totally. We’re at a point in the evolution of the data-driven enterprise that a lot, probably not most organizations in the West, but a large number of large-cap companies have a lot of data that they collect. But there is still a huge gap between that and the problem that you’re solving, where decision-makers are in real-time, being able to themselves query these huge pools of data and get meaningful information that they can take action on right then and there, that’s super cool. I love it. I feel like… Well, how can I send you money? I want to invest. So that’s brilliant. I love it. All right, Peter, so a topic that you and I briefly touched on before we started recording was around how quickly evolving the data science and data analytics stack is today. So how does Sisu fit into that picture?

Peter Bailis: 00:17:17

Right. I think if you look at a bunch of amazing products that have come out in the last several years, Snowflake, largest software IPO of all time, Databricks chasing them, amazing productivity tools for data scientists. We’ve almost rewritten the entire stack in data up until the business intelligence analytics layer. And there’s a lot of great tools for doing the stuff we’ve done for the last 20 years. But look, Tableau was originally called Polaris, the paper came out in 2002, right? It’s 20-year-old technology, 20-year-old interface. And so our thesis is that that last mile interface that most people have to access to all this data, it needs to be rewritten for a world in which data is always arriving. You have way more context, way more features about the data and everyone in the company has access to it. And that’s a huge problem. A lot of people have talked about data mining and insights, all that stuff forever. But if you look at a typical organization, you mentioned these big, large enterprises. It’s not just tech companies that have this type of information now, it’s literally everyone. And the data is really well structured relative to what we would’ve seen five years ago. A bunch of files in a Hadoop cluster or on S3. It’s pretty easy to go and get connected. And there’s a lot of context and metadata and people who want to do more of this. And so the problem to solve is really at the end of the day a relevance problem, right?

Peter Bailis: 00:18:41

Most people forget that in the early days of the internet, there wasn’t good measures of relevance. There’s TF-IDF and the Yahoo web index, which is manually curated. And this type of data that’s available inside of private enterprises is basically data frames or tables, which tends to be very sparse, high dimensional, and pretty bespoke to each company. But just like today where no one thinks, “Gosh, is my internet search query going to work?” Because the internet got big enough and the ranking algorithms got good enough. There’s a similar opportunity to actually improve ranking relevance for all this private structured data in the cloud. And that’s a super hard problem because you have way less supervision than you would in a public internet setting or in a consumer internet setting. But again, when you’re starting with a situation where you have zero recall unless you manually click a button to slice and dice through a dashboard, and if you can calibrate your classifiers and your relevance routines, again, you can provide epsilon recall. And it’s a game-changer for most people. And a lot of times there’s a lot of statistical signal with fast-paced metrics like user engagement or conversion, or margin that people just completely miss. And so, the goal in some sense for Sisu is to solve this relevance problem for structured data, which has really only been possible for the typical company in the last five years because of everything going on underneath the stack. And it’s this analytics layer that’s ready to get peeled off and radically improved.

Jon Krohn: 00:20:12

Super cool. There were two terms that you used in there that I’d love to dig into more because I don’t know what they mean very well. So, I’d love to learn this and I’m sure there’s lots of audience members out there too. What’s zero recall versus epsilon recall?

Peter Bailis: 00:20:25

Oh, great. So look when you have a classified, you can have false positives, false negatives, right. And when you have a low recall when you have zero recall, you’re not getting any results, right? And in some sense, I say it has zero recall with all of the analytics tools today because it’s entirely incumbent on someone doing something, someone clicking a button to get anything useful about their data. And so if you can provide like I say, epsilon, it’s an infinitesimally small quantity. If you provide any non-zero recall-

Jon Krohn: 00:20:54

Got it.

Peter Bailis: 00:20:55

It’s super, super useful, as long as you have reasonably high precision or you’re telling people things they need to know. And I think it’s just this interesting scenario where we tend to think about ML as needing to be perfect before you ship it. This is a case where, provided you can calibrate your classifiers and ensure that you are finding some of those true positives, and you’re not showing people too much garbage, it’s a really, really tractable modeling process. And that’s been surprisingly so, given how you’d think, “Okay, data in a consumer internet context versus data in a B2B, SaaS engagement context versus fraud… ” And there’s just a lot of signal in this data people don’t go and look at because again, their interface requires them to do stuff in order to get results out of that.

Peter Bailis: 00:21:41

So that’s what I mean by writing nonzero recall because you can automate some of these things. And look, if you tell people five things a week about their data and even three out of five, or even two out of five of those are actionable and different and make them change what they’re going to do on a day-to-day basis, it’s unbelievable. And from a data science perspective, the metrics, you just define them once the data’s constantly changing, there’s a set of procedures you want to run and rank and learn from users and you can do a bunch of active learning. And it’s much more tractable than you might imagine otherwise. And again, all the stuff underneath the stack has really enabled this.

Jon Krohn: 00:22:16

An amazing answer and crystal clear, Peter. You took a complicated concept and broke it down in a way that was very easy for me to understand and no doubt our listeners as well. In fact, you did it so well that it makes me think that you might have been an Assistant Professor at Stanford for four years. Oh, wait. Oh yeah, you did do exactly that. So a question that I have for you is for people who come from the academic, highly technical background that you have. Do you have guidance for people looking to succeed in the commercial space? So now you’re the founder and CEO of this high-growth company. It isn’t so uncommon for me to see people who come from technical undergrads, or maybe even technical graduate backgrounds where they drop out of their Ph.D. or they finish their Ph.D. and they go and found a company. A lot of those people would become CTOs, but you went really deep with your academic background, four years as a faculty member at one of the top institutions in the world for technical applications of data science and computer science. And then you became founder and CEO of this high-growth company. So yeah, I put a lot of context there. There’s a lot of different places you could probably go from there, but generally, what guidance do you have for listeners to transition from an academic or technical background and have the kind of commercial success that you have?

Peter Bailis: 00:23:48

It’s a great question, and I think I’m still learning literally every day in this job. It’s been a real education. I think two frameworks I found helpful in thinking about the transition from academia and the research community to software startup, doing it for dollars, you name it. I think the first thing that’s really underrated in academia is you actually do go and build a really useful toolkit, no matter what space you’re in, right? If you spend enough time grinding on problems, wandering through the wilderness, that is choosing research problems are doing advanced coursework and you have to work on projects that are open-ended, and so on. You’re going to pick up a set of tools, whether they’re statistical or algorithmic, or in terms of systems. And I like to think about CS, in particular, as a discipline providing this toolkit, where you’re going to bring it with you to go and solve problems. And in knowing what’s in your toolkit, and then also having a growth mindset to figure out what else you can pick up in your toolkit is super useful. A lot of it comes down to pattern recognition, right? I think for me, a lot of my research was less, how to come up with entirely new methods, but how do I take the methods that are how the things are supposed to work and make them work on messy data, define the right types of problems and make stuff really fast.

Peter Bailis: 00:25:10

So, speed, which helps a lot at the company because we process cloud-scale data and there’s a lot of algorithmic challenges to make things runoff for billions of features and billions of rows, and so on. Just knowing your toolkit and building that toolkit is super, super useful. I think in academia, the second part is, in academia you got to work on whatever you want, right? There’s a space of interesting problems and you can just go carve out whatever space you want and work on it. We worked on a bunch of really fun stuff. We worked with people in doing earthquake monitoring. We looked at fast algorithms for processing their time series. We worked with the big tech companies. We worked on stuff that no one cared about in terms of… I’m not saying no one cared about this cool work, someone will probably care about eventually, but it’s like saying, “Someone’s going to care about this problem eventually, it’s a cool problem to go solve.” And so I think the thing that you have to do when you go into the real world, quote-unquote, is just be really diligent and thoughtful around who cares. And one of the things I find that’s very challenging, but also super gratifying is there are certain metrics in a business that just tell you how well you’re doing.

Peter Bailis: 00:26:15

And the two easiest ones, which are North Star metrics are revenue, right? Are people going to pay you hard-earned cash to solve the problems that you’re solving? Or internally, are they going to give you resource to go to this? Then engagement, right? Are they actually engaging with a product? Are they getting value and so on? And so I think that in some sense, what really pulled me into this role was this idea that I get to work on super hard problems over time, less and less of them incredibly technical, but you get very clear validation about whether the problems are meaningful and useful. And so, putting those together, if you can take your toolkit of things you’re really good at and things you’ve really developed your craft in, find a problem where people really have a lot of pain and you think you make a dent just iterating super-fast with those folks, it’s unlike writing a paper or doing a Ph.D. where you’re wandering through the wilderness defining problems. Here, you’re wandering through the wilderness trying to figure out, “Where’s the pain? What can I go solve?” And so on. But when you find those two that collide, it’s super interesting.

Peter Bailis: 00:27:08

And in the company context you’re also doing this as part of a team, right? I don’t think academia is nearly as much of a team sport as people might expect. It’s always, “Who’s the lead author? Who’s got credit on this?” And so on. In doing the company, you have 10 to 1,000 people just grinding on one thing with a bunch of different skill sets. So you can take your toolkit and find people to amplify that toolkit and fill out other parts of the toolkit. And that’s just super gratifying. I think I underrated how valuable that would be, heading out of academia.

Jon Krohn: 00:27:38

Yeah. On the last point that you made there, a brilliant friend of mine who is a listener of this podcast and who shall remain nameless, but who lives in Sydney, and I know you’re listening, was an academic for a really long time and left because he was sick of being in meetings where people were fighting over their author order on the paper name. And he was like, “This is not what I want to be doing with my life.” And so found an amazing company to work at, that lots of people would know, but I’m not going to mention. That allows him to still have this academic feel, but way more teamwork, way more about, “How can we be solving problems?” Like you’re saying, “How can we be finding pain points and executing on those?” And so I love the frameworks that you provided. If I got this right, the first framework you provided for people making this transition from an academic background to tech startups was awareness of your toolkit and then having a growth mindset beyond the tools you have. And the second one was figuring out who cares about what your business is doing and having revenue, engagement and pain points as great places to look to see who might care about problems you could be solving. All right. So given all of the growth that you’ve had, I’m sure you’ve done a lot of hiring. Peter, what do you look for in the data scientists or the software engineers that you hire?

Peter Bailis: 00:29:08

We do a lot of hiring, and it’s funny because in our business, right, we’re building something that combines a lot of different skill sets. So we have some pretty hardcore statistics in machine learning, beyond just commercial off the shelf [inaudible 00:29:26] package. We write basically all of our hypothesis testing, false discovery rate controls, all of the core ML we basically write on our own because nothing scales to the size of data that we are processing for our customers. So it’s all bespoke and usually have some tweaks that have to make it either run fast or just work in a robust way, in a way that whatever stats paper it came from, they probably didn’t think about these issues. So, we hire a bunch of people who have this core statistics focus, but we have to run this on the cloud at a really big scale and interactive speeds.

Peter Bailis: 00:30:02

Also, hire people on the engineering side who are in basically data-parallel processing. So they’ve worked on databases or they work really good at distributing systems, or they’re just really good low-level systems hackers, like in networking or otherwise. And then there’s a whole bunch to make this useful to users, right? It’s human-in-the-loop. And it’s on these data sets that unfortunately there’s no great public corpuses of tables you can go do a bunch of research on, unlike the internet where you can do unstructured data. It’s really hard to get structured data, because this is really, really valuable to customers. So you have to combine this ML and data systems and almost HCI bent, and it’s really hard to get one person who can do all three of these. You’re lucky if you get two out of three.

Jon Krohn: 00:30:47

Right. Totally.

Peter Bailis: 00:30:50

And that’s the fun part of the company. I actually, one reason we started is I didn’t think it would happen unless if we did it and brought people together around this common goal. But as a result, when we hire, at least on the technical side, it’s so important that we have people with that type of growth mindset where they are, let’s say, hungry enough to dig in, roll up the sleeves, super excited about our big, big audacious projects; but also humble enough that they know they don’t know all the answers. And we spend a lot of time working between product and design and ML and our core engine and our full-stack engineering teams to really iterate quickly with customers. And that’s something we just test for really, really intentionally in our interview process, it’s not just, “Is this person really, really good at one or possibly two of these different dimensions, but are they also super hungry about going and building something special and something different and something new?” And that’s also, honestly, one of the reasons why people come here, instead of going to Google Brain or choose your favorite big tech companies that they’re not just building a better, faster, cheaper X, they’re building something net new by coming together.

Jon Krohn: 00:32:06

Nicely said, I love those answers. And some of them, like the humility aspect, I feel I don’t hear that enough. And for me personally, that also is such a huge thing on teams that I hire. So I love this all together, the growth mindset, humility, hunger, and having at least a couple of the core the competencies that you’re looking for. And knowing that it is very difficult to find people who have everything in one. But yeah, they can grow into it. Nice. So I expect that as your company’s grown, you don’t get to spend as much time as you might like actually rolling your sleeves up and writing code, but it was a big part of your past, and I’m sure you find ways to squeeze it in. So are there particular software libraries that you love to still use today? And then, the second part of this question is now that you’re getting more and more into product development and management, maybe there’s productivity tools or management tools that you’d also have to recommend to us?

Peter Bailis: 00:33:13

Totally. I still think my favorite tool to use is pen and paper or a whiteboard, one of the things I miss. My wife has not let me get one, but one of the biggest things I miss is-

Jon Krohn: 00:33:23

Your wife hasn’t let you get a pencil?

Peter Bailis: 00:33:27

She’s good with the pencil. But I like the huge whiteboards. I mean, there’s nothing like a giant whiteboard to stand at and diagram something or work through some math. I really like just sitting down to be able to do that. I think that some of the new collaboration tools have gotten a lot better. So we just started doing some more of our remote syncs through FigJam, from Figma, which doesn’t support LawTech but is still pretty good. And you read between the dollar signs and extrapolate what the equation should say. I think we do a lot of prototyping with our customers. So tons of work with notebooks and then also cheap ways to prototype those notebooks and put them into UIs and in mocks.

Peter Bailis: 00:34:14

So it’s funny to take Matplotlib output and then paste it into a Figma mock that then you put in front of a user, but it’s pretty compelling, you can literally see their data. So I guess, I’m a big Figma fan given how much product development we do with Figma and FigJam. And then just good old Jupyter Notebooks and pen and paper. I think that, on the management side, I think keeping a clean inbox is super hard. I think the biggest thing I use is mostly just notepads. I use notepads obsessively, just the TextEdit on my machine and I used to freak out because I said, “I’m going to drop all this different stuff. I’m not going to get it all done.” But I realized if I can get three or four things done a day that I really care about outside of the meetings and whatever’s popping up interviews, that’s pretty good right? So, I just have these notepads, I put on my desktop and then migrate them over to a folder and they have the most urgent stuff. And then if I drop stuff and it doesn’t come back, then maybe I didn’t need to do it in the first place.

Jon Krohn: 00:35:19

That’s great. I love that. Yeah, Peter, that is cool. I mean, for many years, I too was a big fan of the to-do lists in TextEdit on my Mac and that jointly with a note on my phone. And now I don’t know, I’ve got all kinds of new-fangled ways of keeping track of my to-do list, but does that really matter? It’s still just a few words. So, that makes a lot of sense to me. And I love the other recommendations around FigJam. That is actually one I’m not familiar with. We use Figma a bit at my company, but I’m going to look into FigJam, and yeah, pen and paper, hugely valuable getting away from the screen and writing things out. I find you get a lot more clarity that way. And then whiteboards, I’ve got just off-camera, tons of big whiteboards, I am allowed to have them in my house. So I’m sorry. But yeah-

Peter Bailis: 00:36:16

I should probably ask. I’ve been afraid, I got in trouble at Stanford because we had these 14-foot ones we put in the office, we just measured how big the room was and we want a whiteboard that big. So, that’s my platonic ideal. It’s not the painted-on one, you get a giant one, but you have to have multiple people carry it up and stuff. So that’s the ask I’m saving up for, I should see it, if my wife… [Sueme 00:36:38] if you’re listening to this, then [crosstalk 00:36:40]

Jon Krohn: 00:36:41

We’ll be sure to send her a special CD version and then she can try to find a CD player in your home and play it. Yeah. I think the whiteboards a huge for me collaborating. Prior to the pandemic, that was the biggest thing that I was like, “I don’t know how I’m going to replicate this in the pandemic environment.” For me using whiteboards with my team, both for tracking what we have to deliver on today, in the coming weeks, and then just ideating on different things. So, what I do, I’ve never had a 14-foot whiteboard, but I’ll have a stack of whiteboards on one side of the room and you shuffle through them for today’s to-dos, “Okay, here’s one that has some old stuff on it we can erase that and I ideate on this problem that we’re trying to solve together.” And I was unable to replicate that through the pandemic. There’s just some kinds of R&D that we didn’t do all together. And I don’t know. So I’m stoked that we’re spending a bit of time now in the office. It sounds like you are today in the office. So yeah. I mean…

Peter Bailis: 00:37:49

We’re inching back into it. It’s funny that you mentioned not being able to replicate the whiteboard. I still don’t think it’s a replicable. We had a meeting earlier today, talking about some prototypes we’re doing and we were talking about how we should split up our folds for doing some stuff around k-fold cross-validation, and there’s some calibration we want to do with the models to go assess, can we let them abstain from making certain predictions? And it was just so easy to get on there, you draw your fold and the data set, you’re just like, “Okay, we have a separate holdout set or we could put this actually and split it from all these and then resample.” And it’s just so much easier to visualize, even saying it now it feels so abstract, versus you draw six boxes on the whiteboard. You’re like, “Okay, great. This makes tons of sense.” And I think that tangibility is tough. I’m kind of bullish. I hope the metaverse stuff works out maybe not from the Meta Company, but from someone because [crosstalk 00:38:42]

Jon Krohn: 00:38:44

From anybody else.

Peter Bailis: 00:38:47

But I’m not holding my breath, but there’s something, I think… Also, I think just from a thought process perspective, right? There’s always this, I think Leslie Lamport had this amazing thing, this computer scientist. He said, “If you think you have proven something, write it down.” Or, “If you think you have an idea, write it down. If you think the idea is right, try to prove it. If you think you’ve proven it, write it up in TLA and have a machine check it.” Or whatever. There’s some structure in a lot of these mental processes that actually helps with the thinking. I think something about being kinetically active, and actually just going through the physical process of working through something with your body is a huge shift compared to just sitting down. In the same way that writing up a paper or writing down a proposal or doing a one-page PRM, there’s something about the act of doing that, that sharpens your mind and really focuses. And the more abstract the problems, the easier that I think that connection to… I sound crazy, but I think it’s really, really something I’ve missed.

Jon Krohn: 00:39:44

No, no.

Peter Bailis: 00:39:44

Is that ability to be kinetic when you’re thinking.

Jon Krohn: 00:39:47

I agree a hundred percent and something that I’ve thought about a lot before, and I don’t know if I’ve talked about it out loud, but I think that I don’t know if it’s something to do with the generation that we’re in and you look like you’re probably roughly my age. We probably got… The internet started to become a thing probably around the same time in our lives and typing. But I grew up writing everything with paper and a pencil. And I don’t know if it’s because of that experience or if it’s something about, I have a hypothesis, that it’s something to do with that only primarily using motor cortex on one side of my brain, that I am way more creative with my thinking when I’m writing as opposed to typing, which requires both hands. And so, you described that process there of when I’m trying to be creative, it’s a lot easier for me to be writing on my notepad to be coming up with ideas. So when we came up with the outline for this episode, before we started recording that isn’t something I’m going to type up. I’m thinking and it helps me have… Maybe more of my mind is available for open field thinking. And so, that’s one thing. And the other thing that I wanted to touch on related to this whiteboard idea is that in some ways we could say, “Okay, everything that you guys have described, Peter and Jon on the show, I could very easily replicate this digitally.”

Jon Krohn: 00:41:23

What a whiteboard does is very simple. You’ve got a whiteboard and then you choose different colored markers and you draw them on a screen. You say, “Okay. Let’s use a tablet or something so that we recreate that writing on the screen.” And then you can share that with your team. We can do it over the internet very easy. I’m sure there’s hundreds of software tools that people have devised that do exactly just that and tons more. But the key thing that that doesn’t do is get me away from my computer. Which a lot of the most innovative and helpful sessions that my data science team has had, I would call them a local science conference where we’re like, “Okay, we’ve got this big problem that somebody on the team is trying to tackle.” We book a separate meeting room away from our usual room that doesn’t have any screens, that doesn’t have a whiteboard. And we can’t bring our computers with us. And you just start from the beginning and somebody draws, “Okay, here’s the problem that I’m tackling. Here’s where I’m getting stuck.” And it might take them an hour to do that, but everybody’s sitting there listening in a way that if they did that on a computer, even if everybody closed every other application on their computer, because you use your computer for so many different kinds of things, your mind can’t help but think about, “Oh, I wonder if I’m getting an email?” Anyway…

Peter Bailis: 00:42:41

No, I totally agree. I think that one of the amazing things about just turning off is also to be able to turn back on. So especially in research, when you have the rough problem you want to go solve you think it’s mostly solvable, but you don’t quite have all the pieces. I remember we were working on some stuff trying to run related to Sisu, but not stuff we’re using today. How do you learn models over streams, right? There’s all these streaming algorithms from doing accounts and sums. How do you do it when you want to learn a linear model or run grading to set up our stream. And it was pretty close. We had some promising initial results in notebooks, like I said, and then it’s like, “What happens next?” And I think when you find a good problem, there’s this part of my brain that turns on and it almost is more on, getting ready for bed or making dinner, or in the shower. So many of the things we research end up being literally shower thoughts. And then you go back into the notebook or back into the prototype or back into the lab meetings and it’s like, “Hey, what about this?” And I think that shift in modality is really fun.

Peter Bailis: 00:43:47

I think it [inaudible 00:43:48] as a grad school at Berkeley, and Berkeley’s amazing, because they have so many different libraries on campus and you can easily walk between them. So I had this rotation where I would go to the Architecture Library. Then I would go to this café, Caffè Strada, I think it was outdoors, then go to dinner and on the Southside and go to the Law Library and then go to the Stacks. And that was my route. And then I would go for a run at 11:00 PM. It was the best to have that rotation of places to think and just super, super fun. So I think that’s one thing I missed the most with lockdown, is just being able to have that mobility. And I think one of the nice things about actually being in a company versus being an individual researcher or a smaller group of people, we have a 10 person lab, is you can replicate some of that change of scenery by talking to different people.

Jon Krohn: 00:44:32

Yeah. [crosstalk 00:44:32] Absolutely. Yeah, yeah, yeah. Some people I talk to are a verdant forest and others are a dark storm, so yeah. All right. So we managed to go quite a ways off piece here really enjoyed this conversation about whiteboards and ideating and hopefully some audience members out there did as well. I’m sure there were some people out there going, “Yeah, it’s totally what it’s like in my head.” So we’ve touched on this a couple of times in the episode and I’d love to dig into it more, which is your academic research as an Assistant Professor at Stanford. And I’m sure that followed on from what you were doing in your Ph.D. at UC Berkeley if I remember correctly?

Peter Bailis: 00:45:15

Yep. Yep.

Jon Krohn: 00:45:17

And so, yeah, so you have this long history of tackling complex computer science problems, it sounds like some of those rolled into what you’re doing at Sisu, but yeah. Do you want to dig into it a little bit more? I’d love to hear about it.

Peter Bailis: 00:45:30

Totally, totally. So I’ve always been in data and making stuff fast. And when I started, I took this job at Stanford back in 2015, I was going on the tenure track, had about seven years to go, and make it happen. And I’d previously been working on some cool problems in data around making systems faster. So my thesis was specifically on what guarantees can you put on data if you don’t allow databases to communicate? So you can be on opposite sides of the planet instead of being limited by the speed of light until we solve quantum entanglement, how can I go and just run operations independently and what types of guarantees can I provide? And when do I need to synchronize so, a really simple example. If want to make sure there was only one person named Peter in the database, then if you’re on opposite sides of the planet, we have to communicate, otherwise, if we make sure that no one in the database is named Peter suddenly we can guarantee that without synchronization. So what makes those properties different? Yada, yada, yada. And it was a really fun thing; I wrote a bunch of fun papers. But when I started at Stanford, I said, “Well, look, the databases that we had built, and a lot of other smarter people have built are so fast that you can basically run one transaction per person on the planet every minute with half a million dollars of hardware.”

Peter Bailis: 00:46:51

And so, unless if Amazon on gets really, really, really popular it’s not a problem anymore. And I took a step back and said, “What have I spent seven years on?” And the broad theme, and I like working on these really hairy, poorly specified problems and carving out little bits to make incremental progress on was in a world where data’s effectively free to store, and I have as much computed as I want to scale out to answer any question, make any query run faster. Where does the bottleneck lie? And even before starting at Stanford, spent some time with some friends who were at startups who were super smart, technically, but were struggling with questions like, “I’ve got a mobile application deployed. Some of my users have really poor engagement. Why?” One of them was a smartphone application to tell you if you’re driving well or not. Sometimes people were marked just really good drivers, just consistently, why? Well, their models, they were shipping every release, some of them didn’t work well with some of the different Android devices. So you were either considered a great driver because you bought a crappy phone or a bad driver because your phone, Exceleron, was not behaving well. So we started to see this problem crop up and I gave myself two years actually to see if I could write papers in the space. And the general flavor of things we started working on were two key themes.

Peter Bailis: 00:48:14

One was a lot of the techniques required to go, in terms of prioritizing people’s attention in these large-scale data sets, there’s really good statistical methods for going and solving a lot of problems. If you map someone’s problem down to these methods. So for example, I want to find unusual points, great density estimation. What’s the best nonparametric density estimator? Kernel density estimation. Great. So, that’s a primitive we can go use. We started treating stats methods as primitives. But then the question was how do you compose these primitives? And then how do you make them run very fast? And as soon as you start looking at end-to-end pipelines of these operators, there’s a huge amount of optimization opportunities that you don’t look at in isolation. And in turn, you can turn into software systems that go and self-optimize in some sense. So concrete example, one of the first papers we wrote was this paper, Accelerating Kernel Density Estimation, which has end scored operations scales really poorly over even modestly sized data sets. And we realized that if you just want to find these unusual data points, you just have to refine an estimate until you’re very clearly in a normal region or in an abnormal region. And you can actually stop your estimate of the Kernel density estimate really early, depending on the region that you’re in.

Jon Krohn: 00:49:28

So, even though there’s this polynomial time complexity, this n² time complexity for the computations, you’re able to in a lot of circumstances stop early.

Peter Bailis: 00:49:42

Exactly. Because you only want to know a binary bit, are you in a dense region or a sparse region? And once you know for sure, because your air bars are above the cut or below the cut, you’re done. And you can prove that it’s the end of the seven, eight, or end of the… Something like a small asymptotic improvement, you can prove that. But also just in real-time made these things that were not tractable over the datasets we were looking at, way, way, way faster. And we did a lot of work in that flavor and also worked on some really crazy problems as well. So for example, we use this at Sisu, but now Datadog uses it and TimescaleDB and so on. This is where I got started doing this human-in-the-loop stuff. One of the problems we worked on was a lot of people when they’re looking at time series data and let’s say, find some spike looks weird or find some time series signature, that’s problematic, they just plot the raw data in the time series. And if you zoom out and you have [inaudible 00:50:36] in your data, it’s super noisy, right? It’s really hard. It just looks like a bunch of spikes. And so we worked on this really hairy problem of saying, “What does it mean to choose an optimal window to smooth your data?” And we actually came with a policy around preserving the skewness of the data set and then minimizing the variants across the plots.

Peter Bailis: 00:50:55

And then it turned out to be really slow, again, another n² problem to go and compute the optimal, whatever. So then we made that really fast and then we made a JavaScript library and a bunch of people use this now to smooth their visualizations. And it was funny. This was one of the first inklings I had that maybe doing this for real would be more fun than doing it in a lab because we had a massive user study of 750 people on Mechanical Turk trying to pick out like, “Here’s a plot of taxi volume in New York, in what month did the volume drop?” And you’d have this blind study where you’d show people the un-smoothed thing. You’d show them a bunch of alternative smooths you’d show what we did with this algorithm called ASAP. And it was like, “Why can’t we just show this for real users?” And we were lucky because we had some good intuition and that’s why folks like Datadog have put this in their products. But it is just fun to basically say, “What’s the right thing to do? What are the right statistical measures? How do you make them fast to make them be usable?” And we ended up putting all this together inside of basically software we gave away for free. People started picking it up and at a certain point realized we can make this work for a bunch of big tech companies. We had public information, we had 20,000 queries a week on this backend we used to have at Microsoft and written papers of Google and Facebook about some of the work we did there.

Peter Bailis: 00:52:07

But it was like you said at the start of the episode, everyone has this type of data now. In 2015, Snowflake was not a really huge thing. Right? It was just getting bigger. But by the time it was 2018, it was clear that more people would have this type of data. And we could do these types of optimizations and end-to-end usable machine learning for people who weren’t just in tech. And again, it’s like standing on the sidelines and some of the stuff seems not obvious, but like, “It should be built. So why isn’t it being built?” And I think it comes back to, again, how we hire. It’s knowing the right statistics, having the right taste. It’s knowing how to make them run really, really, really fast. And it’s making them useful to people who would never write these pipelines in the first place but would hugely benefit from them.

Jon Krohn: 00:52:57

Nice. So if I can try to summarize all of what you just eloquently said in a phrase, it might be that a big focus of yours, something that I picked up on a couple of times there, is taking processes that if you were to run them in the naive implementation comprehensively, it’d be this polynomial time complexity, this n² time complexity that in practice for getting real-time information from some process, is impractical. And so you come up with some tricks, some ways of going from that really intractable time complexity to something that’s fast, something that’s instantaneous, and then you present that to your users. And so that was a problem that you’ve been tackling for a long time, and you’ve realized there’s this opportunity to go from paying people a few dollars an hour in Mechanical Turk to be looking at these kinds of results to having it the other way around having an even bigger number of people paying you to see their data in real-time and make that a commercial application?

Peter Bailis: 00:54:13

Absolutely. And I think that the key thing I’d say is, it’s not just… I mean, these are in some sense tricks, right? It’s algorithmic techniques and it’s different approaches-

Jon Krohn: 00:54:23

It’s not magic. No, no magic.

Peter Bailis: 00:54:25

Well, the key thing is that if you just look at these kernels basically in isolation, right, they’re super well optimized, right? But if you look at the end result, the people are going to be consuming this information and you look at them as a full pipeline, there’s a huge number of algorithmic and systems things in my toolkit that I know I can make this work. And then by working with great product people, you can actually make it intuitive. Sometimes it’s dirt simple once you look at it. But for example, for this ASAP algorithm I mentioned for the smoothing, if you have, let’s say 800 pixels to show a graph and you have a billion data points, most window sizes yield the same visualization when you’re looking at 800 pixels.

Peter Bailis: 00:55:09

So you can downsample your data appropriately before you start running some of the expensive search procedures. And again, there’s this set of tricks that we learn to apply even automatically and then build up interfaces around them. So it’s not that the tricks and speed up are independent of the interfaces, because they’re enabled by it. It’s like looking at the end-to-end pipeline. And I think a lot of work and ML in data science, I think, is often in the context of just an MLR data science workflow. And we talk about compiling the slides. If you actually think about what needs to go in that slide, what people need to see, suddenly that gives you a lot more degrees of freedom in terms of the choices of models you use. But in a lot of ways helps you restrict the optimization space quite a bit because you know you don’t have to go and compute over a lot of the data that if you just said, “Hey, give me this estimator, give this model, give me this thing?” In a vacuum, it just wouldn’t make sense. And that’s again, coming back to working on problems that are useful to people and interesting, and that problems that are useful to people give you constraints and the constraints allow to optimize.

Jon Krohn: 00:56:20

Awesome. Well, that’s super interesting and I would love to take all day with you and ask you more and more questions, but you’ve got a big, fast-growing company to run. So I’m going to leave it there. We’re going to cut to the question that I ask all of our guests at the end of the show, which is, do you have a book recommendation for us?

Peter Bailis: 00:56:41

Somewhat unrelated to the stuff we’ve talked about so far, I think especially with a lot of the debate over again, metaverse and NFTs and authenticity, I’ve been reading a pretty interesting book by one of the best wine importers in the Bay Area, I’m not a wine snob, but a friend told me about this book, this guy Kermit Lynch, who’s one of the first people to really dig into importing great wine, he’s 25 years old. So it’s from a different era, which is amazing in the era and we’re buying Bored Apes on Open Sea, right? It feels completely divorced from that talk about getting away from the computer. But it’s also interesting to think about how they approach things like authenticity and certification of physical assets in a, not quite pre-digital world, but the pre-metaverse world, let’s say. It’s been fun to read that and then also compare it to what’s going on in the media and where future work may be heading and all these sorts of things. And just wondering what happens for the future of our physical assets and how do we think about them? And so on. So maybe [crosstalk 00:57:46] provoking, 1988, I’d recommend it.

Jon Krohn: 00:57:49

All right. Yeah. I’m not a wine snob, I’m doing this to learn more about technology.

Peter Bailis: 00:57:55

Exactly.

Jon Krohn: 00:57:58

Great. I love that book recommendation. So, Peter, you’re clearly brilliant. I’ve learned tons from you in this episode. I’m just sure lots of our audience members have as well. How should people follow you? What’s your social medium of choice?

Peter Bailis: 00:58:14

I’m mostly on Twitter, @pbailis and increasingly as my role as CEO, I’m on LinkedIn as well, and sometimes I write blogs on our Sisu Data blog. I have an agreement with our marketing team that anything I sit down and spend more than two hours writing, they’ll put it up on the blog. So there’s sometimes interesting content there.

Jon Krohn: 00:58:34

Nice. All right. Well, I’ll make sure we have all three of those in the show notes, your LinkedIn, your Twitter, actually not in that order. Let’s do the other order since you [inaudible 00:58:43] Twitter first, then LinkedIn, and then the Sisu blog. Brilliant, Peter, thank you so much for taking the time to be on the show, and hopefully, we can have you on again at some time.

Peter Bailis: 00:58:52

Thank you so much. It’s been a real pleasure.

Jon Krohn: 00:59:00

Holy crap. What a remarkable person Peter is. I love how he’s so clearly, wildly intelligent, but nevertheless, so grounded and easy to relate to. In today’s episode, Peter filled us in on how a decision intelligence engine like Sisu Data’s, enables decision-makers in a company to have immediate access to the causal factors behind their critical business metrics. He talked about his two frameworks for succeeding at growing a tech startup, namely knowing your toolkit and having a growth mindset to expand it. And secondly, having a clear idea of who cares about your business from either a revenue, engagement, or pain point perspective. He talked about what he looks for most in the folks that he hires, including humility, hunger, and expertise in statistics and computer science. And he talked about his favorite productivity tools, including classics like a good old physical whiteboard or pen and paper, as well as new digital tools like TextEdit, Jupyter Notebooks, and the FigJam Digital Whiteboarding tool by Figma. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Peter’s Twitter, LinkedIn, and company blog, as well as my own social media profiles at www.superdatascience.com/527. That’s triple W dot www.superdatascience.com/527.

Jon Krohn: 01:00:28

If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show. All right, thanks to Ivana, Mario, Jaime, JP, and Kirill on the SuperDataScience team for managing and producing another fun and informative episode for us today. Keep on rocking it out there, folks. And I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 527: Automating Data Analytics

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 527: Automating Data Analytics

Share

SDS 527: Automating Data Analytics

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025