Podcasts SDS 511: Data Science for Private Investing — LIVE with Drew Conway

69 minutes
Data Science, R Programming

SDS 511: Data Science for Private Investing — LIVE with Drew Conway

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

In our first-ever live-recorded episode, we cover private investing, how data science can help private investment decisions, what Drew looks for in who he hires, his infamous Venn diagram for explaining data science, and more!

About Drew Conway
Drew Conway is a prominent data scientist, entrepreneur, author, and speaker. Drew has built companies, and he has also advised and consulted for companies across many industries, ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. He also sits on several advisory boards. Drew is currently a Senior Vice President at Two Sigma, where he is leading the development of several new initiatives at the firm. He is best known for the development of the Data Science Venn Diagram, which outlines how data scientists fit into the data analytics ecosystem. Drew is also the author of “Machine Learning for Hackers,” a popular introductory text on machine learning techniques.

Overview

We recorded this episode at the seventh R Conference in New York. We previously discussed the history of the R Conference in episode 501. Drew got involved with the group behind the R Conference while in graduate school. He learned R as an alternative to a tool he wasn’t comfortable with in class. He found a meet-up for R programmers and became hooked after watching a course on how R could help personal finance data. The group grew as they brought in speakers until they are where they are today: the world’s largest R community.

We discussed Drew’s book, Machine Learning for Hackers, which I personally found hugely useful when I was transitioning from academia into practitioner approaches. The origin of the book is tied to Drew’s work with the New York R community. During this time Drew found a lot of software engineers who were getting asked to build models for forecasting or inferences, which is challenging to implement without deep technical knowledge. So, could they write a case-study-based textbook to introduce different algorithms in each chapter to give these engineers a jump without sending them back into technical, mathematical textbooks? Despite the title, there’s nothing in it about hacking at all.

Drew works now at Two Sigma, one of the world’s largest hedge funds. He works in their private markets business, something the company is less known for. Drew says his work, and the company at large is a great place for working with data. There’s a lot of it. And the same data in the public market can be informative in the private market—watching micro-movements in the economy to make macro inferences. The speed is also quite different. Hedge funds execute multiple trades per second, in the private market work, a deal a month is an excellent pace. Drew structures his data science team to interact with human traders by being rigid about not separating the data science team and the investment team. They utilize a “buddy system” where there is a one-to-one map between the two teams to collaborate on problems together. It can take a long time to master each other’s jargon, so having constant communication and heads-up conversations is key. In hires, Drew is more interested in how a candidate looks at a data problem rather than starting with their technical skills.

From there we dove into the live audience Q&A where Drew talked about his data science Venn diagram, what Drew is most excited about learning at the conference, building effective models for post-COVID normalization, and what Drew believes the future of data science and the data science community holds.

In this episode you will learn:

The R Conference and NYHackR [6:33]
Machine Learning for Hackers [20:17]
Two Sigma and Drew’s work [28:27]
Drew’s team structure at Two Sigma [35:12]
Audience Q&A [46:27]

Items mentioned in this podcast:

New York R Conference
NYHackR
New York Open Statistical Programming Meetup
Two Sigma
Data Science Insider
SDS 501: Statistical Programming with Friends
Machine Learning for Hackers by Drew Conway and John Myles White
SDS 437: Data Science at a World-Leading Hedge Fund
SDS 479: Knowledge Graphs
Gödel, Escher, Bach: An Eternal Golden Braid by Douglas R Hofstadter
The Three-Body Problem by Cixin Liu

Follow Drew:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 511 with Drew Conway, Senior Vice President of Data Science at Two Sigma.

Jon Krohn: 00:00:12

Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and bestselling author on Deep Learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Jon Krohn: 00:00:42

Welcome back to the SuperDataScience podcast. Today’s episode is a special event, a first of its kind, the first ever live recorded episode of the show. We shot it at the New York R Conference which was run virtually this year due to the pandemic, but this still enabled us to thrive off the energy of knowing there’s a live audience watching as we film and the audience also got to ask us questions, which all ended up being outstanding. My courageous guest for this live episode experiment is the incredibly accomplished and incredibly articulate Dr. Drew Conway. Drew is a Senior Vice President at Two Sigma, one of the world’s largest hedge funds, where he is responsible for applying data science strategically to private investment decisions such as those in real estate and private equity. He also co-authored the classic hands-on O’Reilly book called Machine Learning for Hackers.

Jon Krohn: 00:01:45

He was co-founder and CEO of the New York-based data science startup, Alluvium which was acquired in 2019. He’s formally advised countless successful data-focused startups from the New York area, such as Yhat, Reonomy and Insight. And prior to all of that, he obtained his PhD in politics from New York University. In today’s special live episode without a single retake, Drew manages to flawlessly cover what private investing is, how data science can lead to better private investment decisions, the differences between creating and executing models for public markets, such as stock exchanges, relative to private markets, what he looks for in the data scientists, he hires, including how he interviews them, and his infamous Venn diagram for explaining what data science is. While slightly technical for brief moments, today’s episode is relatively high level so should appeal to anyone who’s interested in how data science can be applied to investing or to anyone interested in working at a world-leading hedge fund. Alright. You ready to do jump ahead first into this live episode? Let’s do it.

Jon Krohn: 00:02:59

Drew, this is awesome. This is so exciting. We’ve never done anything like this before. This is going to be the 511th episode of the SuperDataScience Podcast. It’s going to air on October, 5th and there has never been in the 510 preceding episodes, a live filming of it. So yeah, very cool. Thank you for being open to this.

Drew Conway: 00:03:23

My pleasure. I’m excited.

Jon Krohn: 00:03:27

Maybe next year in June, presumably things will open up and we can go to the Alliance Francaise or the Alliance Francaise as Jared likes to call, and we can actually do something like this in person. I’d love to do SuperDataScience live with people but being able to do it virtually has its benefits as well. I can get scorched by the sun here in this spot. And you don’t have to leave home. You didn’t have to come to the city. You live upstate a bit, right?

Drew Conway: 00:03:57

I do. I live just outside of New York in a town called Pelham. So I’m about 30 minutes from Grand Central. And I would have loved to have done it in person too but this is just next best thing, so I’m excited to get to have this little chat.

Jon Krohn: 00:04:10

I am not super familiar with Pelham but I hear it a lot because I think it’s like the sixth train or the four train or something as it’s going [crosstalk 00:04:15]. It’s like the Pelham express train.

Drew Conway: 00:04:17

Well, made famous by the old and then new again, taking of the column one, two, three. So you can take the one or the two of the three all the way up to Pelham Bay, which is right on the Bronx side of where I live.

Jon Krohn: 00:04:31

Nice. All right. Jared is someone that you’re very familiar with as was familiar to anybody who’s here at the, R Conference because he gave that very warm introduction. We’re going to get into that in a moment. Jared was himself recently on the SuperDataScience podcast. He was an episode number 501. I’m sure there are many guests from the show that you know but I don’t know who they all are yet. I do know that you definitely know Claudia Perlich who was on episode number 437. It was the second episode that I ever hosted of the SuperDataScience show. She’s been a friend of mine for many years now and you work with her directly at Two Sigma.

Drew Conway: 00:05:12

I do.

Jon Krohn: 00:05:13

It sounds like you share-

Drew Conway: 00:05:14

So Claudia and I, we worked directly together. In fact, if we were back in the office, we sit right behind one another. We were physically right next to each other in the office and she and I, we co-lead team at Two Sigma called Strategic Data Science. And our focuses are somewhat separate while we do manage the same overall group. I focus the efforts at Two Sigma on our private investing business, which I think we’ll talk about maybe later. And Claudia focuses primarily her research on our core hedge fund business. And the strategic part of Strategic Data Science is we have this fun deal where we get to try all the new stuff. We get to be much more experimental, we get to take a longer range view of the work that we’re doing. And of course, getting to work closely to Claudia Perlich is a joy. So she was actually a big part of why joined Two Sigma, knowing that I would get [crosstalk 00:06:07]-

Jon Krohn: 00:06:06

Wow.

Drew Conway: 00:06:08

… Claudia every day.

Jon Krohn: 00:06:09

Cool. She is a special mind for sure, and a lot of fun. We are going to talk about that later. We’re going to dig into in detail, your strategic data science role and the private investing that you do there at Two Sigma. But we’re going to make our way there, first by talking about the R Conference that we’re recording this episode live at and its history. This is the seventh iteration of the R Conference in New York and it comes out of the New York R Community, which for a long time was called NYHackR and today is called the New York Open Statistical-

Drew Conway: 00:06:51

There we go.

Jon Krohn: 00:06:51

… Programming Meetup.

Drew Conway: 00:06:55

You can thank me for that jumbled mouthful. I think I made that switch at some point.

Jon Krohn: 00:07:00

It is what it says. I like it. I was imagining, actually, you can probably tell us about this. Because I’ve imagined in my mind many times when you realized at some point you wanted to have a name for the community that wasn’t going to have as much of a time limit on it. Because programming languages come and go.

Drew Conway: 00:07:21

Exactly.

Jon Krohn: 00:07:23

In the beginning, calling it NYHackR, very clever name. And so Jared talks about that. Jared was recently on episode 501, the SuperDataScience, and in that he talks about the beginnings of what is now the Open Statistical Programming Meetup or was NYHackR. He talks about the name of it a bit. And in talking about the Genesis of the group, he specifically sited you Drew, as one of the critical people in growing this group. Tell us about those early days of NYHackR.

Drew Conway: 00:07:55

Sure. The way that this story starts for me is actually when I was in graduate school. As a graduate student at NYU, I was taking my early statistics classes and in those classes, everything was done in Stata. After literally one session of the TA walking us through Stata and then Meda, if you’re familiar with the DSL that exists with in-

Jon Krohn: 00:08:19

I have never heard of that one.

Drew Conway: 00:08:21

Stata.

Jon Krohn: 00:08:22

Stata I have heard of. But not [Meda 00:08:24].

Drew Conway: 00:08:25

I basically threw my hands up and said, I am not going to invest time in learning this tool. I’d known about R. I’d seen code floating around the internet, timeframe wise is like late 2008. I use essentially that class to teach myself R. What I saw at that point was to try to find people that could help me learn it, and I happened to find out through folks that I knew in the department and around New York, that there was this group of people who were meeting to talk about R and it was a meetup. Even in 2008, the notion of going to a meetup was still relatively new. But of course I knew what it was, it was a startup that was in New York City so I thought okay, I’ll check out this Meetup-

Jon Krohn: 00:09:11

Meetup.com.

Drew Conway: 00:09:11

Yeah. Exactly.

Jon Krohn: 00:09:13

Meetups with a capital m.

Drew Conway: 00:09:14

That was the real deal back in the early mid 2000s. The original founder of the group was a guy named Josh Wright. Josh went on to be quite successful as the founder of Simple, if you’re familiar with Simple Bank, he was one of the founders of Simple. I’ll never forget, the first presentation that I saw on R was given by Josh and it was how to download your own personal finance data from your online bank and do a bunch of analysis on it inside of R. Once I saw that I was basically hooked. One of the things that I thought I could be useful at, because at the end of the initial meetup, what Josh said is a call to action to all of us and said, “Hey we have this space.” And we were actually sitting in a conference room at Union Square Ventures because one of the partners at Union Square Ventures had graciously allowed us to use this space for this one night. He said, “Hey, if anybody knows where we can get space to have a more regular meeting, let me know.”

Drew Conway: 00:10:17

And I said, “Well listen, I’m a grad student. There’s hundreds of classrooms at NYU that are not used in the evening. Let me see if I can figure something out.” And so Josh and I partnered on that and for that first year or so, I really just tried to help. Be helpful in getting space, be helpful in finding speakers. I spoke at one or more of those conferences or meetups rather. And eventually Josh said, “I think you’re better positioned to run this than I am. I’m starting this company. I’m a startup co-founder, you’re a grad student so the resource that you have a lot of his time, so maybe you could take this over for me.” And that’s what I did. And it was after that period of initial startup time where it was mostly just about finding space and speakers, we started using the tools of the day, early social media to start to build some momentum around the group and there was a real demand for it. Then it became really easy to get speakers. And that’s when I met Jared and folks like Claudia and actually many of the speakers who’ve been at the conference today, I actually met through being part of that early R or data science community in New York City.

Jon Krohn: 00:11:30

Very cool. And now it is the world’s largest R community. So you set it off in the right direction for sure. And it is amazing when this group, the Open Statistical Programming Meetup as it’s called today, this R Conference, it really does host the biggest names in data science. For me, that was a draw in the beginning and that’s how I met Jared was through going to the Open Statistical Programming Meetups. And I was drawn by these amazing names that you see, like Hadley Wickham or Claudia or yourself, where I’m like, wow, I’ve read their textbook and now I can go see them in person. And it still feels intimate. Even when someone big like Hadley Wickham comes, they’re still only a couple hundred people there. He makes eye contact with every single person who comes in.

Drew Conway: 00:12:25

I think that’s one of the things that, and all the credit to Jared in continuing the community aspect of it. The really great thing about the Open Statistical Programming Meetup and the data science and R and Stat community in New York City, oftentimes particularly in the early days when we were organizing it, people would say, New York City obviously is a huge city, the largest city so you would expect it to be big. But even in 2010, 2011, New York City was not the largest technology community. Obviously you have San Francisco but also places like Boston and other large university towns where you might expect there to actually be a greater density. And so why is New York City doing so well? And I would always think about this theory. One part is, what are the anchor industries in New York and what is driving people to want to learn from one another? Obviously finance being a key one but also a media and advertising and retail, all of these industries, particularly at that time were really starting to ramp up their use of data. And R became the lingua franca of research and development, even both in academia obviously but also in industry.

Drew Conway: 00:13:41

You had this really nice confluence of academics, folk like myself in graduate school who were using this time to really skill up and learn what was out there. Obviously build relationships with peers who were in Academia. But then also being able to see how it was actually being used in industry from people who were real life practitioners, I think that created an amazing amount of cohesion. Then there’s this other simple fact about New York City which is for better or worse we’re all crammed on that little island. And so it’s very easy to get around and go to a meetup. I remember the first time I went to the R meetup or Stat’s meetups in the Bay Area, you’d have to get in your car and drive half an hour, hour and 90 minutes from downtown San Francisco to Menlo Park, or you’re going, there’s something over in Mountain View, whatever, that’s a long hole. Whereas in New York City, get on the subway, go 20 minutes in one direction and you’re there and you can be home in time to have a normal dinner. It’s simple things like that I think really helped in those early days and created the flywheel, and like you said now, the largest community in the world, something that’s very impressive.

Jon Krohn: 00:14:58

Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such, I only subscribed to a handful of email newsletters. Those that provide a massive signal to noise ratio. One of the very few that meet my strict criteria is the Data Science Insider. If you weren’t aware of it already, the Data Science Insider is a 100% free newsletter that the SuperDataScience team creates and sends out every Friday. We pour over all of the news and identify the most important breakthroughs in the fields of data science, machine learning and artificial intelligence. The top five, simply five news items. The top five items are handpicked. The items that we’re confident will be most relevant to your personal and professional growth. Each of the five articles is summarized into a standardized, easy to read format and then packed gently into a single email. This means that you don’t have to go and read the whole article. You can read our summary and be up to speed on the latest and greatest data innovations in no time at all. That said, if any items do particularly tickle your fancy, then you can click through and read the full article.

Jon Krohn: 00:16:10

This is what I do. I skim the Data Science Insider newsletter every week, those items that are relevant to me, I read the summary in full. And if that signals to me that I should be digging into the full original piece, for example to pour over figures, equations code, or experimental methodology, I click through and dig deep. If you’d like to get the best signal to noise ratio out there in data science machine learning and AI news, subscribe to the Data Science Insider, which is completely free, no strings attached at www.superdatascience.com/dsi. That’s www.superdatascience.com/dsi. And now let’s return to our amazing episode.

Jon Krohn: 00:16:54

And so, obviously for people in spinning distance of New York City or in New York City itself, the takeaway here is if you’re not already familiar with the Open Statistical Programming Meetup or the R Conference, these are amazing communities to get involved with, to learn from the best in the business. But there’s also another narrative here that I think is important for people anywhere in the world. Which is that there are probably something like these meetups, and they might be meetups with a capital m, like formerly on Meetup.com, or they might be somehow organized through a local university or some other industry group. But there are very likely real life communities like these in your area that are absolutely amazing to get involved with. There is something intangibly better about meeting people in person, as opposed to from Stack overflow post or something-

Drew Conway: 00:17:49

Absolutely.

Jon Krohn: 00:17:52

It feels personal when you can put a face to a big paper or a book. It really makes that content come to life and you feel like you’re a part of it. In the same way that you described getting involved with NYHackR all those years ago to support yourself learning of R, a lot of my career as a speaker in data science came from me standing up at the Open Statistical Programming Meetup and saying, “Hey, I would like to know deep learning really well. I’d like to go through textbooks but I don’t want to do it on my own. So would anybody like to join me?” And that gave me my first people, first list of people to see this group for my deep learning study group and that’s how I then led to having a deep learning book and that in a way led to me having this podcast. And so I think there’s amazing things that can happen in community and whether you’re in New York or not, you can do that anywhere.

Drew Conway: 00:18:47

Yeah. Well, I completely agree with that. I can cite directly, real life and career changing moments for me that really started relationships that I made either at the R meetup or hanging out within that community. Whether it was my co-author John Myles White for Machine Learning for Hackers who I met through the R meetup. Or my co-founder in my startup, Chris Bethel, who saw me speak at an R meetup and built a relationship with me through there and we went off and started a company together years later. Your approach, Jon, I agree with completely. I often tell [inaudible 00:19:27] folks earlier in their career, they say, “Well, how can I start to build these relationships, and maybe I’m more introverted and I’m not so good in crowds and trying to meet people.” The biggest hack I can give you is if you give a talk, people come to you. Then you don’t have to worry about putting in the energy to go meet people. If you stand in front of a group of people and give a talk on something you know well or passionate about, they’ll come right up to you and then you’ve short-circuited that issue. Also, it’s an old cliche but to really know something, you have to teach it. And so to actually put in the work to put-

Jon Krohn: 00:20:04

100%.

Drew Conway: 00:20:04

… even a simple meetup talk together, it really refines your view of that. And like you said, that can lead to actually actively educating people by writing a book or whatever it may be.

Jon Krohn: 00:20:17

Exactly. You mentioned your book there, Machine Learning for Hackers. I’d love to talk about that. That book came out in 2012. I mentioned it briefly, I think at the onset of this program because that is how I first became familiar with you. I’d known about you for many years. I’ve actually still never met you in person. So that’ll happen post-pandemic but this is getting to know each other here on air. That book, Machine Learning for Hackers, was hugely useful for me back 10 years ago because it allowed me to transition from an academic data science career, so we finished our PhDs around the same time, 2013. And when you do a PhD, most people, some people are smarter about this today and they do deliberately learn a wide set of data science skills. But a lot of people, myself included, you develop quite a niche specialization. Your book was perfect for me to get an overview of what I could be doing with a whole bunch of different machine learning approaches. It takes a hands-on approach in R and it covers from data distributions to get you started so you can understand a bit about probability, to machine learning algorithms like classification, regression approaches, principal component analysis, k-nearest neighbors, graph analysis. Each chapter uses a different hands-on use case to introduce, I think in each chapter’s case, a completely new topic which could be useful in whatever data science fields you go into.

Drew Conway: 00:22:05

The origin of that book is actually inextricably tied to the R community and the data science community in New York City. As I mentioned, John and I met as both being participants in this community and John gave many talks at the community and he and I grew to be very close. He was a grad student at Princeton and I was at NYU, so we saw a lot of each other. He was in the psychology department, I was in the politics department so we had a social science orient towards our research and there was a lot of overlap there. Anyway, as John and I both together in separately were interacting with this community, one of the things that we noticed as a trend is as data science as this emergent unicorn, like skillset was starting to come out over that time period. We found a lot of people that we knew who were software engineers by training who were working at companies in New York City. In otherwise, we’re starting to get asked to build classification models or do forecasting models or use graph structures from a social media company to make some inference.

Drew Conway: 00:23:16

We knew that it would be incredibly challenging for someone in their day-to-day job to simply implement that without having to go back and read a deep technical textbook on how all the math works. What we thought about is would it be possible to write a case study based textbook where every chapter was a different algorithm. And we demystified how that algorithm worked to, to your point, really give people a pretty wide breadth of the tools that are out there. If you were either that engineer or sitting at a startup, there’s three of you there and suddenly the CEO is saying, I need a classification algorithm. Here’s the data, go. You can at least get started and start to contextualize that for yourself. Or if you were a student, undergraduate, graduate student who wanted to be able to apply this to their own research. But again, weren’t coming from a computer science or machine learning background, you were coming from different department, you could look at that book and start to get yourself started. While the book is, like you say, over 10 years old today, so most of what’s in there it’s very dated and you wouldn’t write those algorithms today the way that they’re written there, I think the concept of what we tried to do there, I think continues to have value today. Because there’s just as many people today who are in those positions as there were 10 years ago.

Jon Krohn: 00:24:42

You don’t need to even be sitting at a computer going through the exercises to make the most of them. I actually read the book on a flight to Greece and back in a summer. You can go through the code examples. If you’re familiar with object-oriented programming languages particularly are, it’s quite easy to follow along with what’s happening and you can say, “Okay, yeah that makes sense.” And so, that hands-on approach, like you’re saying, it gives you I think a really clear way of understanding how classification say it works. I just noticed, I can’t believe I didn’t notice this before, is that the Open Statistical Programming Meetup back then was called NYHackR and this book was called or is called Machine Learning for Hackers. You don’t mean like cybersecurity hackers, you just mean, I guess you’re self-defining as somebody who’s hacking away at code or what’s up with that?

Drew Conway: 00:25:37

Right. It’s a good thing you bring this up because it’s actually ended up being an issue with O’Reilly and the editors at the point of printing and what the title would be. Because of course, most people when they see hacker, they think black hack hacker, coding away, breaking into systems. What John and I wanted to do was to try to reclaim some of the, what we thought of as more of the original meaning of hacker when it came to software and technology. That is really peeling the top layer off of a tool and getting into something and understanding how it works. Hacking around with data, hacking around with a tool like R, not to say that you would be a expert in it and nothing that we were doing in that book or what we were trying to say was about bringing someone from intro to expert level, but make knowledgeable enough with the tool that you could start to hack around on stuff. And hopefully from there, learn on your own and grow or use other more deeper, more sophisticated references to understand something at a deeper level. We tried to reclaim that hacker. Although I remember I haven’t checked them in a long time, thankfully I’m sure that they get worse over time, but some of the early Amazon comments or feedback that we got was, “I read this book thinking it was going to be about hacking systems and these nothing about hacking in it all.”

Jon Krohn: 00:26:57

That’s funny. What can you do? I ran into issues with reviews on Amazon, a few people with… So my book is called Deep Learning Illustrated. And so we use a lot of visualizations, there are over 200 illustrations that we had drawn by an artist. There are tons of visual graphics. And there’s also this idea that just like in your book, you use hands-on code demos to illustrate how things work. But I have several Amazon reviews that basically are complaining that this isn’t like a graphic novel, that it isn’t a comic book. They’re like this isn’t illustrated at all. So I can’t possibly give it a bigger, higher than a three-star review, it’s not an illustrated book. And I’m like, “What are you talking about? There’s hundreds of illustrations.” So I understand.

Drew Conway: 00:27:46

I feel like that’s a bit of a badge of honor. You can’t put effort into something and write a book without having someone for a review like that over the wall. So if anything, well done.

Jon Krohn: 00:27:57

Alright, thanks. Great. Great. Take that reviewer. All right. All right, so we’ve talked about your relationship to this R Conference that we’re doing this live SuperDataScience episode at. We have talked about your book and you have done tons of amazing things over the course of your career from your PhD to today. And we might get to that depending on how many audience questions we have from the live audience here. But what I really want to dig into is what you’re doing today. You are at Two Sigma, which is one of the world’s largest hedge funds. As I mentioned earlier on, you can check out episode number 437 with Claudia Perlich, who also is at Two Sigma. And she talks about her role in the same Strategic Data Science position that you’re in Drew. But she’s involved more with the… Actually, you probably describe it better than me how to distinguish. I know that you’re doing private investing and hers is not private. Hers is kind of a lab, I guess for coming up with a trading strategies for Two Sigmas.

Drew Conway: 00:29:05

Two Sigmas, as you mentioned, has an enormous platform for doing data science, lots and lots and lots of data. It’s a great place to come if you love working with data. And the vast majority of that data is really focused on understanding how small micro behaviors in the economy, consumer behaviors in the economy may be reflected in the public markets. And the theory that we operate on the private market side is that same data can help inform private market strategies. Whether that’s investing in businesses or buying buildings, you can learn a lot about where the overall economy is going both at a macro scale, so generally where the U.S. economy is going or other globally. And then zeroing in on more micro analysis, whether that’s understanding the activity in a specific region in the U.S. so how can we use this data to understand the dynamics in one city in the U.S. versus another, but then even getting more micro in starting to provide analysis on a specific business or on a specific property.

Drew Conway: 00:30:15

What becomes really challenging, I’d be happy to talk about this more is how that actually works in practice. Because ultimately the biggest difference between, this is not explicitly true but since we’re talking about quality, the biggest difference between what her work ultimately is delivered to and what my work is delivered to, and what my team does is we work with discretionary traders, like real live human beings who are going to think about an investment that has to be made in a specific asset, again, a business or a building, or what have you.

Drew Conway: 00:30:48

On the hedge fund side, what they want to build is a model, a package, a piece of software that contains within it, the underlying strategy. And then that model can go into an execution system and that’s traded automatically. That world doesn’t exist for us. And so the challenge for us is we think about our product as really being fully formed software products. Things that human beings will interact with. And that doesn’t mean that it has to be like a fully realized web application with interactivity and things like that, but understanding what the right entry points are and how to actually deliver that information to an investor who already has a free-standing process. They know, and have been doing it for, in some cases, 20, 30 years of successful private market investing. We need to fit our work into that in such a way that it minimizes the friction of that but hopefully also enhances the decision-making process. And that is a really exciting challenge. I think one that not a lot of other financial institutions are contemplating and it’s what gets me excited about the work that we get to do.

Jon Krohn: 00:31:56

Let’s definitely dig into that. Another thing that occurred to me that might be different between having models that are making trades in public markets and stock exchanges versus in private markets, is I bet the speed can be quite different.

Drew Conway: 00:32:11

Yeah, of course. In the core hedge fund business, you’re talking about trades being executed, many times per second in some cases but certainly multiple times per day. In private markets, if you get a deal done a month, that’s a pretty good pace. The other side of it which is even more important is, when you’re thinking about the public markets business, you may be in and out of a particular position over the course of few days or a month. In private markets, you make an investment and you’re in that investment for a minimum of five years but it could be five to 10 or 15 years, depending on what the strategy is or what the nature of the asset is. And that’s particularly interesting in an asset class like real estate, which is notoriously illiquid relate, so it doesn’t have a lot of… An asset will not trade very often. And is also famously opaque. And operates in a world where much of the information in the industry is contained in what I might describe as private network.

Drew Conway: 00:33:14

You and I having a conversation, you say, “Hey, drew, I know that there’s this building coming up for sale in the next few months. Wanting to get this information to you first.” The other interesting thing about real estate is there are really weird incentives about information. Particularly about what information you can believe in, what information you can’t. Because all the actors in that space may have different incentives about what information they do reveal and how they reveal it. By adding a more systematic strategy to that, the challenge is, are you actually by adding more information in data, increasing fidelity or you actually just increasing noise because now you have this broader set of data and does that actually create more value to your strategy? Then last, I think clearly from the work that we’ve done, there’s a ton of value that can be added there but that same challenge remains which is at the end of the day, there’s a lot of things that may be interesting.

Drew Conway: 00:34:09

But I think about, and I know you’ve experienced this, as a graduate student, you have this huge wide universe of stuff that you could research, because there are many things that could be interesting. But ultimately is it in the academic sense, going to move our scientific community forward, create new knowledge for a particular discipline. And sometimes we have a simpler problem in real estate but the stakes are certainly higher, which is, is this actually going to make the investment team better? Is this actually going to lead to a better investment? And you don’t know the answer to that question for a long time.

Jon Krohn: 00:34:44

Right. 15 years.

Drew Conway: 00:34:46

Yeah, could be.

Jon Krohn: 00:34:47

Wow. Well, that’s cool. So tell us a bit about the team structure. We know that there are these hackers, these data scientists who are working with data set, maybe new kinds of data sets, experimenting with models. And then you also have these humans that are doing the private investing. How do you structure your data science team? How does that interact with these human traders?

Drew Conway: 00:35:22

It’s a great question and quite frankly, something that we have been evolving over time. But one thing that we’re very explicit about early on, and if anything, have become more rigid about as the business has grown and the teams have grown, is there really should not be a separation between data science team and the investment team. That’s true regardless of seniority. Me running the team and the CIO running the business or a data scientist working for me and a principle or an associate on the investment team working together. There needs to be very, very tight alignment and understanding of each other’s work. One of the things that we found to be really, really useful in this case is to actually create a buddy system. To actually have a direct one-to-one mapping between someone on my team, a data scientist, engineer, and someone on the investment team so that they can work on problems together.

Drew Conway: 00:36:25

On my team, we’ve implemented, I would call it a semi agile type of process. We bound our workflow to two weeks. We talk about task prioritization and what we want to do. And we can put all those tickets and get and we can track it as we would as software engineers. But the members of our investment team are right there with us and folks on the team are meeting regularly. Some of this is really just valuable from a, are we speaking the same language perspective. Because there can often be just that crosstalk of, I call it one thing, you call it another. It took our investment team a while just to get their heads wrapped around terms like, what is a feature? What do we mean by a response? Stuff that you as a data scientist or someone who’s statistically trained, you’re throwing these terms around and you’re not thinking twice about it. Just almost do that osmosis processes have been really valuable.

Drew Conway: 00:37:20

But the really important stuff comes when we have these heads up conversations about, how is this really going to get used in your decision making process? Again, it comes back to that interesting versus useful question. We could create a model that’s going to forecast some value for you and we may do really well in our back testing, our error rates are really low, blah, blah, blah. And you look at that as investor and you say, interesting. But that’s not a value that goes into my underwriting. So why do I care about that? How do I build… Or in the other case, this is something that goes into my underwriting, but how do I build confidence that it’s actually useful if they aren’t participating in the process of future engineering and model testing and really understanding that? Really a lot of credit to the folks on our investment team who we’ve been really dedicated to spending the time to do it.

Drew Conway: 00:38:16

I think for many of them, it has been quite eye opening to see just what it takes to do a project like this. I think as I’ve seen many times in my career, when folks from various industries start to interact or think about using data science, machine learning, whatever you like, in their business, they view it as a button that you can press and results fall out of. But obviously we all, folks in the audience today and yo, and I certainly know that is almost laughably untrue. Some of my favorite meetings at work or when we can go through with an investor, not the results of an analysis but where we are in our panelization process, or what are we going to use as our normalization technique for this particular metric and what are the consequences of choosing one versus another? And having folks really understand the consequences of not making those choices and the benefit of taking the time to deeply understand the data generating process of a data set you’re looking at and how that can influence your interpretation on the other end.

Jon Krohn: 00:39:28

I hear you Drew. It is amazing working with people who were on the front lines of business in various kinds of applications over the years. It is basically a law that they will prior to getting to know a data scientist well and having to learn what we go through every day, you do have this sense that, well, I know that these data exist. These data must exist. We must know about that behavior on the internet or whatever. Let’s use that to have this model that pumps out money and you’re like, what are you talking about? There’s not even any place to start with what you’re describing. Cool. I understand that the buddy system, one-on-one buddy system sounds like a great way around training people, having them understand the features are the inputs to models and all these other kinds of terms that we use all the time as data scientists.

Drew Conway: 00:40:25

And by the way, it goes both ways. It’s deeply valuable for my team-

Jon Krohn: 00:40:27

Yeah, for sure.

Drew Conway: 00:40:28

… as well. Right?

Jon Krohn: 00:40:29

No doubt.

Drew Conway: 00:40:30

None of us come to the table with years of private equity or real estate experience. What makes a good deal or what makes a market worth pursuing for a particular property type, great real estate investors and people will have experienced this, whether it’s renting an apartment in New York City or buying a home or whatever. There’s so many intangibles or in measureables when it comes to real estate. A lot of great real estate investors will sometimes know it when they see it. Starting to unpack that and actually create some structure around that and say, well, of course, there may be many, many things that we could never measure and never attempt to structure with data or in software, but there are many things that we can, and it’s just seeing a process unfold in real life. There’s no better way to get in and start to understand where the pain points are and what you can do to try to either improve it or automate parts of that.

Jon Krohn: 00:41:33

It definitely goes both ways. If in your hires that are working on your team, you’re not necessarily looking for people that have say, real estate experience, what do you look for in the data scientist that you hire?

Drew Conway: 00:41:49

Well, I am much more focused on how an individual thinks about a data problem and how they start their process, versus going deep on a technical screen or wanting to quiz someone on what distribution functional forms they can write on the whiteboard and things like that. There’s certainly a place, and I’ll be the first to admit, it is part of our interview process, for seeing someone work with data, actually sitting them down at a command line or at an ID, and actually seeing them work with data and making sure, okay, you got the basics. This is a terrible way to see what you know but it’s the only way I can really do it, so let’s do that. But I really get much more value in sitting down with a candidate and asking them think through a problem with me. Because the reality of our work, particularly in the private investing space, is we’re given pretty open-ended or ambiguous problems to think about. Something as simple as, what are the drivers of rent growth in a market? But we may have lots of intuition about that.

Drew Conway: 00:43:02

But then when you get to the actual question of, how are you going to measure that? What do you think are the pathways to understanding why this data’s getting collected? How do you want to structure it? When I have an interview with someone particularly junior or mid-career folks, I will often say, “Hey, I want to ask you to think about how you would measure this thing.” Sometimes I’ll pick a pretty arbitrary thing like, and I’ll have to change it now because I’m going to put it on the podcast but oftentimes I’ll ask folks like, let’s imagine you wanted to measure the supply of coffee in New York City. How would you do that? How would you think about going about that?

Drew Conway: 00:43:47

Let’s assume you work at a place like Two Sigma and it’s very easy to get access to any kind of data. How would you think about that? Let’s talk through that problem. And candidates will have all different kinds of ideas. Obviously there may be some overlap in those ideas. And then once we get through that problem, we can start to think about, okay, I like where you’re going. Now let’s actually model this. Now let’s move from measurement to modeling. Rather than just counting the supply of coffee, how much you try to forecast it and what would you need to do and what would be the implications of what you’re doing. You get through an hour of a conversation like that and you can really get to know what kind of perspective people bring to a problem like that.

Jon Krohn: 00:44:30

Totally. I interview exactly the same way. Actually, I don’t have a problem with saying that that’s the question I ask, because it doesn’t matter if that that’s the question in advance. Because as soon as the person starts answering the question, you’re like, well, that’s interesting. I hadn’t thought of it that way. I’m the person saying this. I’m saying too, I can learn things from this interview processes. Where I’m like, wow, that’s an interesting way to think about modeling this problem. I don’t know how to build that kind of model. Tell me about it. It ends up, so every single interview that I’ve ever had is completely different, even though we often start from the same [crosstalk 00:45:10].

Drew Conway: 00:45:10

No, that’s exactly right. What’s interesting too is, and I’ve spoken to peers, hiring managers, and I often see this anti-pattern where hiring managers will get excited and have a more positive view of candidates if they’re going down the path that that interviewer would’ve gone down themselves and said, this person thinks like I do and so that’s good. And we’re thinking about the same way. I love it when it’s completely orthogonal to how I would’ve thought about problem. They say something like, whoa, would’ve never thought about. Could we classify the size of the box trucks on the Verrazzano Bridge coming in and identify coffee cups? Or just stuff that might actually be really difficult to implement in practice, but it’s like, I really appreciate the creativity of that. Now let’s dig into that and actually think about the practical implications of it. But those are always the most fun.

Jon Krohn: 00:46:03

Nice. I’ll be looking out for your cameras on the Verrazzano Bridge.

Drew Conway: 00:46:08

They’re there. Trust me.

Jon Krohn: 00:46:14

That’s great. I think in terms of having the core of this episode of me having asked you questions, I have more questions here in case we don’t have audience questions. But Jared, let us know, do we have questions from the live audience here and can you let us know what they are? Oh man, you must have had that one copied and pasted and ready to go, because that just popped up really quickly. Wow. Okay. That is really specific. Drew, do a lot about Venn diagrams? We have this-

Drew Conway: 00:46:52

I know about the data science Venn diagram, which I suspect is the… I mean you know-

Jon Krohn: 00:46:57

Okay, okay, that must be what it is because it says, can you tell us about the Genesis of the Venn diagram?

Drew Conway: 00:47:01

And I do appreciate the question because it presumes that the data science Venn diagram has transcended the need to be specified as being about data science as just be Venn diagram. So the Genesis of the Venn diagram’s, it’s a pretty easy one really and it’s, how it came to be may somewhat overwhelm folks given the longevity of the figure itself. But similar time period as to what we were talking about in my origins, in the R community, I was a grad student, around the same time as John and I were writing Machine Learning for Hackers, and everybody on the internet was talking about data science. It was this crazy new field, Harvard Business Review had it as the sexiest job of the 21st century. And there seemed to be a lot of hotly debated Reddit forums and Twitter. I guess Twitter threads didn’t exist at that point, but Twitter stuff, about what it was, what was data science?

Drew Conway: 00:48:08

Literally the story was, and you’ll appreciate this as a graduate student, I was sitting in a lecture for an intro to compare to politics, which I was a TA for. So I had to sit in the back of the class while the freshman and sophomores at NYU were learning from the professor. I just started jotting down some ideas about what that would look like. The point that I was really trying to hone in and as I was building it is data science as discipline, not scientist as a job title, is fundamentally an interdisciplinary practice. And even 10 years ago, thinking about it as being a new… A lot of people are saying, well, it’s just statistics with computers or it’s just a different way of talking about machine learning. I didn’t agree with that. It just didn’t satisfy what I was trying to think about.

Drew Conway: 00:49:04

And it partly was my, I think bias as a social scientist thinking, well, my training is to think about how are human beings making choices? Why are collections of people deciding to do one thing versus another? And within that context, why do I think data science is a full practice for exploring those questions? What became clear to me is naturally connecting these three disparate practices that I find myself doing a lot, which was obviously a lot of coding, and even in the Venn diagram, I say hacking skills. And again, going back to that Machine Learning for Hackers, it did mean that you were a professional software engineer. Because that’s not what I was. But someone who could sit down with data and tools inside of software and start to manipulate that data and understand it and start to learn about it and go, whether it’s at the command line or with R, do that work.

Drew Conway: 00:49:58

But then of course there’s also the fundamental training that you need from math and statistics to know what tools to apply to that data. What are the things that you actually want to do with that data and how do you want to learn from it. And you do really need to know how those models are constructed, what the limitations are, what the biases are and those things. And so you do need some training. You can’t just be hacking at the command line expect to be able to create work. Then the final piece, which again I think came for me, at least thinking about my training as a social scientist is, why are you even asking that question? What is important about the thing that you’re trying to solve and where do those questions come from? As I was interacting with professors or experts in their various field, it was clear to me that the best way to do that was to have deep understanding of that subject matter, and really knowing why a particular problem area is important.

Drew Conway: 00:50:55

Those three bubbles came together, and when you have a Venn diagram, of course you have the three-way intersection and for me that was data science, but then you have effectively the off intersection. When you think about combining something like substantive expertise with hacking skills, that became I think one of the more popular areas because that’s what I called the danger zone. That’s where you may know a lot about a subject and you may know enough with the tools to produce some result, but you just don’t know why the answer was there.

Drew Conway: 00:51:27

That was in some ways a big motivation for me in doing it because that was something that made me nervous at the time. That you could see the idea of data science or the role of data scientist, really starting to take off. But not a lot of people who I saw really had much understanding of the underlying statistical properties of what they were using. And that felt to me pretty dangerous. Then on the other side, if you just have statistical training or math training and subject matter expertise, to me that was like all of my peers or colleagues who were graduate students learning their trade, but not necessarily interested in using computational tools to do it. That’s more traditional research in my mind. You have a hypothesis that’s driven by the subject that you understand, you know how to apply linear regression to that subject and you go off and do that, but you’re not doing data science.

Jon Krohn: 00:52:22

You know how to use a P-value table on the [crosstalk 00:52:26] stats textbook.

Drew Conway: 00:52:26

You can look up your T-test and see where you are on the grid. Trust me, there’s nothing wrong with that at all. It just wasn’t data science. Put these ideas down and what’s made me happy about it although, I think now there’s probably more pixels and ink have been spilled about why the data science Venn diagram is wrong than why it’s actually useful. It does seem to, at least for many people who are new to data science, provide an initial clarity as to what we’re trying to do.

Jon Krohn: 00:52:57

Cool. It sounds like a badge of honor there as well.

Drew Conway: 00:53:00

Yeah. For sure there’ll be a data science Venn diagram right on the top of my tombstone, I’m sure.

Jon Krohn: 00:53:07

Just to summarize, the three circles that are going into this Venn diagram are hacking, math and a subject of expertise?

Drew Conway: 00:53:16

Exactly. Yeah.

Jon Krohn: 00:53:18

Nice. All right. Jared, have you got another question for us? It’s right there. He’s so ready. You just have to press enter, I think. This question, Drew, is what are you learning and excited about today?

Drew Conway: 00:53:33

Well, that’s a good question. The truth is most of what I’m learning today, apropos of our previous conversation is like, how do you create analysis and technologies that can effectively change behavior for businesses or business people that may not be necessarily inclined to do that and certainly don’t have experience with the tools. But that’s, I think not really where the questioner was thinking. On the technology and method side, the two big areas that I focus on a lot today are more sophisticated ways of structuring time series. Whether that’s using more contemporary, deep learning tools, things like LTMs to try to forecast future results and using those tools in such a way that the results are interpretable to people like my colleagues who are investors who want to understand why things are working and so unpacking some of that interpretability within these more complex models is something that I think about a lot.

Drew Conway: 00:54:42

But also using time series data to do things that are perhaps not natural or what you might think about purely in a forecasting context. And so you even give an example, more recently you’re thinking about like dynamic time work models, where you can use that to actually classify different data generating processes. Why that becomes really useful for us is while the underlying investment strategy may be constant, how you can observe that in different markets or different asset classes through looking at a time series can be really informative. And using the differences that can get picked up by an algorithm like that are really, really interesting. The other piece that I think a lot about and is a persistent problem in all data everywhere is essentially named entity resolution. How to do this more effectively, how to think about this in a more efficient manner, how do we resolve all these different observations that may happen across different units? You can imagine, for example in my case, there may be lots and lots of different kinds of data that resolves to a particular property.

Drew Conway: 00:55:56

Because it could be the owner of that building, it could be the tenants in that building, it could be the rent history of that building. It could also be retail transactions that are happening in that building. It could be mobility data that’s moving through that building. And all of those happen on an x, y plus data dimension. But then we also have to add time to it and understanding how to resolve observations that are logically true to a place. Through time, a business may change its name. A owner will change over time but it’s the same building and how do you have a consistent point in time analysis of that? Those are areas that we spend a lot of time thinking about and are complex, dirty, ETL type of problems. And they have a machine learning data science component to them. But that’s where the real work gets done. And so that’s what I’m excited about for my team and some of the problems that we’re working on.

Jon Krohn: 00:56:55

That problem that you just described about name entity recognition and being able to track ownership of properties and that kind of thing, it sounds very similar to the problems that Maureen Teyssier was describing at Reonomy, a real estate data company where they use graph databases to track all those kinds of relationships between entities. And that’s episode number 479, if listeners want to check out that and hear lots more about those kinds of approaches. I also wanted to, just before you got talking about named entity recognition, you were talking about time warping. Which no wonder Two Sigma has been so successful. That’s just unfair if you can see into the future [crosstalk 00:57:41]-

Drew Conway: 00:57:40

I love the algorithm because of the name, because it just has a very rocky horror picture show vibe to it, so you feel like you can just throw toilet paper at the laptop when it blows up in your face.

Jon Krohn: 00:57:55

All right, very good. All right Jared, do we have any other questions there? Oh, we do. All right, so Max Kuhn, who was a speaker at the conference, he talked about COVID ruining his data. And presume not because they got infected but because things changed so much through the pandemic. What impact, Drew, has COVID had on your models? Or does your long term [crosstalk 00:58:25] focus protect you from that a bit?

Drew Conway: 00:58:27

I think, listen, it’s had a huge impact. As I’m sure was the case for Max as well. Basically, any holdout set that exists between 2020 and today is completely garbage because it’s a totally different data generating process and that is certainly true of almost every observation that we would have in all of the work that we do. As a simple example, if I’m training a forecasting model that’s attempting to predict the value of some asset that I care about into the future and I train that data the last 10 years and then used 2019 to 2021 as my test set, my model’s going to look terrible, because those results will be very, very bad. We’ve had to think a lot about, how do you use whatever information may be contained in the observations of data that happened over the last two years? But of course, how do you also build effective models that you think are going to be relevant as things eventually start to normalize? And so yes, it has absolutely had a tremendous impact on obviously how we build and interpret our models, but also how we think about what data we can actually use and what the underlying biases may be in that.

Drew Conway: 00:59:52

And the second part of the question, does the long term focus protect us from that? I think the answer to that is yes and no. Because if you look historically, if you treat COVID as a acute version of a recession in the economy, then you might think, yes, my long term focus will protect me from that because ultimately things will normalize and we would expect positive economic growth generally. But I don’t really have any strong reason to believe that that’s true. Because it’s not that. It’s something completely different and it’s not something that we’ve had much, we certainly have any experience in the last 100 years, and our data says don’t go that far back. So we don’t really have the ability to test on any previous examples.

Jon Krohn: 01:00:39

How were trucks over the Verrazzano Bridge affected by the Spanish flu? And what happened there?

Drew Conway: 01:00:48

We didn’t have the license plate readers on the horse and buggy.

Jon Krohn: 01:00:52

Oh man. Really great answer. And we do have time for one more. Jared is asking-

Drew Conway: 01:00:59

Yes.

Jon Krohn: 01:01:00

… if we can do that and here it is. All right, so it is, given the current popularity of data science, what do you surmise is the future of data science and the data science community?

Drew Conway: 01:01:14

Good question. I tend to think of data sciences following in a similar pattern to a lot of technical disciplines that have evolved over the course of the last 20 years. My favorite example of this is like, how many people do you meet today that have the title webmaster? A single person who’s responsible for the production and maintenance of a website. That just doesn’t exist anymore. We have a notion of a full stack engineer, but even that is a somewhat specialized role within an engineering team. I think data science as a discipline, roughly today splits between folks who are working either in full on or semi-academic settings where they’re actually doing more research and development, people who are actually building algorithms that are going to improve our ability to classify or capture or forecast something. Then you mostly have a bunch of other data scientists. Mostly I mean truly more people doing this than the former who are thinking about how to apply those tools to a business or to a problem that they’re working on. I think within that set of folks, there will continue to be a splintering and fanning out of specialization within the kinds of roles that ultimately are important for businesses to get right.

Drew Conway: 01:02:41

And a lot of people, I think recently talked about data engineering or ML Ops and all these different disciplines that I think you can break out from data science more generally. And that makes a ton of sense to me, because these systems are much more complicated. The volume of data is much, much higher and the consequences of getting things wrong for a business are much, much higher and so allowing people to focus on one or a smaller set of problems within that tool chain is ultimately I think a way for the practice to become much more efficient. But I do think the discipline itself, and I’ll go back to my Venn diagram, is still mostly made up of being able to ask the right questions, identify the right data and apply the right method. If you generally have a sense of how to do that and ultimately how to deploy that effectively within a business, then I think you’re doing good data science and the roles and specializations will fall out of that based on the needs of the business.

Jon Krohn: 01:03:48

Makes perfect sense to me, Drew. Great answer to that question. We ended up having lots of really brilliant live questions. Thank you to the audience for having these for us. We didn’t know whether there would be any. I had a bunch more prepared just in case but you ended up having questions much better than mine. Really appreciate that we got to have Drew answer those. Live at the R Conference for this first ever live episode of the SuperDataScience show. I think we’re pretty much ready to wrap up. Something Drew, that we always do at the end of these episodes is ask for a book recommendation. And I’m supposed to always, I usually tell the guest that before we start recording so that you have time to think about an answer. But just tossing it right on you here. Surprise.

Drew Conway: 01:04:38

Well that’s easy. My favorite book recommendation for data scientists, or really anyone who thinks about how to observe the world through math is a classic Gödel, Escher, Bach. Most people will have probably read it.

Jon Krohn: 01:04:53

Oh yeah.

Drew Conway: 01:04:55

But it’s one of my favorite books. It’s a book that you certainly don’t have to read it linearly. You can just flip to a chapter and open it up and I love that book. For fiction, I’m a little bit late to the game, but I’ve really, really been enjoying the Three-Body Problem series. It’s such a wonderful sci-fi. People like big complex sci-fi. I highly, highly recommend it.

Jon Krohn: 01:05:19

Nice. All right, that’s a great recommendation. Then Drew, you’ve had such wonderful insights for us today and we managed to do this entire episode without any cuts or retakes. It’s incredible. You’re an outstanding speaker, you have lots of great ideas. How can listeners track your work or follow your work? What’s the best way to follow you online?

Drew Conway: 01:05:39

Sure. I’m on the social media, just @drewconway on Twitter. I’m not quite as active as I used to be but I occasionally get a couple punches in there. I post on LinkedIn occasionally as well and anytime I get an opportunity to do something like this, I often post it on those formats. So that’s probably the best way to see what I’m up to.

Jon Krohn: 01:06:03

Nice. All right. Thank you very much, Drew. And thank you, Jared and Nicole, and everyone at the R Conference for being open to trying this new experimental idea of recording a podcast live at a conference. And we look forward to experimenting again in the future once these R Conferences are live again. It’ll be even more complex because we’ll have that hybrid format. I know everyone backstage is excited about that. Thank you so much, Drew.

Drew Conway: 01:06:31

Thank you.

Jon Krohn: 01:06:31

I’m blown away that that live filmed episode managed to be executed without a hitch. I’m so grateful that Drew was open to the idea of doing this. His clear spoken eloquence seemed to make it a piece of cake for him. I’m also thankful to Jared Lander, who let us try this out for the first time at his New York R Conference and who even meticulously screened audience questions for us in real time so that I could focus 100% on interviewing. In today’s episode, Drew filled us in on how the massive open statistical programming community in New York grew up around his desire to accelerate his ability to teach himself are. The inspiration and topics covered in his fabulous hands on our book, Machine Learning for Hackers, the one-to-one data scientist to investor buddy system to Sigma leverages to create success, how data models in public markets are executed on short, often sub second timeframes by machines, while models in private it’s like real estate and private equity are executed over years and by humans. And he talked about how he looks to see how data scientists solve problems before considering hiring them.

Jon Krohn: 01:07:52

You can get all the show notes, including the transcript for this episode, the fun live video recording, any materials mentioned on the show, the URLs for Drew’s LinkedIn and Twitter profiles, as well as my own social media profiles at www.superdatascience.com/511. That’s www.superdatascience.com/511. If you enjoyed this episode, I’d of course greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I always encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Super cool. We did it. With this experiment, a success, lookout for more live filmed episodes of SuperDataScience in the future including post pandemic episodes filmed in person in front of a live studio audience for an extra mention of energy and interactivity.

Jon Krohn: 01:08:44

Thanks to Ivana, Jaime, Mario and JP on the SuperDataScience team for managing and producing this special episode today. Keep on rocking it out there folks, and I’m looking forward to another round of the SuperSataScience podcast with you. Very soon.

Podcasts SDS 511: Data Science for Private Investing — LIVE with Drew Conway

SDS 511: Data Science for Private Investing — LIVE with Drew Conway

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

July 7, 2026

July 3, 2026

Podcasts SDS 511: Data Science for Private Investing — LIVE with Drew Conway

Share

SDS 511: Data Science for Private Investing — LIVE with Drew Conway

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

SDS 1008: The AI-Native Startup Playbook

July 7, 2026

SDS 1007: How to Find Solid Career Ground in the AI Era, with 80,000 Hours Founder Ben Todd

July 3, 2026

SDS 1006: In Case You Missed It in June 2026