Podcasts SDS 437: Data Science at a World-Leading Hedge Fund

74 minutes
Business, Data Science

SDS 437: Data Science at a World-Leading Hedge Fund

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Coveted international speaker, Dr. Perlich is a senior data scientist at one of the leading hedge funds. We looked at her daily work, software features, her tips for data science competitions, and her one simple trick to succeeding in any field.

About Claudia Perlich

Claudia Perlich comes to Two Sigma from Dstillery, where she served as a chief scientist (2010 to 2017). In this role, she was responsible for the reliable estimation of targeting models (predictive modeling/ranking models using NB/logistic regression and others), the performance evaluation of systems/models and campaigns both in vitro and vivo, as well as the supervision of a real-time scoring engine that applies the models to identify the target segments of browsers. Since 2011, Claudia has also worked as an adjunct professor teaching Data Mining in the M.B.A. program at the New York University Stern School of Business. As a research staff member in the Data Analytics Research Group at the IBM Watson Research Center (2004 to 2010), she led teams that completed successfully in KDD Data Mining Competitions, designed and executed wallet/opportunity estimation models for IBM Sales using quantile regression, and worked on blog and twitter analysis tools for marketing. Claudia holds a Ph.D. in Information Systems from the New York University Stern School of Business (2004), an M.S. in Computer Science from Technical University Darmstadt, Germany (1998), an M.S. in Computer Science from the University of Colorado (1996), and a B.S. in Computer Science from Technical University Darmstadt, Germany (1995).

Overview

Claudia boasts that she’s managed to get at least 3 hours back in her day thanks to the lack of commute and has been exploring nature more than before. We agreed, however, that the R&D has taken a hit, which can be difficult for new hires who are trying to gel with a team remotely. Claudia works day-to-day at Two Sigma, one of the most successful hedge funds in the world. Ironically, she tried to avoid finance for as long as possible but made the shift from her role at an advertising company after speaking with members of the company who sold the culture to her.

More than half of the people working at Two Sigma work in data science, despite their titles. Strategic Data Science is a broadly mandated team that works at Two Sigma that functions as an R&D incubator within the company’s primary focus. They explore higher hanging fruits rather than direct pathways. Claudia’s work involves looking at hypothesis and fundamental economics to look at causes and predicting effects. This is called Alpha ideas, pre-modeling. Based on that, you come up with ways to characterize entities for the initial signals. They establish goal portfolios during the optimization stage before the last piece: execution. Something at the scale of Two Sigma means all eyes are on their behaviors and their choices have a market impact, so execution and the many ways shares can be bought and sold, is incredibly important. The variety of applicability of data science in these processes is huge. As far as day-to-day goes, Claudia does a regular standup meeting two a week and works with team members on their projects and roadblocks.

Claudia differentiates the needs of her team and the needs of Two Sigma at large when it comes to hiring. She focuses on bringing in people who can challenge the thought process and looking for diversity of thought. You, obviously, need coding skills and to be able to do data analysis, computing skills, and Claudia likes to see unusual professional backgrounds. Ultimately, it’s about finding skills that complement the team that already exists. As far as where the market is going, Claudia has noticed that many of her upcoming students are looking at the technology on a conceptional level. Real data intuition is something she has noticed has recently been lacking — the classic scientist behavior of thinking of the problem before the answer.

We shifted into Claudia’s competitive experience in data science. In 2007, Claudia got involved through IBM in competitions where she won 3 years in a row—the Netflix prize, a medical space question on breast cancer detection, and a telecommunication issue. She did not win by having the fanciest algorithm, what she found was a problem in the data set each time which she could utilize to get at her answer. After winning for three years, Claudia decided to get involved in running competitions and hackathons. How do you win competitions? You need a ton of time to get quality participation. You also need to be good at cleaning up data sets.

We ended by talking about Claudia’s decision to work in finance. The secret to her success is finding good bosses to work for. She realized she liked solving other peoples’ problems rather than focusing on her own. So having someone to work for whom she respects and relies on for judgment calls was incredibly important. Picking who you work for, rather than what you work on, is incredibly important. I can relate, as my Ph.D. topic didn’t interest me that much but my advisor did.

In this episode you will learn:

Life and work during the pandemic [2:23]
Claudia’s history with horses and riding [8:28]
Claudia’s work at Two Sigma [12:00]
Claudia’s role on a daily basis [20:51]
Tools of the trade [30:27]
What Claudia looks for when hiring [36:37]
What skills do future hires need? [40:32]
Claudia’s history with data science competitions [48:22]
Why work in finance and at Two Sigma? [1:00:19]

Items mentioned in this podcast:

Follow Claudia:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 437, with Dr. Claudia Perlich, Senior Data Scientist at Two Sigma.

Jon Krohn: 00:00:12

Welcome to the SuperDataScience podcast. My name is Jon Krohn, the chief data scientist and best-selling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Jon Krohn: 00:00:42

Welcome to this episode of the SuperDataScience podcast. I’m your host, Dr. Jon Krohn, and it is my great pleasure to be joined today by the incredible Claudia Perlich, a coveted international speaker and many time data science competition world champion. Dr. Perlich is a senior data scientist at Two Sigma, one of the planet’s largest hedge funds, a feat achieved in no small part thanks to the fund’s focus on quantitative machine learning-driven trading strategies.

Jon Krohn: 00:01:12

In this episode, Claudia fills us in on how investment managers, including hedge funds, leverage statistical models and machine learning, what the work-day of a data scientist is like at one of the world’s most successful hedge funds, as well as what software tools they use. The one major feature she’s looking for in the data scientists she hires, her tips for winning data science competitions like Kaggle, and her one simple trick to have an extraordinary career in any field.

Jon Krohn: 00:01:44

This episode is particularly well suited to data professionals who are or might like to apply their skills in the financial industry. That said, Claudia has thoughtful tips for data scientists at any stage of their career, regardless of industry. There’s also a fair bit of content that would be of interest to managers or business people who aren’t hands on with code but are keen to learn about how valuable data science is in financial applications. This is such a great episode. Let’s get right to it.

Jon Krohn: 00:02:21

All right, Claudia, welcome to the program. How are you doing? How is COVID been for you? How’s the lockdown been in 2020?

Claudia Perlich: 00:02:29

Oh, first of all, thank you so much for having me, it’s a great opportunity to talk to you. On the COVID, I think that’s actually the thing, you don’t accidentally run into friends that you know in Bryant Park and have a drink with.

Jon Krohn: 00:02:45

That’s sounds like a familiar story.

Claudia Perlich: 00:02:49

I have found that my kind of loose social life got a little bit more distant in the process. From a work perspective, I’m saving currently three hours a day that I’m no longer sitting in trains commuting. I have felt that if at all it might have improved my productivity and even given me a chance to get into more technical work that I never really had the time in between meetings and the usual bustle of the office. So at the end, the fact that I’m living out in Westchester, very near to where I initially worked, here near Watson IBM, it’s a beautiful area, you can go out there, you can go for hikes. It’s all just in front of the door. And so I have not found the lockdown to be personally limiting with regarding to all of the things I love to do in my spare time.

Claudia Perlich: 00:03:45

On the contrary, as I said, I have more time to do these things. I have to be grateful for the fact that everybody stayed healthy in my group of friends and family. And I’m very well aware that I’ve been very lucky about this. I think it will have some lasting changes on my life, even if we go back to business as usual, as it may have been. I have found more new interests and hobbies, so I’ve been well and I’ve been very grateful for that.

Jon Krohn: 00:04:17

Nice, that’s great. And I totally understand everything you’re saying. I’ve also had the experience of being able to avoid that commute time and be able to dig more deeply into some books and that kind of thing. Creating educational materials in my case. Definitely I would say in terms of execution, my productivity is higher and I think my teams as well, and studies seem to show that that’s happened under lockdown, for people that can work remotely.

Jon Krohn: 00:04:49

However, I found that the R&D has suffered a bit because we used to gather around a whiteboard with notebooks, no computers, so this is something I’m looking forward to having a bit of in 2021 now, being able to get back and get around a whiteboard with people.

Claudia Perlich: 00:05:08

I think it’s also really challenging for people who are new and joining these teams virtually. I think we can rely on our existing contacts. I know, oh, that’s a great thing that I should be asking Hannah, and she’s happy to just jump on whatever Slack, Zoom, you call it. But if you haven’t quite yet gotten the lay of the land and you don’t know who’s the right person to ask what, I think these things are much harder to establish from scratch, if you’re just joining. I feel for new hires that are struggling with finding their place in organizations with the current remote work scenario.

Jon Krohn: 00:05:46

Nice. We have alluded to where you live and work and what you do. You mentioned Westchester, so for people who aren’t aware, people who are listening from all over the world, Claudia lives near New York and works in New York City. I’ve known her for many years. Getting six, seven years now, actually, we’ve known each other. And mentioning, for example, running into each other in Bryant Park, which is a popular park in the middle of Manhattan, that has some nice outdoor bars around it that are beautiful in the summer. And yeah.

Jon Krohn: 00:06:27

So we met at a conference, a data science panel. No, it was a data in advertising panel that you and I were both on many years ago. And yeah. You mentioned the IBM TJ Watson Center that you used to work at. I went up and I spoke at that conference center. Thank you for inviting me up there. And it is a beautiful campus. But the best part of going up and speaking at the IBM TJ Watson center in upstate New York by Claudia’s place was that I got to see her with her gigantic horse, which, if I remember correctly, is named Monkey.

Claudia Perlich: 00:07:07

Well, to be formal, his name is officially Moon Country. And he is, I think, the grandson of Secretariat. That not withstanding, he’s terrible at running fast. [inaudible 00:07:20]. Yeah, I’ve always been absolutely fascinated by horses. It’s a childhood obsession that goes back to when I was 12.

Claudia Perlich: 00:07:31

And, of course, when you then study and you kind of get your life and your career going, I didn’t have much time. But once I had a job and a child, I said, “You know what, there has to be more to life. I need to go back to my roots.” So I picked up that hobby. And I found Monkey locally here. I retired him actually this spring. He kind of got to the time where it felt it would be just fair to him to walk around with green grass, hang out with his buddies and don’t worry about me trying to make him run somewhere.

Claudia Perlich: 00:08:02

But yeah. I’m still using hobbies like that as a way to just free my mind, because when you’re sitting on a half ton horse, you don’t really get to think. I mean, there’s just a presence there that makes you live in the moment. And I find that a great balance to the most cerebral activities that I spend the rest of my day on.

Jon Krohn: 00:08:25

Nice. So we’ll talk about your work and the amazing data science that you do momentarily. But first, I think it’s really important for everyone to know about the kind of writing that you’ve done since you were 12 years old. I wasn’t even aware that this existed. Tell us about your hobby.

Claudia Perlich: 00:08:42

Okay. You said I should keep it light on the background, but since you’re asking.

Jon Krohn: 00:08:46

It’s my fault. It’s my fault.

Claudia Perlich: 00:08:49

I grew up in East Germany. The thing about East Germany is that you couldn’t just take horseback riding lessons for money. That didn’t exist. The only way you could get anywhere near a horse is if you joined a sport that was considered and appropriate sport.

Claudia Perlich: 00:09:07

Now, for a 12 year old, the only option regarding sport with horses was something that Germans call [German 00:09:14]. In America it’s known as Volting. And it basically means the horse is running in a circle, and it doesn’t wear a saddle. It just has a girth with two handles. And then you do basically gymnastics on the running horse.

Claudia Perlich: 00:09:30

You have to actually come on the horse and then you’re standing on the horse, or you have a second person join you and then you’re sitting on the horse and the person is sitting on you. And that was the only way you could get anywhere near a horse. So I said, “Okay, that’s what it takes. Sure. I’ll go and stand on running horses.” So that was my beginning. I competed a little bit in that and it was a fun time. The one thing I have to say is great about it, you get really good at falling off.

Claudia Perlich: 00:09:57

You learn how to make an exit without hurting yourself too much, and that has come in handy quite often since I’ve joined the more proper riding side of the world.

Jon Krohn: 00:10:10

Nice. Do you do any kinds of tricks anymore? I guess maybe now that Monkey’s retired you don’t want to put him through that.

Claudia Perlich: 00:10:17

Well, the truth is you need a very well educated horse to do that. Most horses would not tolerate you running up to them and trying to jump on. I mean, that’s kind of [inaudible 00:10:26] right there. So I don’t recommend just trying to do that with any horse, so no, that [inaudible 00:10:33].

Claudia Perlich: 00:10:34

I have a friend who still runs a team in Germany. Sometimes I go there and I kind of brush up on some of these fun stuff. But yeah, nothing active at this point.

Jon Krohn: 00:10:43

Well, very cool.

Jon Krohn: 00:10:46

You may already have heard of DataScienceGO, which is the conference run in California by SuperDataScience. And you may also have heard of DataScienceGO Virtual, the online conference, we run several times per year in order to help the SuperDataScience community stay connected throughout the year from wherever you happen to be on this wacky giant rock called planet Earth. We’ve now started running these virtual events every single month. You can find them at datasciencego.com.connect. They’re absolutely free. You can sign up at any time. And then once a month, we run an event where you will get to hear from a speaker engage in a panel discussion, or an industry expert Q&A session.

Jon Krohn: 00:11:29

And critically, there are also speed networking sessions where you can meet like minded data scientists from around the globe. This is a great way to stay up-to-date with industry trends, hear the latest from amazing speakers, meet peers, exchange details, and stay in touch with the community. Once again, these events run monthly. You can sign up at datasciencego.com/connect. I’d love to connect with you there.

Jon Krohn: 00:11:58

So now I’ll finally let you tell us, so what do you do? Tell us about where do you work and what you do there.

Claudia Perlich: 00:12:11

I have joined a financial technology company called Two Sigma.

Jon Krohn: 00:12:18

Yeah, that’s the wonderful, biggest and most successful hedge funds in the world.

Claudia Perlich: 00:12:26

I’m very grateful and appreciative of not just the success, the truth is I tried to avoid finance for as long as possible, which actually isn’t easy if you’re doing data science in New York City. But I started meeting folks there as we were talking about data when I was still in my previous role as the chief scientist in an advertising company. And I was really impressed by the sense of a… They call themselves The Nice Geeks, and I think that’s pretty appropriate.

Claudia Perlich: 00:13:02

I didn’t join them for the success as much the fact that they’re really incredible people, and it’s a very nice working environment where you feel that you’re trying very hard to find a balance between intellectual property and the competitiveness. After all, we are building financial products, and some amount of secrecy and protection of ideas. And on the other hand, as you say, the absolutely invaluable brainstorming session on the whiteboard where really good ideas are only born as you’re working with people who think differently.

Claudia Perlich: 00:13:40

And so I joined Two Sigma and over the first year we decided to build a team that’s called Strategic Data Science.

Jon Krohn: 00:13:52

Cool. I didn’t know this.

Claudia Perlich: 00:13:53

Now, the truth is, I would say more than half of all the folks working at Two Sigma are data scientists. That’s what they do. They may not call themselves that, but kind of the naming issues around data science aside. So this team has a pretty broad mandate and it was maybe somewhat motivated by the breadth of experience and interests that I entered Drew Conway half.

Claudia Perlich: 00:14:21

Drew joined about a year later, and many of you may have heard about him as well. And that team is still fairly small. Think of it as a little bit of an R&D incubator within the kind of primary focus of the company, where we have some freedom to explore some more higher hanging fruits on the tree that might take more time than more of the kind of direct pathways into kind of the financial, industry and data types.

Claudia Perlich: 00:15:00

We work with more out there data sets. We are also collaborating with a number of affiliates. So Two Sigma isn’t only working in trading and the financial markets, but they also have a venture group and they have a team that is interested in private investing. There is a constant exploration of what are some adjacent areas of growth where the primary competence around data and modeling could be brought to bear. So we are working with some of those teams.

Claudia Perlich: 00:15:41

What’s really fun about it is they themselves then have venture companies that may need the occasional data science advice. So you get to work on a number of different product there. That’s part of the mandate is also to support these affiliates and help them understand what the best way forward is, and how some of the resources that we have can brought to bear.

Claudia Perlich: 00:16:07

That is the second piece of the mandate. And the third is things like talking to you. Doing some public facing activities, we’re going to conferences. And the understanding is that on some level, it’s giving back to the community. It also obviously helps reaching talent that may otherwise not have considered exploring financial industry as such, and just good citizenship. A very interesting area on that is, I’m not sure if you know about data clinic.

Jon Krohn: 00:16:39

No.

Claudia Perlich: 00:16:40

It is a small team within Two Sigma that the model is similar to what data kind does, so they are partnering with nonprofit organizations. And then explore opportunities where some of the employees at Two Sigma are volunteering their time, whether this is engineering time or data science time, to build solutions and systems based on data for nonprofits.

Claudia Perlich: 00:17:10

The team is also doing some great work with open data. They are participating in the New York Open Data conferences on a regular basis. That has been another area that I’ve been really excited about and participated as much as I can.

Jon Krohn: 00:17:24

It sounds like some pretty nice geeks. Yeah.

Claudia Perlich: 00:17:28

Very nice.

Jon Krohn: 00:17:30

You mentioned Drew Conway, that also reminded me of another kind of celebrity data scientist alongside yourself and Drew that I believe was, until recently, at Two Sigma, Wes McKinney. And so he is, for people who aren’t aware of him, you’ve almost certainly used his software, because he is… And you can correct me if I’m wrong on this, but if I’m remembering correctly, the founder and primary contributor behind the Pandas Library.

Claudia Perlich: 00:17:57

Absolutely. He actually interviewed me at Two Sigma when I was being interviewed to join Two Sigma. He left maybe a year ago. I think he wanted to focus even more so on the open source development and extension of his software. But he has been closely partnering, so we are still working with him. And we are using a lot of his developments and even have direct collaboration to the extent that we’re looking for extensions that seem to be useful to us.

Jon Krohn: 00:18:36

I think this is public information. I’m pretty sure I read online that Two Sigma is an investor in his company that he started since leaving.

Claudia Perlich: 00:18:44

Yeah.

Jon Krohn: 00:18:44

Yeah, so definitely-

Claudia Perlich: 00:18:47

That goes back to one of these kind of affiliates like the venture business that are really… Given the understanding we have about where technology stands and data science and what some of the kind of missing pieces might be. There’s a very deliberate effort to identify those companies that could help building out these pieces. I think it very well aligns with our strategy, despite the fact that we like Wes, that we are very happy to support him in his future endeavors.

Jon Krohn: 00:19:14

Yeah. And I can speak to the venture side of Two Sigma, being unusual in a really good way. Two Sigma has been a client of my company for many years now. And at one point, we came in and talked to your venture arm. We were invited to come in based on what we were doing and had some further conversations. And it was amazing to me the level of interest in the models we were building. We were talking about whiteboards, literally getting up at a whiteboard in a boardroom, talking through our modeling approaches, and getting feedback and guidance from venture capitalists, which I’ve never had that experience before. It was like going into a classroom and not a venture capital office.

Claudia Perlich: 00:20:10

I’m very happy that you experienced that and that you’re sharing this. And again, from the other side, it’s something that I’ve always enjoyed doing a lot. So I’ve always worked on these projects, whether they are mentorship with companies, or even students. And I think that opportunity really kind of reflects just excitement about good technology. So thanks for sharing.

Jon Krohn: 00:20:38

Yeah, an honor.

Jon Krohn: 00:20:43

Of course, working in finance, a huge amount of what you do can’t be made public on a podcast like this, but I’d love to know, maybe digging into a little bit more for the audience, what your role is like on a daily basis. So you’ve talked about these three different streams of work, but I don’t know if you were to pick. So we’re recording this podcast in the morning, so in the afternoon today what’s your day going to be like, or what is a typical day like?

Claudia Perlich: 00:21:17

Also, today happens to be a day off, because I can only take 15 vacation days over to the next year. But if it were a regular day… I think it might make sense to put for those folks who have not as much experience with kind of the financial investment process, a little bit of structure around that. And the way I like to think about it is, there’s typically at the very front end, where you have ideas about hypotheses from a really scientific perspective of what makes things work well, like what makes things grow, what could be having positive impact and success. And so you think about, actually, really fundamental economics, what are the drivers? Are you, as a company, showing that they’re hiring, that are successful as hiring talent?

Claudia Perlich: 00:22:08

So you’re looking at any form of either causes of symptoms that you feel, from an economic perspective, should have a positive or negative impact on the company. That’s kind of in the very front end that people talk about alpha ideas in the hypothesis space related to success. So we’re not talking about modeling returns yet, we’re just talking about understanding, scientific understanding, of how the economic world works.

Claudia Perlich: 00:22:36

And then starting with that, you come up with a number of ways to characterizing entities and companies and all of that. And these are the initial level of signals. There are many different teams at Two Sigma that are thinking about this, and they focus on very different components, they have different specialties. What happens then is it goes into the next stage. So when all of these signals come together, and they have been validated and tested and all of that, there is an optimization stage where, as a company, we need to figure out, well, how much exposure do we want to the Asian market? It might be that they all say it’s great, but how much exposure do we want to take? Right? So there are now these trade offs, depending on the mandate of a given fund.

Claudia Perlich: 00:23:22

And so in that optimization piece, you’re now restructuring the signals into a goal portfolio. At that point you decide, where do you want to be? And this happens in very different timescales. Some of them are much shorter terms, others are much more longer term. But there is this optimization stage that then says, “Here’s where I want to be.”

Claudia Perlich: 00:23:42

And then the last piece is the execution, which shouldn’t be overlooked, because there are many different ways how you can buy a certain amount of shares in a certain company. And if you’re operating at the scale of Two Sigma, you know that A, people are watching what you’re doing, and B, that you have market impact. So now the strategy of how do I get to that place from where I am.

Claudia Perlich: 00:24:08

And in all of these components, there are different flavors of data science with different types of algorithms being used, some are more in the machine learning region, some are more in the optimization space, some may be more in a kind of reinforcement space. So there’s really applicability for data science across that whole board. And my team has primarily worked on the very early beginning of understanding what kind of signals could be giving us some insight into how the economy works and what makes companies successful. And that piece is also the piece that then is very much applicable to private companies and to startups. So that’s really where the symbiotic piece of why my team is doing all these supposedly different things works, because the fundamental economic drivers really should be the same. And you can often learn something about public companies by understanding private ones and vice versa.

Jon Krohn: 00:25:07

Nice. That was such a beautiful explanation. I was completely entranced that whole time. I love it. I worked for a couple of years after my PhD in trading, and I don’t think I could have possibly explained what it’s like anywhere near the level that you just did, so that was absolutely wonderful. Thank you.

Claudia Perlich: 00:25:32

I do realize that I did not at all answer your question about what I would be doing on a regular day. I guess I’ll answer that [crosstalk 00:25:39].

Jon Krohn: 00:25:39

Yeah, please.

Claudia Perlich: 00:25:46

In the role that I currently have, it’s a very small team. I’ve been trying to resist management responsibilities for a while, but I absolutely love the team that I have right now. Given the circle and size of three folks, I feel that the increase in productivity that I get from working with a team absolutely outweighs the kind of managerial components of it. It feels like a really step forward in my career. Part of my day is I’m connecting to my folks. We have a little stand up. We used to do it daily, but we’ve come to learn that we know so much about what everybody’s doing anyway. And we have now put it into twice a week.

Claudia Perlich: 00:26:30

I think when we’re on the office, it’s kind of more natural to just grab each other over a coffee and then talk about what we’re doing. It felt a bit artificial, so we’ve scaled it to twice a week. And then typically, I’m picking up whatever I’m most excited on working on. It could be a little component of a piece of work that one of my team members is really focused on. And it could be something like, “Well, our entity matching of company names is still only at 95%, let me try to figure out if why not throw it into some kind of embedding or distance learning thing and see if that works.” So I’m picking up almost the most R&D things that often don’t work, but occasionally they do. So if I get to devote two, three hours of my day, I might pick up some embedding and do something that is really useful to one of my team members.

Claudia Perlich: 00:27:28

I might have a meeting where we’re going over a model and understand, well, shouldn’t it truly predict a lot better than it does? So there are kind of the technical sides to it. And other than that, we have a number of kind of ad hoc inside things. So we’ve been writing this newsletter on COVID observations, and it’s really fascinating how you can look at both public data services.

Claudia Perlich: 00:27:56

Part of my day is scouring for relevant other new data sources, whether this is public data or vendors that I hear about that could be of interest. And then in thinking about, well, how can we share this? Actually not for direct financial model, but understanding maybe just the risk landscape of how COVID is changing the economy, and maybe there are some markets where we don’t know where it’s going, but maybe we just don’t want to be in that risk space at the time.

Claudia Perlich: 00:28:24

So I’ve been tracking, for instance, how globally, the movement of goods has changed under COVID, and how different economies kind of there’s a staggered effect that it has on different economies. That also means that there’s a bit of a shift in the market share, because a few are there at the right time when somebody has a lot of demand and some previous very big supplier is out of commission. So they’re interesting dynamic observations. Even if we can’t really predict where this will be in a month or two, it might still be interesting to observe these in terms of the scientific understanding of the economy.

Jon Krohn: 00:29:01

When you say newsletter, that’s a public newsletter?

Claudia Perlich: 00:29:05

That is only public to the Two Sigma environment. So no, it’s not going [crosstalk 00:29:13].

Jon Krohn: 00:29:13

I see. I see. I see.

Claudia Perlich: 00:29:15

But [inaudible 00:29:15] actual company to send an email with insights to everybody in the company, that is quite spectacular in terms of [inaudible 00:29:23] that’s going on there. So yes, it’s not quite a newsletter as such.

Jon Krohn: 00:29:28

I was like, that’s super interesting, I wonder if people would like to have access to it. I’m sure lots of people would like to have access to it, but I’m not going to try to give them a URL to do it.

Claudia Perlich: 00:29:38

A part of the issue is that I think generally we’re very encouraged to be publicly involved. They have been a number of people who have been working on data science COVID inside with the city and other entities externally. There is, of course, the concern. A lot of the data analysis reveals in much detail what data sources it is that this was derived from. And I think that’s where typically, then we are feeling a little less comfortable sharing all that.

Jon Krohn: 00:30:08

I understand. That makes perfect sense. All right. Let’s try to move a little bit away from divulging proprietary information. I’ll ask this question about the kinds of skills and tools that you use. You’ve mentioned now, you’ve discussed what your kind of work is. You describe it as, what you call the front end R&D. So it isn’t like front end development, it isn’t like building a user interface, but it’s where, in your world, the back end is executing the trades. I suppose the front end is coming up with ideas for models that trades could execute on. What kinds of tools do you use to do that?

Claudia Perlich: 00:30:55

There’s a little bit of front ending going on. So we do have a number of dashboards, especially for things like this COVID insights, also a lot of focus on election. We have done a little bit, but that has not been my core focus. I mean, you should not at all be surprised to find the kind of typical stack of data scientists’ bread and butter. Obviously, a lot of work happens in [inaudible 00:31:23] books.

Claudia Perlich: 00:31:24

At this point, I think Python has become a very prevalent modeling environment. I find it fascinating. Some of the tooling is kind of straddling the place between specifically built for the needs of the company, vis a vis then having the kind of open source tools, and kind of the flux in between. And depending on where you’re ending up, as you get to the more Two Sigma specific things, then the support you need and what kind of help on that side is kind of very specific.

Claudia Perlich: 00:31:59

But in broad terms, in terms of tooling, we are utilizing cloud for a number of data storage and even processing. As I said, Python is the primary use case or primary language. You have a lot of stuff going on for some of the more intensive scaling challenges that people deal with.

Claudia Perlich: 00:32:21

I think there’s an open source tool called IBIS, which is used to basically interface with any kind of cloud storage system. So it automatically generates SQL queries from a more kind of programming language that many of the coders feel more comfortable expressing their data manipulation in that format, and then have it out generate this signal-

Jon Krohn: 00:32:45

Cool. I didn’t know about that.

Claudia Perlich: 00:32:46

… that you can [inaudible 00:32:48]. So yeah, I think that’s, broadly speaking, where most of, I would say, 95% of what I call front end. Now, truth is we have folks that write code in R. If you’re really good at R. If you want to do your research in R, go for it. It’s not very prescriptive on the ideation side. I think once you get more into the production, analyzing it, and making sure this runs then on the schedule, and so on, I think there’s a lot more consistency and requirement. But on the R&D side, and I’ve written the occasional pro script, not that that’s something that I would recommend anybody to do. But, again, never mind.

Jon Krohn: 00:33:28

One thing, if I recall correctly, I mean, I don’t know if people would find this interesting or not, but most organizations, they’re a Windows organization, or if you’re a tech startup, you’ll often be a Mac focused kind of organization. But if I recall correctly, everybody’s working on Ubuntu. Is that right? Am I misremembering that? I thought for some reason.

Claudia Perlich: 00:33:59

Let me tell you what I work on, and then you can draw conclusions from there. When I joined Two Sigma, we were just in the process of allowing people to work on laptops. And that was, from a security perspective, significant concerns, because now you’re moving with this. You could be sitting in a coffee shop, so the concerns of security around the potential of having proprietary or confidential information. And by the way, this is not just because we want to be secretive. I mean, there’s a lot of regulation around what we should and shouldn’t be doing. Right? It serves more than just a, let’s kind of contain our ideas.

Claudia Perlich: 00:34:45

And so at that point, we now started being able to work on Windows-based machines, laptops that we could [crosstalk 00:34:55].

Jon Krohn: 00:34:55

I see.

Claudia Perlich: 00:34:55

The venture team is on MacBooks. But most of Two Sigma embraced laptops. And the timing was really excellent, because by the time remote work became a necessity, pretty much everybody had a laptop. Otherwise, people at home maybe a workstation that mirrored the workstation they had. They were also Windows-based, by the way. You would still log into a Windows system.

Claudia Perlich: 00:35:25

The actual compute, a lot of that happens on cloud machines that you log into remotely. And whether you now just go in kind of behind and open up a notebook or whether you actually go in there with a full X terminal, kind of a full view, I think that’s a matter of taste. So yes, most of the compute is clearly on Linux. And I have no idea whether it’s Ubuntu or not.

Jon Krohn: 00:35:52

It’s obviously a mis-memory of everybody working on Ubuntu.

Claudia Perlich: 00:35:59

As far as I’m concerned, the machine that I’m carrying around physically in my hand, that’s [inaudible 00:36:04].

Jon Krohn: 00:36:04

Well, yeah. So it ended up being an interesting story. I hope you won’t mind me saying that using Windows is a bit less interesting of a choice for me to be asking a question about. That’s great.

Jon Krohn: 00:36:23

Now, we’ve talked about what you do on a day to day basis at Two Sigma, in a broad way, the way that you approach financial markets. We’ve talked about the tools that you use. When you’re looking to hire, Two Sigma’s hiring the most talented people on the planet. What do you look for in terms of hard skills or soft skills?

Claudia Perlich: 00:36:50

I want to give you kind of a broader sense. I think there is Two Sigma hiring at large, and then there is the more specific need that I feel I have for my team. There’s a little bit of a difference. There’s the constant balance between good financial understanding, and being able to have hypothesis about economic factors and drivers. And on the other side is diversity of thought that when you do get to a whiteboard, you really have somebody who can challenge you and can bring very different kind of orthogonal thoughts.

Claudia Perlich: 00:37:32

If we’re all thinking the same thing, then that signal really doesn’t give you any way of managing risk or anything. So there is very strong emphasis in getting diversity of thought and hypotheses. And we’re measuring how kind of different your ideas are once we get to a point where we’re having signals.

Claudia Perlich: 00:37:52

In that search, there’s this balance between how much background and what kind of background do you require in these different areas? One thing for sure, we are a technology company. I think once you are on the modeling side, you have to be able to code on some level and be able to do data analysis. In the more executive curve, if you’re looking for advice on some separate roles, that’s not like once you’re reaching a certain level, but let’s talk about normal hiring in terms of skill set. Compute is definitely necessary.

Claudia Perlich: 00:38:32

In terms of background, I personally have really tried to find the non usual suspects. So people with background in library science, or your pure econ. I mean, Drew himself came from social sciences. I personally have found kind of that mixture to be really interesting and make for great conversations and great innovation. At the current moment, I feel that I’ve gone so far out of my way with that strategy that I probably should hire somebody who knows more about finance than I do.

Claudia Perlich: 00:39:14

In terms of really then connecting your hypothesis to how it would affect markets, that piece, it’s really you’re looking for these missing building blocks. And so when you are talking to the HR team, I’m trying to describe to them as clearly as what I’m looking for. And then often we’re discussing, “Well, what do you think these kind of people are working right now, and what kind of background should we be looking for?” It’s actually a very closely involved relationship that we have with our HR teams, as we’re describing who we want. And then even, “Yeah, I really want these type of people on my lineup to talk to that person.” Because, actually, I can get help. If the Python programming isn’t really up to par yet fully, I can get somebody to mentor that. That’s something we can absorb right now. But we’re looking for that skill to complete our team’s skill set, ultimately.

Claudia Perlich: 00:40:14

It’s not like cookie cutter, you go out there and then they’re coming in. There is a good amount of involvement and thoughtfulness in terms of what skills would complement the team that we have.

Jon Krohn: 00:40:24

Nice. That’s a great answer, as all your answers have been. And so kind of following on from that, either at Two Sigma or even just in the broader data science or machine learning market, where do you think the field is going? What kinds of skills should people be getting now or in the years to come to be as hireable as possible? It could be at a financial services firm, at a hedge fund, or even just more generally.

Claudia Perlich: 00:40:54

I love that you asked the question. And I’ve been struggling with a good answer to that question as well, because I, as you know, I am currently teaching as an adjunct at NYU, in the part-time MBA program, and you’re always one of the absolute favorites when it comes to [crosstalk 00:41:15]. Oh yeah, they love you.

Jon Krohn: 00:41:14

Oh yeah, I had the honor of doing a guest lecture in Claudia’s course at NYU Stern. Yeah, it’s so much fun to give. I miss being able to do it in person, and I can’t wait to be able to do that again. From the last time that I came in and did it in person at NYU, there was half a dozen people who I still see popping up regularly on LinkedIn. And it’s so easy to remember, “Oh, yeah, that was a person I met that night.” Well, thank you very much for saying that. But yeah.

Claudia Perlich: 00:41:46

Absolutely. So the moment that’s back in person, you’ll be back in person.

Jon Krohn: 00:41:49

Great.

Claudia Perlich: 00:41:53

And so what’s interesting about that gang is they are not what you would understand to be a regular data scientist. They’re in an MBA program. And they are really looking to understand more on the conceptual level. But what are the opportunities? Where’s this technology really valuable? What are the fundamental limitations? It’s slightly different than the question you asked.

Claudia Perlich: 00:42:16

One of the things I have found to be difficult is to assert a certain level of real data intuition. Let me try to quantify this somewhat. I think there was recently a blog, I’m happy to share that, that I found kind of pretty spot on. There’s the ability to look at an analysis or a data set and say, “You know what, it just doesn’t quite smell right. The performance is too good in some ways.” Or there are certain ways that algorithms behave like the relationship between the performance of the logistic regression and the tree just doesn’t make sense. They shouldn’t be that different. Or they shouldn’t be that similar.

Claudia Perlich: 00:43:01

So there is, on the machine learning side, but also just on the sheer data analysis, a level of skepticism that you look at data and say, “Well, this might be data, but it’s probably very far from the truth.” And that curiosity to understand what might be going on. And I’ve been talking to one of my team members, and I tried to express that. It comes from a friend of mine. Before you do any analysis, I want you to have an expectation as to what to find.

Jon Krohn: 00:43:35

Classic scientist behavior that scientists I think seldom actually have.

Claudia Perlich: 00:43:41

First, think about your problem, think about for any answer that you’re looking for first, what do you expect it to be? And then if it’s not that, you have to figure out why. And there are two possibilities. Either you learn something about the world that you didn’t know, and you are better thereof, or you figure out that something is wrong with your analysis or the data. It becomes both an opportunity to grow in front of a guard, rail for analysis. I have struggled even with my own teaching efforts, kind of, how do I convey that? How do I get people into the habit of doing that?

Claudia Perlich: 00:44:26

This is something where I think on the educational side, as much as I love the ability to take online courses and to learn a lot about the methods and the math and all of that kind of exciting stuff. When it comes to thinking deeply about what your data should be, that it’s a data set, that it’s not a benchmark data set where the only goal is to beat the current or get within 5% of the current performance. And then being able to say, “Look, this is really not likely to be right.” And the data set, you often end up having to significantly refactor it, throw half of it away. So that process, in terms of understanding data, as well as on the early piece, it’s very easy, “Oh, I just need to build a model to predict default.” Well, maybe, but do we really need to look all of them? Or are some of these default pieces really not appropriate?

Claudia Perlich: 00:45:26

I recently did a project in the medical space on kind of a side effort where we were looking at hospital readmissions. I just don’t understand enough about medicine. And it’s not just about building a model that predicts whether the person shows up again in hospital, you need to actually understand how the process works and why people go to rehab. And if they go to rehab, it’s coded as a hospital, but it doesn’t mean that they got readmitted. So it’s kind of simple stuff of that sort, that way of communication skills and the ability to reconnect the fun modeling piece back to, what are we trying to achieve here?

Claudia Perlich: 00:46:08

I have found that part to be the hardest to evaluating candidates, and also the hardest for candidates to do well on. We find a lot of candidates that are constrained and that get 100% on the coding test. But they seem to be entirely uncomfortable with questioning real life data sets that are not kind of prepared.

Claudia Perlich: 00:46:35

And of course, just telling somebody, “Oh, you need 10 years of experience,” that is not a fair answer. I think the real question is, how do we help educating those kind of skills in the process? And how do we build a better sense of evaluating them?

Jon Krohn: 00:46:48

That’s a beautiful answer, and it brings me to several other questions and comments about you and your career. First of all, I think that that is an amazing answer to the question, what do data scientists need to be doing in looking forward in the future? Because this is absolutely something that’s not going to go away. It’s not like there’s going to be some software library, some Python library that comes out that lets you know that these data are reliable and everything’s fine. It’s always going to be a problem.

Jon Krohn: 00:47:18

I think that that is a focus of your course at NYU. I think that’s something that you talk about a lot, is where things can go wrong. And so first, I want to make the comment that if you ever have the opportunity to see Claudia give a lecture, they are always exceptional. And a lot of them hinge around this idea of data not being as you expected.

Jon Krohn: 00:47:40

I remember a talk from a couple of years ago that you gave called All Of The Data But Still Not Enough. And that was a great one. Claudia gives lectures kind of like the way Malcolm Gladwell writes books, where you think you know what’s going on and you’re understanding the problem and you’re so smart, and then the next slide comes up and you realize that everything is the opposite of what it appeared. And you’ve been lulled into a false sense of security around your own intellect. There’s a recommendation for you. But then second, this also leads me to a question. So you have a very storied career as a data science competition winner. I mean, you could describe it to me.

Claudia Perlich: 00:48:39

I’ll do.

Jon Krohn: 00:48:41

Please go ahead. Tell us about your history with data science competitions, and then maybe you had a tip or two, I suspect they’re going to be related to data quality, around how to win competitions.

Claudia Perlich: 00:48:55

A story about my competitive experience in data science. I think the first time I stumbled over a competition was long before this was kind of broadly a publicized effort. You may know of a community called KDD.

Jon Krohn: 00:49:15

I do.

Claudia Perlich: 00:49:15

It’s an annual conference. Knowledge Discovery and Data Mining was the original term, I think it’s 25 years ago, was the first workshop that then became its own ACM conference. I think it was in maybe ’98. At some point, for the first time, somebody posed a competition in that context, where the pretty much rules of the game, as we it know today, were set up.

Claudia Perlich: 00:49:47

There’s a training set that everybody gets. There’s a test set where the target variable is missing. Everybody goes off, does their thing. You can write it down on a piece of paper with a pencil if you want and come to the solution, or you can actually use deep learning. I don’t care. Tooling is whatever you’re most comfortable with.

Claudia Perlich: 00:50:04

At the end of the day, you submit your answer. It’s very clear how you’re going to be measured. And then a winner is pronounced. I think this idea actually goes back. There was a Santa Fe time series prediction competition, where one of the task was to finish an unfinished piece of music using computers. So if you’re interested, that was run by [inaudible 00:50:27].

Claudia Perlich: 00:50:29

But so in this way, there have been these competitions long before Kaggle was, well, it even existed. And so KDD has been running this ever since the late 90s. And so at IBM, I got involved in the years 2007 through nine, being a participant. And our team happened to win those three years in a row.

Jon Krohn: 00:50:57

I like that, happened to.

Claudia Perlich: 00:50:58

There’s some [crosstalk 00:51:01] involved. So at the time, I think one was about Netflix data. You probably know the Netflix prize. It was the same data set but on a different task. There was one in the medical space on breast cancer detection. And there was a third one, I think it was also in the… That was in this telecommunication space [inaudible 00:51:25].

Claudia Perlich: 00:51:26

The piece that you are alluding to, and that’s kind of the story I’d like to tell about this. I won pretty much every single one of them not by having the best algorithm, or the fanciest algorithm. But pretty much every single time I found something that was wrong in the data, that ultimately you could exploit in order to get a better predictive performance. And why that is a little bit mischievous here, I think it’s an exceptionally good learning experience.

Claudia Perlich: 00:52:00

Obviously, in the real world, you’re not interested in coming up with fake performance on a model that’s supposed to predict breast cancer. But the ability to spot that this data set has a certain underlying structure where in the end the patient ID is predictive. And that goes back to the expectations, it should not be. If you come to me and you tell me that, something that shouldn’t be predictive is, I think that’s the starting point for a great journey into the guts of a data set. And you might find that ultimately, my assessment in that data set would be, it is not viable to build a model that should ever be used, period.

Claudia Perlich: 00:52:43

And you can listen to that story and why that happened the way it did. The same was true for the other examples where we were able to take advantage of statistical properties of the data set being disclosed by the way the test was sampled. Now, in practice, you shouldn’t be doing this, but you learn a lot about thinking about the data generating process.

Claudia Perlich: 00:53:10

In the last case, it became quite obvious that somebody had deliberately replaced missing values with zeros, which is one thing that I tell everybody do not ever do. And it turns out that the missing-ness was highly predictive, so you had to basically backwards engineer what had been done to the data set in the pre processing to reestablish missing-ness, and then you got a very well performing model in the process.

Claudia Perlich: 00:53:38

When you win this thing three times in a row, there’s really nowhere to go. So I retired and had instead then started to run data competitions myself. I knew the early founders of Kaggle, and I became involved in running the KDD cup and running a couple of other competitions at [Informes 00:54:00], and still a couple of data hackathons afterwards.

Claudia Perlich: 00:54:05

So to your question, how do you win competitions? You better have a lot of time at your hands. I’m very glad that I’m no longer competing, because winning a Kaggle competition in today’s day and age, with that level of kind of quality participants is hard. I don’t think I would have the shot to get anywhere near.

Claudia Perlich: 00:54:28

Secondly, I think people have learned a lot about how to better clean up datasets. I think we have kind of a real process now in terms of doing much more diligence in the data sets that are being published there. But in the early Kaggle competitions, you can go back and look the discussion forums. And I talked to some of the master participants there, you still had exactly the same problem, that almost every data set had some form of what we call Leakage, meaning there is something that is sorted that shouldn’t be. And because it’s sorted, you can find out where the holes are, these must be the pieces of the data that were pulled out from the test set.

Claudia Perlich: 00:55:11

So things of that sort were extremely pervasive. And we, again, might talk about, “Well, is this really data science? Is that what you should be spending your time on?” Well, I hate to break it to you, that’s every single data set I’ve ever touched in industry. And if you can’t find these things, then your models are only as bad as whatever went wrong to your data set in the first place.

Claudia Perlich: 00:55:34

And so developing that skill set maybe in an environment where you can read up what other people have found and understand better how to even think about these processes. So going back through old Kaggle competitions and just scouring for, what are these insights about data that help you become a better data scientist and develop a process and an intuition for what to do with new data sets? I think that’s really, really valuable.

Claudia Perlich: 00:56:00

And as I said, I’m not sure I can compete today. Part of it is because I’m not as nearly as well versed in some of the really exciting new technologies around deep learning and kind of old style model. So I do not want to reduce developments of the algorithms that have come an incredibly long way since I left that playing field. You have to be a good machine learning practitioner. That’s just the data piece that you can still probably gain a lot of skills.

Jon Krohn: 00:56:36

Nice. That was a beautiful answer. So informative. Thank you. Did you know that I once won a data science competition that you created the questions for?

Claudia Perlich: 00:56:47

Which one was that?

Jon Krohn: 00:56:48

It was in San Sebastian, the I-COM Global Summit.

Claudia Perlich: 00:56:52

Oh, good.

Jon Krohn: 00:56:57

That’s the opposite of what you’re describing, where on Kaggle you’re competing against 10s of 1000s of people. I was on a team, first of all, and I was competing against three other teams. So it wasn’t a huge field. We won by the extremely narrow margin. On my team of four people, I was not the reason why we won first there. In fact, specifically, I remember we were in this beautiful hotel in San Sebastian, there was the professional… Oh, what’s the name of the professional football soccer team there?

Claudia Perlich: 00:57:35

Now you’re definitely getting into a no interest [crosstalk 00:57:37].

Jon Krohn: 00:57:39

Their head coach was living in the hotel. I would wake up very late and miss the beginning of the conference. And so it just be me and him having breakfast in the morning. He used to coach Manchester United. Anyway, I’m getting-

Claudia Perlich: 00:57:59

So San Sebastian was the I-COM that I didn’t go to. You were there, Brian D’Alessandro and I were running the competition, but I was not actually in San Sebastian.

Jon Krohn: 00:58:10

Exactly, you weren’t there, but your competition was. You’d come up with the dataset and the question. Anyway, I went to bed the night before the results were due. I think we were being tested in the morning. So we were there for several days as this hackathon team, and so we’d go off and we’d work on this problem for… It was maybe 48 hours or 24 hour hackathon.

Jon Krohn: 00:58:37

But the night before results were due, I went to bed before anyone else on the team. And when I woke up in the morning, they had a much better model score. They’d kind of stayed up all night working on it while I slept. And then because they were also exhausted, after we won, I had the privilege of being the person that presented on our findings [crosstalk 00:58:56].

Claudia Perlich: 00:59:00

You make for a very good speaker. I think that competition required also presentation skills. I think that was an optimal choice.

Jon Krohn: 00:59:07

And this is why I’m in data science management these days. I get to talk about other people’s hard work.

Jon Krohn: 00:59:16

So yeah, and you talked about KDD. I think also something that might be of interest to listeners is, there’s a very popular newsletter website, the KDnuggets, I think it is?

Claudia Perlich: 00:59:27

KDnuggets, yes, by Shapiro.

Jon Krohn: 00:59:32

Yeah.

Claudia Perlich: 00:59:34

It has a little bit of that blend of the old style kind of academic focus. It’s a little less industrial, and has a lot of structure around academic events. I found it to be a really interesting source of some of the less flashy updates on technology. Yeah, I encourage everybody to check it out.

Jon Krohn: 00:59:58

Nice. So another great practical tip for our data scientists out there. I’ve now let you talk about everything about today and the future, but I haven’t let you talk about your past at all. So maybe now it’d be a good time to kind of let the audience know how you’ve ended up where you are today. With the level of success that you’ve had in your career at anything that you’ve done, it’s interesting to me that now you have made the choice to work in finance, which is just in the last two years that that’s come about. I guess give us a little bit on your background, but specifically kind of answering the question, given that you could probably work anywhere you wanted, or you’d certainly have a very good shot at working anywhere you wanted, why work in finance, why work at Two Sigma?

Claudia Perlich: 01:00:58

It’s an excellent question. And I’ve pondered. Every time a student asked me for career advice, I try to reflect a little bit of how I got to where I got to. I have concluded that the secret to my success is to find good bosses to work for.

Claudia Perlich: 01:01:24

There is a certain theme that I personally never had a very strong sense, oh, yes… Well, okay, at some point I wanted to be a veterinarian, but that’s long time ago. But I never had very clear goals for my career. I felt like I really enjoy what I’m doing and I do what I enjoy. The one thing I realized about myself, I really like solving other people’s problems. I don’t like to “find my own problems to solve necessarily”, and so I always felt strongly that having a person to work for who I deeply respect and who I rely on for “making judgment calls”, at least initially in my career, about this being a worthwhile problem, and then me getting to think about it and learn something new.

Claudia Perlich: 01:02:20

And so my career, starting out with a great advisor, Foster Provost, who wrote the book on data science for business that I’m still using, who left me a lot of freedom to do what I want, to go to IBM Research Watson to work for Rick Lawrence, who was a very similar personality type, where you stand to gain so much in terms of wisdom and mentorship, both of the technical but also in understanding how organizations work and what kind of projects are successful and what ultimately matters.

Claudia Perlich: 01:02:54

To then work at a startup, not because I really wanted to get the startup taste, but because I felt a very similar sense of mentorship into now more of the venture world and how that piece really came together under Tom Phillips. So it’s really weird. But I feel like I’ve picked the people I wanted to work for much more so than I pick what to work on specifically. And something very similar happened as I started to talk to Two Sigma. I met Alfred Specter again, who I knew from the IBM days, where he was briefly head of research there. And after being at Google, he went to Two Sigma as their CTO.

Claudia Perlich: 01:03:42

I always had a very similar impression of the ability to learn a lot from somebody who has kind of made the journey that they made. And the other person I met there was Ali-Milan Nekmouche, who I’m now reporting to, who has 15 years of experience and a real depth of thought and excitement about why we’re doing what we’re doing. That’s really what I have felt.

Claudia Perlich: 01:04:08

If you get to work with people like those, you learn a lot. You get the opportunity to make a career for yourself. And that’s the piece of mentorship. When people ask about PhDs, don’t pick a topic, pick an advisor. Pick an advisor who cares about you, about your growth, and not necessarily just kind of the next step in the kind of publication phase. I think the same is true.

Claudia Perlich: 01:04:34

I’ve interviewed at a number of companies, I’ve been offered great positions as head of AI here, there or other ways. And the one factor that always drove my decision, “Would I like to work for that person, or with that person?” If the answer is no, then that was the answer.

Claudia Perlich: 01:04:53

It might be a very kind of personal attitude. I had the benefit of being able to choose what to work for. At point, you can make maybe those choices, so it’s a bonus that may not work for everybody. But that has been my experience. And that’s how I found my way into finance. They’re great problems to work on. There’s a lot of interesting stuff to do. And they’re great people to work for-

Jon Krohn: 01:05:21

And lots of data.

Claudia Perlich: 01:05:23

And lots of data.

Jon Krohn: 01:05:25

Wonderful. I couldn’t agree more with that advice. I don’t know if we’ve talked about this before, but I imagine we haven’t.

Claudia Perlich: 01:05:35

We may not have.

Jon Krohn: 01:05:36

It is the same thing I’ve done in my career, is I have same thing, my PhD topic, I wasn’t particularly interested on that PhD topic before, but I met the supervisor and I was like, “Wow, I would love to work with that guy. What’s he doing?” And then the same kind of thing has happened since my PhD. I meet people that I’m like, “Wow, you are interesting. I love the problems you’re solving. I really admire your character. I don’t really care what you’re working on, but I’ll work on it alongside you.” Amazing.

Jon Krohn: 01:06:15

This has been such valuable advice. Thank you so much. I just have one last question to ask you, which is, what are you reading right now? Or what do you recommend for listeners to read themselves?

Claudia Perlich: 01:06:32

I have been making a little bit of a detour. I’ve come back to the classics. So let me tell you where the story starts. A good friend of mine, Sinan Aral, who works at MIT and has published extensively, but specifically a book on the hype machine that is focused on how the algorithms that we’re using in machine learning interface with this scale of social media. And there’s a lot of depth we can go into there. I absolutely recommend, if people are interested in that. But what it brought me back to is the original thoughts around memes.

Claudia Perlich: 01:07:10

I went to kind of find the source. I had read Richard Dawkins, Selfish Gene, maybe 30 years ago. And I found it absolutely… It totally changed the way I thought about a lot of these processes. And in the very last chapter of a book that was published in the last century, he poses that memes and ideas have a very similar evolutionary patterns as individuals and living beings. And that the evolution and how they gain ground in the minds of folks is really well understood as you think about them as living organisms.

Claudia Perlich: 01:07:54

And that really came back to life in the current discussion we had about the hype machine. I would absolutely recommend everybody kind of go back and pick up that classic from Richard Dawkins, and the interconnectivity of how things work that is similar to what happens in biology to what we are now desperately trying to understand in terms of these pieces of information that are recommended by algorithms, and then get shared by our participation in that system, and how it evolves and how it shapes our understanding of the world.

Jon Krohn: 01:08:30

Wonderful. Yes, and such a classic, as I mentioned before the show, it’s a book that I haven’t read, but I’m aware of a lot of the philosophy in it, or some of the philosophy in it. I end up thinking about memes and meme transmission and evolution as a part of my daily life, I think so. Definitely a book I should read, and I’m sure our listeners would gain a lot from tackling it as well.

Jon Krohn: 01:08:58

All right, that’s it. Thank you so much, Claudia. How should our listeners contact you or follow you or find you? Do you use social media?

Claudia Perlich: 01:09:11

Oh, I’m not good with social media. You can follow me, just don’t expect much to happen. How can you reach me? Absolutely connect with me on LinkedIn. Write a little something, because I get too many requests from people I don’t know. That gives me a sense. Just tell me where you come across me, and then I’d be absolutely happy to respond.

Claudia Perlich: 01:09:39

The truth is I’m very easy to find. I think with about 20 seconds of Google, you have both my personal email and my phone number. Stick to the email, but feel free to use that. So yeah, I’m welcoming people to reach out to me via email as well. My primary kind of social environment is LinkedIn. I’m not using it very liberally, but that has been always the best way to catch up with me.

Jon Krohn: 01:10:08

Nice. Yeah, that has been the trend for me as well as anybody that I’ve had as a guest on a podcast, LinkedIn for data scientists seems to be the platform of choice.

Jon Krohn: 01:10:21

Wonderful. Well, thank you so much, Claudia. And I do highly recommend, again, if you ever have the chance to see Claudia speak, go do it, you will not regret it. She is one of the best on the planet. Thank you so much, and I hopefully will have you again on the program soon.

Claudia Perlich: 01:10:40

Well, we should switch roles next time and I will ask you to go into depth what you do. You should put this on your podcast, seriously. Has Kirill ever actually had you as a guest on this podcast?

Jon Krohn: 01:10:51

Yeah, episode 365. You can check that out.

Claudia Perlich: 01:10:54

When was that?

Jon Krohn: 01:10:56

It was in 2020. Maybe spring of 2020. But yeah, so listeners can check out Episode 365 if they want to have the experience of what it would be like to have me be interviewed by Kirill, the long standing host of the SuperDataScience podcast.

Claudia Perlich: 01:11:15

Yeah. I feel like we could have just as much fun if I got to interview you. And maybe it’s something I should be doing at some point.

Jon Krohn: 01:11:23

Well, that sounds great. I would always say yes to any invite from you, Claudia. Thank you.

Claudia Perlich: 01:11:28

All right, so thank you again. Great to have me here. I appreciate that. Very much so. Great conversation. I will catch up on some of the podcasts that I definitely should be listening to. Absolutely fun to see you. You ping up on my LinkedIn and Twitters and what have you, with all the great content that you bring out there, so congratulations to all the stuff you do. It’s been great to kind of catch up with you and kind of follow you from the distance. And next time, we can meet in Bryant Park, let’s do that and have a drink.

Jon Krohn: 01:12:03

Nice. That sounds great. All right. Thank you so much, Claudia and catch you again soon.

Jon Krohn: 01:12:14

Wow. Claudia is brilliant, isn’t she? She has such thoughtful and well spoken responses to every question. You’d think I provided her with the questions in advance, but she was responding off the cuff and we had no retakes whatsoever.

Jon Krohn: 01:12:29

Well, I learned a lot about the leading role data, stats and machine learning plays in trading algorithms. I also greatly appreciated the emphasis Claudia has on studying data for irregular or unexpected patterns, traits she looks for in the scientists she hires, as well as trades that enabled her to be a repeat champion in global data science competitions.

Jon Krohn: 01:12:53

As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, and URLs to Claudia Perlich’s LinkedIn, as well as my own LinkedIn and Twitter profiles at www.SuperDataScience.com/437. That’s SuperDataScience.com/437.

Jon Krohn: 01:13:18

If you enjoyed this episode, kindly leave a review on your favorite podcasting app or on YouTube, where you can enjoy a high fidelity video version of today’s program. It sure is nice to put smiling faces to the laughs. I also encourage you to tag me in a post on LinkedIn or Twitter to let me know your thoughts on this episode. I’d love to respond to your thoughts in public.

Jon Krohn: 01:13:39

All right, it’s been so much fun. Thank you for listening. Looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 437: Data Science at a World-Leading Hedge Fund

SDS 437: Data Science at a World-Leading Hedge Fund

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

July 7, 2026

July 3, 2026

Podcasts SDS 437: Data Science at a World-Leading Hedge Fund

Share

SDS 437: Data Science at a World-Leading Hedge Fund

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

SDS 1008: The AI-Native Startup Playbook

July 7, 2026

SDS 1007: How to Find Solid Career Ground in the AI Era, with 80,000 Hours Founder Ben Todd

July 3, 2026

SDS 1006: In Case You Missed It in June 2026