Podcasts SDS 607: Inferring Causality

73 minutes
Data Science, Machine Learning

SDS 607: Inferring Causality

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Dr. Jennifer Hill, Professor of Applied Statistics at New York University, joins Jon Krohn this week for an episode that covers the role of causality in data science applications, her favorite Bayesian and ML tools, and how to design research to infer causality from the results confidently.

Thanks to our Sponsors:

About Jennifer Hill

Jennifer Hill develops and evaluates methods to help answer causal questions vital to policy research and scientific development, particularly on situations where it is difficult or impossible to perform randomized experiments. Her most influential methodological work focuses on Bayesian machine learning methods that allow for flexible estimation of causal models and strategies for exploring the impact of violations of assumptions. She is currently developing software that makes these sophisticated tools more broadly accessible and teaches the user about the foundations of causal inference during the process of model fitting. Hill has published in a wide variety of leading journals and won several teaching and research awards. She is Professor of Applied Statistics and currently serves as the Director of the PRIISM Center and Co-Director of the MS degree in Applied Stats (A3SR) at New York University.

Overview

Jennifer first explores the concept of causal questions through the lens of daily life. Whether you’re deciding on your wardrobe or choosing a restaurant, all humans are faced with causal questions every day. But when one must decide on behalf of many, or within a scientific context, the stakes are raised. Essentially, causal questions try to assess whether the world would change, were the experiment be exposed to one state of the world, versus another.

The vast majority of data science tools are correlation-based and have no internal capacity to infer causality. This leaves causality in the hands of practitioners and forces them to address this critical step for themselves. But when causal inference prevents data scientists from making problematic predictions, how do we encourage them to start thinking causally? As Jennifer explains, there is no easy answer. But one way to be extremely confident in results involves mastering experiment design, and ensuring that a randomized control trial is implemented.

Both Jon and Jennifer warn that when using plug-and-play causal tools, it’s essential to understand the underlying data and all the assumptions that go into it. From another perspective, however, Jennifer also emphasizes the importance of “deeply engaging with the subject matter, and researchers in the field, policymakers, and understanding the assumptions and communicating them better.”

Since 2008, Jennifer has been championing a causal tool called BART (Bayesian Additive Regression Trees), which is a model fitter that differs from others because it’s embedded in a Bayesian framework. Benefits of this include its ability to get coherent uncertainty estimates and avoid overfitting.

If you’re interested in learning more about causal inference, Jennifer recommends a roadmap for eager students. First, she suggests enrolling in a research methods class that deals with human subjects. Secondly, a measurement course provides a better understanding of the meaning and provenance behind your data. Lastly, studying experiment design will help you round out a solid foundation of skills on causality.

Tune in to learn more about BART and Jennifer’s new zero-code graphical user interface for making causal inferences.

In this episode you will learn:

How causality is central to all applications of data science [4:32]
How correlation does not imply causation [11:12]
What is counterfactual and how to design research to infer causality from the results confidently [21:18]
Jennifer’s favorite Bayesian and ML tools for making causal inferences within code [29:14]
Jennifer’s new graphical user interface for making causal inferences without the need to write code [38:41]
Tips on learning more about causal inference [43:27]
Why multilevel models are useful [49:21]

Items mentioned in this podcast:

Pachyderm
Zencastr – Use the special link zen.ai/sds and use sds to save 30% off your first three months of Zencastr professional. #madeonzencastr
Random forests
BART (Bayesian Additive Regression Trees)
bartCausal library
thinkCausal
PyMC library
Pearson Correlation Coefficient
Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill
Regression and Other Stories by Andrew Gelman, Jennifer Hill and Aki Vehtari
The Anomaly by Hervé Le Tellier
The Midnight Library by Matt Haig
Sliding Doors

Follow Jennifer:

Website

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn:

This is episode number 607 with Dr. Jennifer Hill, Professor of Applied Statistics at New York University. Today’s episode is brought to you by Pachyderm, the leader in data versioning in MLOps pipelines and by Zencastr, the easiest way to make high-quality podcasts.

Welcome to The SuperDataScience Podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today, and now, let’s make the complex simple.

Welcome back to The SuperDataScience Podcast. I am over the moon to be joined today by one of my personal data science idols, Professor Jennifer Hill. Jennifer is Professor of Applied Statistics at New York University, where she researches causality and practical applications of causal research, such as those that are vital to scientific development and government policies. She co-directs the NYU masters in applied statistics and she directs PRIISM, a center focused on impactful social applications of data science. With the renowned statistician Andrew Gelman, she wrote the book, Data Analysis Using Regression and Multilevel/Hierarchical Models, an iconic textbook that has been cited over 15,000 times. She holds a PhD in statistics from Harvard University.

Today’s episode, largely contains content that will be of interest to anyone who’s keen to better understand the critical concept of causality, but it also contains parts that will appeal primarily to practicing data scientists who will be implementing causal models in practice. In this episode, Jennifer details how causality is central to all applications of data science, how correlation does not imply causation, how to design research in order to confidently infer causality from the results, her favorite Bayesian and machine learning tools for making causal inferences within code, and her new graphical user interface for making causal inferences without the need to write any code. All right. You ready for this major episode? Let’s go.

Professor Hill, welcome to The SuperDataScience Podcast. It is such a personal treat for me to have you on the show. So, I have been an enormous fan of your book, Data Analysis Using Regression and Multilevel/Hierarchical Models, since it first came out. So, it came out in 2006, I started my PhD in 2007, and it was the first stats book that I fell in love with. I think I worked through the entire thing. I learned so much, not only about stats, not only about regression and multilevel/hierarchical models, but at that time, it was also largely an introduction to R to me. So, I’d mostly been working in C++ and MATLAB before then. So, I’ve been revering you for years and it blows my mind that I get to spend time asking you any questions I want on air. So, thank you so much for taking the time to do this. Where in the world are you calling in from?

Jennifer Hill:

Well, first of all, that was a very kind introduction. So, I really appreciate that. Writing books is really not fun and-

Jon Krohn:

Agreed.

Jennifer Hill:

… it’s great. It’s totally worth it when you actually get positive feedback. So, it’s really lovely to hear. I am in Northwestern Connecticut. I’m kind of close both to the New York and Massachusetts borders. So, pretty rural country.

Jon Krohn:

Nice. Then I guess, do you take the train in to the city somehow [inaudible]?

Jennifer Hill:

I go back. So, I have family here who I spend time with and then I spend a couple days a week during the school year in New York as well. This was a change that happened during the pandemic for a lot of reasons.

Jon Krohn:

And for a lot of people. Not an uncommon thing to be getting some space outside of the city in the pandemic. I am jealous of the people who did. I don’t know why I’m still seven days a week in Manhattan when I could be enjoying the outdoors at least some days of the week. Well, thank you so much for joining us. We’re going to dig right into the technical questions that we have for you because we have some really great questions leaning on your expertise in causality. So, first of all, Jennifer, what are causal questions and why are they vital to both policy research and scientific development?

Jennifer Hill:

Sure. So, human beings like to make meaning by thinking causally. So, if you think back even on what you’ve done already today, you’ve probably made 20 or 30 decisions that were based on implicit answers to causal questions, from what did you eat for breakfast? Or if you had commuted, what train that you took. Everything that we do, we’re thinking, huh, would it be better if I wore this or if I wore that? Well, if I wear this, I think I’ll feel this way, right? It’s because I’ve decided I know the causal answer to that question. Or if I wear this, I’ll be warmer. Some of these decisions, we have a lot of data for, lot of them, we don’t. When we expand out to thinking about making decisions for other people, which is what happens in science and happens in policy, the stakes are a lot higher often. So, it’s really bad if we get the answers wrong.

So, kind of a broad definition, causal questions are questions that try to assess whether the world would change if observations, let’s say individuals, were exposed to one state of the world vs. another. So, that can range from a navel-gazey question about will I have more energy during the day if I have eggs vs. oatmeal to something like whether mask mandates reduce deaths due to COVID, right? Really broad reaching, hard to answer questions. The key idea is that there’s a causal variable, which sometimes we refer to as a treatment, even when it’s not what we think of as a medical treatment.
That variable has to be something manipulable, something that we can actually change in the world. We want to consider what individuals’ lives would look like if they were exposed to one setting.

So, attending a school with a mask, mandate vs. another, attending a school without a mask mandate, and what would the implications be for your health, or psychological and physical, etc., etc.? So, that ties into why they’re vital to policy and research and scientific development, which involves a lot of decision making. So, there are lots of examples you can pull from on the science side of things that should feel fairly familiar to listeners. So, consider if you’re trying to figure out whether a drug or a vaccine is effective. It’s not enough to say, “Hey, I took it and I got better,” because you don’t know what would have happened if you hadn’t.

Jon Krohn:

In terms of personal conversations, as somebody with a statistical background, formal education in it, I have a really hard time and I try to be tolerant, but it’s a regularly occurring thing with family members, with a lot of even random people that you meet where they’ll say things like, “Oh, yeah. I got the vaccine, I got the COVID vaccine, and I still got really sick. It doesn’t work.” You’re like, “How can you draw that conclusion?” I’m like, “Your personal experience, your N equals one, is basically irrelevant. We have no idea. Did you go to the hospital? No, you didn’t go to the hospital. Maybe you would’ve gone to the hospital or you would’ve died if you hadn’t had the vaccine.”

But yet, we see people drawing causal conclusions, global conclusions that they think apply to a large number of people based on their own personal experience. I’m sure I do it all the time, but I try not to. I also try not to give people a hard time in conversations when they do do it because it’s so interesting. It’s like, it’s so ingrained in people and people have … I guess because it’s so personal, your personal experience is, by definition, very personal. So, people kind of take it as an affront that their personal experience isn’t meaningful and that they should really just be relying on peer reviewed controlled studies for their decisions.

Jennifer Hill:

Yeah. Oh, there’s so many kernels in there that I could draw out. So, one of this is the N equals one thing, right? So, N equals one. You’re a statistician. You’d say, “Why would you draw any conclusions from N equals one?” In causal inference, it’s worse than that because you could have a huge sample size, like the whole world during COVID, but if you don’t have the counterfactual world where COVID didn’t happen, you’re still sunk because we’ve got all this missing data. The data for everyone in that alternate universe where it didn’t happen is still missing for everyone. So, it’s worse. Then add to that the fact that psychologically, there’s a ton of evidence that we all really think we’re good at causal. We make meaning causally.

So, there’s been studies in psychology and linguistics and a range of fields, philosophy, that make it really clear that this is how people think. Right? This is how we make meaning. So, we all really implicitly think we’re good at it. Yeah. So, this happened to me, therefore, obviously, you’re not only saying it happened to me, so maybe it would happen for other people, but it happened to me and I implicitly know what would have happened if I’d gone down that other path, which is craziness. Sometimes, you kind of know, but you never really know. So, yeah. So, for thinking about policy, it’s like that, but for a lot of other people, often people who are very much not like you, and you have to say whether their lives would be better under policy A or policy B, well, that’s a tough call.

Jon Krohn:

Right. All right. So, what can we do? What’s the key to teaching causal inference effectively? You have to do it regularly to a university audience. So, maybe how can we teach causal inference effectively to a university audience? But what can I be doing to try to convey these issues just in informal conversations with family members and that kind of thing? In particular, so far, we’ve just been talking about it from the personal experience level, but there’s also this issue of people conflating correlation with causation. I think that that that happens, the correlation-causation conflation happens even to very well educated people, even to people who are formally trained and should know better, I think.

Jennifer Hill:

Yeah. Yeah. It does. It’s a very natural impulse, again, I think based on human psychology. It doesn’t help that when popular press, newspaper articles, all kinds of other media pick up scientific literature that might have been well written to begin with, they’re going to jump. The chance of that headline having a causal implication in it is much higher because it’s sexier, it’s more interesting. Right? That’s how people think. So, that’s part of the problem.

Jon Krohn:

Tomatoes cause cancer, tomatoes don’t cause cancer.

Jennifer Hill:

Yeah. That’s right. That’s right. Wine’s good for you, then it’s bad for you. Then it’s good for you again, right? Again, nutritional epi is the worst. It’s really hard to do that kind of work.

Jon Krohn:

So, I’ve had that exact experience with family members where they’re like, “I don’t even listen to these anymore because I keep hearing on the news that blueberries are good for me, blueberries are bad for me.” Red wine. That’s a big one. It seems to come up all the time. Red wine is good for you. Red wine is bad for you. It has caused some people to believe that somehow there is no real underlying truth, that because they’re hearing a change on the news all the time, they’re like, “It can’t be some real truth here,” and it’s really unfortunate. Anyway, I’ve kind of gone off a tangent here.

Jennifer Hill:

No, no, it’s true. I have family members who really don’t like having these conversations with me because I’m not fun. On the other hand, my whole family now says, “But what is the counterfactual?” So, there’s that that happens too. What’s good when you’re feeling judgy about other people is to start noticing your own behavior. So, when you read the article that says, I don’t know, think of a food you don’t like, and that that’s bad for you, you’re like, “I always knew it,” when it conforms with your priors.

Jon Krohn:

Right. Yeah. Confirmation bias.

Jennifer Hill:

But when you read the thing that says … Like salt. I love salt. So, I’ve dug into this literature on salt deep because I really want to believe it’s not that bad for you. It’s really not that bad for most people. Most people.

Jon Krohn:

Right, right, right. I’m really glad to hear that. I’m going to add it into my confirmation bias database because I also love salt, and I’m actually frequently telling people that, I haven’t done as much research as you probably, but I also am pretty sure that, yes, for most people, for healthy people, it is okay to have salt, especially if you have low blood pressure. I can be eating salt. Anyway.

Jennifer Hill:

Yeah. Yeah. Yes, yes, yes, yes. It’s true. So, then I challenge you to think of the next time that you have the confirmation bias side of things. Are you going to look up that article and make sure that the study was really well done and rigorous? Or do you just take it? Right? We also all have limited time. So, how do you teach it? Yeah. I’ve been teaching it for a long, long time and I think I’ve slowly gotten better. Really, it’s much more about a mindset than about any particular tool.
So, students come in thinking I’m going to teach them the magic that’s going to allow them to suddenly do causal inference and more what it is is that I teach them that it’s really hard and they have to think hard about assumptions and they have to know their subject matter area well, they have to talk to the experts, they have to talk to the people on the ground, and they have to communicate super clearly. So, I’m a big fan of don’t say, “Oh, I’m not even trying to do something causal. I’m just doing something descriptive and I’m going to use the word association to make that clear, and then in my discussion section, make big recommendations about what people should do.” Right?

Jon Krohn:

Right.

Jennifer Hill:

So, I’d rather you say, “I’m trying to do, I really want to do something causal. This is the precise causal question, but these are the assumptions that need to be satisfied and you, the reader, can decide, once I’ve been very transparent and clear about it, whether or not you believe those assumptions.” Then you leave it up to the reader to understand.

Jon Krohn:

This episode of SuperDataScience is brought to you by Pachyderm. Pachyderm enables data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Their unique approach provides parallelized processing of multi-stage, language agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data. Learn more at pachyderm.com. That’s P-A-C-H-Y-E-D-E-R-M.com, like the elephant. All right? Now, back to our show.

Yeah, I think it’s fair to say that almost everyone in both academic studies, as well as just general inferences you’re making in life, you almost never care that two variables are just correlated. You really want to know the causal direction and your brain for whatever reason, probably because it was useful evolutionarily to be trying [inaudible] causality that we see a correlation and you’re like, “Well, even if I am really good and I write in the paper all the caveats like you’re saying, still in my mind, I’m like, ‘I think I know the causal direction here. I think I know what’s going on. I feel like I understand this process.'”

Jennifer Hill:

Of course. It’s very natural.

Jon Krohn:

Sounds good. Feel like you’re my therapist there. You’re like [inaudible].

Jennifer Hill:

It’s okay. It’s okay.

Jon Krohn:

That’s normal.

Jennifer Hill:

It’s how you communicate it to others. We just actually, over the past year, have done a bunch of studies with college students where we present to them research findings and we experimentally manipulate how we phrase things. Do we use the word association, but then a word like increase tied with it? Or do we say very clearly, no, we’re just comparing two groups, or do we use the word causal to see how much that wording actually makes a difference in whether or not students infer causality about the relationship between the variables? So, two things. One is, yes, it makes a difference. But what makes a bigger difference is how much they believed it to begin with. Right?

Jon Krohn:

Right.

Jennifer Hill:

So, if someone already thinks they know a lot about the relationship between vaping and anxiety in high school students, they’re very quick to go to causal. It doesn’t matter how we’ve changed the wording. Right?

Jon Krohn:

Right.

Jennifer Hill:

So, anyway, very sobering.

Jon Krohn:

Vaping is bad. Phones are bad. Video games are bad. Yeah. Yeah. I don’t care how you write it. Yeah. So, the language that we use can make a difference in situations where people don’t already have a strongly preformed opinion on how they think that that piece of the universe works.

Jennifer Hill:

Yeah.

Jon Krohn:

That’s interesting to know. I guess that makes sense to me. Okay. So, speaking of correlation, in data science, when we’re answering inference or predictive questions, this typically involves correlation based tools. So, most of the tools that we use as data scientists, we’re going to talk about some others. We are going to talk about some causal tools in this episode. But the vast majority of tools that data scientists use are correlation based. So, whether it’s literally the Pearson correlation coefficient or something a step more sophisticated like an analysis of variants in an Innova model, or even all the way through to some of the most advanced machine learning models like deep learning models, these are correlation assessment tools. They have no internal capacity. There’s nothing special about them that allows us to be actually inferring causality. So, what needs to happen to get practitioners to start not only thinking causally about the world, but adopting causal tools?

Jennifer Hill:

Yeah. So, the dirty little secret is the causal tools are just prediction tools, or they’re post-talk design tools like matching and waiting. So, there’s a little bit of a bait and switch here. Like we say, you need to do causal stuff, and then we hand you tools that are just random forests, or BART is a prediction tool in some sense, right? It’s a modeling tool. So, where does the magic come in? The magic mostly comes in as a leap of faith. No, no, it’s true. There’s no special tool that makes it all fine. If someone tries to tell you that they’ve got an assumption free tool that’s going to identify causal effects, no. All the assumptions are not testable. So, that’s the depressing bit. So, what is happening that’s special about these tools? Well, mostly it’s that you’re making it clear that you’re making predictions in that counterfactual world. So, what’s different about that?

Jon Krohn:

Yeah.

Jennifer Hill:

Go on.

Jon Krohn:

So, that’s probably something … We should talk about counterfactuals. So, you’ve mentioned that everyone in your family now talks about counterfactuals. So, what does that mean? What is a counterfactual?

Jennifer Hill:

Oh. Yeah. Yeah. All right. So, say you want to understand the effect of a masking policy. Right? Some schools had, the kids had to wear masks, some schools, they didn’t. What you would want do ideally is to follow a school in time, or set of schools, where they had a masking policy and then go back in time, do it all over again where they had the different policy. Right? What you want, you want to see both of those worlds and playing out exactly the same point in time with the exact same kids who came in with the same health issues and the same things are happening at home in terms of what their families are doing, in terms of supporting them academically, in terms of whether they’re going out into the community or not wearing masks, all the things, whether they’re vaccinated. We want everything to be the same, except for that one policy. That’s the counterfactual world.

Jon Krohn:

It’s literally, to break down the word, so there is some factual reality that happened, and then the counterfactual is this fake world where that fact that factually happened didn’t happen. That’s the counterfactual. Okay. Got you.

Jennifer Hill:

Yeah. Right. So, what do we do? So, in a randomized experiment, you get to do that because we’re saying that the two groups who got randomized to … Say we could randomize kids to go to one school vs. another. Those two groups of kids will be basically identical, on average.

Jon Krohn:

Right. Because the sample size is big enough and then maybe some studies would be careful about, okay, we’re going to have the same number of female kids and male kids and the same kind of weight or the same kind of obesity index. Or you could could invest effectively an infinite amount of resources into trying to have the two groups be as identical as possible. But for the most part, I think we just usually rely on the law of large numbers, then?

Jennifer Hill:

It’s a numbers game. Right. Right. But the nice thing about the randomization is that you can randomize not only all those things, you can create balance across the two groups, not only in terms of all those things you just mentioned that you can measure, but all the things you can’t measure. Right? That’s the beautiful part. Right? So, what you just talked about, I would call a fancy randomized block design, that I’m going to create groups of people who look a lot alike and then randomize within that. So, the causal inference, the typical causal inference assumptions in the absence of a randomized experiment basically say let’s pretend that happened. If we measure all those things that we can about people, let’s pretend that if we have two kids or two classrooms or two groups of kids who look alike on all those things, let’s pretend it was a coin flip. If you measure enough things, maybe that’s plausible, and that’s what we’re trying to do.

Then there’s a whole range of methods in between the randomized experiment and the purely observational design where you might be able to randomize pieces of things. So, maybe you want to understand whether smoking leads to cancer. Can’t randomize smoking because it’s unethical, among other things, other reasons why, but you could take a group of smokers and randomize maybe access to a smoking cessation program or drug or something like that. Right?

Jon Krohn:

Yeah.

Jennifer Hill:

So, there are clever things that people do to inject some randomization into it or find ways to create fair comparisons. So, there’s a whole range. So, actually, most causal inference classes focus on those types of designs because if you can get the design right, there’s less of an issue at the end of the day with the analysis. All the fancy analysis stuff is basically saying, “Oh, oops. We didn’t get to design it. So, what can we do now to play cleanup?”

Jon Krohn:

So, what you’re saying bottom line is that in order to infer causality confidently, we need to have a randomized control trial, that prior to collecting data, you need to randomly break into two groups and have one of those groups be in your control condition and the other one in your actual experimental condition, that that is really the only way to be causal confidently. But yet, so a lot of these tools, like BART, which we’re going to talk about later, other causal tools, the intention with these tools is that in some circumstances, we might be able to infer causality, despite not having done a randomized control trial.

Jennifer Hill:

[inaudible]. Right. Because most of the time, we end up with data that we didn’t even collect. We had no control over the design. You can try to clean some of that up using matching or waiting to make two groups who look alike. I actually, the work I’ve done on machine learning and causal inference, I’ve presented as an alternative to matching and waiting because it tends to be more accurate because you can use more of the data and you just have the power of flexible model fitting. So, in the end, I feel a little conflicted about it.
Heuristically, I like the idea of matching and waiting more because it’s much easier to understand. Yeah. I want two groups that look alike and then I’ll compare them. But in practice, the machine learning stuff tends to work better. But then you have this responsibility of you don’t want to make it too easy. So, I know there are people in the causal inference community who are not in love with the machine learning and causal inference side of things in part because they’re worried, understandably so, that it’s going to feel like magic. I just push a button-

Jon Krohn:

[inaudible].

Jennifer Hill:

… and there it is, and there’s a whole lot that goes into that, right?

Jon Krohn:

It’s just a method in a library. You just use the default parameters, you put in any data set.

Jennifer Hill:

Yeah.

Jon Krohn:

Right, right, right, right, right.

Jennifer Hill:

My plug there is just that I think I don’t want that part to be hard. There’s so many parts of this that are hard, that I’d rather researchers not spend a lot of time using something like propensity score matching, spending hours and hours and hours and hours trying to get good balance and making the groups look alike and investing and learning new software when that’s not the important part, in some sense. The important part is deeply engaging with the subject matter and researchers in the field and policy makers and understanding the assumptions and communicating it better. Now, it’s not at all clear that if you’re freeing up time from your analysis that you’re investing in those ways. But in the tool that I’m building that we’ll talk about later, we’re trying to help balance that out.

Jon Krohn:

Machine learning practitioners, Jennifer, don’t want to be doing that. They just want to press a button.

Jennifer Hill:

I know. I know.

Jon Krohn:

They don’t want to be going out into things and learning all the assumptions. But you’re absolutely right. So, I guess that’s probably going to be the number one takeaway message from this episode. My very next question is going to relate to tools that we can be using for inferring causality. But I think the number one take home message for listeners should probably be that if there is a causal tool that is plug and play, you should be suspicious and you should really be understanding the underlying data and all the assumptions that go into it.

Jennifer Hill:

Yep. Absolutely.

Jon Krohn:

On that note, since 2008, you have been championing a causal tool called BART. So, Bayesian nonparametric modeling for causal inference. So, this is one of those tools that could be abused to find causal relationships. What is special about this? You’ve been championing it for so long. What’s special about it? How is it different from tools that we use just for inferring correlations and how does it compare with other causal inference tools out there?

Jennifer Hill:

Yeah. So, BART stands for Bayesian additive regression trees, actually. It’s like a Bayesian form of gradient boosting, actually.

Jon Krohn:

Yeah. I guess [inaudible]-

Jennifer Hill:

It’s fine. That’s the name of probably the article it was in or something. Yeah.

Jon Krohn:

The paper about it. It’s like, Bayesian nonparametric modeling for causal inference is kind of what it’s doing, but that definitely does not make a nice acronym like BART.

Jennifer Hill:

Yeah. Yeah. Yeah.

Jon Krohn:

Yeah. Trying to create studio-quality podcast episodes remotely used to be a big challenge for us, with lots of separate applications involved. So, when I took over as host of SuperDataScience, I immediately switched us to recording with Zencastr. Zencastr not only dramatically simplified the recording process — we now use just one simple web app — it also dramatically increased the quality of our recordings. Zencastr records lossless audio and up to 4K video and then asynchronously uploads these flawless media files to the cloud. This means that Internet hiccups have zero impact on the finished product that you enjoy. To have recordings as high-quality as SuperDataScience yourself, go to zencastr.com/pricing and use the code “sds” to get 30 percent off your first three months of Zencastr professional. It’s time for you to share your story!

Jennifer Hill:

So, okay. So, why is BART great? So, BART is just a model fitter, right? It’s an algorithm, like you might think of random forest. It’s also based on trees. Random forest is averages of trees. BART is additive trees. So, it’s like the difference between random forest and boosting. But BART has lots of nice properties that, in essence, you could use any kind of flexible fitter. There are very popular causal inference, random forest based algorithms out there. There are lots of different options. So, actually, the first time I talked about this at a conference was in 2005, so it’s been a long time, at the Atlantic Causal Inference Conference. It took a long time to get that paper published. Two different journals. The world has changed.

But anyway, the reason I like BART is because, okay, it’s an extremely flexible mode fitter. Lots of things have that property. So, that’s no longer such a big deal. Why is BART better than those is really what it comes down to. The nice thing for me is that it’s embedded in a Bayesian framework. So, you get a bunch of bonuses for that. First of all, you naturally get coherent uncertainty estimates. So, if you want to understand what your posterior distribution is for a range of different estimands, whether it’s an average treatment effect for everyone in your sample or the whole population or just part of your sample or any number of different kinds of estimands, you can get coherent uncertainty intervals, and you don’t have to do a separate bootstrapping thing, which is going to come with its own sets of assumptions, etc.
It allows … Sorry. It avoids overfitting through an extremely clever prior specification, as opposed to having tuning parameters that you then have to use cross-validation to choose.

Then in theory, you should be trying to represent that in your uncertainty estimates, right? It’s a lot less ad hoc. I should note, and I should have said this in the beginning, that BART was created by Chipman, George, and McCulloch. So, I don’t want to claim. It’s just a great tool that I happen to like a lot, but they’re the ones who came up with all this clever stuff. Great prior specification that helps to avoid overfitting and you don’t have to do anything like splitting your sample. Right?
In theory, you can use cross-validation to choose the hyperparameters in your priors and you might get a slightly better fit. I do that once in a while, depending on how high stakes the things are or how fragile they seem, if I’m not getting convergence, etc. But in general, you can use it with just the default prior. Because, again, it’s in this Bayesian framework, it’s easily combined with other modeling strategies.

So, actually, just sending in a paper today on a new algorithm that our team has created. My collaborator, Vince Dory, who wrote one of the main BART packages was the lead on this, which is an amalgam between BART and the Stan algorithm to expand BART to allow for multilevel models.
So, most machine learning algorithms assume IID observations, even if they claim that they don’t. So, it’s great to be able to account for correlation across observations within groups. It’s also been combined with strategies for identifying situations where your two groups are just too different to be compared. Right?

So, if you’re comparing a school in, I don’t know, New York City with a school in the rural South, maybe that’s not how you want to do your mask mandate policy comparison because there’s too many other differences between the groups. So, it’ll help you identify situations where that’s happening empirically. It’s got a great track record in practice. It’s won a bunch of data analysis challenges in the causal inference world and just seems to perform well on average. There are some other methods that have theorems proved about them, but the theorems rely on asymptotics that you never really believe [inaudible] ever and things like that. So, I like the fact that it’s got this long track now.

Jon Krohn:

Yeah, that it can win these competitions where the data set is set up in a way that we can actually tell whether an algorithm is doing a better job of causal inference or not.

Jennifer Hill:

Exactly.

Jon Krohn:

That’s pretty cool that BART is often winning those. So, you mentioned there Stan. So, we had an episode, it was the most popular episode of 2021 with Rob Trangucci talking about Stan. So, Bayesian statistics is very popular with our listeners. So, if you haven’t listened to that episode yet, listener, that’s back in episode number 507 with Rob Trangucci so you can learn all about Stan. Also, it’s a great introduction to Bayesian stats. So, cool to hear that you’re integrating with that package to allow for multilevel hierarchical modeling. Very powerful package.
That episode would also be a great one for you to check out, listener, if you’re interested just in learning more about Bayesian stats. So, for example, we are glossing over some key Bayesian terminology here today. So, things like priors. So, priors in Bayesian statistics are a number. Very often in Bayesian stats, it represents characteristics of a distribution that goes into a model. So, we have these prior values that can then be moved around by the data, and after the data have adjusted these priors, we end up with a posterior, and that posterior is what then we might use to actually draw our inferences. Did I say all that correctly?

Jennifer Hill:

Yeah. I think the way I would frame it is that in standard frequentist inference, we think that the parameters in our models are unknown but fixed. In Bayesian statistics, we think they’re unknown and uncertain. So, we put a prior on them to express our uncertainty about them. 20 different Bayesians would frame this 20 different ways. I’m a pragmatic Bayesian. I’m a Bayesian when it works for me and not otherwise. So, I think it’s just a way of being more honest about our uncertainty. It also creates a situation where it’s much more flexible way of doing inference. You can transform parameters and then transform them back at the end of the day without everything going haywire. We don’t have to make sometimes as strong distributional assumptions because we can use much more flexible distributions. So, it can be very helpful. Not always necessary. It can be very helpful.

Jon Krohn:

That was unsurprisingly a very eloquent explanation of Bayesian stats in a couple minutes and how it’s useful, and certainly better than my explanation, my effort at it. So, if people want to be using this specific Bayesian stats technique, BART, for making causal inferences, I know that in R, they can use the BART cause library to do that. Then our researcher, Serg Masis, also tipped me off to it being available in the PyMC library in Python. So, if you’re a Python user, not an R user, listener, then you can get access to BART within PyMC, and if you want to learn a ton about PyMC, we have a whole episode dedicated to it recently.
That’s with Thomas Wiecki, who is a core developer on the PyMC team, and that’s episode number 585. So, another great Bayesian episode for you to check out. So, those are two ways that somebody could programmatically get access to BART and use it.

But Jennifer, you’ve been working on something special recently, which is your Think Causal Library. So, to help folks that have causal questions, but might not necessarily be people who are comfortable writing a lot of code, you’ve developed this click and point user interface that seems very easy to use called Think Causal. Can you elaborate on what it does and how it’s going to be helpful for people?

Jennifer Hill:

Sure. Yeah. I’ve had this dream for a while that, it’s going to seem like a very boring dream, I think, I wanted for a very long time, my entire time getting funded, this idea that when people use software that’s for a new method, for instance, that they haven’t used before, that’s the perfect time to teach them about it. People don’t want to, in general, then have to go and read a whole article about the new software, etc., and they’re not going to. Right? They’re going to probably use it and use it badly or they’re not going to use it, or they’ll use it and they’ll relay results to the public afterwards that aren’t actually reflecting what happened under the hood. So, the idea here was to have this software that makes it really easy to use so you’re not burdened by the programming aspect of it. You’re just really, pull down menus, point and click.
You can even change names of variables and do data exploration and make pretty plots that you can download. It scaffolds all that really nicely.

But at any point during the course of when you’re trying to prepare your data and then run it through the algorithm, if you have questions, the idea would be you would always be able to have a place to get help. So, for instance, if it says what is your causal estimand? Well, I don’t know what those words mean. What are you talking about, estimand? Are you saying that right? Yes. The estimand is the specific average treatment effect that you want. So, do you want to make inferences about everyone in your sample? It could be that your treatment group is very specific and only looks like part of your control group. So, you really want to focus on the treatment effect for them, not other people, etc. If you want to learn what that is, at that point, you have the option of saying, “Tell me more,” and it’ll bounce you out to a learning module-

Jon Krohn:

Cool.

Jennifer Hill:

… that has a pretty scrolly, like your favorite New York Times visualizations or whatever place that you get where you’re seeing information here, plot here, interactive plot here. That’s changing and telling a story. It’s all embedded in some story about, for instance, marathon runners and times or monsters and weight loss or whatever it is. We’ve got a variety of different kinds of stories and then you learn about that idea, counterfactuals, and potential outcomes. You learn about all those ideas embedded in a story with pretty graphics and at the end, it’ll give you a little quiz to make sure that you are understanding.

Jon Krohn:

Cool.

Jennifer Hill:

Then you can bounce right back to the software and make your choice and move on.

Jon Krohn:

Nice. So, you get presented with a series of prompts about assumptions effectively that you’re going to be making based on your particular situation or information that you need to be providing to the model, based on your particular situation. Then if you encounter any question marks yourself about I don’t know what option to be picking here, then you can learn more within the tool, do a quiz if you’d even like to, and then come back and confidently answer, okay, I know that I should be saying A or B at this binary choice in my path.

Jennifer Hill:

That is the idea.

Jon Krohn:

Cool.

Jennifer Hill:

That is the idea. It is still very much under development. We only have maybe four learning modules up and running. But we’re going to keep plugging away and we’re excited about it.

Jon Krohn:

Yeah. So, we will have a link in the show notes for you, listener, if you’d like to check out Think Causal right now. So, if you want to just understand anything related to causality that we’ve been talking about today, as long as the module’s been built, then you’ll be able to get that in the future. You’ll also be able to get the other modules and of course, you’ll be able to run your own causal models yourself on your own data. So, you can upload your own data into the platform and you can be running causal models, even if you don’t know how to write code in Python or R.

Jennifer Hill:

Correct.

Jon Krohn:

So, that is super cool. All right. So, more broadly, other than just using your Think Causal tool to learn about aspects of causality, can you provide a roadmap for us if we’re interested in learning about causal inference? What books should we read or what courses should we take? Do you have any particular software recommendations beyond Think Causal?

Jennifer Hill:

Right. So, I would say if you want to do anything with regard to causal inference, you should take some kind of research methods class that deals with human subjects. So, how do you interact with human subjects, the ethics around interacting with human subjects? So, I would go to a psychology department or a sociology department, people who research people and understand all the complexity that happens. I would take a measurement course so that you understand that when you just get data off the internet or that someone’s handed to you or you found on Kaggle, there are probably big gaps between what the variable name is and what it actually represents. Right?

So, even if you’re not going to do measurement yourself, having a healthy lack of confidence in what those measures represent, I think is really … A lot of this is just about humility and honesty and transparency. Right? So, understanding the limits of what we can understand is maybe way … There are lots of tools out there, easy to learn a tool. Right? How can you be humble about it? How can you be honest about it? How can you be transparent about it when you’re writing up what you did? So, this is maybe not an answer that anyone is going to like. Take a course in qualitative research and community engaged design, things like that. It’s just understand where your data came from. I think that’s almost the most important thing.
There are more and more causal inference courses out there, way more than there used to be. Your first course of that ilk should be one that focuses on design.

So, I would say a good course probably has some piece that’s on randomized designs and then a big chunk on quasi-experimental designs, natural experiments, etc. You might find that in a stats department these days. Maybe there are more and more causal folks out there. The economics folks have a long tradition of doing that well, but again, of course, focused more on design than on here’s the fancy estimator. Then after that, if you can find a course on the fancy estimators, at this point, I’d say more of those are probably online or you learn in books, then there probably aren’t too many courses out there.

Jon Krohn:

Nice. Any particular book recommendations or anything?

Jennifer Hill:

So, I think an introduction to these issues in my book, in Regression and Other Stories, my book with Andrew Gelman and Aki Vehtari. We’ve got a couple chapters that could be a good starting point. Then if you want more Imbens and Rubin have a nice book, Robins and Hernan have a very nice book. There’s an older one by Winship and Morgan that’s quite nice. Yeah. There are more and more books out there.

Jon Krohn:

Nice. All right. Well, you segued me perfectly into my very next question because I was just about to ask you about your books. So, Regression and Other Stories is a new book that you have collaborated on with Andrew Gelman. Who’s the third author again?

Jennifer Hill:

Aki Vehtari.

Jon Krohn:

I knew I didn’t want to trust myself with remembering or pronouncing it correctly. So, that book is derived, to some extent, to a large extent, from the Data Analysis Using Regression and Multilevel/Hierarchical Models book that I talked about right at the onset of this episode that I was in love with back at the beginning of my PhD. So, that book, published in 2006, was also with Andrew Gelman. So, it sounds like from our conversation just before recording that this new book, Regression and Other Stories, that you recommend for people who are interested in learning more about causal modeling, is derived somewhat from that earlier book. Yeah?

Jennifer Hill:

Yeah. The idea was to take the original book and split it into two pieces because the original book was so long and not everyone wanted the content in both. So, we thought, oh, we’ll take the first part and make a regression book and take the second part and make a more explicitly multilevel model book. Of course, the problem is the Andrew Gelman effect, which is that we took the 300 pages that were in the original book on regression and it turned into another 700 page book because there’s just a lot to say. So, everything expanded a bit. But there’s newer material. It’s more up to date. It now uses the Stan integration into ARM. So, the commands are slightly different, etc. But more examples. Yeah. Then the second piece would be another book coming out, who knows, one or two years on the multilevel.

Jon Krohn:

Depending on how much agony you want to go through in the short term.

Jennifer Hill:

Yeah. Yeah.

Jon Krohn:

So, I think it would be nice to touch quickly on that other piece. So, we’ve talked now a lot about causality, and that is covered in this first volume, if you will, Regression and Other Stories. So, the second book that’s coming out, which was originally the second half of your 2006 book, that one on multilevel and hierarchical models, could you explain to the audience a bit what those multilevel models are and why they’re useful?

Jennifer Hill:

Oh, sure. So, multilevel models are useful for several reasons. One way of framing it is that almost any method, a standard method you use, whether it’s machine learning based or statistics based, will assume that your observations are identically and independently distributed. In reality, doesn’t often hold. That’s after conditioning on everything you’re conditioning on your model. But if you’ve got kids who grew up in the same family together or go to the same school together or patients in a hospital or incarcerated individuals in a prison, there are going to be things that are similar to them because they’re in that same environment that we’re not capturing just in the covariants or features that we’ve measured on them. So, our error structure is try be wrong. So, if we then try to get confidence intervals on things, we’ll be overconfident in things because we think that everything’s independent when actually, all these are pieces of information that are highly correlated. So, instead of 30 unique contributions, 30 individual data points, it’s maybe more like 15. Right?

Jon Krohn:

Right.

Jennifer Hill:

You’re not accounting for that in your uncertainty estimates. So, Andrew always says we can’t sell the book by saying they’re going to have bigger standard errors, but that is kind of the thing, that you’re going to be more honest about your inference. Another reason that multilevel models can be super helpful is that they help us to understand phenomena that happen at group levels vs. individual levels.

Jon Krohn:

Right. Exactly.

Jennifer Hill:

Right? So, you can explicitly partial out the different parts of the relationship that are happening at each of the levels of aggregation, which can be extremely helpful in understanding a whole complex phenomenon.

Jon Krohn:

Remembering back to some of the data and models in your 2006 book, back in my PhD days, I think a lot of the examples related to things like school districts or individual classrooms. So, what you’re describing allows you to then, so instead of just assuming that all the school districts or all the classrooms are the same, you break them into groups. So, you could have a model where you have one tier of subgroups where it’s school districts and then another tier of subgroups within that, so this three level hierarchy where that deepest level in the hierarchy is the classrooms. Just to be repeating back to you what you just said, this then allows you to make inferences at the classroom, or at least the school district level that otherwise you wouldn’t be able to do. With a more traditional regression model, you might only be able to make inferences at the whole state level or country level, as opposed to those more granular inferences.

Jennifer Hill:

Yeah. There are kludgy ways of doing it. Right? So, you could throw in fixed effects for those things. Right? There are other ways of reflecting that. Then you could throw in interactions between school district and something else. Those are strategies that come with their own assumptions and aren’t necessarily going to capture exactly what you want. You might have to do a lot of manipulation to pull out the parameters that you actually care about.

Jon Krohn:

So, I think one of the nice things about these multilevel or hierarchical models, and I don’t think I’ve made explicit in case the listener’s been wondering, those are the same thing.

Jennifer Hill:

They’re the same thing. Same thing.

Jon Krohn:

So, I keep saying multilevel/hierarchical. They’re synonyms. It’s fun to have a synonym on the cover or the textbook. I think you don’t see that very often.

Jennifer Hill:

Well, it helps when people are doing Google searches.

Jon Krohn:

Right. Exactly. It was good for your SEO.

Jennifer Hill:

Mm-hmm.

Jon Krohn:

So, yeah, it’s a more elegant solution to these group level effects than a lot of other approaches out there. One other thing that I was just noodling on. So, I just wanted to bring up a topic that we haven’t talked about related to causality that I dug into a lot in my PhD. So, one of my PhD chapters was on making causal inferences and we were able to do it in this situation because one of the variables that I was working with in my data was genetic information. So, I had genetic information from, it could be any organism. It could be from a human or a mouse or whatever. If you have genetic information, as well as some other kind of information about them, maybe in my case with my PhD, I was working with mice and we had several thousand mice.
We had their genetic information, as well as about a hundred attributes of those mice, what we call phenotypes.

So, measurable characteristics, how long were they, how heavy were they, various tests of their anxiety levels, these kinds of things. Biochemical tests, what are the sodium levels or the chloride levels in their blood? So, a hundred of these kinds of things, as well as we had gene expression information. So, we had information from several different organs from the liver, the lung, and the brain on how much particular genes were being expressed.
So, in a sentence for the listener who isn’t aware of genetics, in order for any of your genes to be effective in your physiology, basically these photocopies get made that are called RNA, as opposed to DNA, and these RNA are like these temporary photocopies that allow some gene to be expressed in your body.

So, because genes cannot, by any mechanism that we’re aware of, be causally changed in a biological organism, there are random mutations to DNA that happen, but a mouse being obese or a person being obese or having high chloride levels, there’s no conceivable mechanism that could cause a systematic change in the DNA for a particular DNA letter to be switched. So, we end up with this interesting situation where we can use the genetic data as the starting point for a causal pathway. So, we can do conditional probabilities.

Jennifer Hill:

Because there’s a randomness to what you ended up with genetically.

Jon Krohn:

Exactly. So, it’s kind of like, so we make the assumption that we can treat this genetic variant, a letter being one letter or another at a particular genetic location, we treat that as a random condition. So, I don’t know. I was just thinking about that as we’ve been speaking and I just wanted to bring that up to the audience as an interesting situation where we can go beyond just correlations because we can feel comfortable making this causal assumption. So, I guess with genetics, it feels like a relatively safe assumption, and I guess it’s that same kind of thinking, though, that allows us to make a lot of causal inferences in the real world where we might be layering in more assumptions about how that random assignment happens. Yeah? So [inaudible] causal.

Jennifer Hill:

Sure. Yeah. Yeah. Yeah. Sometimes, there are all kinds. That’s in the realm of what I’d call a natural experiment where there’s something that’s random. Often, it’s not the exact thing that we want to manipulate, but if it’s related enough to the thing that we do want to manipulate, then we can make some progress. So, sometimes that thing that’s random was created by God. God. Sometimes, it was created by Congress. Right? So, there have been a whole bunch of papers written about the Vietnam draft lottery.
So, there was random assignment in who came up to be drafted, and that was related to who served. But if you want to understand the relationship between serving in Vietnam and future health outcomes or earnings outcomes, you’ve at least started with this random assignment. That is where the fancy methods can come in to try to tease apart, all right, what bit of that was service vs. other things and what assumptions do I need to make to tease those apart? So, yeah. That’s a really fruitful area is being able to figure out, well, there’s something random here. Is that related enough to the thing I care about?

Jon Krohn:

Yeah. So, the ideal state is to have a randomized control trial where we know, we designed the experiment up front, and we’re able to assign randomly to the control group, the placebo group, or to the experimental group. But then this kind of situation, these natural experiments as you call them, offer us another opportunity to potentially be making causal inferences, though, with more assumptions.

Jennifer Hill:

Yeah. Yeah. I should say, even with a pristine randomized experiment in the sense that we know that randomization happened, it could be that there isn’t fidelity of treatment implementation. So, whatever they were supposed to receive, maybe the curriculum didn’t get taught in the way it was supposed to. Or I participated in an experiment once about mindfulness and teachers had the Headspace app and they had to listen to it for 10 minutes a day. We found out afterwards that some of the teachers were just turning it on and putting it in the cord. No, they weren’t actually listening to it. But the data showed that it got used 10 minutes a day. Right?

Jon Krohn:

Right.

Jennifer Hill:

So, there’s all kinds of complications that can come in an actual field experiment.

Jon Krohn:

Cool. All right. So, it seems like you have a pretty amazing job, Professor Hill. So, you get to teach brilliant people at NYU. So, you co-direct the NYU masters in applied statistics. You also direct PRIISM, which is the Center for Practice and Research at the Intersection of Information, Society, and Methodology. So, through these different responsibilities at NYU, you get to teach, you get to have a real world impact through PRIISM. Evidently, you’re a pretty good teacher because you were awarded the 2021 NYU Distinguished Teaching Award.
So, sounds like an amazing job.

Then in addition to the teaching, you get to think deeply about these thorny research questions, which have a real world impact. You’re not doing pure math and studying the shape of donuts or whatever that pure mathematicians do. You’re tackling problems that can genuinely save lives or improve educational outcomes for young people. The most impactful kinds of policy decisions are impacted by the kind of research that you do. So, I was wondering if you could give us, I’m sure a lot of days are very different, but just kind of a sense day to day of what your role is like.

Jennifer Hill:

Yeah. That’s a hard one. Day to day, I would say I answer way too many emails. For anyone to whom I owe an email right now, I’m sorry that I didn’t answer yours because I also ignore too many emails and sometimes it takes two or three times.

Jon Krohn:

What can you do? You’re only one person.

Jennifer Hill:

Yeah. So, I would say a big chunk of my work is administrative because of directing the program and co-directing the program and directing the center, and that’s a lot of supporting other people. So, most people hate administrative stuff and I get it. A lot of it is tedious. But I would say that it also, at times, is the most rewarding part because I get to think really hard about, well, what classes should our students be taking and help to design what does a pathway through a master’s program look like? I know people care a lot about PhDs and great, PhDs are great, but there are a lot more master’s students in the world than PhD students and the master students tend to get ignored. So, I actually think that thinking really hard about how to create great master’s students is important.
I do a lot of talking to employers about what they want and have come to understand that a lot of the skills they need are not at all what we teach.

So, I’ve heard from too many employers that students come in from data science or stats programs, thinking that they’re going to solve the problems of the world with their algorithm and with a lot of hubris, and what they really need to do is to learn how to communicate, and what they really need to do is document their code and understand how to meet the needs of different stakeholders. It’s all that kind of relationship building and soft skills and kind of understanding the world around you in a more humble and respectful way. So we actually try to incorporate that a lot into our master’s programs.
We’ve got courses on ethics, we’ve got a course called Data Science for Social Impact, where it’s all about how do you create partnerships with agencies on the ground to do research and what does that mean in terms of designing … Everything from designing a research question to measurement to design of experiments, quasi-experimental designs to disseminating results at the end, etc.

So, I really enjoy that piece. I also love teaching. I don’t like making slides. I don’t like grading. But I like teaching and I love interacting with students and just understanding how they see the world. I think a lot about how do I make it easy for this person to learn that thing? It’s always hard. It’s always challenging. It’s always new. So, there’s [inaudible].

Jon Krohn:

As we’ve experienced on the show, you’re an excellent communicator. So, I’m not surprised you won the teaching award. I’m not surprised that you enjoy the communication part of it. So, thank you very much for that. My next question was going to be about what tools you think are most important for data scientists or people entering the field. But I think you might have just reeled a number of them off, things like communication, humility, ethics, having an understanding of the world. Am I right?

Jennifer Hill:

Yep. Absolutely. Absolutely.

Jon Krohn:

Nice.

Jennifer Hill:

I’ll just add one thing-

Jon Krohn:

Yep. Please.

Jennifer Hill:

… that communication does not mean communicating to technical people. Communicating means can you explain this to your grandmother? If you can’t communicate what you’re doing in a way that non-technical people can understand, you don’t actually understand what you’re doing. So, it’s a hugely important skill.

Jon Krohn:

Nicely said. Actually, the most recent guess on the show, Kian Katanforoosh, he also said the same thing when we asked him what the number one thing that he looked for was in people that he hired. It was this ability to communicate to non-technical people. Awesome. All right. So, getting near the end of the episode. Jennifer, do you have a book recommendation for us?

Jennifer Hill:

So, I read a ton, both non-fiction and fiction, though mostly fiction. So, it was very hard for me to try to narrow down. So, instead of trying to do favorite book or whatever, I thought actually, maybe a better thing to do that’s at the intersection of relaxing, fun, enjoyable, nourishing, and work, if you want to learn more about causal inference, but don’t want to read a textbook, there are about a million books out there, it’s a conservative estimate, that have either alternate universes or people going back in time and changing things. So, two that come to mind, one is The Anomaly by Hervé Le Tellier, which is a recent book. I’m not going to even tell you what they’re about. You can Google them. The other is The Midnight Library, and both play with this idea of what if things had worked a different way? There’s the movie Sliding Doors from a million years ago. You could watch their other movies that do this. But read yourself a good alternate universe book and start thinking about the world that might have been or-

Jon Krohn:

Counterfactuals.

Jennifer Hill:

… start thinking about counterfactuals. Exactly.

Jon Krohn:

Yeah.

Jennifer Hill:

Exactly.

Jon Krohn:

Nice. So, Anomaly or Midnight Library as your book recommendations.

Jennifer Hill:

Yep.

Jon Krohn:

And then Sliding Doors for the film.

Jennifer Hill:

Yep.

Jon Krohn:

Love it. All right. So, I know that you aren’t personally the biggest social media user or poster. Got too many great books to read, too many students to teach, too many books to write. So, how could people follow you? Maybe not completely directly like on a LinkedIn page, but how can people keep up with your work anyway?

Jennifer Hill:

Yeah. Well, the most important thing that I would want people to follow right now is the development of Think Causal. We’re also interested in thought partners. Anyone who wants to do research with us on it, helping with development or testing, or you want to run a randomized experiment within your class or just use it in your class or whatever, we’d love to hear it from you. So, if you go to my NYU webpage and you click on Web- … where there’s three links to it in different places, but if you just click on, in fact, Webpage, it’ll bring you there. But it’s also mentioned in my bio. You can click there.

Jon Krohn:

[inaudible].

Jennifer Hill:

The other thing is I would say … Okay. Both the master’s program I run and the applied stats research center are on various social media, including Twitter. So, you could follow those there.

Jon Krohn:

Nice. All right.

Jennifer Hill:

But then-

Jon Krohn:

That sounds great, Professor Hill. Yeah?

Jennifer Hill:

No, sorry. Sorry. I was going to say, and then get off the social media and go make some music or art or go out in nature. [inaudible].

Jon Krohn:

Yeah. I couldn’t agree more. Yeah. I kind of facto as a podcast host, I have to spend some part of my time doing these social media posts, like letting them know last week that you were going to be an upcoming guest. But it is not the richest part of my day. Actually, it’s amazing. I love that I’m able to interact with the audience and I love when you guys, obviously commenting on posts, and it provides me with so much insight into what I could be doing with the show or questions that I could be asking guests. I absolutely love that, but couldn’t agree with you more that I think when I’m on my deathbed, it probably won’t be the first thing that I jump back to as the richest point in my life.

Jennifer Hill:

Makes sense.

Jon Krohn:

All right. So, thank you so much, Professor Hill, for being on the show. It has been, like I said at the very onset, a tremendous honor for me personally to be able to meet somebody that I have idolized for 15 years. So, it’s been amazing to have this conversation with you. Thank you so much for the deeply insightful answers to everything and just generally having a good time, and yeah. Hope to catch up with you again soon.

Jennifer Hill:

Thank you so much. It’s an honor and I really enjoyed it. You made it fun. So, I really appreciate you asking me and having such great conversation.

Jon Krohn:

Wow. What a trip for me to be able to have such a fun, informative conversation with someone I’ve revered for the past 15 years. I hope you enjoyed and learned a lot from Professor Hill as well. In the episode, Jennifer filled us in on how we must be clear about any assumptions we are making when we think we’ve observed a causal effect, how there are no assumption-free causal tools for analyzing existing data. The only way to be fully confident in a causal effect is with good experimental design upfront, such as a randomized control trial. However, tools like BART, short for Bayesian additive regression trees, can work to analyze data retrospectively anyway, and BART is Professor Hill’s preferred causal inference tool. Then she talked about how her new Think Causal application enables you to learn about causality interactively yourself, as well as to make causal inferences on your data without needing to write a line of code.

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for social media profiles associated with Professor Hill’s work, as well as my own social media profiles at www.superdatascience.com/607. That’s www.superdatascience.com/607. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you, and thanks, of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng, and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another exquisite episode for us today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of The SuperDataScience Podcast with you very soon.

Podcasts SDS 607: Inferring Causality

SDS 607: Inferring Causality

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 607: Inferring Causality

Share

SDS 607: Inferring Causality

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025