SDS 613: Causal Machine Learning

Podcast Guest: Emre Kiciman

September 27, 2022

We hope you’re ready for a jam-packed episode! This week, it’s all about causal machine learning as we welcome Dr. Emre Kiciman, Senior Principal Researcher at Microsoft Research to the show. Gear up for an in-depth episode that dives into exciting real-world applications of causal machine learning, the four key steps of causal inference, how these impact ML and so much more. Listen in and share your feedback with us!

Thanks to our Sponsors: 
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
About Emre Kiciman
Emre Kiciman is a Senior Principal Researcher at Microsoft Research, where his research interests span causal inference, machine learning, and AI’s implications for people and society. Emre is a co-founder of the DoWhy library for causal machine learning. He received his PhD in Computer Science from Stanford University.
Overview
After speaking with Jennifer Hill last month on the topic of causality, we’re revisiting this field yet again, however, this episode tackles machine learning too. This week, John Krohn welcomes the world-leading applied causal-research leader Dr. Emre Kiciman to the podcast to discuss causal machine learning.
As we do in nearly all of our episodes, the conversation starts by covering the basics–what is causal machine learning? Emre explains that causal ML is the intersection of causal methods with machine learning. Causal methods can be applied to machine learning, but machine learning can also be applied on top of causal methods to process much larger datasets.
But how does causal machine learning differ from correlational machine learning that data scientists are already largely familiar with? Causal machine learning goes beyond the data and considers a larger representation of domain knowledge, or causal assumptions, and leverages this information to guide machine learning.
When it comes to the four key steps of causal inference, Emre succinctly summarizes them as follows:
  1. Modeling your assumptions: create a graph that maps the potential causal relationship and causal variables.
  2. Identification: given a question that you want to answer, establish a strategy to calculate the effect of A on B.
  3. Estimation: statistical estimation based on the approaches that you identified in the previous step.
  4. Validation and refutation: review all of your assumptions in order to build more confidence in the conclusions you’ve drawn. 
By moving over to causal ML models, we’ll now be able to apply ML to decision-making applications much more robustly. There are, however, downsides associated with causal methods, and one situation in particular that Emre highlights is its failure to handle unstructured text and video well.
Tune in for far more insights from Emre, including the real-world applications of causal methods, the key skills he looks for when hiring at Microsoft, and the tools he uses most.

In this episode you will learn:  

  • What is causal machine learning? [5:52]
  • Causal machine learning vs correlational machine learning [10:10]
  • Emre’s DoWhy open-source library [16:17]
  • The four key steps of causal inference [21:24]
  • How and why Emre’s key steps of causal inference will impact ML [26:36]
  • Emre’s thoughts on the future of causal inference and AGI [34:09]
  • How Emre leverages social media data to solve social problems [38:36]
  • What’s next for Emre’s research [46:02]
  • The software tools Emre highly recommends [55:16]
  • What he looks for in the data science researchers he hires [58:45]
Items mentioned in this podcast:
Follow Emre:

Podcast Transcript

Jon Krohn:

This is episode number 613 with Dr. Emre Kiciman, senior principal researcher at Microsoft Research. Today’s episode is brought to you by Datalore, the collaborative data science platform, and by Zencastr, the easiest way to make high-quality podcasts. 
Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. Now, let’s make the complex simple.
Welcome back to the SuperDataScience Podcast. We’ve got a special episode for you today on causal machine learning with the world leading applied causal research leader, Dr. Emre Kiciman. Emre has worked within the prestigious Microsoft Research organization for over 17 years currently holding the position of senior principal researcher. In that role, he leads Microsoft’s research on causal machine learning, including leading development of the DoWhy open source causal modeling library for Python, and pioneering the use of social media data to answer causal questions in these social sciences, such as with respect to physical and mental health.
He’s published over 100 papers, and his research has been cited over 8,000 times. He holds a PhD in computer science from Stanford University. Today’s episode is relatively technical, so it will probably appeal primarily to folks with technical backgrounds, like data scientists, machine learning engineers, and software developers.
In this episode, Emre details what causal machine learning is, and how it’s different from the correlational machine learning that most data scientists are already familiar with.
He talks about the four key steps of causal inference and how they impact machine learning, the types of data that are most amenable to causal methods, and those that aren’t yet, but maybe soon. He talks about the exciting real-world applications of causal ML, the software tools he most highly recommends, and what he looks for in the data science researchers he hires. All right, are you ready for this sumptuous episode? Let’s go.
Emre, welcome to the SuperDataScience Podcast. I’m excited to have you on the show. Where in the world are you calling in from?
Emre Kiciman:
I am calling in from Seattle. 
Jon Krohn:
All right. I guess that should be unsurprising given your 17 plus years at Microsoft, that is the Microsoft hub. Have you always been there in Seattle while you’re working at Microsoft? 
Emre Kiciman:
Yes. I came here from Stanford where I did my PhD, and then came up here to work at Microsoft Research. 
Jon Krohn:
Nice. It must have been… I mean, we could end up talking about this for a long time, but in that 17 years, I imagine Microsoft has changed a ton. 
Emre Kiciman:
No. No. It’s really been the same company. It’s changed unbelievably. Microsoft Research has always been a really super fun place to go, but the broader cultural change of Microsoft is it’s really been astounding to see. 
Jon Krohn:
Wow. We know each other through Sarah Catanzaro. Sarah was our guest on the podcast in episode 601. She did a brilliant episode on venture capital funding for data science company, so both from the perspective of if you’re a data scientist, and you are thinking about getting funding for some idea you have. It talks about that. It also gives some insight into how you could become a venture capitalist or what kinds of decisions you might go into making high-quality investments in data science companies, really great episode, amazing energy that Sarah has. I do hope that we have her on the show again soon.
I don’t know how you know her, but I asked her at the end of the show. It’s like, “There’s people that you think would be incredible guests on the show.” She recommended you, and I was 100% right away. People are really excited about you being on the show. As with many guests, a week before filming, I posted on LinkedIn and Twitter that you would be coming on the show, and we had a huge amount of engagement. I think you might have a record number of comments for people saying, “I can’t wait for this episode. Emre’s the best,” so really excited to get this episode out. Anyway, how do you know Sarah? 
Emre Kiciman:
I’m excited to be here. That’s super kind of those commenters. How do I know Sarah? She reached out… I think we’ve crossed paths a couple of times. She’s reached out over the years to ask about topics where I’m doing research. I’ve been interested in always getting her perspective on what Silicon Valley and what VCs think about where technology is going, where challenges are.
Jon Krohn:
Nice. That sounds like a great partnership. Awesome. So speaking of your expertise, a key expertise of yours is causal machine learning. Now, we actually already recently had an episode on causal inference with a luminary in the causal inference field, Professor Jennifer Hill at New York University. That’s episode 607. Now, you’re here to talk about a specific… You can correct me if I’m wrong on this area, but I think I could describe it as a specific sub category, a subfield of causal inference, and it’s causal machine learning, which is something that we barely touched on in Jennifer Hill’s episode. At Microsoft, you lead initiatives related to causal machine learning, notably the PyWhy open source library. Maybe you could tell us about what causal machine learning is at a high level. 
Emre Kiciman:
Causal machine learning is the intersection of causal methods with machine learning. Conventional machine learning methods, they look for patterns in data to make predictions, to classify things, and they always find patterns in data, even if they are not real, if they’re spurious patterns. Now, the problem then is that the machine learning is assuming that those patterns are the same in their training data and in their deployed environment usually. What this means that if those patterns change, for whatever reason, the machine learning models fail.
This can happen for a number of reasons, distribution shift, et cetera. The way that causal machine learning helps is that it uses domain knowledge, causal assumptions about the underlying mechanisms of a system to guide the machine learning models to pay attention to the cause and effect relationships, only the right patterns. 
Those cause and effect relationships, because they’re more fundamental to the mechanisms that govern the system or the data generating process, they’re more stable.
Even when exogenous factors, other things change in surrounding your system, your model of the internal mechanisms, the endogenous mechanisms still remains. Even if one of these mechanisms change, the others are more likely to stay the same as well. So, causal machine learning is the area where we are using causality to help machine learning, and then also where we’re using machine learning methods to help causal methods, for example, to help them apply to high-dimensional, multi-dimensional data on structured texts, images, things like that. 
Jon Krohn:
Super interesting. It sounds like this is a blend of different areas of approaches. Causal inference is typically under the purview of statistics, econometrics, whereas machine learning comes from a different lineage. Machine learning grew out of computer science really, and this idea of working with very large data sets, whereas those other approaches, statistics, econometrics that causal inference techniques came out of those fields. Typically, we’re dealing with smaller data sets, and worried about things like statistical significance.
This does sound like a really interesting blend, and it’s nice to hear from you that it works both ways, that both causal methods can be applied to machine learning to allow us to draw conclusions that otherwise wouldn’t be possible, while it also goes the other way where we can use machine learning on top of causal methods to be dealing with much larger data sets than maybe traditional causal approaches would work with. 
Emre Kiciman:
Causal methods in general have developed across a huge number of communities, almost independently. We have the computer science approaches like the Pearl and Judea Pearl’s approaches to modeling causal graphs and doing causal reasoning coming from part of computer science. We have statisticians, econometricians doing potential outcomes. We have people in the health field. We have people in genetics who are able to make much more structured and stronger assumptions than we can in many other domains developing specialized methods.
Now, what we’re seeing is we’re seeing all these areas start to come together a lot more. We’re seeing a lot more conversations across communities, and a lot more thought going into how we can start to use methods together. 
Jon Krohn:
Awesome. How are causal machine learning methods fundamentally different from correlational machine learning methods that most data scientists are already familiar with? So in data science, a very common method would be linear regression. Today, we have a lot of interest in deep learning algorithms. The way that those algorithms work out of the box, all that they can do is identify correlations between variable X and variable Y with deep learning. These can be non-linear relationships, but a deep learning algorithm has no more sense of causal direction.
Does X cause the variation in Y, or does Y cause the variation in X? I don’t know if you’re able to explain in a podcast without visuals or without going into mathematical equations, but what are we able to do with causal machine learning approaches to take some information like that to say, “Our linear regression model or our deep learning model suggests this strong correlation between X and Y. What can then we do on top of that to infer causal direction, and say X is causing Y?” 
Emre Kiciman:
I’ll try and go at a very high level and then go down at least one more level after that. So at a very high level, the difference between causal machine learning and conventional machine learning is that causal machine learning does not just look at the data. It takes a representation of domain knowledge or causal assumptions, and uses that to guide what the machine learning should be doing. So in your episode with Jennifer Hill, she mentioned that if anyone comes to you, and says that they have a method to get at causality just from data, an assumption-free method, I think she said, don’t believe them because you can’t.
We’re in that same boat. We’re playing under the same restrictions. In order to get at cause and effect relationships with machine learning, we need to bring in assumptions, and we encode those assumptions so that we can reason over them. 
We use, for example, Perlin approaches, for example. That then tells us… That then gives us that key difference. Going down one level, for example, if we want to know how much some treatment A influences some outcome B, and we know that there’s a confounder, we know the fact that confounder influences both of these other variables, both the treatment and the outcome.
Then we can condition on that variable, or we can use any number of methods to essentially match and create something that’s equivalent of an RCT on the difference in A, and then measure from that. 
Jon Krohn:
The mice control trial. Something that came up with Professor Hill in her episode was this idea that the only way to be 100% sure that variable X is causing variable Y is in a randomized control trial, where you are controlling variable X. What you’re saying is that there are situations where we can use conditioning on variables to make the assumption that X is causing Y. 
Emre Kiciman:
That’s right. It is an assumption that we have to assume that we’re able to condition on all the necessary variables. An RCT’s benefit is that because you’re flipping a coin, you know that nothing else is influencing the treatment likelihood. 
Jon Krohn:
Right. Right. Right. When we’re making the assumption, I guess the kind of thing… There could be situations where there is some unmeasured extra variable that unexpectedly is causing the impact in both X and Y. That’s why things like domain knowledge are critical to being able to say, “Based on our understanding of how this phenomenon works, it is unlikely that there is some third variable that we haven’t accounted for.” Is that how it works? 
Emre Kiciman:
Yes. You can do sensitivity analysis and other things to get a better understanding of whether there might be other confounders, and if so, how strong that confounder has to be to mess up your conclusions. But that’s the essence of it is that we’re making these assumptions. That’s driving how we condition or what types of constraints we put on our loss functions, for example. 
Jon Krohn:
Today’s show is brought to you by Datalore, the collaborative data science platform by JetBrains. Datalore brings together three big pieces of functionality. First, it offers data science with a first-class Jupyter-notebook coding experience in all the key data science languages, Python, SQL, R, and Scala. Second, Datalore provides modern business intelligence with interactive data apps and easy ways to share your BI insights with stakeholders. And third, Datalore facilitates team productivity with live collaboration on notebooks and powerful no-code automations. To boot, with Datalore, you can do all this online in your private cloud or even on-prem. Register at datalore.online/SDS, and use the code SUPERDS for a free month of Datalore Pro, and the code SUPERDS5 for a 5% discount on the Datalore Enterprise plan.
Nice. Tell us about this PyWhy open source library, this initiative that you’ve been leading for causal machine learning. What is the gap that this open source library fills that other existing tools didn’t? 
Emre Kiciman:
When we… The DoWhy library started as a project together with Amit Sharma. He’s been actually… If I had to pick one of us as the major driver, he’d be the major driver of the library. 
Jon Krohn:
This is a name that is going to come up many times in this episode. I know from our conversations beforehand already. Amit Sharma, he’s big in this space. Just also really quickly to clarify here, there’s this PyWhy and also DoWhy. I’ve seen in different places, they’ve both been referred to as the open source library, but maybe you can explain that a little bit better. It’s something like the PyWhy is the project name in GitHub. Then DoWhy is the name of the specific library or something like that. 
Emre Kiciman:
That’s right. DoWhy is the name of the repository in the library. PyWhy is then the GitHub org that we’ve put around it, and the idea is that over time, we expect there to be additional repositories, different kinds of libraries coming in to support the broader effort of creating an ecosystem for causal tooling for the community. 
Jon Krohn:
Very cool. Anyway, before I started interrupting you with those clarifications, so you’re telling DoWhy and how Amit Sharma is maybe the primary driving force behind that library, but you certainly also play a leadership role. 
Emre Kiciman:
Yeah. We started that library because we wanted to teach people how to use causal inference. When we looked around at other code that was available to help with causal effect inference, we saw that almost all the code was narrowly focused on the statistical methods. They implemented a statistical method that was critical. I mean, it’s complicated code. It’s good to have in the implementation already. But when we spoke to people who were trying to use these libraries, or trying to get into causal inference, they weren’t falling over or having trouble because of the statistical estimation method.
They were having trouble trying to map their problem to a causal framing. They were having trouble tracking their assumptions, and really understanding what they were assuming when they used a particular conditioning set, a particular identification algorithm, a particular estimation method. Really, because causal effect inference is trying to estimate a question who’s answer can’t be directly observed, we don’t have any ground truth that gives us confidence that we’re doing the right thing. We’re trying to see the difference… 
When we talk about effect inference, we’re trying to see the difference between what happens if we do an action versus don’t? In the real world, we only ever get to see one of those. We can’t see both, which means we can’t actually measure the difference, which means that when our effect inference algorithm says the difference is two or three or something, we don’t know if it’s right. We have to trust it. To trust it, we need to rely on our assumptions, and validate them as much as we can.
So the purpose of DoWhy, long story short, was to create that scaffolding across the end-to-end process. We had some very simple algorithms for estimation. We call out to other libraries for the more complicated, more sophisticated methods, but we really provide that end-to-end scaffolding for thinking about your assumptions, reasoning about them. Then after you’ve done your statistical estimation, validating them, refuting them, running sensitive analysis. 
Jon Krohn:
Perfect. It actually ends up being a way for people. If they don’t already have familiarity with causal inference, it’s a good library for them to start with, it sounds like.
Emre Kiciman:
Yes. We really try and step people through the process. 
Jon Krohn:
Excellent. Then I’m sure there are lots of listeners scribbling that down. Hopefully not while they’re driving to work. We’ll be sure to include links to the DoWhy open source library, part of this broader PyWhy GitHub organization in the show notes, for sure. I have a feeling that this library is going to come up again in this episode. In previous talks that you’ve given, so there is a particular talk that you gave with Amit Sharma, whom you mentioned earlier.
The two of you gave a talk in 2021 last year on the foundations of causal inference and its impact on machine learning. This talk is a bit over an hour long, and it’s tremendously popular. So at the time of recording, it has over 13,000 views, which is amazing for such a technical topic. In that lecture, you provide four key steps of causal inference. Could you elaborate on those four key steps for our listeners on the podcast? 
Emre Kiciman:
Of course, the four key steps, so it’s modeling your assumptions, identification, estimation, and then reputation. The first one, assumptions, is about capturing your knowledge about the system. That’s going to guide the rest of the analysis. This ends up looking like, right now, the most practical way to do this is to draw out a causal graph, where each of your features is a node in a graph structure, and an arrow says that this feature might influence this other feature.
Each of these arrows is a potential cause and effect relationship. Once you’ve captured that, the second step is identification. This is now given a question that we want to answer, what is the effect of A on B? Identification is about finding the strategy to calculate that effect of A on B from your observed data. So given from the graph, the identification algorithm will read off the potential confounders. It will, depending on whether you want the direct estimate or the total… sorry, the direct effect or the total effect, it’ll take care of mediators, and we’ll give you the causal estimate that you care about. 
Jon Krohn:
Cool. 
Emre Kiciman:
The statistical estimation step is about statistical estimation. There are large number of these methods, each one of them appropriate for different scales of data, different kinds of data like categorical, binary, continuous, et cetera. Then validation and reputation now is where we go back to all the assumptions we’ve made through this process. What did we declare when we were modeling the assumptions? What did our identification algorithm assume? What did our statistical estimator require?
It’s just keeping track of these and making sure that it’s easy for you to test the ones that are testable to refute the ones that maybe you can’t validate, but you could prove them wrong if you see the right signals in your data, or to run a sensitivity analysis where you say, “Hypothetically, if there is say an unobserved confounder, how big does that have to be to mess up my results, to reverse my outcome, for example?” All of this now is intended to give you a better confidence in the output of your whole analysis process. 
Jon Krohn:
Nice. Let me try to restate those back to you in my own words, to make sure that I and therefore hopefully many of our audience members get it. The four steps of causal inference are first modeling our assumptions. We create a graph that maps the potential causal relationships, and other confounding variables, and how they could relate to each other. The second step is identification, actually, seeing if in that map if there is actually a causal impact of A on B. I’m probably not describing that very well.
Emre Kiciman:
It’s basically what’s going to be your strategy. Do I use instrumental variables? Do I condition on some variables?
Jon Krohn:
Right. 
Emre Kiciman:
There’s multiple ways of getting at how do I calculate the effect of A on B? Sometimes, the answer is going to be you can’t. You’re not observing the variables you need to observe in order to really get at the result, the impact of A on B. 
Jon Krohn:
Excellent. Now, I understand that step better. Then in step three, that’s where we can actually do some statistical estimation based on the approaches that we identified in step two. It sounds like there are a lot of different potential methods that we could use there, depending on the situation, depending on the size of the data set. Then finally in step four, we have this reputation validation step where we double check assumptions that we’ve made, and results that we have so that we have more confidence in the conclusions that we draw. 
Emre Kiciman:
That’s right. 
Jon Krohn:
Cool. That sounds crystal clear to me now. Brilliant. How and why will these four steps of causal inference impact machine learning? It seems like… Maybe this is part of why you’re so interested in this field in general. It seems to me like there’s a lot of discussion now about causal machine learning, but we’re still in the infancy of it being applied. Again, how and why do you think these four key steps of causal inference will start to impact machine learning more and more? 
Emre Kiciman:
Well, it’s… Stepping back, I think when we talk about conventional machine learning, the place, we talked about how it looks for patterns, and sometimes those patterns fail or change, and then the machine learning model fails. The place where that’s most, I think, pernicious is when you use machine learning models to help drive decision-making applications. The reason is that oftentimes, the machine learning model drives a change in a decision-making policy.
That decision-making policy changes the environment, changes the data distribution of the data patterns that the machine learning model then sees the next round around, which means that now your machine learning model is probably going to fail because of that data distribution shift caused by your new decisions and actions. So from an application point of view, where can we use machine learning? My hope is that by moving over to causal ML models that are focused on the cause and effect relationships, that we’ll now be able to apply ML to these decision-making situations much more robustly.
Jon Krohn:
Cool. That sounds exciting. Are there any downsides to moving to causal methods, or is it all… Is it going to be useful information every single time that we start applying causal methods?
Emre Kiciman:
I think that there are some situations where you would want to use a conventional machine learning approach. Obviously, causal ML, like you said, is still quite new, so there’s a lot of basic capabilities. Just the general ability to handle unstructured text and video is it’s just much more advanced, right? Those are not yet brought back and integrated with causality. That’s the functionality you want. Then that’s what… Then causal ML isn’t quite ready for you yet. Maybe soon.
The other downside is if you know that you are in a scenario where your data distribution is not going to change, then causal machine learning methods are probably going to leave some information on the table, which means that you’d expect… Because the pattern, even if it’s spurious, it’s there for some reason. If nothing is changing, that pattern is going to still give you some predictive value. If that’s what you care about, then causal ML is not going to exploit everything that the data has to offer. 
Jon Krohn:
Gotcha.
Trying to create studio-quality podcast episodes remotely used to be a big challenge for us with lots of separate applications involved. So when I took over as host of SuperDataScience, I immediately switched us to recording with Zencastr. Zencastr not only dramatically simplified the recording process, we now just use one simple web app. It also dramatically increased the quality of our recordings. Zencastr records lossless audio in up to 4K video, and then asynchronously uploads these flawless media files to the cloud. This means that internet hiccups have zero impact on the finished product that you enjoy. To have recordings as high quality as SuperDataScience yourself, go to zencastr.com/pricing, and use the code SDS to get 30% off your first three months of Zencastr Professional. It’s time for you to share your story.
But there’s nothing… I guess, the kinds of risks associated with applying causal machine learning methods to the kinds of data, so let’s say we have structured… We have tabular data, which is probably the most well-trodden road for causal methods. We have the structured table of information, probably the only… Well, I mean, correct me if I’m wrong, but it seems to me like the only major risk to using causal inference tools in that kind of situation comes from human error, from misinterpretation.
It doesn’t seem like there’s any downside to trying out causal methods with any given data set. It’s just that you could end up in situations where these tools get misused. People are making causal assumptions in situations where they ought not to be, and they’re not aware of it. 
Emre Kiciman:
I mean, if you give the system the wrong causal assumptions, if you mess up your arrows, you say, “Trust me, there’s no confounders,” when there are, you can get the wrong results. So, we do have a heavy dependence on some level of domain knowledge. We’re trying to… That’s one of the challenges right now of using these tools, and that’s one of the frontiers on our research agenda is to search out new sources of domain knowledge, and support people who are trying to put that into the system. 
Jon Krohn:
Nice. Now, this isn’t a question that we talked about before we started recording, but it just occurred to me as potentially a very interesting question to ask you. There is a lot of hype around the idea that in our lifetimes, we could have artificial general intelligence, an algorithm that would have all the learning capabilities of an adult human. One of the biggest… There are countless barriers that we understand to attaining AGI, and then there are surely countless more that we haven’t even thought of, because the stepping stones aren’t in place to even understand what those limitations are yet.
But one of the key stepping stones that we are aware of is this causal inference issue. So as we’ve already discussed on the show, the majority of machine learning or AI approaches today are investigating correlations only. 
So when you mention deep learning, the idea of deep learning to a lay person, or somebody who’s maybe an expert in some specific field, maybe they’re an expert medical researcher, and so they’re a smart person. They’ve heard things about deep learning.
I think when people hear about that, one of the conclusions that their minds automatically jump to is that there’s some deeper reasoning happening here, that if this is deep thinking that it’s doing some kind of… that it’s going to somehow be able to come up with some kind of understanding of the world, some causal direction between variables, but there isn’t that at all. Deep learning is one of the most advanced AI approaches we have today, and it has absolutely no clue whatsoever on causal direction. It is only a tool for identifying correlations between variables.
The question that I’m working my way towards here is it seems like tools… 
We’ve been talking about in this episode so far today that all of the tools for causal machine learning today require humans to be making causal assumptions, and setting up the graph architecture for how variables could be related to each other. So from your deep understanding of causal machine learning, do you see over the coming decades some pathway towards complete automation of these kinds of causal assumptions being made, or is that such a hard problem that we might not see it in our lifetimes? 
Emre Kiciman:
I see a few avenues for supporting people in creating this domain knowledge. One of them is we could encode the domain knowledge for areas of interest, and share it, right? We don’t have to have necessarily everyone encoded for each individual problem. There might be ways of finding abstractions that let us share this across people working in health or in cancer or in climate change, et cetera. That’s one thing that might help produce that burden on the human.
The second thing is, I think, that there are approaches to do causal discovery through experimentation. You can have reinforcement learning and actively trying to probe a system to get that causality. This is bringing in knowledge through interaction with the world. 
Jon Krohn:
Right. 
Emre Kiciman:
That’s feasible in some areas, but not others. You talk about the deep neural network despite all of the incredibly complex tasks that it’s achieving is just looking for statistical patterns. At least right now, when we talk about adding causality and integrating this domain knowledge into those algorithms, we’re still talking about just enforcing additional constraints on the patterns that we look for. So, we enforce statistical independence constraints that are implied by the causal graph and our domain knowledge.
We’re not necessarily adding any real magic to the underlying mechanisms there. I’m not sure that’s going to [inaudible]. 
Jon Krohn:
I like that phrase. That’s real magic. 
Emre Kiciman:
A real magic. 
Jon Krohn:
I mean, it goes to show… Going back to that word hype, it is interesting to me how often people talk about AGI, and how I think people who are like, like I said, this medical researcher, or just other interested people, technologists who aren’t deep in the weeds like you are on causal machine learning problems or on big picture AI related problems, reasoning problems. They read so much about the advances we’re making. It’s probably easy for them to think, “We’re just a few steps away from having these systems that could infer causality on their own, that could reason on their own, that could draw conclusions about the world, cause and effective relationships about the world.”
It is amazing having a conversation with somebody like you, and facing the reality of how far we are away from that. It’s exciting to the extent that it means that there’s a lot of great work for you to be doing still in your career. 
There’s an effectively probably infinite amount of work to explore in this causal machine learning space, so that must be exciting.
All right. So beyond just pure causal machine learning research, you have been involved in other research in your career. I suspect some of what I’m going to ask about now is related to causal inference anyway, but you’ve done a lot of papers and patents on understanding social issues through social media data. There are papers that we’ll include in the show notes related to the effects of early college alcohol use.
You pull out patterns in social media data related to this alcohol use, the influence of social pressures on daily activity patterns. You pull this out of rhythms in Twitter data, and you’ve also done research on the flu and separately mental health, also using social media data to come up with some of the conclusions that you draw on that research.
I’m curious about this body of research that you seem fascinated with, this use of social media data. What motivated you to get involved in using those data, and is it related to causal inference in some way?
It is in many ways. I got into… I’ve changed research interests a few times in my career. I used to be a distributed systems researcher. I’ll skip over how, but I got into social computing and then computational social science, really just being fascinated by how much information about people and important problems that people and societies faced was embedded in social media data and other digital traces. So, I got into the computational social science community, and started seeing all these fascinating questions that people were asking all motivated by wanting to solve problems, to get real insights that would help us with online mental health issues, with health issues.
The flu example is a study where a social data had only a relatively small part to play, but we were looking at how seasonal influenza spreads across the U.S. every year. Why does it start where it starts? Then why does it travel the way it does? Is it airplanes? Is it local travel? Is it weather? What plays the part? We were able to… That word why. 
Emre Kiciman:
The word why. Exactly. 
Jon Krohn:
It seems that’s a recurring one of these causal inference episodes as well as in your PyWhy and DoWhy episodes. 
Emre Kiciman:
Yes. Then when I go to a lot of these conferences, people would be asking these great questions. They’d have this great story about what they were seeing. Why are people making friends? Why are… What’s driving people’s behaviors? Why? 
Jon Krohn:
I like that. I like the idea of that at a computer science conference, why are people making friends? 
Emre Kiciman:
Well, I mean, they’re tying this back to theories about triangle closure and stuff like that from I don’t know how many decades ago in the social sciences. But anyway, then they end their presentation on this really deflated note. Of course, all of our analyses are correlational. Who knows what’s really driving these patterns? We don’t know. It’s like, “Well, that sucks. You’ve got this great, huge data set, great insights into what’s driving. You’ve talked to the domain experts. You’ve got all of this knowledge coming together.”
Then because correlation is not causation, you just have to have this huge caveat then that says, “But who knows?” That was really disappointing. I’d heard about… I mean, it was presentation. It’s a presentation. I remember this one day when it was just every presentation had to have this caveat. I’d heard about this area called causal inference, that there were ways of making causal inferences from observational data. That’s the day, I think, I decided that I needed to go learn more about that, and bring it to the communities that I was a part of then. 
Jon Krohn:
Cool. 
Emre Kiciman:
That triggered a line of work where we were demonstrating the use of these methods to analyze signals that we were pulling out of social media data. The first one was on the… I’m going to mess up the title exactly, but basically, the events that seemed to lead to suicidal ideation in social media forums. We were able to make causal connections between issues that people talked about occurring in their lives, and then later on see them talking about suicidal ideation.
We not only did the causal analysis of the data. We also worked with domain experts to tie this back to theories of suicidal ideation offline, and basically show that the same issues and signals that were believed to exist offline are also showing up online. That was an important and interesting topic to study, partially because of the limits of studying suicidal ideation offline. You can’t talk to everybody unfortunately about what triggered their issues. 
That was the beginning, and there was a thread of work with a close collaborator, Munmun DeChoudhury at Georgia Tech, a faculty there, where we continued looking at mental health issues online with causal methods. That’s her whole research area, so she’s the person to talk to if your listeners are more interested in that. Then I went and wanted to broaden the use of these methods, and so worked with other domain experts and other topics, et cetera. Then eventually, the causal methods themselves, the toolkit, the algorithms became more of my primary research focus more than the computational social science questions.
I still care about them. I still care about the societal implications of AI, but for now, more of my time is going to the causal machine learning itself. 
Jon Krohn:
Nice. So from studying these kinds of social issues, going to these social conferences, and being frustrated by people constantly drawing conclusions, where they had to couch those conclusions and say, “We found this strong correlation, but we can’t be 100% sure about causal direction,” that led you to start examining causal inference techniques a fair bit. Then now, that has really taken your interest, and you’re focused primarily on causal inference as opposed to necessarily that specific application of the social space, though you still have an interest in it. 
Emre Kiciman:
Yeah. 
Jon Krohn:
That’s cool. You’ve had this exciting journey. Now, you talked about it a bit there. So while on the one hand you’ve been in one job for over 17 years, I think your entire post PhD career, you’ve been at Microsoft Research. On the one hand, it sounds like there hasn’t been that much excitement and change, but on the other hand, there has been enormous excitement and change not only in Microsoft, the company, but also in your research interests at Microsoft Research.
You mentioned being in distributed systems, and then there’s fault detection in large scale systems, and these social questions that you’re tackling, and now this focus on causal inference. That is an exciting journey. What’s next for you? Do you have insight into that, maybe some particular research direction within causal machine learning? 
Emre Kiciman:
I think, I’m going to be in causal machine learning for a while yet. I don’t see that going away soon- 
Jon Krohn:
Until AGI. 
Emre Kiciman:
Until AGI. I think what I’m excited about is seeing these methods get broader use. I really think that they are going to have a strong impact on the value of our decision making. If I had to pick application areas where I think I’m most excited about partnerships we have, the things that I’ve been excited about are, I think, we have… We’re seeing broader pickup in industrial usage like industries using this. One that’s a lot of fun these days is actually agriculture.
We have a partnership with the Global Soil Health Program, whose mission is to improve soil management methods by, I think, over 60% of the world’s farmers to make the soil healthier through better sequestration of carbon, and also improve the carbon sequestration for mitigating climate change. The challenge here is that the carbon process in soil is not super well understood. It’s very complex. There are models of it that work in particular regions, but there isn’t a global understanding of the carbon process that works across the world.
Of course, if you’re trying to reach 60% of the world’s farmers, they need that global model. We’re working with them to try to apply causal models, not only to learn from observational data, but also to help direct the gathering of data and experimentation. 
It’s really a broad approach to helping make sure that we can better understand how carbon stays in the soil. I’m playing a supporting role in that as the machine learning person. I’m not the soil person, but that’s certainly been a lot of fun.
Then others, I think other fun areas, impactful areas are health. I think there’s a lot that can be done in improving health, and using causal methods to augment the current randomized control trial based development process for treatments. One that’s been great actually as an accelerator of research is a partnership with our online services. In particular, I do a lot of work with this great team at Microsoft advertising. They have very crisp problem statements, infrastructure data, and ability to run experiments to validate that our methods are doing the right thing, but then also a strong desire to avoid experiments that are expensive, frankly.
That’s actually been a way for us to develop new algorithms that we think are going to be broadly applicable, but then also make sure that they are correct. 
Jon Krohn:
Sounds really cool. The agricultural application sound particularly fascinating to me. We’ve had other episodes on the show that deal with agriculture. For example, Serg Masis, who is actually the researcher for this program now, so recent months, he’s been doing an amazing job of researching guests like you before you come on air, and comes up with amazing questions for me to ask, and provides me with a lot of context that he digs up from your papers, patents, talks you’ve given.
Serg is an invaluable contributor to the SuperDataScience Podcast. But before he was doing any of that, he was a guest on the show in episode number 539. He was specifically talking about agricultural data science and climate health, that kind of thing. I can’t wait for him to listen to this episode, and hear about how causal methods could be impacting the work that he’s doing, and then, of course, climate change, so related to that, this idea of carbon sequestration and taking advantage of how we’re managing soil to be fixing carbon from the atmosphere, and having a positive impact on climate change. 
So if listeners are interested in an episode specific to climate change, if that’s something that interests you, I highly recommend episode number 459 with Vince Petaccio, where he reviews the literature and a broad range of applications related to using machine learning in that and climate change. It sounds really fascinating. Great to hear all these applications of causal machine learning from you. No doubt there will be many more to emergent future, especially as we figure out how to be applying causal methods to unstructured audio in video, which is there’s huge amount of data, exponentially, more data available there than in structured data sets.
Something we haven’t talked about on the show, at least explicitly, maybe you’re aware of applications… As you were discussing some of these applications, maybe implicitly you’re aware of natural language processing data, natural language data. Is that something that we see? We’ve talked about how structured tabular data, that’s the sweet spot for causal methods today, including causal machine learning methods, how unstructured audio and video don’t work that well, but what about unstructured text? 
Emre Kiciman:
Text and audio and video are in the same bucket from the causal perspective. 
Jon Krohn:
Got it. 
Emre Kiciman:
But the way that we’re approaching them, I think we started as the academic community a couple of years ago, started applying in variance discovery, which has a causal interpretation. It’s basically we’re not going to rely on patterns that we see shifting across a couple of sample data sets. So even if a pattern looks really strong and great and explains 90%, 95% of what we care about, the fact that it’s varying from 90, 95, and we don’t know why means that we don’t really know that maybe one day it wouldn’t drop down to zero. We prefer to find the 70% correlations that are just consistent and not changing.
Now, that direction has built out quite a bit, and now, we’re at a place where we can look at a causal graph, read off the statistical independencies that are implied by that graph, and impose them within your deep learning model to ensure that you’re looking at patterns that are consistent with the causal graph for a particular target that you’re trying to predict, et cetera. I think that this line of work is very promising, is going to continue. I think that’s going to continue and give us ways of applying these causal methods to unstructured data. 
Jon Krohn:
Very exciting. Then there will be this huge treasure trove of previously untapped data that we can tackle with causal methods. Very cool. All right. Are there particular tools that you recommend? We’ve obviously already talked about some causal tools like the DoWhy library. But as a researcher with so much experience, are there particular tools that you’d recommend to listeners that they should be getting their hands on? 
Emre Kiciman:
We were joking around before the podcast that my answer was going to be email. I spend all my time in email. 
Jon Krohn:
We’ll go with that. If you haven’t heard of email listener… 
Emre Kiciman:
If you haven’t heard of that. No, so what are the tools that I use regularly? Python, I love Jupyter Notebooks. For programming, I found Copilot to be super useful. 
Jon Krohn:
Nice. 
Emre Kiciman:
It helps me get into new packages that I’m otherwise unfamiliar with. Basically, I type a comment about what I want to do and what package I want to use to do it, and then Copilot has a nice suggestion that helps me quickly get it what I want to do. 
Jon Krohn:
So, that leverages the OpenAI Codex library, right? 
Emre Kiciman:
That’s right. 
Jon Krohn:
I talk about that. I have a five-minute Friday episode, episode number 584, where I introduced the OpenAI Codex algorithm, very cool algorithm that can take natural language input, and then output code in a number of different languages, but it’s particularly adept with Python. It can do some really impressive… It can make some really impressive leaps. There’s a demo video on the OpenAI Codex website that shows somebody building this shoot them up video game, where you have a rocket that’s attacking asteroids in JavaScript, I assume or if I remember correctly, and all of the code for creating that game was generated with natural language prompts.
Very cool library. Then one of the most well known applications is Copilot. I haven’t used it personally myself, but it sounds like maybe I should be, because you’re getting a lot of value out of it. 
Emre Kiciman:
It’s super cool. 
Jon Krohn:
Nice. 
Emre Kiciman:
I mean, and just really surprising to see that a computer can do what it does. 
Jon Krohn:
Wow. I think that’s an example of the situation where somebody who isn’t adept at causal machine learning like you are, when you see a machine that you can dictate natural language to, and it generates code. Because that seems like a cognitively challenging task, we think, “Oh, we’re on the cusp of AGI. There’s a reasoning happening here,” but it’s really just a very impressive… It is truly impressive. I don’t mean to discount the impressiveness of this approach, but it is relying on correlations only. It’s pattern recognition.
It’s a network with a very large number of weights that is trained on a very large data set, and has learned how to correlate that natural language input that you provide with a particular output that it should provide, but it isn’t able to reason in any sense at all as to why it’s producing that output. It just does. Cool. That’s exciting. I’ll have to check that out. Having been at Microsoft for 17 years, you’ve, I’m surely, been involved in a lot of hires.
I know that because you’re in Microsoft research, these are specific kinds of hires, but I’d love to hear what you look for when you’re screening recent PhD graduates, and considering having them join the Microsoft research team. What are the key things that you look for? 
Emre Kiciman:
I can tell you what I personally look for. You’re right. Microsoft research is a little bit different. We don’t hire the same kinds of engineers necessarily that the rest of the company does. We’re looking for people with a PhD or equivalent research experience for the researcher roles. What I think we’re looking for, we know that these are roles where we need people working at the cutting edge of technology. That cutting edge is going to change in ways that we can’t anticipate, so we need people who are able to be out there in academia, and also very actively within the company trying to find that cutting edge of what needs to be coming down the… What needs to be developed? What’s coming… What are the big trends that we’re going to need to be taking advantage of to stay ahead of the game?
So, it’s someone who has a vision, a demonstrated ability to do research, and someone who’s looks at… I mean, if I think about the distinction between folks who we think are great, who are a better fit maybe for universities and academia versus for Microsoft research, it’s the people who see the fact that we have this giant company attached to our research lab as a leverage point for having impact in the world, so someone who wants their research to go out and have an impact on the world, and sees the rest of the company as a way to do it.
I think that creates the right connection between an individual’s research agenda and then what’s going to end up making a great career for them at MSR. 
Jon Krohn:
Cool. I haven’t looked at this stat specifically in a few years, but if I remember correctly, a few years ago, so it’s probably still the case today of all of the big tech companies, Microsoft has the highest number of publications, which is… That must also be something that’s probably changed over the 17 years that you’ve been there. It probably hasn’t always been the case in that 17 years that Microsoft was this global juggernaut of academic research output. 
Emre Kiciman:
We’ve always been very open with our academic research. I’m not sure if we are dominating conferences as much as we did when I first joined. There were a whole operating systems conferences and stuff where we did quite well. Graphics, I think, we were doing well. I’m not sure where we are with all of those venues anymore. First of all, there’s a ton more venues, so you just can’t do it all. There’s a ton more areas of computer science, so, again, you can’t quite do it all, but we continue to be very open.
We see ourselves as being part of the academic community. Then we also see this as a great way of communicating to the broader community about the important problems that we see affecting the industry, where we think there needs to be more research. Otherwise, computing is not going to be able to do everything that it’s promising to do if we don’t solve these problems. There’s a lot of… I guess we’re really happy to be part of academia. We learn a lot from them. We hope we bring something unique to the table as well. 
Jon Krohn:
It sounds like an amazing opportunity. For listeners out there who are interested in being at the cutting edge of developing techniques, and then being able to apply those at scale, being able to leverage a gigantic global technology corporation like Microsoft to make an impact without research that they do, Microsoft sounds like an amazing place to be doing it. Hopefully somebody’s inspired to either follow that path, or pull the trigger, and apply for one of these kinds of roles if you already have that PhD or equivalent level of experience. Now, Emre, in our episodes, I always ask at the end of the episode for a book recommendation from our guests. But before I do that, I also have an understanding that you have a book coming out soon yourself that you might want to tell us about. 
Emre Kiciman:
We’re writing a book about the foundations of causal machine learning. 
Jon Krohn:
The “we” as you and Amit Sharma. 
Emre Kiciman:
We, me and Amit Sharma. Really, it’s a book in two parts. One is about effect inference and the four key steps that we talked about earlier. That’s part one of the book. That introduces the concepts of causality, how you talk about assumptions, what you have to worry about to really be getting a causal inference. Then part two is focused on taking those pieces that we introduced in part one, and applying them to more machine learning challenges.
So, how do we improve generalizability and robustness? How do we learn from logged offline data? How do we learn… What’s the connections with reinforcement learning fairness in AI? These are all now then things that we think can use the components of causality to further their purposes. Then the part two is focused on that. We have most of part one up on our website. It’s drafty. We are very interested in feedback.
Jon Krohn:
Nice. 
Emre Kiciman:
It would be great if any of your listeners are interested in helping us- 
Jon Krohn:
[inaudible].
Emre Kiciman:
… poke holes in the narrative, and improve the text where it’s incomprehensible. 
Jon Krohn:
Nice. We’ll be sure to include that in the show notes as well. Based on the huge level of engagement that I had when I posted that you were going to be on the show, I have a feeling that that book is going to do very well. It’s really amazing here reading some of the comments from people. Richard Zaragoza, who’s a UX designer, he says, “Emre is the best,” and lots people saying, “I’m looking forward to watching this.” 
Emre Kiciman:
Thanks, Richard. 
Jon Krohn:
“Can’t wait. This is going to be a great opportunity to hear from one of the top voices in causal machine learning.” Then in particular, somebody named Alvaro Restoy Ramos calls out the presentation that we already talked about with Amit Sharma. We’ve talked about Amit. As promised, many times in this episode, we’ve talked about Amit from the very beginning with the DoWhy library to your book coauthor, and also, this gentleman, Alvaro, called out your talk, which we already mentioned with Amit Sharma on the foundations of causal inference.
He cites that talk as being great to get an idea of the power of causal inference applied to observational data and in order to understand the impact of a discrete variable. Lots of excited people out there. I’m sure your book is going to do spectacularly, Emre. Then beyond your book, do you have a book recommendation for us? 
Emre Kiciman:
There’s one book I’ve been working my way through and really enjoying, I guess, if that’s the right word. I’ve been enjoying The Ministry of the Future by Kim Stanley Robinson. 
Jon Krohn:
Nice. 
Emre Kiciman:
It’s a book about climate change, and starts off very on a big downer. Folks who’ve read the book will know that’s an understatement. 
Jon Krohn:
Wow. Hence, why you couched that verb enjoying reading. 
Emre Kiciman:
Yes. I think I’m at the book now where change is starting to happen. 
Jon Krohn:
Great. Nice. You are actually not the first guest that we’ve had that recommended this book, so it must be an exceptional one. We also had in episode 593 Professor Philip Bourne, who is the head of the data science faculty at the University of Virginia. He also highly recommended this same book, Ministry for the Future by Kim Stanley Robinson. Now, you’ve got two votes for why you should check out that, I guess, depressing or alarming and then hopefully gets better novel. That’ll be an adventure for the listeners that decide to pick that up.
All right. Thank you so much for that recommendation, Emre, and thank you for this amazing conversation. I thoroughly enjoyed it. No doubt our listeners have as well. For our listeners who would like to stay in touch with your thoughts going forward after the episode, what do you recommend? What are the best ways following your work? 
Emre Kiciman:
Folks are welcome to follow me on Twitter. That’s where I’m most active. I also… For paper publications and more technical stuff, all of that’s out on my Google scholar page or my homepage. Happy to… If folks are interested in chatting, probably reaching out would be totally great. Jon, thanks for having me on the podcast. It’s been a great conversation. I really appreciate it. 
Jon Krohn:
My absolute pleasure. You are a brilliant leader in the field as so many have already testified on our social media posts, and truly been an honor to have you on the show, Emre. Hopefully we can catch up again at some point, and hear how your journey is coming along. 
Emre Kiciman:
Thanks so much. I hope so. 
Jon Krohn:
What a delight to spend time in conversation with a brilliant mind like Dr. Kiciman. I learned a ton during today’s episode, and I hope you did too. In the episode, Emre filled us in on how causal machine learning requires domain knowledge or causal assumptions to infer cause and effect relationships when two variables are correlated, and how while reinforcement learning may be used to automate some causal inference assumptions, causal modeling is nowhere near full automation.
He talked about how his PyWhy GitHub organization is on a mission to build an open source ecosystem for causal machine learning, including through the development of the DoWhy Python library, which supports explicit causal modeling and testing of causal assumptions.
He filled us in on how the four key steps of causal inference are modeling, identification, estimation, and reputation. He talked about how causal ML is ideally suited to structured tables of data today, and maybe more applicable to unstructured data like text images, audio, and video in the coming years. He filled us in on how causal ML is making a real-world impact by answering important causal questions in the fields of agriculture, healthcare, and the social sciences.
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Emre’s social media profiles, as well as my own social media profiles at www.superdatascience.com/613. That’s www.superdatascience.com/613.
Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. Thanks, of course, to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another magnificent episode for us today.
For enabling this super team to create this free podcast for you, we are deeply grateful to our sponsors. Please consider supporting the show by checking out our sponsors’ links. If you yourself are interested in sponsoring an episode, you can find our contact details in the show notes by making your way to Jonkrohn.com/podcast. Last but not least, thanks to you for listening all the way to the end of the show. Until next time, my friend, keep on rocking it out there, and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon. 
Show All

Share on

Related Podcasts