SDS 355: DJ Patil on Harnessing the Power of Data Science Community

Podcast Guest: DJ Patil

April 8, 2020

Before diving into the meat of this podcast, we’d like to share that we are hiring!
Listen for our announcement in the first half of the podcast about the several positions that are now open and looking for new, excited talent! You can check out and apply for the open positions here: http://www.www.superdatascience.com/careers
__________________

In today’s episode, we go over data privacy and ethics, DJ’s work at the White House, the future of data science, the importance of communities in data science and many, many more exciting and inspiring topics.

About DJ Patil
DJ Patil has held a variety of roles in Academia, Industry, and Government.
He is Head of Technology for Devoted Health, a Senior Fellow at the Belfer Center at the Harvard Kennedy School, and an Advisor to Venrock Partners. 
Dr. Patil was appointed by President Obama to be the first U.S. Chief Data Scientist where his efforts led to the establishment of nearly 40 Chief Data Officer roles across the Federal government, new health care programs as well as new criminal justice reforms. He also has been active in national security and for his efforts was awarded by Secretary Carter the Department of Defense Medal for Distinguished Public Service which the highest honor the department bestows on a civilian. 
In industry, he led the product teams at RelateIQ which was acquired by Salesforce, was a founding board member for Crisis Text Line which works to use new technologies to provide on-demand mental and crisis support, and was a member of the venture firm Greylock Partners. He was also Chief Scientist, Chief Security Officer and Head of Analytics and Data Product Teams at the LinkedIn Corporation where he co-coined the term Data Scientist. He has also held a number of roles at Skype, PayPal, and eBay. 
As a member of the faculty at the University of Maryland, his research focused on nonlinear dynamics and chaos theory and he helped start a major research initiative on numerical weather prediction. As an AAAS Science & Technology Policy Fellow for the Department of Defense, Dr. Patil directed new efforts to leverage social network analysis and the melding of computational and social sciences to anticipate emerging threats to the US. He has also co-chaired a major review of US efforts to prevent bioweapons proliferation in Central Asia and co-founded the Iraqi Virtual Science Library (IVSL). In 2104 he was selected by the World Economic Forum as a Young Global Leader and is also a Member of the Council of Foreign Relations. 
Overview
DJ Patil is one of the most important guests we’ve had and is responsible for data science as we know it today. How he views it, is that he was the one charged with steering the community in its modern direction. He looks back all the way to indigenous groups working with data in ancient ways. The difference today is the growth of technology and the growth of community connections. So, DJ, in that vein, views himself as a community organizer of data science.
So, one thing DJ wants to tackle is the question of what data science is not. That is, when is data science being used to cause harm? An example? The Nazis who used the phonebook, a simple data source. There’s more complicated examples in Henrietta Lacks where genetic data was stolen and misused. There’s the practice of disenfranchising voters through data. We have to get to a place as a community to police what data science is and should be. A good rule: just because we can, doesn’t mean we should. 
One hang-up is the differing ethics between nations. The US and China are in a battle to dominate the world of artificial intelligence and there are different ethical concerns between the two countries. DJ believes the question we should be asking is why the US, the West in general, isn’t funding the space as robustly as China to properly compete. Western values, which prioritize democratic values, are important to prevent actions that hurt human dignity and human life in general. Other countries with totalitarian regimes don’t share those values. So, if data science is deployed in a widespread way, we need to make sure it’s done in an ethical way—democratic values.
DJ warns against constantly making China the image of the bad guy in these situations. We were unable to use data to get ahead of the infection curve right now. So, what’s holding back a cure? In the US we have the best medical data in the world thanks to the diversity of our country and robust medical records. But the data is fragmented and protected. If we can bring that data together, we could be closer to the potential of a cure that might be already out there and find off-label use drugs, or locate new forms of disease vectors. We need to find new ways to think about diseases and central to that are the data scientists who will unlock it. This is why transparency of data is so critical. China is slow to release data historically. Domestically, we don’t have access to the total number of tests, how many positive, how many negatives. A concrete example of this is the sharing of data between criminal justice systems and public health systems to prevent those with mental health issues from constantly going to jail when they’re arrested for incidents.
Pivoting, we opened the floor to questions on LinkedIn: 
  • What do you think makes a good data scientist and how do you approach any data science problem?
    – The best data scientists have curiosity and have a natural ability to ask questions. Even if you have a full and complete data set, you should have more questions as you explore it.  
  • What are the new challenges where data science is heading towards? What shape is Data Science going to take in the next 5 years? 
    – DJ says data scientists are a new form of first responders. Data scientists can view disaster areas early and safely, they can use routing algorithms to get rescuers in those areas. Material sciences can be helped by data science getting them better manufacturing. We can help people learn faster with data-driven, tailored education. What DJ is most excited about is that this is central to the success of virtually every institution.
  • Which was your most memorable work memory when you were at the White House?
    – There are moments at the White House where things can be astonishingly positive and astonishingly sad. A moment that stands out to DJ is the day President Obama flew back from the South from a funeral for a mass shooting, and on the same day the Supreme Court ruled marriage equality was the law of the land. The juxtaposition is both tragic but uplifting.
  • Data science can enforce centralized power over decentralized power. Many data-driven companies are monopolists (Facebook, Amazon). How can we use data science as an equalizing fourse for society rather then a centralizing force? Is that even possible?
    – Recently, it was ruled that a patient’s data is their own. For a long time, hospitals believed they owned the data and you do not. Now, you can donate your data, you can move it, you can destroy it. We also need transparency of data, who has it, who sold it, what did they do with it? We need more of that and watchdogs to make sure it’s done. We also need to make sure we’re training data scientists correctly and implementing ethics interviews into the hiring process.
DJ’s parting advice is to think about what is going to move the needle the most for your children and your children’s children. Data scientists get to pick their problems, and if you pick something with that in mind, you can rest easy knowing that problem and solution benefited more than just yourself. It doesn’t matter if you wrote the fastest algorithm in the world, you’re traveling that road alone. 
In this episode you will learn: 
  • How does it feel to be the person who created data science as we know it now? [3:17] 
  • What data science is not [6:01] 
  • Ethics and data science development in different countries [10:00] 
  • What is the “biorevolution”? [16:02] 
  • The importance of data sharing [20:10] 
  • The current state of Chief Data Scientist of USA [24:07] 
  • LinkedIn Q&A [26:03] 
  • What to think about when you think about data science [44:08]
Items mentioned in this podcast: 
Follow DJ Patil
Episode Transcript 

Podcast Transcript

Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States, DJ Patil. 

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple. 
Kirill Eremenko: Welcome back to the SuperDataScience podcast everybody. Super excited to have you back here on this show and I am very inspired today. Why’s that? Well, because 15 minutes ago I got off the phone with DJ Patil and we recorded the episode you’re about to hear. It’s a super exciting episode with one of the most famous if not the most well-known person in the space of data science. So if you don’t know, if you haven’t heard of DJ Patil, he is the person who co-authored The Harvard Business Review article called Data Scientist is the Sexiest Profession of the 21st Century. If you haven’t read it, read it. Make sure to check it out. That gave rise to the popularity of data science. He also coined the term data scientist. That came originally from when he was working at LinkedIn. He didn’t know what to call himself. Him and Jeff Hammerbacher didn’t know what to call themselves and they came up with data scientists. That’s why we have data science right now. And also he is the ex-chief data scientist of the United States. How amazing is that? 
Kirill Eremenko: And on top of that, in this episode, we covered of some very interesting and important topics. So here are a couple of examples of what you will hear about: data privacy and ethics, data in healthcare and biotech, DJ’s work at the White House and some of his most memorable moments while he was there, his current mission at Devoted Health and what they’re doing, how much progress they’re making, the future of data science, data science for good versus data science for bad or evil, and data science communities. So those are just a couple topics we’re going to cover of. I’m sure you’re going to love this chat, this conversation, and by the end of it, you’re going to be super-inspired about data science and your career in the field. So without further ado, I bring to you the ex-chief data scientist of the United States, DJ Patil. 
Kirill Eremenko: Welcome back to the SuperDataScience podcast, everybody. Super excited to have you back here on board, and today’s guest is none other than DJ Patil. DJ, how are you going? Welcome to the show. 
DJ Patil: Thanks for having me. 
Kirill Eremenko: Super excited to have you here. Everybody’s heard about you as the person who started this whole data science movement. It’s a huge honor to have you so probably like first question would be how does it feel to be the person who created data science as we know it right now? 
DJ Patil: Well, I think the way I really think about data science and the movement is this is a community. A lot of times when people look to a particular person and say that person started it or this person really is the most seminal person behind it, I think what a better way to think about it is what I got to do was help steer the community toward a set of problems. But the thing that is probably more interesting than anything else is that this community has been going on a long time. If we go back far enough, you get the Mayans and the Indians and Chinese astrologers and astronomers. You move to Kepler and Copernicus doing really amazing things with data and very difficult calculations. You get to people like George Washington who was doing cartography and maybe should be argued as the first real chief data scientist of the United States. 
DJ Patil: And we’ve had a movement and that movement has really manifested in the next wave of people who have unbridled enthusiasm to use data, have incredible technical skills, we have computational power that we’ve never had the likes of that are easily accessible through the cloud, we have storage and we have the ability to collaborate just like we are miles apart. And so that has manifested in ways that we can apply technology in approaches that we had not thought before. And so I really think of it as I’ve had the opportunity to be more of a community organizer than anything around saying this is how data science should be. 
DJ Patil: I think what we can say, if anything, is there are certain things that data science, I would hope, aren’t, then we can talk about some of those and what I think some of the challenges are if we do data science in the wrong way and the impact. But I also think that we should also not get to a place where we are so regimented that we say data science is this one narrow thing. We should really think about data science as a team sport and we all have different roles to play that use data to make really fascinating good things happen. 
Kirill Eremenko: Yeah. I love it. I was reading an interview you had with The Observer and you mentioned there that you are generally opposed to trying to define data science too rigorously. But it would be interesting to hear your thoughts on what you just mentioned. What data science is not. What are your comments there? 
DJ Patil: Yeah, so I think the first thing that when we say what data science is not is the question about, in the most extreme form, is when should we not be using data or data in ways that possibly cause harm. And there is a number of ways to look at this and this is why we’ve been so active, people like myself, Hilary Mason and Mike Loukides. We actually published a book on this around data science and ethics. It’s a small ebook. We made it free for everybody because we want everyone to kind of take away the ideas around how do we actually start having this conversation about the ethical use of data. When we think about it historically and we ask where some of the most egregious human atrocities have taken place… Take the Nazis. One of the most egregious cases is the phone book. The phone book is a database. 
DJ Patil: And so as we head into this next wave of technology being able to do things, what does it look like where we might possibly do harm to people? And it’s very easy for us to say, “Oh, no, that’s not going to happen again.” But remember, we’ve also had a history of biomedical research, particularly in the United States as well as the Western World, where we’ve had issues like Henrietta Lacks and Tuskegee syphilis experiments where we’ve had breaches of the way we do things in an ethical manner. Right now we’re faced, in this time right now, about how do we use technology to ensure that they are implemented with the values we would like? There’s conversations where people are using data that is scraped from websites like social media, Instagram and Facebook, to create the basis for facial recognition technology for police departments and maybe parts of the government. Is that acceptable? Should we allow people to do that? 
DJ Patil: When we think about voting and using data to disenfranchise voters, that’s a bad problem, in my mind, for what data scientists should be. We haven’t figure out how to self-police. Other communities have figured out how to self-police. If somebody works on genomic research and it isn’t considered acceptable, the community knows how to address that situation and then there’s legal ramifications on top of that. We have to get to the place as a community of asking ourselves what is acceptable. And the specific way that that was actually implemented as the US chief data scientist is a mentioned statement of that role, and that is to responsibly unleash the power of data to benefit all Americans. 
DJ Patil: And I think data scientists should take this to the statement of how do we responsibly unleash the power of data to benefit everyone? Just because we can, doesn’t mean we should. That’s part of the responsibility. I think we should extend that to everybody to make sure we are using data to empower every single person. 
Kirill Eremenko: That’s a very interesting and valid point of view and here I’d like to refer to what I mentioned before the podcast. I was listening to your open hearing on developing and deploying next-generation technologies to, I think it was to the Congress? I don’t know enough about politics to understand the dynamics there. One of the concerns that came up there was I noticed a lot of the questions were around China, around how the US is competing with China for domination in the space of artificial intelligence and other exponential technologies. 
Kirill Eremenko: While these ethical considerations are extremely important, they are crucial, one of the issues that I can see and what I also heard in this open hearing is that they are usually limited to one single jurisdiction or certain set of countries, maybe like the Western World or America or Europe or China and so on. And so what are your comments on imposing these ethical and certain restrictions on development of data science, while absolutely important, can inevitably slow down or inhibit the rate of progress that the US or the Western World will have as opposed to what will happen in China where they have their own ethical considerations which might be very different and they can get much further ahead. What kind of consequences can that carry? 
DJ Patil: Great question. One of the reasons I think everyone is fixated on China is largely due to how aggressively they are investing. And it gets to a place where we can easily point the finger and say, “China is doing all this stuff and so we should slow them down.” I think the better way to look at it is why aren’t we investing as aggressively in our own societies to continue to keep up our pace and our competitive edge? We have dropped the amount of funding that we have supported our basic sciences every year. We continue to have questions, even right now, around the Centers for Disease Control, the CDC, about funding. This current administration wants to cut the funding to those groups and yet we’re seeing the ramifications when we don’t fund research as well as these groups. And that’s not just a US, that’s the Western World. China is increasing the funding. We are entering a space where, within the next 30 years, we will no longer have singular dominance that we’ve seen. 
DJ Patil: As that develops, one of the questions that’s inherent is values and what does it look like with western values? Part of the reason why western values are important is it’s about democratic process. But when we think about science and we think about areas like cloning humans, we have a framework that has been developed through a lot of hardship. Much of that has been in Europe through the Nuremberg trials that turned into Nuremberg code that turned into bioethics after WWII. And we’ve realized that certain things and experimenting on humans has, not only negative repercussions for society, but it takes away not only human dignity but it actually is a road down which you get into all sorts of thorny issues that we have realized that are just not acceptable for people when they don’t have consent. 
DJ Patil: In China, it doesn’t have to be China, it can be other countries with totalitarian regimes, that you run into the same aspects. So when we think about the power that is about to be unleashed through technology and data, we have to ensure that that technology works for us rather than against us. And when we look at some of the technology deployments that are being done where you have groups that are being persecuted through the use of technology, facial recognition or other things, that’s a problem and we have to figure out how, as a society, we are going to make sure that the technology and the focus of how we implement those technologies is really on the side of democratic values. 
Kirill Eremenko: Mm-hmm (affirmative) Gotcha. 
Kirill Eremenko: Hey everybody, I hope you are enjoying this amazing episode with DJ Patil. This is a quick announcement and we’ll get right back to it. We are hiring at SuperDataScience. With the recent pandemic and the corona virus we all know how a lot of people have lost their jobs and their source of income, so hopefully this will be a breath of fresh air for some people out there. We are a 100% remote team, we all work online, we continue to grow and I’ve just, literally just published 10 new positions at SuperDataScience, which might be suitable to you. 
Kirill Eremenko: And even if they are not suitable to you, check them out, they are at www.superdatascience.com/careers, check them out and send them to somebody you know who may have been displaced by this pandemic and all the lockdowns, who may have lost their job and source of income. You could change their life. We are creating opportunities for people to do their best work, to contribute, to create amazing products, to create amazing experiences for people studying data science. 
Kirill Eremenko: So here are some of the positions that have just been released: VP of Marketing, Product Designer, General Manager, VP of Sales, Junior Media Creator, Sales Representative, B2B events Sales Representative, Event Marketer, B2B Sales Representative and Marketing Strategist. And those are just some of the intial positions that we have available right now. More will come soon, so keep an eye out at www.superdatascience.com/careers. Maybe we’ll even post a data scientist position in the near future. 
Kirill Eremenko: But even if none of these are relevant to you specifically, if you know somebody who’s in marketing, or in sales, or who’s a great general manager, who’s great at creating amazing products in education and learning experiences, or who’s great at running events or somebody who is amazing at creating animated videos, if you know any of these people, any people with the right talents and skills, please send them this link, www.superdatascience.com/careers. This could change their life or career especially in these dificult times. Thank you very much for your help and let’s get back to the episode with DJ Patil. 
Kirill Eremenko: One of your co-panelists on this open hearing, Mr. Chris Darby from In-Q-Tel, he had an interesting comment. He said, “All roads lead to two places…” in technology, I’m assuming, “… microelectronics and biotechnology.” And data science is at the core of all technologies right now, in my perspective, because it’s data, right? And then he proceeded to quote a scientist, as he mentioned, a scientist from China, and he said that according to the scientist, the quote was, “The Europeans won the industrial revolution, the Americans won the IT revolution, and in China, we’re going to win the bio-revolution.” What are your thoughts on that and how can America and the Western World compete with China in the space of the bio-revolution? 
DJ Patil: So I think it’s very easy to try to just highlight China as the bad guy in this kind of situation. And it’s more useful, I think, to ask who are we really competing against? To me, we’re competing against cancer. We’re competing against the pandemic that is already here. We’re going to have far too many people that are going to be killed by this disease because we weren’t able to use data efficiently to know where it is, to test appropriately, and develop strategies to get ahead of the typical infection curve that is the exponential rate of infections. So when I look at that, I look at what’s holding back a cure. Well, one, we have the best data sets right now in the United States and across Europe because we have not only genetic diversity but we have great electronic medical records. The problem is the data is fragmented over thousands of databases and there’s no ability to easily pull that data together. 
DJ Patil: Earlier this week, new rules were passed by the administration to actually make sure that the data remains a patient’s data and you can take your data and move it, and that includes to researchers. The reason that’s so powerful is, if we’re able to bring that data together and you have fantastic data scientists working on that data, maybe there’s cures already out there we just haven’t realized are cures. And when we partner with epidemiologists, researchers, and the traditional drug discovery units, maybe we’ll find something that could be used from off-label use. It’s not already used for one thing but if we use it there it’s going to have fantastic impact. Maybe it’s going to help us identify new forms of disease vectors that we hadn’t thought about and then when we look at them we’ll go, “Oh, wow. How amazing is it that we now have this targeted population that if we find a cure for, we’re going to give them disproportionate value added for life.” 
DJ Patil: We look at something like ALS, Lou Gehrig’s disease. We look at Alzheimer’s. We look at all these things. These diseases don’t care what race you are. They don’t care where you live. These are problems of a species. What I look at as a country, and this was why it was so important that when President Obama launched the Precision Medicine Initiative and put Joe Biden in charge of the Cancer Moonshot, was that we have to put data together along with all sorts of other things, microelectronics, biotech, new sensor designs, all these things together to find new ways to think about these diseases. We cannot be thinking about them in the ways of the previous few decades. Central to that thesis are going to be the data scientists. The data scientists are going to be the ones that are going to unlock this. Whether you call them a data scientist, you call them an epidemiologist, that person who is looking at data right now, that person is going to be key for helping us get ahead of this pandemic that is here now called COVID-19. 
Kirill Eremenko: Yeah, that’s definitely a big problem. I saw recently that Johns Hopkins University released data to the public that you can go and analyze about COVID-19. As you say, maybe somebody will come up with a solution along the way. 
DJ Patil: Well, this is why transparency of data is so critical. Right now, we don’t have great transparency of data between countries. China has been far too slow in releasing the data. That was true during SARS. We’ve seen this also during MERS that there wasn’t enough data sharing. And the Ebola incident, one of the most powerful things that was used to help get ahead of the Ebola incident was Google Docs because people would share their data as spreadsheets and you didn’t know when that spreadsheet was last really updated and by who. So having real-time and somebody filling in the data that they saw in their town and updating it daily gave everyone a clear indicator of where the disease was moving and propagating and allowed us to get infrastructure in place to make sure that you could start helping people. 
DJ Patil: That transparency is not happening fast enough right now in the United States. For example, where are the total number of tests? How many are administered? How many are positive? All of this, if there was very aggressive data sharing across a federal system, across the states, across the cities, across the towns, we’d have a much better realistic picture. And then we could start developing strategies very quickly. We could learn from the Chinese because they’ve dealt with this first. We could learn from the Italians. And then we could share with countries that are going to be impacted that don’t have the quality of healthcare system that we do so the number of deaths in those societies is going to be substantially higher. We could save a lot more lives if we had people just doing something very simple with just data sharing. 
DJ Patil: This is one of the things that’s really important that I have found in my experience around these things is we often look to the AI solution right away. A lot of times, we could just go with the tiny, bare-bones, just share some data and you’ll find a huge amount of lift in the problem. That’s not to say we shouldn’t do the AI solution. I’m not saying that at all. I’m saying that let’s focus just on some of the basics. Can I give you a concrete example? 
Kirill Eremenko: Sure. Sure. 
DJ Patil: In Miami-Dade, Florida, they realize that we have, as many places in the United States, is that we have this problem of too many people in jail. And one of the root causes of that is mental health issues. People with mental health issues get taken to jail rather than actually getting to the treatment centers that they should. Same with drug addiction. So if you see a person who’s constantly getting picked up for mental health issues, why do we keep taking them to jail? That’s kind of crazy. 
Kirill Eremenko: Mm-hmm (affirmative) 
DJ Patil: So, instead, what they decided is they decided to say let’s share data between our public health system and our criminal justice system but in a super-secure way that respects privacy. The data can only flow from criminal justice to the health system, not the other way around. And when somebody gets picked up, they check in with the public health system and if they see that person they don’t take them to jail. It cost something like a million, million and a half dollars, to get this going. In the first year alone it saved 10 million dollars. 
Kirill Eremenko: Wow. 
DJ Patil: But the real value is it closed a full jail. Then a little later on they closed a second jail. All that was done there is sharing data. It’s the spreadsheet. It’s literally a spreadsheet, now with a lot of safeguards in place, but a spreadsheet. 
Kirill Eremenko: Very interesting. Wow. Yeah, so that really shows importance of having this role in the government. It was very exciting to hear when you got the chief data scientist position which was created for the first time by Obama and you were the first chief data scientist for the US. I think it’s very important. Is there a chief data scientist at the moment in the US? 
DJ Patil: They are looking for one, is what they tell me. So maybe somebody here will apply. As much as I might be harsh on this administration, there also have been a number of really good things this administration has done around data. For example, recently President Trump did sign an executive order that basically asked a reevaluation of how we look at organ donations. The fact, right now, is too many people in this country go without an organ when they could easily receive a kidney or a liver or a heart or something that would give them an incredible number of days left in their lives. They would be able to take that. 
DJ Patil: But the reason that happens is there aren’t any quality measures that actually assess when are people doing a good job of actually making sure those organs get to the right person. And so, as a result, many times these groups that actually have the responsibility to do this, they let the organs expire. They’re left in the body for too long. They’re not picked up in time. They’re mishandled. And so a person who’s waiting on the operating table to receive their kidney doesn’t get it. And that’s just a tragedy when it could be so easy to do. 
DJ Patil: We’re not talking, again, any sophisticated AI. We’re talking about just measuring something and having a dashboard that allows us to ask ourselves are we doing a good job or not, and continuously improve it. 
Kirill Eremenko: Gotcha. No. Totally agree. Totally agree. In the interest of time, let’s proceed to our little experiment that we did on LinkedIn asking people for questions. So, as you saw, there are dozens of questions posted for you from people. Very excited to hear from you. Maybe let’s have a few. What is your favorite question out of the ones that you saw on LinkedIn? 
DJ Patil: Oh, boy. I can’t pull it up here simultaneously. 
Kirill Eremenko: Oh, okay. No worries. One of my favorite questions was from Akshey who asked, “What do you think makes a good data scientist and how do you approach any data science problem?” 
DJ Patil: The thing that I have found, Akshey, time and time again is the best data scientists have curiosity. They’re the people that just have this ability to go, “What about this? What about this? What about this?” And the question I used to literally give back in the days of LinkedIn is I used to say, “Pretend you had all of LinkedIn’s data. What would you be interested in knowing? What would be the first thing you would want to know?” And you’d be surprised how many people would just stare at me blankly. The best data scientists, they would just start and they’d have idea after idea after idea. And they would just keep going until I was like, “Okay. Okay. We’re good.” The best ones, oftentimes, would be like, “Have you thought about this or this?” and I’d be like, “Oh my gosh, no. I haven’t.” Or they’d say, “What if you combined data from LinkedIn with this other data set? Have you thought about that? And what about this? Have you tried this? Or could we turn this into a product that would have value this way?” 
DJ Patil: That curiosity plus passion is something that you develop especially at the intersection of multi-disciplinary sciences. So, myself, I was working in non-linear dynamics. Was doing math but was also doing a tremendous amount of weather data. And so you kind of have to sit at these intersections and you’re just trying to find data sets. You’re trying to figure out things. What I tell a lot of data scientists is you need to play with a lot of data sets to just develop intuition, to develop curiosity. Be very fast at plotting something, trying something, getting a sense of what’s going on in the data. For me, sometimes when I get a data set, the first thing I love to do is just kind of tab through the data and just get a sense. There’s this moment like if you use Unix or Linux you’re using the more command and you’re just seeing what’s in this file. Are there characters? Are there just numbers? Are the numbers decimal? You just let it blur and you just get a sense of what’s in there. It just starts to expose. 
DJ Patil: And then I’m trying to find lots of ways to just visualize it. Visualization, for me, oftentimes, is just histograms to get a sense of what’s in this and then trying to go, “What if? What about this? What about that?” The more you can develop that the better I think you’re going to be at being really fast at helping find solutions for another person. 
Kirill Eremenko: Gotcha. Curiosity. Wonderful answer. I love it. Suman asks, “What are the new challenges where data science is heading towards? What is your vision for data science in the next five years?” 
DJ Patil: Wow, Suman, great question. So the first is I think there are so many areas I’m so excited about data science impacting. I think data scientists are one of the new form of first responders. You know, when there’s an earthquake in a remote area of the world, before people can even get in to help, first responders now have the ability to look at satellite imagery, drone footage, being able to tell which roads are washed out or bridges have been wiped out. If it’s a hurricane we could use drones plus just a little bit of computer vision to actually tell which houses people are on. Could we then route boats to quickly get to all those people just like we’d use Uber or Lyft or UPS uses routing algorithms? 
DJ Patil: In terms of the biological fields of trying to understand how disease manifests using large data sets to find that basis like the Precision Medicine Initiative. I think about the world of understanding new chemicals and particularly about material sciences and using data science to help understand how to get better manufacturing. That’s a fantastic area. 
DJ Patil: I look at the world of how do we create tailored education and help people learn faster? Myself, I was such a bad student. I think tailored education would’ve really helped someone like me. I could go on and on. If there’s one thing I think that I’m most excited about for the data science field over the next five years is this is central to the success of every institution and every organization from nonprofit to for-profit, the government, everybody will have to have some notion of data. And everybody that’s being trained in undergraduate curriculum will have some element of data literacy. 
Kirill Eremenko: Mm-hmm (affirmative) Absolutely. Absolutely. Like Andrew Ng says, “AI is the new electricity.” Right? I can’t even think of a single business that doesn’t use electricity right now, whereas 100 years ago I think the residential electrification of the US was around 50%. So it’s massive. 
Kirill Eremenko: Okay, next one is a fun one from Abhishek. “Which was your most memorable work memory when you were at the White House?” 
DJ Patil: Oh, boy. What’s my most memorable? The White House is phenomenal in the way that there are moments where things are incredibly astonishing positive and astonishingly sad. That’s just the reflection of how complex the world is. 
Kirill Eremenko: For example, what do you mean? 
DJ Patil: On a positive, I remember so many positive ones. One that always stands out in my mind was the day the President was flying back from being in the south where he was doing a memorial for a number of people who were shot in a church. And he was flying back but that was the same day the Supreme Court ruled that anybody can get married to whoever they like because love is love. We put the colors on the White House as a rainbow. And I remember the President’s helicopter coming in from such a tragedy, circling around, and we were thinking about the juxtaposition of such vicious hatred at one moment that the President is having to console people over and the next moment having these amazing crowds there to celebrate such a phenomenal activity. 
DJ Patil: So many times meeting with people who have rare diseases and are looking for a hope and realizing that they cannot wait. They can’t wait for bureaucracy to figure out how this is to work. They need the data in people’s hands who are going to figure out how to find this cure for something that their loved one has or they have. Time is so essential. What data science is, is it is an accelerant to solutions. If we’re not careful, it is an accelerant to entropy. It can cause incredible harm. But when used and wielded correctly, it is an accelerant to help to deliver solutions very effectively. 
Kirill Eremenko: Wonderful. Thank you. Siddharth asks a question. Something we touched on already, in this podcast, I think. Maybe we could elaborate. Quite a long-winded question but I think it’s an important one. “Data science seems to enforce centralized power rather than decentralized power in multiple contexts. The best consumer company’s driven by data science are monopolous like Facebook and Amazon. The best enterprise data science companies are like Palantir and Databricks which primarily serve the largest companies in the world as their customers. Data seems to do much more help to the Chinese surveillance state than it helps democratize and improve the way we vote. How can we use data science as an equalizing force for society rather than a centralizing force? Is that even possible?” 
DJ Patil: Yeah, so one of the most important things that just happened this last week is the belief that a patient’s data is theirs. It doesn’t belong to the hospital. It doesn’t belong to the doctor. It belongs to the actual human. For quite some time, the hospitals and physicians have believed that they own the data. You should not. Now, it is codified that it’s your data and you have access to it. If you want to move it? By all means, you should be able to get access to it and you should be able to take it to where you want. You want to donate it? Great. Good for you. Donate it. It’s giving you control. That’s part one. 
DJ Patil: Part two is what we have to ensure is that there’s transparency of data. You have to be able to access it. We still don’t have enough reporting requirements for people to know what data is being collected? Who’s got my data? Who sold my data? We’re starting to see elements of that in different policies, some of which are in Europe under what’s called GDPR and in California under California Consumer Protection Acts, CCPA. But we need more of that. Right now there are many data brokers who can suck up data and use it without you knowing it. Some of those data sets have real implications for the population. For example, data sets that are collected and used in loans has been shown to actually impact negatively the black population. How do we ensure that safeguard? We need that form of watchdog. Somebody who’s actually looking over the shoulders for people to actually make sure that people are using data in an acceptable way for society. 
DJ Patil: The other part here is how do we train data scientists? As we go forward and we think about the companies and we think about who is there, what’s fascinating is we always talk about data interviews but we never actually talk about giving people an ethics interview around data. So one of the things that anybody who interviews with me, you’ll go through an ethics interview with me because I view ethics as part of asking a question around cultural fit. If we can’t see eye to eye on how we think about the importance of ethical issues, then how do we deal with it? 
DJ Patil: I’ll give everyone an example of one because it’s not hard. Supposed we’re working on a problem and we know we’re not supposed to use race but we find a proxy for race. We also find that if we use this proxy for race, we’re going to help a lot of people. What’s your next step? 
Kirill Eremenko: Oh. That’s a tough… What answers do you normally get to that? 
DJ Patil: Well, I think the real answer that’s interesting is, first, as an organization, what safeguards do you have to make sure I have the resources to be able to address this problem correctly? Is the organization prepared because what if I don’t know the answer? How do we adjudicate this? Who do I ask? Do we culturally have this? Everybody that is interviewing at a company should ask their company how do you handle ethical issues around data and technology? If everyone asked that question when they interviewed Facebook or Google or any of the other companies that were called out, you would start to see a material change in their approach. 
Kirill Eremenko: Yeah, wow. It’s so tempting not to ask, right? You just want the job. You just want the high salary. You have to put the global best interest, the greater good, ahead of yourself in order to ask that question. 
DJ Patil: This is a thing that we have to grapple with as a community. We want the salaries. We want the power. We want the prestige. Where is responsibility in that conversation? To be empowered by society to do things with data and technology means that we have to lead the way, also, on responsibility. We should be leading from the front. We shouldn’t have to have civic groups push us and say, “Have you thought about this or this? What about these issues?” We shouldn’t have regulators saying, “Hey, how are you doing this?” We should be going to them and saying, “Hey, we have the following concerns. We’re not sure we have all the right answers. What should the answers be? Can we work together to figure it out?” 
DJ Patil: We need to push society to understand the implications of what we are developing, the positive, the negative. Otherwise, if we do not, what will happen is that data sets will be harder to access. There will be more restrictions on it. Progress will slow. That also means that people won’t have as many jobs. But more importantly than all of that is that somebody who needs a cure, somebody who needs help in a disaster, somebody who is relying on a technological breakthrough to happen to improve their quality of life or a loved one’s life, will not get the solution in the time they need. 
Kirill Eremenko: Gotcha. Well, I can feel how you’re passionate about this. Now it makes sense to me how or why from working at the White House and doing public service you moved into the healthcare space and doing data there. 
DJ Patil: Yeah, well, the reason I moved into healthcare is, a big part of my portfolio that President Obama had set up intentionally was healthcare. And I think rightly so because he realized that people who are typically in technology don’t work on national security problems or something else. We don’t often gravitate to healthcare. Or that people have been working in healthcare for a while but they haven’t had access to some of the newer techniques that we really pioneered in the consumer and enterprise companies. So what happens if we get people together to do that? That genesis and looking at that left us with the question that we had a chance when we left the administration to ask well, what are we going to spend our time doing? 
DJ Patil: Well, if you look at that, one of the greatest challenges that we have is how do we ensure that people have access to the care they need, they want, they deserve. And so we said the only way that this is going to happen is if we actually show the way forward in what we believe is true. And so we said we were going to do this when the only way to actually make it work, in our model, is through a corporate enterprise. And so we started Devoted Health and the mission is to build a healthcare system that takes care of everyone like their own family. 
DJ Patil: Literally, we have something that we call the prime directive which is if you’re not sure what decision to make, close your eyes, visualize literally in front of you the person that you think of the most, your loved one. What’s the decision that you would want to make for them? And when you have that, run it by other people to make sure it’s legal, it’s safe, it doesn’t have downsides. Then take the action. In healthcare, time is of the essence and so we have to build those solutions. We have to build those technologies. 
DJ Patil: And parts of it are already proving. We find everyday somebody who is in a situation where our job is to figure out how to unstick something in the healthcare system for them. And it’s not rocket science. A lot of times it’s just finding out something very obvious and trying to figure out how do they actually get an answer from somebody? Why do they have a drug interaction? Why have they been prescribed drugs that are going to cause some kind of interaction? Has anybody looked? Has anybody double checked with them? Those simple things. 
Kirill Eremenko: Wow. Sounds like you’re making massive progress with Devoted Health and I wish that it goes really well and we all see results, especially- 
DJ Patil: We hope so. It’s not a winner-take-all market. We’re excited that more people are coming to work on these problems. We need more people in this country to work on these things. If we have more people working on these problems together, the we wins. It’s what is behind when we say we, the people. We, the people, isn’t just a whole bunch of individuals. It’s we as collective people, as citizens, as community, as companies, as nonprofits, as religious groups. When we all come together against a problem and we decided people should have not only access to healthcare, they should have access to good quality healthcare. And it should be affordable. Then we’re going to see the change happen.
Kirill Eremenko: Yeah. Gotcha. Amazing to hear this trajectory and the progress that’s being made. I know you have to go, DJ. 
DJ Patil: Can I give one more thing? 
Kirill Eremenko: Of course. 
DJ Patil: Yep. What I would tell people to think about a lot of times when we’re thinking about data science and we’re thinking about the problems that we pick. As data scientists, we get to pick our problems that we want to work on these days. Ask yourself, what is going to move the needle the most for your children and your children’s children? Because we’re in that inflection point as a society that if we pick the problems that move the needle for our children and our children’s children, we will select a set of problems that will deliver outside value for decades to come. 
DJ Patil: When that impact manifests and we look back at our careers and we look back at what we’ve done and how many people we’ve helped along the way, then we can rest easy. If we look back and we only say, “Gee, that only benefited me.” what good is that at the end of the day? It doesn’t matter if you wrote the fastest algorithm in the world, you’re traveling alone. And that’s a sad lonely place you could be and it’s a wasted set of skills, in my opinion, because everybody that is working in the data science field has such phenomenal opportunity to have an impact now. And society cannot wait for the impact that every one of you can provide. 
Kirill Eremenko: So much leverage. Data science provides so much leverage. 
DJ Patil: It’s leverage and that’s why we have to do it as a team. It is a team sport and all of us have to be on that team together collaboratively to make this happen. This is why the community that you’re putting together is so important. Without that community, where are we supposed to talk about these hard things? Where are we supposed to have dialogue? Where are we supposed to push each other? Where are we supposed to learn from each other? We have to create those communities. And it’s not just one community, it’s going to be different kinds for different types. Who knows where it’s going to evolve? But without us as a community, we’re going to be struggling to actually be on the right side of this equation over the long arc of history. 
Kirill Eremenko: Gotcha. Wow. Thank you very much, DJ. I think we can wrap on that. I know you have to go but that was very inspiring. I feel so inspired just listening to you right now. 
DJ Patil: Yeah. Well, thank you for everything you’re doing for the community. It’s very much appreciated. 
Kirill Eremenko: Thank you very much. 
Kirill Eremenko: So there you have it, everybody. Thank you so much for being here today and being part of the SuperDataScience community. As you heard from DJ Patil himself, communities in data science are ultra-important because where are we going to discuss these critical issues, ethical privacy, future of technology issues that are on everybody’s mind that are dictating where this field, and where the world is going. Because data is underpinning all technologies that are revolutionizing the world and data science is the way to deal with data. And on that, I hope you enjoyed this episode. My personal favorite part was when DJ was talking about the importance of doing data science not just for yourself. Being in the field not just for the purpose of benefitting yourself, but instead, thinking about others. How you’re impacting the world, the communities around you, people around you? Because, as data scientists, we have so much leverage to create impact, in DJ’s words, “it would be such a waste of our skills to just think about ourselves and not think about others.” I found that very inspiring. I hope you did, too. 
Kirill Eremenko: And if you enjoyed this episode, I highly encourage you to follow DJ on LinkedIn where he has over 700,000 followers as well as other social media. We’re going to include all of the relevant links in the show notes, as always, and you can find them at www.superdatascience.com/355. That’s www.superdatascience.com/355. And one thing I would like to ask of you, if you did enjoy this episode, please share it with your friends and colleagues. Let’s spread the word about data science and what missions we have as data scientists across the community. If you know a data scientist, if you know a data science manager, data science leader, data science practitioner, somebody who is getting into the field of data science, send them this episode. It’s very easy to share, just send them the link www.superdatascience.com/355. And on that note, my friends, I really appreciate you being here today. Can’t wait to see you back here next time. And until then, happy analyzing. 
Show All

Share on

Related Podcasts