Podcasts SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

82 minutes
Business, Data Science

SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Professor Philip Bourne, Founding Dean of the School of Data Science at the University of Virginia, joins the podcast for a fascinating discussion that covers his biomedical data science research, the importance of open-source and open-access within the industry, and the most in-demand skills in data science today.

Thanks to our Sponsors

About Philip Bourne

Philip E. Bourne, PhD, FACMI is the Stephenson Chair and Dean of the School of Data Science and a Professor in the Department of Biomedical Engineering at the University of Virginia. Prior to that he was the Associate Director for Data Science (ADDS; aka Chief Data Scientist) for the National Institutes of Health (NIH) and a Senior Investigator at the National Center for Biotechnology Information (NCBI). He has published over 350 papers and 5 books and co-founded 4 companies. Awards include the Jim Gray Award eScience Award and the Benjamin Franklin Award. His current research focuses on systems pharmacology (notably neglected tropical diseases and protein kinase targets), structural bioinformatics, scholarly communication, professional development and the development and application of data science methods.

Overview

As the Founding Dean of the School of Data Science at the University of Virginia, it was only natural to kick off the episode by discussing the goals that first motivated the idea. Philip said that the school brings together many disciplines that are critically important to data science, creating camaraderie but also competition. Essentially, the school aims to “be a point of exchange of best practices” that can be shared across academia and industry.

And that got them thinking–how has data science’s role evolved within academia throughout the years? After a decade of experience in computational biology, Philip noted that the Human Genome Project in the 90s was a significant turning point. “It was the very first time I saw the synergy between experiment and computation within biology,” he revealed. From there, Philip explained, the use of computation in academic disciplines went from being just an afterthought to becoming integrated within the field and now driving innovation across disciplines.

Regarding the most essential data science skills one should master, Philip revealed the 4+1 model he and his team use at the school. This skills framework begins with analytics and is followed by three other elements:

Systems: how we move data around; the hardware, software, and cybersecurity, etc.
Design: how we think about human and computer interaction.
Value: the notion of ethics, policy, justice, and law.

Finally, the +1 component refers to the domains that these elements can be applied to.

As far as future-proofing a data science toolkit, both Philip and Jon agreed that data engineering is emerging as one of the most valuable areas that can prepare students and professionals for the data science industry of tomorrow but also of today.

It then came time to discuss Philip’s extensive biomedical data science research. Among the many technical elements that were touched upon, the pair explored the fascinating practical applications of his biomedical data science research into the structure and function of biological proteins. Philip also shared how computational approaches to understanding a protein structure like AlphaFold 2 enable us to understand the impact of genetic defects, discover new drugs, and help prevent climate change.

Tuen in to hear more about Philip’s research and why open-source code and open-access publishing in data science are essential to the industry.

In this episode you will learn:

Why Philip founded a School of Data Science [6:08]
How computing and data science have evolved across academic departments [15:55]
The improvements needed in higher education [26:44]
The most important data science skills for academia and industry and the 4+1 model [36:49]
Philip’s biomedical data science research and its fascinating practical applications [43:24]
The essential roles of open-source code and open-access publishing in data science [1:01:27]

Items mentioned in this podcast:

Follow Philip:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn:

This is episode number 593 with Professor Philip Bourne, Founding Dean of the School of Data Science at the University of Virginia. Today’s episode is brought to you by Z by HP, the workstations for data science.

Welcome to the SuperDataScience podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today and now let’s make the complex simple.

Welcome back to the SuperDataScience podcast. I’m delighted to be joined by the distinguished data science researcher, Professor Philip Bourne. Philip is the Founding Dean of the University of Virginia’s School of Data Science. He’s Professor of Biomedical Engineering at the University of Virginia as well. He’s Founding Editor-in-Chief of the open-access journal, PLOS Computational Biology. He was previously Associate Director for Data Science of the NIH, that is the illustrious National Institutes of Health in the United States. He’s held roles at the University of California, San Diego and Columbia University in an academic career that began half a century ago. He holds a PhD in chemistry from Flinders University in Australia.
Despite Professor Bourne being a deep technical expert, he conveys concepts so magnificently that today’s episode should be broadly appealing to practicing data scientists and non-technical listeners alike.

In this episode, Philip details why he founded a school of data science and why such schools are uniquely positioned to bear the fruits of applied data science research within universities, what the most important data science skills are in both academia and industry, how computing and data science have evolved across academic departments in recent decades, fascinating practical applications of his biomedical data science research into the structure and function of biological proteins, and the absolutely essential role of open-source code and open-access publishing in data science. All right, you ready for this deeply interesting episode? Let’s go.

Professor Bourne, welcome to the SuperDataScience podcast. It’s awesome to have you here. Where are you calling in from?

Philip Bourne:

Oh, thanks for having me. I’m actually in lovely Charlottesville, Virginia in Central Virginia at the University of Virginia.

Jon Krohn:

I bet that it is pretty nice there year round, probably hot in the summer, but the rest of the season’s delightful.

Philip Bourne:

Yeah, the weather’s actually pretty good here. I was in Washington for three years, which is much more humid and much more swamp like. I’d say we’re much more pastoral down here.

Jon Krohn:

Nice, lovely. Well, I’ve never been. Someday I hope to check it out. So we know each other through Amy Brand. So Amy was on episode number 567 and she had an awesome episode on open source and the importance of having science and journals and anything that we can think of be open. And we’re going to get to some of these open topics later on in the episode. I understand that you also know Xiao-Li Meng, who is in episode 581, kind of your counterpart in a way over at Harvard. He’s leading various data science initiatives, not a school of data science there yet.

Philip Bourne:

No, we actually had… He also runs the Harvard Data Science Review, was very instrumental in setting it up. And we had conversations that actually linked the two, the Amy, together in that I was very keen that they make that open access, which has actually gone along pretty well. I actually think the model for journals ought to change even further, we ought to have much more alignment with interactive journals, and that’s something that I’m hoping to work on in the future. But particularly in data science, staring at static imagery of data is not necessarily the best way to convey the message. But all things for the future.

Jon Krohn:

This wasn’t something I was planning on talking about, but have you ever come across a distill.pub as a data science publication?

Philip Bourne:

No.

Jon Krohn:

It is a really cool… It is designed to be interactive first so it has lots of interactive blog posts. They typically put a lot of effort into each one. So it’s more about quality than quantity. And in fact, I’ve just now, in real time, gone and looked up and they actually haven’t published anything since September. So I don’t know how active that is, but there are a few dozen historical articles in there and they are interactivity first, so that might be something that interests listeners looking for [inaudible].

Philip Bourne:

Jupyter Notebooks achieve some of that, but it’s the recognition piece and there lies the problem, it’s the culture of the whole publishing industry that what counts towards your scholarship is typically quite traditional. So the idea, it’s just not well accepted. We actually have a Wikimedian in Residence, for example, but no one gets tenure in a university by publishing Wikipedia pages, which is in my mind actually somewhat unfortunate.

Jon Krohn:

Right.

Philip Bourne:

But that’s the nature of the beast, unfortunately. So it’s changing that system one slow step at a time.

Jon Krohn:

Well, it seems like even this effort, Distill, to have things be quite interactive, now by scrolling down, I’ve noticed that they posted in July 2021 that they’re taking a break.

Philip Bourne:

Yeah, case in point.

Jon Krohn:

Yeah, exactly. Anyway, you are the Founding Dean of the University of Virginia School of Data Science. So what does that mean? What does it mean to have a school of data science and what prompted you to lead the creation of the school?

Philip Bourne:

Originally, I was the Chief Data Officer of the National Institutes of Health and I decided at one point I didn’t want to work in government anymore. So I was looking to go back to academia, and I actually came here for a couple of reasons. One, at the time, it wasn’t a school, but there was already a data science institute. It was just a very small group of people and I just really enjoyed the interactions I had here and I could see that there was an opportunity to build it out and then also do my own research.
And it was only later that the leadership of the university, I have to say that a large gift, in fact, the largest gift in the university’s history, helped launched the school but that ultimately was not the reason for doing it. It was really that the importance of data science got elevated to that of being a school, which is a major construct within an academic framework. And it hasn’t happened much. I mean, University of Virginia is a little over 200 years old and it’s only in the 12th time in its history that a school’s been formed. So I think it sort of speaks to the importance going forward, that data science is perceived to have.

Jon Krohn:

Yeah. I mean, you’re preaching to the choir, I guess a bit here on the SuperDataScience podcast, but it makes a lot of sense to us. So I guess the idea is that it brings together lots of different disciplines that are critically important in data science. So things like mathematics and computer science statistics, these can all be taught in one place. There’s, I guess a sense of comradery and scholarship between those disciplines that overlap in data science.

Philip Bourne:

Yeah. There’s also competition, of course, with respect to those existing disciplines. I mean, I think from our point of view, we speak about the school as being a school without walls, and really that takes what you just described, which are core elements that come from statistics, computer science, applied math and so on. But really, it’s the domains to which that’s applied. I think that’s what makes for data science in my view. And what we’re trying to do here is to sort of pervade throughout the university framework and essentially be a point of exchange of best practices, where we bring together people and methods and data, protocols and workflows, whatever else it might be, that we share those more broadly across, frankly, what is a siloed institution, not just the house I’m talking generally. And this, of course, applies across not just academia, but also the private sector. So it’s really us trying to be that point of exchange and I’ve got examples of why I think that’s so important that I wish I can go into, if you want.

Jon Krohn:

Yeah, absolutely. Go for it.

Philip Bourne:

So a good example when I use a fair bit is just, one day I’m sitting in my office and there’s a bang on the door and it’s this fellow. And he says, “Well, I’m actually a trauma surgeon.” And I said, “Really, but what are you doing here?” And he says, “Well, I’m actually interested in learning about data science.” So I said, “Why?” He said, “Well, I’ve been doing trauma surgeries for a long time here.” UVA has a health system that’s really quite renowned, got to have that plug in there.
He said, “Quite often trauma comes from car crashes. And what I noticed anecdotally was a relationship between the kind of injury that people were getting and the kind of car crash they were having, whether it was a rollover front, head on, rear-ending, whatever it might be.” So he said, “I got interested enough in this, that I went to,” this is where a medal is deserved. He went to the Department of Motor Vehicles and got public data on crashes in Virginia. And then he tried to begin to map that to the electronic health record, which he has access to in the hospital.

Jon Krohn:

Wow.

Philip Bourne:

And he said, “I don’t know what tools to apply. I don’t know how to do this.” So he ended up doing a master’s in data science within our program.

Jon Krohn:

Wow.

Philip Bourne:

And the notion of all of this, of course, is that when someone shows up in the ER, after a crash, what happens now, what was happening is that they don’t know how to treat them because a lot of injuries are internal. So they start doing a full body scan. And occasionally people die in the scanner, because the time it takes to localize where their trauma is. If you had a better idea through this kind of correlation between a kind of crash and the internal injury, you could potentially short circuit that process by examining that part of the internal organs immediately and then dealing with it. So I thought, this is to me the essence of data science that suddenly, first of all, you’ve got someone, in some ways a sort of citizen who’s getting involved in all of this. And you’re bringing together two very disparate sets of data. I mean, normally people don’t think about what would be typically the study in the area of transport versus health, bringing those datasets together, actually to create societal benefit. It doesn’t get any better than that.

Jon Krohn:

No, it doesn’t. That’s a really good example.

Philip Bourne:

And we’ve got lots of examples like that that’s happening and it just wouldn’t happen without having this, whether it’s a school or not, but certainly having this entity that people can turn to, to help them with this kind of learning and improvement in society. So, that’s what we’re about.

Jon Krohn:

If you have one more such example like that one on the tip of your tongue, I would love to hear it. That one was amazing.

Philip Bourne:

Well, I would say, I mean, well, I could actually cite from my own work, but I won’t do that. Let me just sort of twist it slightly and say, so people say, well, okay, that’s an example, but is it generalizable, this capability? I had this conversation with someone once. We were sitting having a coffee one morning at a bakery here in Charlottesville, and I said, “Okay, pick an object. And I’ll tell you a data science story about it. There’s a data science story about everything.” So he said, “Okay, that blueberry muffin over there.” I said, “Okay, well, first of all, the data that machines are collecting now that actually, it’s still about manufacturing, their fine tuneable and effectively their data that’s being collected from them is used to tune the instrument over time. So you end up creating a better muffin.” So, that’s the first part of the supply chain.

Jon Krohn:

Right.

Philip Bourne:

Well, it’s not the first part, it’s the central part of the supply chain. How you get the ingredients and how you actually predict how much you need to buy and so on, that’s more data science in the supply chain that it’s precursive to actually making the muffin. And then, of course, there’s the whole distribution, and there’s the notion that right now, I can look up on Yelp or wherever it might be and I can see that they have good muffins, or bad muffins, of course. What I can’t do is say, well, it’s 10 past 11:00 in the morning. The odds of that muffin still being sitting on the shelf is one in three. That predictive, based on the sales rate and everything else, another piece of data science. So it’s all along that supply chain and there’s jobs to be had all along there to make the baker happy.

Jon Krohn:

Super cool. And the consumer.
This episode of SuperDataScience is brought to you by Z by HP. Get rapid results from your most demanding datasets, trained data models and create data visualizations with Z Data Science machines, which come in both laptop and desktop workstation options. The Data Science Stack Manager on these Z by HP machines provides convenient access to popular tools and updates them automatically. So this helps you customize your environment easily on either Windows or Ubuntu. Find out more at hp.com/datascience. That’s hp.com/datascience. All right, now back to our show.
Very cool. Those were amazing examples. I am going to force you to talk about your own work later on. So you haven’t gotten off that easy. So your field is computational biology. We’ll dig into that later on. Is there anything from your decades of experience in computational biology that you’ve brought into the University of Virginia School of Data Science? Are there any relationships there?

Philip Bourne:

Yeah. If I think back, you say decades, of course, people aren’t going to be able to see me, but it’s actually four decades. I think, in a way, what’s happening, computational biology is a precursor to what’s happening in everything, whether it be religious studies or whether it be history, whether it be economics, it’s the same process. And so by looking at that process in computational biology, which I’ll explain to you, I think you could see where it comes from.
When I started doing this, well, it’s like the ’70s, but let’s just call it the ’80s. I mean, the idea of using computers to study biology, there was just basically a few of us that were considered weird by the [inaudible]. And then what really changed it all was the Human Genome Project. So the Human Genome Project that came along in the ’90s, basically what happened there is that NIH, the National Institute itself did, a lot of things I could say about them, but one good thing I’d say right now is that they had the foresight to say, okay, this is going to generate a bunch of digital data. And it’s not just about maintaining and using that data effectively, it’s actually leveraging it to do new science. And it was the very first time I saw the synergy between experiment and computation within biology, at least.
That notion really got to the point where people got really excited.

And then as the genome was becoming realized, industry suddenly realized, wow, there’s a huge future here. They basically bought up a whole bunch of the computational biologists. And then unfortunately the timeframes with which it was necessary to see the advances that computational biology was going to bring was not in alignment with the short product cycles the industry has. So a lot of those people drifted back to academia, but I’d say over the period of the 2000s and then into 2010 and beyond, it became to the point where you could get a tenure position by doing just computation in biology. So it really exploded.
And then essentially what happened after that is it just became to the point of being a very well accepted part. And I would say to the point, in a way, where it was sort of a support mechanism for experiment, I say the opposite is now true, that the predictive modeling that it’s about is, and will continue to come from the computational study is going to drive what we do experimentally. The experiment becomes essentially a validation of a computational outcome. That is a fundamental reversal.

And I said exactly what I just described to you to the Advisory Board of the Director of NIH when I was actually at NIH and reported to the director. My timeframe was a bit off. I said, 2020, it’s now 2022. And we’re still not really there, but we’re close and it’s going to be profound. And that is something…
And so what drove that? Well, initially it was all this digital data that everybody got the juices flying and the compute cycles going and everything happened from there. That’s now happening in every conceivable discipline. And I would say, sitting here in academia in the 40 years I’ve been in it, there’s nothing going on like what’s going on now across all of these disciplines. And as I said already, the goal here is to try and use our school to cross purpose a lot of the innovation that’s happening across the different disciplines. So, that’s really the relationship. I’m now doing it in my own field of which I now, of course, biomedical data sciences is what we’re calling it now. But it’s going on in all these other disciplines as well.

Jon Krohn:

Nice. So to recap what you said there and paraphrase back to you, from the ’80s until the early 2000s, there were only weirdos like you using computers. They analyzed biological data, and it was a little bit niche, but then starting around 2010, people driven by things like the Human Genome Project and people seeing the value in all of this digital data, people could start to get jobs around 2010 doing computation alone in not only biology, your field, but in other disciplines as well. So the same kind of digitization that we see in biology exemplified by, say, the Human Genome Project, we’re seeing this kind of digitization in a lot of different traditional academic disciplines. And that spans not only the sciences, but the arts as well. We see things like when I was an undergrad, I don’t think there were any faculty at my university who were computational historians, but now lots of people are analyzing digital records, analyzing literature, using natural language processing techniques.

And I suspect that similar to what you’re describing in biology, whether it’s the sciences or the arts, the humanities, a decade ago, people started being able to have their job be focused entirely on applying computational methods to whatever digital data you could get in that discipline. And now what you’re saying is that in recent years and a trend that will no doubt continue in the years to come, these digital records and computational techniques, they’re not the result of trying to think about solving some problem. They actually drive even what problems should be solved, what experiments should be run.

Philip Bourne:

Yeah. No, that’s great. You were actually listening to what I said, I’m impressed. You did remind me, just a little story that really got this sort of thing going. Really, it sort of goes back to why I’m so keen on doing this school is I was at UC San Diego for many years. And at one point I became the Associate Vice Chancellor for Innovation and Industry Alliances, which took me out of my pharmacology department and took me around the campus to talk to all sorts of different people, to try and help them turn their research ideas into product. And I went to see a fellow in the Rady School of Business. And when I walked in there, he had a paper of mine on his desk from computational biology.

Jon Krohn:

A paper man?

Philip Bourne:

I’m sorry?

Jon Krohn:

A paper man?

Philip Bourne:

No, a paper of mine.

Jon Krohn:

A paper of yours. Sorry.

Philip Bourne:

It’s my Virginia accent. And I said, “Oh, you brought that paper out because you knew I was coming to visit you.” And he said, “No, I had no idea it was your paper. We’re using a statistical method from this paper.” I said, “What are you using it for?” He said, “We’re analyzing Corning marketing data.” And I said, “My god, it’s completely different than what I was doing with that method.” And that got us talking. And it was all about the time, this was like 2010, ’12 when big data was just becoming a thing. So we got talking and we ended up organizing a big data at UCSD Day. And it was the most attended event, I think, as I recall, being told by faculty in the university’s history, because they all smelled money. They smelled what was coming.

Jon Krohn:

Right.

Philip Bourne:

It was full of the kind of things you just alluded to. The one I particularly remember was someone talked about where in the world is Francis Bacon. So essentially what they’d done is they’d done natural language processing and a whole set of digital texts, historical text that included references to Francis Bacon. And they built a social network, Francis Bacon, where he was on a certain day and time, who he talked to or he communicated with. And I thought to myself, however, accurate it might be, it is a tool that’s going to change how we study history. No one’s going to sit there pouring through massive amounts of text in Latin and goodness knows what else ever again, because that will be the full bat position. Well, we’ll accelerate the way we study history. And I thought, well, and that’s just one field. That’s just not everywhere. And now I’ve got all these people doing digital humanities in the school, doing really interesting stuff that is completely alien to me as a researcher, but incredibly interesting.

Jon Krohn:

Did they pick Francis Bacon because of the Six Degrees of Kevin Bacon?

Philip Bourne:

No, I don’t think so, but I guess they were just scholars that somehow got hooked up with these computing geeks who really helped them and opened their eyes to all of this. Yeah. But yes, it’s a good question.

Jon Krohn:

I don’t know. It’s one of the most famous contemporary examples of people trying to connect the dots around someone’s life. So doing that historically with another Bacon… All right. So we’ve talked about the University of Virginia School of Data Science and why you founded it. And now we’ve talked about how the use of computation and data science has gone from being a side show in academic disciplines to being not only integrated in it, but now driving a lot of the innovation that happens across disciplines. So this brings me to another kind of extrapolation question for you related to academia and education, which is, is higher education the way that it’s set up today working well, or do you think that there’s room for improvement in the future?

Philip Bourne:

I actually think there’s room for lots of improvement, a couple of examples. And I think this plagues not just higher ed, but also other industries, including the NIH when I was there, which is just this siloing that prevents the free flow. Just as an example, when I went to NIH as the first Chief Data Officer, I said to Francis Collins, the Director of NIH, “Explain to me my job.” And he said, “Well, it’s to take the 27 institutes and centers of NIH to operate independently and using data as a catalyst that should flow across the organization. I want you to change the culture of NIH.” And I said, “Oh, you mean this $40 billion year operation?” He said, “Yeah. Yeah.” I said, “Okay, what am I going to do next week that I will be changing cultures?”
And it’s true in academia. It’s interesting what happens in academia because you have these silos, you either belong to a chemistry department, a physics department, a biology department, an economics department, but they don’t really exist as purely that field of study anymore. They all involve some aspects.

So you increasingly see the notion of institutes or centers being set up that actually bridge across those disciplines. Or you have faculty who have appointments in two or more of those silos to try and compensate for this rigid structure that really in many ways should not necessarily exist anymore. And the students who come, I mean, they don’t care, they think it’s wrong. They just want to study what it is they want to study, which it goes across this gamut. But changing that culture where you’ve got, they’ve been financed this way and it’s particularly…
There are different financial models in universities, but one of them is called the RCM model where the money that comes in, whether it be from tuition or extraneous money from grants and so on and philanthropy flows directly into the schools and departments. So they hang onto that. And so they don’t want to see that structure changed. I don’t think if you asked any university president, if they were starting building something from scratch today, that they would do the same thing that’s there now.

Jon Krohn:

Right.

Philip Bourne:

It may have made sense years ago. So what we’re trying to do, even when we’re constrained in a university framework is effectively trying to create a university within a university that’s different. In fairness, we have enormous support from our leadership, our president and provost and so on to actually do that. And that really excites me. I’m hoping, we’re not the only ones thinking this way, but just that we and others become exemplars for what higher ed should be in the future. So I’d say that’s one large aspect of it. There’s another aspect.

Jon Krohn:

So if we could dig into that a little bit more, what are the kinds of things that you would like to do differently in the future in your university and a university? What are the changes that you’re making?

Philip Bourne:

Well, I think we’re creating this notion of what we call collaboratories, which are really these entities that we form with another school. So just as an example with education, clearly there’s a huge role that this is our school of education, which is not related to the university, they focus a lot on K-12. But what they’ve realized is that educational analytics and the notion of precision education that instead of taking a class load of K-12 children and essentially teaching them and examining them in exactly the same way, but they’re individuals with all sorts of different learning capabilities and so on.
We recognize to some degree students who have learning disabilities or particularly prodigies or whatever it might be at the other end of the spectrum, but there’s a large, large group that just get treated as one great blob. And that’s just not appropriate. I mean, I thought that when my own kids were in school. But they don’t have all the expertise in analytics to really make total advantage of what they know about education. So when we come together, because we don’t know as much as we should about education of K-12 children, then when we put the two together, bingo, hopefully some magic will happen.

Jon Krohn:

Gotcha. Right, right, right.

Philip Bourne:

That was quite an example.

Jon Krohn:

Yeah. So similar to the idea of having medical data, be combined with data science expertise and traffic data in your previous car accident example, this is another example. In this collaboratory, we have education researchers with domain expertise in K-12 teaching. And so they can combine with data science generalists who are proficient in organizing datasets and analyzing datasets to help figure out where the insights are, where the significant connections are in the data that the education department has. That’s a really good example.

Philip Bourne:

I could just drive it a little further where just the ever improving or ever-changing technology has an impact on all of this is, now, devices that deal with things like eye tracking are very inexpensive. So you can actually eye track every student in the classroom. And basically that data tells you a lot about the teacher, but it also tells you a lot about the students. So that’s one aspect, but on the other hand that also has significant ethical consequences. The responsibility and the ethical aspects of data science is something that we’re particularly concerned about in our school. So there’s lots of different components to this, but technology is obviously helping drive the moving more towards this precision education with caveats.

Jon Krohn:

Yeah. Precision education sounds cool. And now, so I wonder if you’ll be able to go back, I interrupted you to get an example of the kinds of things that you were trying to do differently in your School of Data Science. And so you were about to say something else. You were like, “So I’ve finished my one main point and now I have another,” and then I interrupted you. Do you remember what you were going to say?

Philip Bourne:

No, but I’ll make up something. I think, how we think about ourselves as a group, as a team within the school… Oh, actually, I think we’re trying to reduce hierarchies as much as possible. Academia is very hierarchical and we’re trying to address that, but that wasn’t the point I was going to make. I think the point I was going to make is eating our own dog food. It sort of relates to some extent, to the example I just said about K-12, but applying that broadly across what we’re doing. We in academia, and I think this is generally true is we’re not… I remember saying to our president, I always liked sticking the knife in and saying to him… When we formed the school, he congratulated me and said, “It’s great. We’ve got the school off the ground.” I said, “Jim, this is great. Now all we need to do is to find a reason to hire our own graduates because we’re not actually doing enough data science on ourselves.

So we’re not actually..”
For example, a project that we and others, the sort of things that people are looking at is there’s obviously a great concern about the mental health of our students. What you really want is predictive models that give you a sense of someone potentially going off the rails before they go off the rails. And so the interrelationship between their health record and their performance record, their transcript, if you like, it needs to be much more finely nuanced than that is a particularly important area of study. Obviously that also has significant ethical considerations, but in principle, the promise is there of really improving the student experience and their wellness through data science. It doesn’t matter where you turn, whether it’s the muffin or it’s wellness of students, there’s so much to do.

Jon Krohn:

Yeah, we are really at the very beginning of the impact that data and data science will make in our lives. We haven’t realized a fraction of 1% of what’s possible. And I will get to a big question related to that later on in the episode for you. For now, I have a relatively straightforward, pragmatic question for you, Philip, which is, what are the most important data science skills in your view? You run the School of Data Science. You probably have a lot of insight into the substance that students are being taught, whether they’re going off into academia or industry. What are the most important skills for our listeners to have?

Philip Bourne:

Yeah. Maybe I’ll couch that in how we think about data science, because I think that’s a precursor to what we think is the most important skills to provide our students. We’ve come up with a model for data science, we call the 4 + 1. So there’s four elements of data science. The obvious one of course is analytics, which is what everybody thinks about. But I think there are three others that we feel are equally important and those are systems, which is how we think about how we move data around the hardware and software, the cybersecurity and so on. Another one is design. That’s the third one. So how we think about human computer interaction, which is critical for communication dissemination and that sort of thing, which is clearly something that’s very important to students.
And then the last one is what we call value, which gets back to this notion. I mentioned ethics a couple of times, but really it’s much more than that. It’s really about ethics, policy, justice, law.

And it’s really the way we think about it as value is the tension that exists between the ability to have data science produce something and the nefarious effect that that might have. It might have positive effects, but it also might have negative effects. There’s a tension there, and that’s what we think about value. So those are the four elements and the plus one is all the domains that we apply to it that we’ve been talking about all along.
So I would say what’s important for our students to go away is to go away with that balance, is to be thinking about all four of those components. Yes, they’re going to get jobs where they’re going to specialize in some aspect of it, but we’re basically providing them with a grounding, both at the undergraduate and the graduate level in the 4 + 1 model. And then they’ll specialize in probably one or more of those four areas. I mean, they’re not mutually exclusive by any means, but that’s how we think about training our students.

Jon Krohn:

I love it. That was such a great answer. I’m glad that I asked. So then as kind of a potentially tricky followup, in addition to these data science skills there’s 4 + 1, analytics, systems, design, value and then, of course, domain-specific application that’s important today. Do you think that in the future, the kinds of data science skills that might be most valuable will shift? Maybe not relative to these broad areas that you’re talking about in the 4 + 1, but maybe kind of more specifically, are there kinds of skills that listeners should be developing to prepare themselves for the data science of the future?

Philip Bourne:

Yeah, I see what you’re getting at. I mean, certainly having a reasonable statistical background, being able to use the tools de jure, and the languages de jure is really important, whether today being things like R and Python. I think, clearly those basic skills are really important. But I also, I think about, and it’s partly my own bias and background that I think, and actually, it’s interesting when you bring a whole group of different people with different perspectives about data science together, you get so many different viewpoints. I think a lot about what I guess I would call data engineering, which is the sort of precursive work that goes on to all the excitement. And I, in my own career in biomedicine, spent a lot of time thinking about and developing tools and things around this. How you engineer the data to do all these incredible things that we’ve been talking about, I think it’s really important so that you can deliver all of that in a timely and accurate way.

Jon Krohn:

Yeah, couldn’t agree more that data engineering is one of the most valuable skill areas for our listeners to be picking up. Those kinds of things that you mentioned before that, like statistics and being able to program in, say, R or Python, those are essential. There’s no question. There’s no getting around that. And something that is becoming more essential as datasets get larger and larger is data engineering being able to handle very large datasets more than you can fit into memory on an individual machine. And being able to stream information from these very large datasets, clean it up, identify signal from the noise amongst potentially a lot of noisy data that have been collected. I couldn’t agree more that this data engineering is a hugely valuable skill. And in fact, assuming that recording goes ahead as anticipated, the very next guest episode, episode 595 coming up next week will be with Joe Rice. So he has just finished writing a book on data engineering and we’ll have the whole episode focused on that. So keep an ear out for-

Philip Bourne:

I shall be avidly listening in. Yeah, no, I think it’s really important. We have an amazing advisory board of private sector experts, including the CIO of Capital One, former president CEO of Verisk Analytics and they tell us the same thing, that data engineering, from the point of view of their organizations, is just the need of the day.

Jon Krohn:

Agreed. Cool that you have all of that guidance from industry shaping what you’re covering at the University of Virginia School of Data Science. So we’ve talked a lot about the school that you have founded and that you’re dean of, but I promised earlier that I would force you to talk about your own research as well. So we’ve alluded to it, I called it computational biology, and then you mentioned that more recently it’s being called biomedical data science. Tell us about your particular area of research.

Philip Bourne:

Yeah. So I guess it is fairly broad, but at the core of it is how I think about protein structure. So this is the three-dimensional structure of proteins and DNA and RNA for that matter, where I focus particularly on proteins. We were involved in helping develop a resource called the Protein Data Bank, which is the public repository, the open data repository for all of its material that has been collected now over my lifetime. When I started, there were 77 structures within this data resource. There’s now close to 200,000. And that’s had a profound effect on how we understand biology. My research is involved a lot around that, including developing algorithms that compare structures with each other, actually find binding sites within these structures, such that they can be used to actually design drugs. Those are two areas. We also used structure. My favorite one is actually how we’ve used structure to study evolution. It’s just, I’ll bore you with just an example of that for one second, is-

Jon Krohn:

Please. There’s nothing the listeners love more than being bored by intricate data science details. No, and I’m serious. They will never be bored. This is what we’re here for. So hit us.

Philip Bourne:

If you think about the data of proteins, the protein consists of about 300 amino acids and there are 20 different types of amino acid. So if you took all the random possibilities for proteins, that’s 20 to the 300. That is more than all of the atoms in the universe. So why is nature only… We know several million of these, and there’s probably certainly more to be discovered, but the number is extremely small. So nature has basically given this incredible reductionism in creating life. Then what’s even more profound is that of those millions, they all fold up into a matter of 1,000 three-dimensional structures. That was something that I thought was just incredible. Nature’s done this and there’s like several thousand jigsaw pieces that when you put them together in different ways, make up everything, every species. Then you start thinking to yourself, what does that mean? One of the things it means is to create a new jigsaw piece, to create a new three-dimensional shape is actually a pretty rare event.

And so we actually started looking at, now, there’s enough data that you can actually see the shapes that exist in many, many different thousands of species. One afternoon, just on a laptop, we did an experiment where we took that data. And we basically said, so you could say, okay, I have all these shapes on one axis. And then on the other axis, I have every different species that has those shapes. It’s a matrix. And I just, in each cell of that matrix, if that shape exists in that species, I give it a one. If it doesn’t exist, I give it a zero. That binary matrix can be converted into a tree. That tree is the tree of life. So it’s just stunning that in an afternoon on a laptop you could actually do what many evolutionary biologists have spent in ordinate amounts of time doing even before doing it with… You can do it with sequences as well. I just got an amazing kick out of that. If that’s not boring you enough, I’ll take it one step further,

Jon Krohn:

Please. I’m so desperately bored, I need more.

Philip Bourne:

So when we were publishing a paper that described this, I actually gave it to my informatics class at UCSD as the final exam. This is double dipping. So basically, this is your final exam. I want you to propose the next set of experiments based on this paper, what would you do next? With the hope that… Of course, I got a lot of rubbish, but I also got… There was a student who was actually at Scripps Institution of Oceanography. And he pointed out to me that, I kind of knew it, but he highlighted it that 90 something percent of all evolution has occurred in the ocean based on the 4 billion years of the Earth’s existence.

Jon Krohn:

Oh, wow. I didn’t know that.

Philip Bourne:

And he was an oceanographer and he said, “And you know, in that time, the geochemistry of the ocean has changed dramatically, particularly from what’s called reductive to eugenic, to oxidative.” And he said, “I wonder whether we can see those fingerprints of those changes in these structures that we’re talking about.” And low and behalf, we found amazing correlations between the changes that have occurred in the geochemistry of the ocean because of the change from reductive to oxidative that shaped life. So this sort of gets back to why data science is so important because basically you’ve had a whole bunch of people who study geochemistry, you have a whole bunch of people who study evolution, but you’ve had very, very few people who ever studied both. And suddenly data can bring them together and you start making new discoveries about ourselves, about life itself. In the grand scheme of academia and scholarship, that didn’t get me the most citations, but it’s actually the work in some ways that we’re most proud of.

Jon Krohn:

That was such a cool example. And it is mind-blowing to me to think… So I’ve done, you probably don’t know this about me, but I have a biology background. So my PhD is in neuroscience and I did a lot of biology, particularly as an undergrad, including evolutionary biology courses. And so I know things about genomics and about proteomics, but I did not know this fact that there’s only a few thousand kinds of functional protein shapes, and that that very small number… Because I often think in my mind that there must be orders of magnitude more of that, given the diversity of function that different organisms can have. It’s wild to me that from a plant, a single-celled organism like an amoeba and me and a leopard and a turtle that you can get these wildly divergent organisms from just a few thousands of protein structures. That’s really cool.

Philip Bourne:

Yeah, it is. I think it’s my favorite fact in all of biology, that there’s that level of reductionism. And of course, they come together in many different ways to form, I mean, this happens actually at what we would call a domain level or at the fold level. There’s lots of nuances about this, but the general principle is correct.

Jon Krohn:

And a thought that occurs to me, given the increasing level of understanding that we have of protein structure, could we then engineer new protein structures to perform some kind of function that maybe doesn’t happen in nature? So could we be engineering protein structures to allow us to clean up the oceans or to fix carbon dioxide from the atmosphere into a fuel that we could be burning, have bacteria there doing that? This is quite an open-ended question and it wasn’t one I was anticipating asking, so I know you weren’t prepared for it, but if we’re looking to have these new kinds of functions, if we want to engineer bacteria or some other organism to be doing something functional for us, should we be looking at the thousands of protein structures that already exist and recombining those in new ways? Or could we be thinking about new kinds of protein function that don’t exist in nature?

Philip Bourne:

Well, the short answer is yes. But remember I said that there’s millions of sequences and then there are thousands of structures.

Jon Krohn:

Right.

Philip Bourne:

What’s going on now, and this really comes right back to data science, as you know, that AlphaFold 2 was this development by DeepMind, a spinoff from Google to actually predict the structures from those sequences. This was the sciences breakthrough of the year, last year. And it’s actually a profound development. So what’s happening now as we speak, actually be almost done, is to actually predict the structure of all of those millions. Of course, when we do that, the question is, are there things that come out which are new, that are not encompassed in the thousands. That then gives us a bigger picture of what protein fold space looks like.
And then the question is, well, where are the gaps in that space? There’s clearly going to be gaps between things that we already know. And then there are things that are outliers that we have no idea.

So yes, there is the prints, and this kind of protein synthesis is clearly the next frontier and it’s sort of reverse engineering, right? Where do you start from? You engineer a sequence that then is going to give you a new structure. So there’s clearly going to be movement in that direction to address exactly the kind of issues that you’re talking about. But, again, like all of these things, it’s easy to sit here at a podcast and talk about it for a few minutes, but actually there’s a lot of sweat and tears to actually make these things happen, but the potential is there. And then, of course, there’s also the aspect that you could actually engineer something that’s not desirable. So there’s huge [inaudible].

Jon Krohn:

Right, right. Right, right.

Philip Bourne:

And dangers in that as well.

Jon Krohn:

Right. Yes. Like biological warfare dangers.

Philip Bourne:

Yes.

Jon Krohn:

Yes. I hadn’t thought of that. Yeah. And so, one of the great things about this podcast is that I get to leverage the decades of knowledge that you and other guests have built up, you and your colleagues in your field. And then I get to spend like an hour saying, of all the decades of things that you and everyone in your field has ever learned, what is the absolute coolest stuff? And you’re certainly serving up a lot of it in this episode. Thank you.

Philip Bourne:

Well, you called me at the right time because I’m old enough that it’s all going to start disappearing.

Jon Krohn:

What’s going to start disappearing?

Philip Bourne:

Well, what little knowledge I have. That comes back to your study of neuroscience.

Jon Krohn:

Oh my goodness. All right. So another question that occurred to me as we were talking about all of this is proteins do their work by moving. So when we talk about these thousands of functional protein shapes, maybe as a result of some of this DeepMind research that sounds like it’s imminent, we’ll find out there are a few thousand more or something, but all these proteins, they’re not a fixed shape. They do work most of the time. They are things like enzymes that allow all of the processes in our cells through all the different cell types in our body to operate. And so do you think that we’re close to being able to predict in four dimensions? This breakthrough of the year that you described last year with the DeepMind AlphaFold 2 algorithm, it is working on solving a dataset that is only trying to resolve the three-dimensional structure of proteins, but ultimately that fourth dimension of time is going to be presumably helpful to understanding how this protein actually functions or how genetic mutations lead to dysfunction in that protein. Is there anyone doing research on that fourth dimension?

Philip Bourne:

Oh yeah. I mean, there’s a lot of work in molecular dynamics, which is what you’re describing, where you’re looking at the trajectories over time of these proteins, which as you point out are not in any way static. The problem is the computational time to do that, even on the nanosecond level is quite significant, but we are. Obviously there’s computing and their specialized hardware devices have been built to do this. David Shore’s efforts are particularly noteworthy in this. So we’re garnering more and more data on these simulations, which, of course, ultimately gets to the point of allowing machine learning to actually use that data effectively.
So yes, this is definitely coming in and it’s a critical aspect. And we think about this in my own work with respect to drug discovery that you can actually look at a stack and image and say you, well, that drug might bind to that protein, but in a different instant that binding site’s going to look a little differently and that drug may not bind to it. There’s all of those sorts of considerations, but it’s just, it’s all part of that evolution. But you actually made me think about something that I’ll just bore you with again for a second.

Jon Krohn:

Please.

Philip Bourne:

What I’ve been teaching students for years is what I call the curse of the ribbon. I think the motion of proteins brings this into focus because what happens, and this is just human nature is we’ve all seen the iconic structure of DNA as a double helix. And many people, almost anyone now has seen pictures, particularly they’re in the New York Times all the time, well, they were when COVID was on the front page of various proteins that make up the virus. And those are effectively what you’ll see there is quite often what I call ribbon diagrams, which is a visualization of some very complex data of what the atomic structure of these proteins look like. So, that’s all good. It just creates a visual stimulus that you can work with and no one in the world will want to do away with that because it’s been so powerful for many, many years.

However, you get into this mindset that before long, you actually think that’s what a protein is like.
So it’s a useful tool, but it actually sort of white washes to some extent what really proteins are like, including as it relates to motion, which is what made me think of it. So the question is, I think you’ll always have to be asking yourself, this is just one representation, but it’s not necessarily the true representation. And I think this comes back to data and more generally in data science, you need to be thinking about, it’s great to have these visualizations. It’s the human computer interface piece. It’s the design piece, but you’ve got to look at it in the right perspective. And it’s not the only way of thinking about these things because you’ve got to open your mind to alternative representations that might lead you in new directions and new points of discovery.

Jon Krohn:

Super cool. Thank you for boring us again. All right. Truly your work is fascinating and I’m so glad that we’ve been able to dig into it. One last piece of your professional puzzle that we haven’t dug into much, that we’ve only alluded to. In fact, right at the beginning of the episode, we alluded to that connection between Xiao-Li Meng, Amy Brand, and yourself. So previous guests of the show and you around open access. So you are an advocate of open source, open science and open scholarship. You’ve written several books on the UNIX operating system and you founded an open-access journal, Computational Biology. You participate in the open project of the Protein Data Bank. So what has inspired or motivated you to be such a strong advocate for openness?

Philip Bourne:

Yeah, thanks for asking. First of all, data science itself, since it’s a data science podcast, wouldn’t exist if it wasn’t for open data.

Jon Krohn:

Right. And GitHub.

Philip Bourne:

Yeah.

Jon Krohn:

I mean, it’s crazy. I mean, we are unbelievably blessed in this field that academics and also were so lucky that the industrial giants, not Apple, but other than Apple, Meta, Google, Microsoft, in order to help attract talent, they allow their talent to be publishing in Archive, an open-access publication and publishing corresponding code into GitHub. It’s on me, imagine if we didn’t… I don’t know what we’d… Would a school of data science exist? Would we even think about it as a separate discipline if it wasn’t for all of these open-source contributions? Anyway, I’m railroading probably all the things you wanted to say.

Philip Bourne:

No, no, no. I appreciate your enthusiasm because it’s exactly what you’ve said. I think that we’re trying to get to the point where it just becomes the normal practice. It is in data science. There’s no question about that. We make it part of our promotion and tenure policy that you need to put, I mean, you can choose to opt out, but the default clause is you’re going to make everything you do open and fairly early in the research life cycle. I think that’s just so important for the evolution of the field. It’s still not… And it’s very interesting when you’re in data science and you’re working across many different disciplines. I mean, disciplines treat this quite differently.

Even within biomedicine, how clinicians often feel about longitudinal data in a clinical trial that’s been going on for a number of years and the need to protect their own use versus making it public, obviously de-identify is to some degree understandable.
And this has been an ongoing evolution of scholarly openness. We’re certainly not there yet, but clearly there’s lots of science. The sort of levers within the world of scholarship are either the publishers and, of course, you alluded to I was involved with the Public Library of Science for a number of years, and that clearly was a big move by publishers now. And then, of course, what you don’t anticipate are the negative aspects of what happens with open publishing. Then you suddenly get all these predatory journals cropping up and all that sort of thing. But anyway, the publishers are one lever.

The other lever at the other end are, of course, the funders and NIH is about to come out with a new data sharing, well, in 2023, they have a new data sharing policy, which is more rigorous than the one they’ve had for quite a long time. So they’re actually pushing.
I’d say where the next large push is, is at the institutional level itself. We actually think about institutions more about surrounding open scholarship and that’s something that’s happening. There’s an organization called [inaudible], the University of Virginia is now part of, a number of, I don’t know, 100 more academic institutions are actually agreeing to push the notion of open scholarship. And I wanted to say open science, but it’s more than that because it relates to the humanities. And really look at it at the institutional level, exchange best practices about what we’re doing to try and promote open scholarship within our own institutions, share those practices and do better. This is being, I’m saying underwritten financially, because it’s not, but at least philosophically and support from the National Academies of Sciences. I think that pushing the whole notion of open scholarship within academic institutions is the big next step. We feel as a school of data science, we have a very, the reasons I mentioned, a really important role there.

Jon Krohn:

Very cool, exciting to see that funding bodies and institutions are pushing people in the right direction. And I think, I don’t know, it’s probably for a lot of people the direction that they feel like there’s probably a bit of a good feeling to having your scholarship be available, the data sources that you’re using, the code that you’re using.

Philip Bourne:

Well, it becomes very obvious when suddenly you find yourself with some medical condition that you’re concerned about and you can get a significant amount of information from the likes of WebMD and tools like that. The moment you try and dig more in depth, you start to hit paywalls. You’ve probably paid as a taxpayer for that research, but you can’t access it.

Jon Krohn:

Right. It is-

Philip Bourne:

Inconsolable.

Jon Krohn:

Yeah. In industry, especially now… When I was in the university, it didn’t bother me so much because through the library, I had access to all of these journals. I could put my VPN on and I had access to all of it. And now I don’t. So I’m lucky that in data science, a lot of the data science papers are an archive. But as you say, when I want to look up something medical, yeah, I can’t dig into the references. I can only read the abstract.

Philip Bourne:

I mean, when I was at NIH, we made a big push in biomedicine towards supporting preprints. There was an organization that sprung up around it, still is called ASAPbio. And I remember presenting to all of the 27 directors of the NIH institutes and centers the idea that we should actually support preprints as valid research objects that could actually be put into grant applications and so forth. As we were walking out of that meeting, Francis Collins, the Director of NIH at the time said to me, “Wow, that was the least contentious meeting of the directors I’ve ever been at.” Because they were all really behind it and that became part of the fabric of research. Yeah, there’s a doubt, there’s always an argument to be made that preprint is not yet peer review, but at the same time, given the speed of publishing in many areas, I think there’s enormous value in it personally. And I’m really glad to see that it’s becoming more and more adopted, not just in biomedicine, but in other fields as well.

Jon Krohn:

Cool. Yeah, couldn’t agree more. All right. So we’ve had some big conversations in this episode already. We’ve talked about where data science is going. We’ve talked about where biological research is going. And so it is clear that lots of exciting changes are coming. And every year data storage is cheaper. Every year compute costs are lower. We have exponentially more abundant sensors, collecting data all over the place. We’ve got increasing interconnectivity between people around the world and that interconnectivity is faster and faster, people getting access to archive papers, and open-access data in real time at increasing bandwidths. So technology as a whole, particularly as it pertains to data science is advancing at a faster pace each year that goes on. So given your decades of experience in computational biology research and as a bureaucrat, to use your own words, in data science education, what excites you about the future? Maybe in the lifetimes of your children, how could their lives be transformed by the work that people like you have been doing in your career?

Philip Bourne:

Yeah. I remember reading a book about, it was on the New York Times Bestseller, so it was a whole group of people who made amazing contributions, including people like Tim Berners-Lee, trying to predict what was going to happen in 50 years. And frankly, it was all pretty pathetic in that really, no one could actually really go out beyond five years.

Jon Krohn:

Right.

Philip Bourne:

I figured the near term, obviously from a technological point of view, I think virtual reality and that notion of human computer interaction, obviously it’s the kind of thing that matters and, well, all the big companies are totally getting on board with. There’s no question that that will have profound implications as a technology. In other ways, it’s just the evolution of what’s happening already with respect to just the breadth and depth of data that can be explored. And obviously all the technologies around machine learning and so on are improving, but in the end, the driver is all that data and it’s just coming from more and more diverse places and by means, as you mentioned, census is the obvious one. As we get more and more success stories around what can be achieved, if the breakthrough in sciences last year is from machine learning, using a lot of biological data, clearly that’s going to happen, it’ll be in a completely different…

And just the pressure for us to be able to apply those capabilities to the problems that we’re facing.
I’d say the other change is going to be just the cultural shift about how we think about science. I mean, it’s clear to… If I can give a little plug for it, I’m not sure if it’s embargoed or not, but we have a paper coming out as a science policy forum in a couple of weeks that really says that the way we’re set up to solve the world’s problems through research is very poor right now in that there’s the interrelationship between, for example, in the US, the federal agencies and the research that they’re doing and the infrastructures that they have to support that research and as a result, the siloing of the data and other aspects that we need to overcome that. I mean, we need to create this kind of universal commons of information that can be accessed. And it’s going to happen.

I think the other thing, that it’s going to happen in unusual forums. I mean, the fact that the major breakthrough in science last year was effectively made by a small group of, I wonts say they’re at Google, but basically it wasn’t in academia. It’s the notion that someone with a pencil or a laptop can actually independently make great contributions. The citizen science aspect, I think we’re going to see a lot of change there. There’s a lot of really well educated people out there who can do amazing things and they don’t actually have to be part of groups within companies and so on. So I can keep going on about this, but just a quick antidote. My son works at Industrial Light & Magic. And one of the major rendering breakthroughs that they’ve made at the company, it’s a Disney company, was actually, as I understand from him made by some young 20 something year old on his own contributing, suddenly sent them renderings that were better than a company of that size could produce because he produced a better algorithm.

Jon Krohn:

Cool.

Philip Bourne:

Capturing and nurturing and evolving those kind of capabilities from all over the map and thinking about it in the context of, frankly, diversity of people who can contribute is just so important.

Jon Krohn:

Yeah. Billions of people across Africa, Asia, Europe, the Americas, all of whom could be plugged into this. I mean, we are, to some extent already plugged in via the internet, many of us to open-access data, to open-access publications and open-source software. And yeah, anybody, that 20 year old could be devising a better algorithm than a company’s been able to think of. And Alphabet can have a group of people that can create an algorithm that blows out of the water, lots of other individual university groups on protein structure. Yeah, it is exciting and no doubt more of that to continue. That was a cool take on what’s to come, Philip. I appreciate it. So we’re reaching the end of the episode, which means it’s time for me to ask you for a book recommendation.

Philip Bourne:

Yes, let’s see. At the moment I’m actually reading a book that was, I think on President Obama’s list, which was The Ministry for the Future, which is unfortunately a very scary, but very realistic view of what the future’s going to look like. I thoroughly recommend that. If I might actually mention another one-

Jon Krohn:

Certainly.

Philip Bourne:

I was just to Spain on vacation and I read Hemingway, The Sun Also Rises.

Jon Krohn:

Oh yeah.

Philip Bourne:

I have to say that, this is heresy, but it in a way it’s not that that’s such great literature in my opinion, but it’s the fact that the person who wrote it was just such an interesting person and obviously was doing an incredible number of crazy things, which I would personally like to aspire to, even at my old age.

Jon Krohn:

Nice. I like that. That’s a great recommendation. I love how we have kind of this nonfiction futurology recommendation, and then a classic from Hemingway that is inspiring your own character it sounds like. So it sounds like a great read. All right. And then final question for you is how people should follow you. You’ve enlightened us with all kinds of boring topics today. And so if we want to see some boring tweets, how should we follow you?

Philip Bourne:

Yeah. Just @pebourne. Yes, they are pretty boring. I don’t tweet anything beyond work stuff. I also have a Dean’s Blog where I blog things around… That’s on WordPress. You can just look up Dean’s Blog, Bourne WordPress. That relates to things that we’re facing in the school. So in fact, I’m actually up for my five-year review at the university right now, leaders go through this. And my review group last week asked me, will we need a school of data science in 10 years? So I actually wrote a blog telling them, of course we need that school of data science in 10 years. So, that was the latest blog I just did a few days ago.

Jon Krohn:

Nice. That sounds great. Thank you so much for everything in this episode, this has been fascinating. Have loved speaking to you, Philip. And hopefully we’ll have an opportunity to get you on the show again, someday, and to hear the latest of what you’ve been up to.

Philip Bourne:

Well, thanks, Jon. I’m not sure I got anything left to say, but it was great being part of the show. So thanks for having me. I really appreciate it.

Jon Krohn:

It feels like I say this every week, but yet again, what an incredible guest Professor Bourne was. He’s an adept, humble and charming explainer of seemingly anything from the technical aspects of his research to real-world practicalities and implications. In today’s episode, Philip filled us in on how sharing approaches across disciplines, such as medicine, transportation and data science can facilitate discoveries such as the likely bodily location of car crash injuries based on details of the collision. He talked about the 4 + 1 clustering of data science skills into analytics, systems, design, value, and domain-specific application.
He talked about how alongside essential core skills, like statistics and programming, data engineering is emerging as one of the most valuable areas of data science expertise.

We talked about the evolution of computing within other academic subjects from something fringe only for weirdos in the 1980s through to the computation first nature of many experiments designed today, no matter whether in the sciences, arts or humanities. And we talked about how computational approaches to understanding protein structure like AlphaFold 2 enable us to understand the impact of genetic defects, discover new drugs, and help prevent climate change.
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Philip’s Twitter handle and blog, as well as my own social media profiles at www.superdatascience.com/593. That’s www.superdatascience.com/593.

If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks, of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another super SuperDataScience episode for us today. Keep on rocking it out there folks. And I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

July 7, 2026

July 3, 2026

Podcasts SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

Share

SDS 593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

SDS 1008: The AI-Native Startup Playbook

July 7, 2026

SDS 1007: How to Find Solid Career Ground in the AI Era, with 80,000 Hours Founder Ben Todd

July 3, 2026

SDS 1006: In Case You Missed It in June 2026