Kirill Eremenko: 00:00:00
This is episode number 397 with Principal Data Scientist at Booz Allen Hamilton, Kirk Borne.
Kirill Eremenko: 00:00:12
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. Each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
Kirill Eremenko: 00:00:44
Welcome back to the SuperDataScience podcast everybody. Super pumped to have you back here on the show. Today we have the legendary Kirk Borne joining us for a conversation. Super excited to have Kirk joining us today. If you don’t know, Kirk Borne, Dr. Kirk Borne is one of the most prominent influencers in the space of data science and artificial intelligence, named numerous times, countless times, one of the top influencers, top 10 influencers in different adjacent spaces. Kirk used to be a professor of astrophysics, now he’s a principal data scientist at Booz Allen Hamilton. In addition to that, he’s an author, keynote speaker, and as mentioned, an influencer. Today we’re going to dive into many different areas of data science and artificial intelligence.
Kirill Eremenko: 00:01:34
What I loved about today’s conversation is that it brings a lot of structure. You will find out quite a few frameworks, you will refresh. Many of these frameworks you’ve probably heard before, and if you haven’t you’re going to re-enrich and systemize your data science world and how different types of analytics all fit into this world. Let’s look at what we’re going to talk about today specifically. We’ll talk about the power of small data. We’ll talk about the four industrial revolutions and how this fourth industrial revolution is different to the previous third one. We’ll talk about artificial intelligence and what it’s doing to jobs and why you should or shouldn’t be scared or afraid of your job being automated. We’ll be talking about redefining yourself and continuously learning. We’ll talk about data science education. Kirk actually created the world’s first data science undergraduate degree program 13 years ago at George Mason University.
Kirill Eremenko: 00:02:33
Then we’ll talk about four types of data discovery, that’s another framework. Then we’ll talk about graph analytics and why Kirk thinks it’s the most powerful type of analytics, what network science is, and how you can get into that space. We will also talk about the three categories of artificial intelligence and data science applications, yet another framework. Then we’ll talk about the five dimensions of analytics, implementations, and another framework there for you, and then we’ll dive into community questions. These are questions, we’ll look at a couple of questions that came in on LinkedIn when I announced this podcast. All in all, a very cool podcast coming up which will help put some of that knowledge about data science and artificial intelligence into very structured bits. And without further ado, let’s getting going. I bring to you Kirk Borne, principal data scientist at Booz Allen Hamilton.
Kirill Eremenko: 00:03:26
Welcome back to the SuperDataScience podcast everybody. Super pumped to have you back here on the show. Today’s guest is Dr. Kirk Borne. Kirk, how are you going today?
Kirk Borne: 00:03:41
I’m doing very well, Kirill. Thank you.
Kirill Eremenko: 00:03:44
Where are you in this uncertain time? Where are you located?
Kirk Borne: 00:03:50
I’m located in my house. Oh, just kidding. I’m located in [inaudible 00:03:55], just outside Baltimore, Maryland.
Kirill Eremenko: 00:03:58
Got you, Maryland. You were supposed to be in Peru a couple of days ago or a week ago or so.
Kirk Borne: 00:04:03
Yes, I was a keynote speaker at the Data and AI Summit that was scheduled to be in Lima, Peru, but it went online and virtual.
Kirill Eremenko: 00:04:13
How did you find that? You’ve presented, you’ve done keynotes at a countless number of conferences. How do you compare live events versus virtual events?
Kirk Borne: 00:04:26
It’s hard to compare actually. It’s a completely different experience. For example, I do like being in front of a live audience. Frequently, I tell jokes, I move around the stage a lot. I try to engage with my audience. Sometimes I will stare at them, make them realize I want them to stop and think about what I just said, and it’s hard to do that on an online environment. But on the other hand, I like online because I don’t have to travel. I can be in my comfortable socks and have a nice cup of tea next to me, and it’s fun to do that too.
Kirill Eremenko: 00:05:03
Yeah. It brings a lot of scale with the pandemic, which is unfortunate. But at the same time, a lot of people have tried out and have been open to online events. That way, as you said, you don’t have to travel. Any live event has a limited room capacity. With virtual events, not only you can do them every single day if there was availability. There’s no limit to how many you could do or one per day, I guess. But also an unlimited number of people can tune in and watch you. I think it helps deliver that message to many more people.
Kirk Borne: 00:05:45
Oh my God, I discovered last week. I had four consecutive days where I had a keynote at a virtual conference. I was exhausted by the end of the week.
Kirill Eremenko: 00:05:57
Wow. Wow. That’s a lot of time. Why do you do these keynotes? What inspires you to go out there and speak to people and deliver the message of data?
Kirk Borne: 00:06:13
I’m a big believer in the power of data to change lives, change business, change outcomes in our world. Since everything is digital and data is being collected on just about every aspect of life, I think every person in the world needs to have some literacy around this. I’m actually a firm believer in promoting advancement of science and education, and math and education, and technologies and education. This is my way of doing it.
Kirill Eremenko: 00:06:47
Fantastic. Very excited, and thank you for coming on the show. We’re definitely going to dive into this. I’ve got tons of questions for you, and if we have time, we actually also have questions from our community, which hopefully we’ll get to. But before we get started, for those who might not know you, can you please give us a rundown of your background? There’s so many interesting things that you’ve done. In a nutshell, how would you describe your background?
Kirk Borne: 00:07:18
Well, how many hours do you want?
Kirill Eremenko: 00:07:19
We don’t have… I think we only have one hour [inaudible 00:07:25].
Kirk Borne: 00:07:26
Okay. Well, I went to graduate school in astronomy, got a PhD in astronomy from Caltech, and that was 40 years ago. It’s been 40 years of amazing things I’ve been lucky to be part of. I spent 20 years working on NASA projects, first with the Hubble Space Telescope, and then as a contract manager of a office for NASA. All those jobs involved data, I was always working with data systems for scientists. Of course, I would joke with people that my day job was data, but also my night job was data as an astronomer working with data. I worked with data in every aspect of my entire career, but after 20 years of NASA, I moved on to being a professor at George Mason University, which is located in Northern Virginia. I was professor of astrophysics and computational science, but actually I never taught an astrophysics course while I was there, because that was…
Kirk Borne: 00:08:29
I started there 17 years ago with the vision of starting a data science degree program. Because after 20 years of doing data at NASA, I realized the importance of data and data science for the world. We actually launched that program 14 years ago at George Mason. I did that for quite a few years, and then five years ago this company, Booz Allen Hamilton, found me and offered me a job I couldn’t refuse, fantastic position, fantastic people I work with in interesting projects. I almost got to create my own job, so to speak, and do the things that I love the most. That’s how they described the job offer to me. What do I want to do? And I said, “I want to promote this to the world. I want to educate the world. I want to advise, and instruct, and mentor, and tutor people about data science,” and that’s what I do.
Kirill Eremenko: 00:09:22
Well, well, very concise history, but very exciting as well. I love the whole, the 20 years at NASA. As you mentioned before the podcast, given that your name is Kirk, Captain Kirk from Star Trek. Did they really call you Captain Kirk at NASA the 20 years?
Kirk Borne: 00:09:44
Well, I wouldn’t say professionally they called me Captain Kirk, but certainly my friends called me Captain Kirk. It’s like some of them when they see [inaudible 00:09:51]. Even though I left that NASA job many, many years ago, when I see those people they always say, “Hey captain.”
Kirill Eremenko: 00:10:06
Hey, everybody. Hope you’re enjoying this amazing episode. We’ve got a quick announcement and we’ll get straight back to it. The announcement is that DataScienceGO Virtual number two is in town, it’s happening on October 24th, 25th this year. You can get your tickets today at datasciencego.com/virtual and this part, it’s absolutely free. We’ve got some amazing speakers, amazing workshops for you to attend, and of course the super cool part is that we’ve got networking. There’ll be several three-minute speed networking sessions, where for three minutes you connect with a random data scientist from another part of the world, or maybe from your part of the world. You get to chat for three minutes. If you like [inaudible 00:10:48], if you want to connect, you hit the connect button, you stay in touch. This was, by far, one of the top features of DataScienceGO Virtual number one. So many people got such great connections, stayed in touch, and some crazy stories came out of that. We’re going to repeat it and we want you to connect with your fellow data scientist. Once again, it’s absolutely free. Register for your ticket today at datasciencego.com/virtual, and I’ll see you there. Now let’s get back to this episode.
Kirill Eremenko: 00:11:18
About NASA, I have an interesting question for you. In one of your talks you mentioned that in 1997, I believe, you had a story where you were in charge of the astrophysics data… oh no, you were the [inaudible 00:11:35] project scientist for the Hubble Telescope. I believe you had like 15,000 projects on your system, which in total took up a certain amount of space, and then you had an opportunity to bring on another project into the archives and it was two terabyte worth of data. Can you tell us the story again? Also, the question I had there was, what was big data back then and how has big data evolved over time? What is big data now? How is it different?
Kirk Borne: 00:12:05
Well, that’s a lot of questions in one, but that’s good. I did spend 10 years with the Hubble Space Telescope project, and the last three of those 10 years, I was working with the Hubble data archive, which was the science data system for research astronomers around the world to access and to use the data for research. After I did that, I got a position managing a contract. I was not a NASA employee, but I was working for a small company supporting NASA at the Astronomy Data Center, which was part of the astrophysics data facility at NASA. There we were managing not just one experiment, like Hubble was one experiment, but 15,000 different experiments of astronomy and space science data. Now, most of those were small, very small. They weren’t anything the size of Hubble, of course, but small little experiments that maybe sit on the back of a rocket where they collect a little bit of data.
Kirk Borne: 00:13:02
But we preserve those data for all time, because that was the job of the data center where I worked. It was almost like the permanent digital library of the data collected by NASA. Now, this is important because those missions and experiments were funded by tax dollars, by the American citizens. When the scientists complete their projects and research with the data that they collected from those NASA experiments, it was our job to take those data and curate them much like a digital librarian, to preserve that data forever, not to lose it. We had 15,000 experiments and at the start of 1997, those 15,000 experiments totaled less than one terabyte.
Kirk Borne: 00:13:46
But then about that time, 1997, I met a colleague at a conference who, he knew what I was doing, he knew where I worked. He had an experiment, a project, a big team of people he was working with. He told me that they were just finishing their project and they wanted to send their data over to us so that we could preserve it, curate it and save it for all time, and make it publicly available to the worldwide community, which is what our duty was. That was our job. When he told me that that one single experiment was over two terabytes, it literally nearly broke the bank, so to speak, for NASA. We had 15,000 experiments, less than one terabyte, and one new experiment on top of the 15,000, which would have required us to triple the size of the data center, triple the capacity of the data center.
Kirk Borne: 00:14:38
Well, that for me was the first real aha moment about the growth of data in the world, that was 1997. Just unbelievable, I never even imagined one experiment could eclipse, by a factor of three, the sum total of the previous 15,000 experiments. I think that was, for me that birth of the notion of big data, but also the birth of my interest in what we can do with that much data, i.e machine learning and data science. In those days primarily we called it data mining. What’s happened over the years, of course, is that everyone has, at some point, come to the same revelation that the amount of data they’re collecting in their organizations, and in their businesses, and even in their personal lives, or in government agencies, sports teams, you name it everywhere. Everyone’s come to the revelation that this data is different from past data.
Kirk Borne: 00:15:31
People say, “We’ve always had big data.” Well, that’s a funny statement because, yeah, we’ve had a lot of data, but we’ve never had something so comprehensive and deep and broad in the sense of so many different attributes. For example, we have so many different features and things that we use to describe customers and patients and describe movies and describe books. All of the different ways we could describe every instance of something in our world is what is really most interesting about big data. I think the way it’s evolved is less focused on the depth that is how much we have, but on the breadth, how many different features do you have? The more features you have, the greater, what we call the 360 view. You create this 360 view of whatever it is you’re studying.
Kirk Borne: 00:16:18
Again, whether it’s a customer, or a hospital patient, or even a baseball player, or whatever it is, you’re collecting so many different dimensions of information you have deeper and greater insight than you’ve ever had before about anything. In some sense, it’s the emergence of small data that’s really interesting right now. In fact, I just saw the latest Gartner report on the emerging trends for 2020 in the area of data analytics, and one of the emerging trends is small data. That is the focus on how much power and insight you can get from a small quantity of data, as long as you have many different dimensions, and perspectives, and viewpoints, which we call features in the data set.
Kirk Borne: 00:17:07
For me, we’re coming back to our roots here which is, first of all, data management that is being able to maintain and curate and make publicly, maybe not publicly, but make available to the right community, whoever that community is, even if it’s private data, making that data available in good ways, searchable ways, reusable ways, meaningful ways. But also realizing the more dimensions we have, the more we can integrate those different dimensions, the deeper our insights will be. For me, I really feel like I’m coming full circle to my early days of working with data systems, but in reality, it’s a whole new world because we’ve never been able to get this much insight, so many aspects of our world than we can today.
Kirill Eremenko: 00:17:50
That’s very interesting about small data that Gartner included that as an emerging trend. Why is that relevant now when we have even more storage capacity, even more processing power? Why should we be cognizant of small data sets and delivering value from smaller data sets rather than just put all the possible data we have into the equation, into the model, store all possible data we have? Why can’t we just go bigger? Why are they saying that it’s important to look at small data sets?
Kirk Borne: 00:18:29
Well, first of all, I’m not saying we should not have the big data. We absolutely should. My best analogy for that is this. Let’s say I walk out of the front door of my house. I live in a neighborhood with lots of trees and things like that, sometimes even small animals run through the yard, like the deer. If I were to walk out in front of my house and try to consume every bit of information that’s coming to me, every single leaf on every single tree, and every single piece of grass, and every single thing that moves, and how it moves, and its color, and its shape, and its size, I would be overwhelmed. But instead my cognitive ability, you might say my intelligence, not artificial but natural intelligence, knows how to sift through all of those inputs to find the most important thing that I need to pay attention to.
Kirk Borne: 00:19:25
That will depend on whether I’m going out walking, or I’m going to respond to a neighbor’s request for help, or I’m going outside to trim the bushes, or to mow the grass, or to see if maybe there’s some damage was done by the wind on the side of my house. What I am paying attention to in the middle of all that information is the thing that my cognitive ability says, “This is the thing you need to pay attention to right now given the use case, i.e, the reason that I am out today.” Our data collections are like that. We don’t know what the use case will be. It’ll change from day-to-day. You don’t know what are the things that are going to be most important. Not only that, if you don’t collect that data, you’ll have such a narrow view of your world and such a narrow view of your environment, you’ll have absolutely no idea whether you’re making the best decision.
Kirk Borne: 00:20:16
All of the data that we collect helps us to make the best decision at the right time and the right place. But it’s the data science and the AI that helps us to do the cognitive thing, which is to narrow it down and sift through the data. That is to find the most important pieces and elements and all those streams of data for this particular application, this particular moment, in this particular decision the business is making. That will change maybe day-by-day, it will change for the different people in the organization, the different components of your organization, different departments. All those data are contributing to something, it’s just that they’re not all contributing simultaneously, because that would be ridiculous. But they do contribute at the right place at the right time, and that’s what small data is about.
Kirk Borne: 00:21:03
What AI and machine learning does for us, it helps us to sift through the data, to triage the data, identify the most important thing. Really, one of my things I like to say to people, the most important value of AI nowadays might be just the triage that it does on our huge data sets to help us cognitively identify the piece of information and data that we most need to look at and base decisions upon.
Kirill Eremenko: 00:21:29
Okay. I see. Makes total sense. That ties in quite well with the whole IOT. I like your quote that, internet of things is not about just the connectivity of devices. It’s actually the way that these devices put things into context, the way these data points… basically, I think this is a phrase that you used in one of your presentations, “It’s the internet of context.” I’d love to understand a bit better how that ties into the fourth industrial revolution. I think you have a great way of describing it. I understand the first, versus the second, versus the third, but the third and the fourth industrial revolutions sometimes they can be seen as similar because one is about internet base and computer systems, the other one’s about big data and IOT. Do you mind speaking a bit to that topic?
Kirk Borne: 00:22:33
Well, first of all, the third industrial revolution, people focus on basically the birth of the internet and personal computing. I.e, it used to be, there were computers before 1969, which is the date that people declare as the start of the third industrial revolution. There were computers before that, but they were basically in big government labs or big university labs, big industrial labs. The internet basically allowed for flows of information between people. That’s what we’ve been doing for many years, and that computing power that we use for that purpose. But the fourth industrial revolution now is all about hyperconnectivity and the flows that mediate it are the data flows. It’s not just information, it’s the data that’s flowing through all of these nodes, if you will, that are informing the world through all these connections.
Kirk Borne: 00:23:33
Industry 2.0, and fourth industrial revolution, 4.0, I should say, industry 4.0, is really about that hyperconnectivity and delivering contextual information from these sensors. I say sensors, it’s not just a mechanical device. Even a social network like Twitter, or LinkedIn, or Facebook is a sensor. People leaving reviews on e-commerce pages to give reviews on different products and services, that’s a sensor. Sensors are anything and everything. It’s what we say, what we do, and it is also mechanical electronic sensors of the world measuring [inaudible 00:24:14]. All of us are being sensors. The internet of things is providing contextual information about all these other things. Remember what I said about small data lives in a world of big data. Small data is the thing you’re paying attention to, but all the rest of the data gives you the context in which that thing is happening, is living, is acting and dynamically moving. Internet of things gives us all that extra context about whatever it is we’re paying attention to.
Kirk Borne: 00:24:46
We’re really now mediating knowledge across the networks as informed by all these data flows. That’s really a different way of thinking about the industry, so to speak, and that is information about customers, information about supply chain, information, about warehouse, about customer demand, even about weather and world events that affect supply and demand and so forth. We don’t have all these extra contextual variables. You might completely miss what’s going on and build the wrong product or deliver the wrong items to the store or not meet the needs of what customers want. We need to basically tap into these additional sources of contextual information to help us make the best decisions.
Kirk Borne: 00:25:32
I call the internet of things, the internet of context, but two other phrases I use to describe it is, context-as-a-service or in other words, insights-as-a-service. The other one is, forecasting-as-a-service. Because once you have these insights, how things are behaving, you start detecting the early warning signs of something changing or something about to happen. Forecasting, so to speak, comes, not just from following a time series and predicting what the next data point will be in a time series, that’s old school forecasting. But modern school forecasting is about realizing there’s all these contextual causal factors, and the more you tap into those contextual causal signals, the better you are to see what’s coming around the corner, so to speak. Forecasting-as-a-service, insights-as-a-service is just a way I like to just describe the internet of things and again, internet of context.
Kirill Eremenko: 00:26:29
Okay. Another quote that you mentioned once, or maybe several times, that every industrial age brings a change, not only in the technologies, but also in the work that we do, what kind of change do you see happening now? We’ve been going through this fourth industrial revolution for a while and the change doesn’t seem to stop. It’s constantly more and more change. What kind of change are you seeing and where do you think it’s all going?
Kirk Borne: 00:27:02
Well, the future of work is a major topic actually in the AI and data science world, dealing with automation and the digital transformation of businesses. Oftentimes people focus on the negative aspect, which is the jobs that will be lost. But what they don’t realize is that for every revolution, there are jobs lost, but there are many, many more new jobs created. In fact there was a World Economic Forum report on this two years ago, which said something like there would be, I can’t remember the number, I think 58… I’m going to have these numbers reversed, I think it was 58 million jobs would be lost.
Kirk Borne: 00:27:39
People would spend a lot of time and energy talking about that, and focusing on that, that 58 million jobs lost due to the artificial intelligence and the digital transformation taking place now in the world, automation, and those things. But they did not read the next sentence, and the next sentence says, “And there’ll be 133 million new jobs created.” 133 million new jobs created. Literally there will be twice as many jobs available for every one job that’s being lost due to this automation. The work changes. For example, the first industrial revolution really put a lot of farmers out of work. I like to say that 200 years ago, 95% of the workforce in the United States was farming, 95%. Now it’s only half of a percent, one half of 1%. But those farmers are feeding more than 10 times as many people as those farmers were 200 years ago. Technology has enabled that.
Kirk Borne: 00:28:48
Well, we don’t have 99.5% of the country unemployed because all those people that were farmers 200 years ago are not farmers today. No, the work has changed. What we do is different and that’s the same thing that’s going to happen in this age of automation. When I talk about AI with people I say, “It’s not about artificial intelligence, it’s about amplified intelligence, accelerated intelligence, augmented intelligence.” These things are augmenting the human in the loop. The human is now doing more high level cognitive work while the machine and the automation is doing the repetitive task. That’s actually making work better for people, creating more interesting jobs for people assisted by these automations that have to do the repetitive work that maybe gets boring. I think we have to accept the fact that jobs will go away, but many more new jobs will be created, and that’s just the nature of the evolution of business for hundreds of years.
Kirill Eremenko: 00:29:52
I totally see your point there. However, in one of your presentations you talk about communitorial growth that it’s even faster than exponential growth. The reason I bring it up now is with the previous industrial revolutions whether it was the steam engine, the second one about electricity-based mass production, computer internet-based revolution. There was time for people to retrain, to learn new skills, to requalify for new jobs and to adjust their lives. Many people would say that this fourth industrial revolution is happening so fast, the change is so rapid that is there time? Is there time for people, for these new jobs that will come, as you said, the 133 million new jobs. Will people have enough time to qualify for them, to learn how to do these jobs, to adjust their lives and lifestyles? Some would say that perhaps it’s very different to how it’s happened in the past. What are your comments on that?
Kirk Borne: 00:31:01
Well, that is definitely true. It is different and it is much faster, but that’s what it is. I mean, we can’t change the course of history, this is what’s happening now. It just so happens that the change is happening within the span of one person’s career. But even from the first industrial revolution, to the second, to the third, the time between those major moments in business and history was already shrinking. I mean, it’s shrank from maybe 100 and some years between the first and the second, going from steam engine to electric power. Then from electric to the computer age it was like 50 or 60 years. Then from the computer age to what we now call the fourth industrial revolution, it’s like 30 or 40 years. These numbers are decreasing.
Kirk Borne: 00:31:53
30 or 40 years is now less than a typical person’s career, which basically means that you have to accept the fact that this is what it is. I mean, I don’t think you’re going to really stop what’s happening, it’s happening. It means that basically, it’s no longer true, if it ever was true, that you basically finished school with whatever level of schooling you finished with, and then you can stay in that job in that profession until retirement, 45 years later. That’s just not the way the world is now, unfortunately or fortunately. I mean, I personally am excited by that because I believe in lifelong learning, I believe in constant learning. I’ve redefined myself and reconfigured myself over my career several times, from an academic scientist, to a business manager, to an academic professor in all of astronomy, to now data scientist executive advisor and these things. I’m not saying that everyone has to do what I did, but I’m just saying one has to be agile enough because it is an agile world that’s demanding of us.
Kirk Borne: 00:33:00
The changes are inevitable. Focus on lifelong learning is what I tell people. Focus on learning new things, but don’t do it just for the sake of doing it. Find the thing you’re passionate about. Say, for example, if you’re really into medicine, or sports, or finance, all these organizations, banking, every one of the things in the world has, again, these digital revolutions taking place. If you’re in one of those fields, one of those industries, stick with it if that’s what you love doing. But the job you’re going to be doing in that industry is going to be different.
Kirill Eremenko: 00:33:35
Thinking of education, you created the world’s first data science undergraduate degree program at George Mason, that was 13 years ago. What does the data science education space look like now, and how has it changed?
Kirk Borne: 00:33:50
Well, it’s change quite a bit, and I wouldn’t have believed it myself 14 years ago. 14 years ago, myself and other professors at the university put a proposal in to the state of Virginia because we were a state university at George Mason University. We put a proposal in to build an undergraduate data science degree program, a bachelor’s of science in data science, which was approved and we opened the program to students 13 years ago. Back then I was really thinking very narrowly actually. I was so passionate about bringing data science to the world and bringing it to students and teaching it that I was thinking of it as a profession unto itself. But really today it’s really embedded in business systems, organizations, government agencies, nonprofits, everything, it’s embedded. It doesn’t have to be outside.
Kirk Borne: 00:34:47
It’s fine if you are a data scientist and that’s your entire profession, but data science itself, small d small s, is an embedded thing to do in businesses. It’s a way of doing things I should say. What we see now is the focus more on the data science, data analytics, AI tracks in almost every discipline in universities. When I left at George Mason University five years ago, it was quite interesting because, 13 years ago we were the first in the world to have a data science degree program. Now there’s literally hundreds, if not thousands. The other thing is, at my own university we had business analytics in the business school, we had big data engineering in the engineering school, we had health informatics in the health school, we had data-driven education in the education department, we had policy informatics in the school of government and public policy. There was even a data journalism track in the communications department.
Kirk Borne: 00:35:53
It seemed like every department was taking hold of this and saying, “Yeah, our industry, our business, our domain is being overrun by data also, and so we need to train our students in these disciplines of getting insights, get data-driven insights from data and improve decision making through data.” Every organization and university’s department is doing this just like every industry is now doing this. Now I like to say people that data science is not a thing to do, but it is a way of doing things. That’s really what has changed dramatically in the last 13, 14 years.
Kirill Eremenko: 00:36:34
Would you recommend for people to study data science as a separate degree, or to go and study something they’re passionate about and incorporate data science in their path?
Kirk Borne: 00:36:45
I think the thing that… Your question there, it sort of answered itself. Follow the thing that you’re passionate about. If you want to be focused on data science in itself, that’s great, do that. I mean, one should do that. It’ll teach you, you’ll learn about coding, and machine learning, and algorithms, and visualization, and all the things that will carry you through all kinds of different, interesting paths in life. If you’re interested in doing something in another discipline, like I said, finance, marketing, policy, healthcare, sports, art, communication, you name it, there’s going to be a digital component of that. Learn some data science, learn some analytics, learn some data visualization, learn some coding, to help you in that career track. But first and foremost, follow the thing you’re passionate about. Yes, there will continue to be a data science career track and degree track at the universities, and career track at employers, but that’s just one track. The other tracks are going to be just the normal business functions, but every one of those people are also going to need to have some digital and data literacy.
Kirill Eremenko: 00:37:57
That’s great advice. I’d like to move into a bit more into data science itself. There was something in one of your presentations that I really enjoyed and you spoke about four types of data discovery. You talked about class discovery, correlational causality discovery, outlier or anomaly discovery, and association discovery. Do you mind sharing that with our audience? I think that can be very useful to anybody in the space of data science.
Kirk Borne: 00:38:25
Yeah. The context of that was, frequently I’m asked to describe what data science is and machine learning to people. Oftentimes these are all answers that don’t have the mathematical background and that’s fine. As an educator, that’s perfectly fine for me because I love explaining things. I think it was Albert Einstein or somebody who said, if you can’t explain it to your grandmother, or if you can’t explain it to a third grader, that you don’t really understand it yourself. What I learned over the years is, the more I explain things to people or talk to people about stuff, the better I understand. When people ask me about machine learning and data science, I don’t want to go straight into talking about supervised learning, and machine learning, and training steps, and neural networks, and all those kinds of cool things.
Kirk Borne: 00:39:11
They are cool. I love those things. But what really helps people to understand what you’re talking about is you put it in the context of what people already do. This is my belief, and that is machine learning is emulating the human intelligence. Artificial intelligence is emulating the human intelligence. Data science is emulating what we already do as human beings. We observe our world, we detect patterns in our world, and we learn what those patterns mean to help us make decisions and understand things. What kind of patterns are we talking about? Well, we’re talking about clusters and groups of things. Humans are really good about seeing groups, clusters, and we segment things all the time now. If you put some toys in front of a child, they’ll start segmenting them by color, by shape, by size, they’ll even segment them by function.
Kirk Borne: 00:40:00
They’re not thinking about it, but a toy can be used to build a castle. You could use blocks to build a castle, but you use a ball, you can’t build a castle with a round ball, but you can play a game with a ball. Very natural human thing is to cluster things. Class discovery is what clustering is about. When we started grouping things, we started seeing the different classes and groups that things exist in our domain, whatever your domain is. This is the power of big data once again. The more data we have, the more we are able to discover new classes, sometimes rare classes, but also start learning what separates the different classes, what are the boundaries between those [inaudible 00:40:38]. Class discovery is not just learning the classes exist, but learning what distinguishes them. That’s the first one.
Kirk Borne: 00:40:47
Pattern discovery, first of all, is group cluster and class discover. Another example of pattern discovery is the second one you mentioned, which is correlation discovery. Humans are really good at seeing trends and patterns in things. Trends in pattern discovery, correlation, is the second on my list. That basically says that if you see a relationship between two variables, that’s a correlation. That you can say, “Given X, find Y.” Given X, I can tell you what Y will be if X and Y are correlated. Correlation discovery is very powerful for forecasting and for all kinds of applications in life when you start seeing a pattern. Even a child discovers, if I touch the hot stove and burn my finger, I’m never going to do that again. It is a correlation between behavior and data, the data just happens to be a burn to the finger, that’s still data.
Kirk Borne: 00:41:39
The correlation discovery is good for forecasting, but it’s also good for another thing, not just predictive, it’s got power discovery, but prescriptive. What I mean by that is, when you have extra dimensions, not just X and Y, but you have third, and fourth, and fifth, and more dimension. Some of those extra dimensions, you may discover, have some kind of causal influence on that first correlation that you found X versus Y. Because X versus Y may not have any causal relationship. You’ll learn this in statistics class, that correlation does not imply causation. That is, just because X and Y are correlated, it doesn’t mean one of them caused the other one. However, as you add more data, more dimensions, again, the power of big data is the high dimensionality, the high variety that we’re [inaudible 00:42:22]. We can find that unique dimension, which is the causal variable that maybe we can pause to effect some different outcome.
Kirk Borne: 00:42:33
If we know that Y increases with X, but what if Y is something we don’t want to increase, like risk, or business loss, or customer loss, something like that? How can we decrease that? Well, we know that X correlates with Y, but how do we decrease Y? I just know there’s a correlation between X and Y. I don’t learn anything about how to change it. Well, if we find those causal factors that allow us to reduce Y, then that we can change the outcome of that particular correlation to a lower value. But what if we want to lower Y, decrease Y? We know that there’s this correlation between X and Y, but how do we decrease Y if we want, for example, decrease… I said decrease before I think. Let’s say we want to increase customer sales, or increase customer satisfaction, or increase employee experience, or increase whatever, increase performance on the machine. Again, look for those causal factors that come through those higher dimensions and that correlation space.
Kirk Borne: 00:43:35
Correlation discovery gives us both predictive and prescriptive power discovery. There’s the first two. We have groups and clusters, those are patterns, we have trends and correlations, those are patterns. The third one in the list is outlier or anomaly discovery. Now, I like to just call that surprise discovery. That’s the surprising, unexpected thing in your data. The anomaly or the outlier doesn’t have to be an outlier, it could be an inlier. It could be a data point or behavior that’s right in the middle of a crowd of data points, but you’ve never seen a data point at that particular spot in the middle of that data. That’s an inlier. That’s an unusual, an unexpected, surprising data point. Again, humans are very good at anomaly detection, at seeing things that are out of place that are in a place where we’ve never seen something before, they stick out. We call it sticking out like a sore thumb, that’s the expression.
Kirk Borne: 00:44:34
All right, the outlier or surprise discovery, again, could be inlier or outlier. They’re anomalies, novelties, surprises. Those are things humans are good at, and we train our algorithms to do the same thing. We train our algorithms on these three types of patterns, group and clusters, trends and correlations, and these novelty or surprises. The fourth one is association discovery, the fourth type of pattern discovery or like I really call it, just insight discovery. Fourth type of insight discovery is association or link discovery. Finding the associations and links across a network, across a graph. It’s basically graph analytics. I think graph analytics is the most powerful tool in the universe for data scientists. Wasn’t it Shakespeare who said all the world is a graph. Actually, he said all the world is a stage.
Kirk Borne: 00:45:25
But if he were alive today, he might say all the world is a graph, because what is a graph? It’s about entities and relationships. It’s about things and the relationships between them and what those relationships mean. Isn’t that exactly what a Shakespearian play is all about, people have their relationships? In a graph, you can find connections across the graph. A may not be directly connected to C, but A may be connected to C through an intermediary B. You would never see that in a transactional database, but you see it in a graph database. Now, why would that be important? Well, A connect to B and then B to C, but no A to C directly would be an example of money laundering. It’s also an example of marketing attribution. It’s also an example of causal factor discovery. It’s also a factor of illicit goods trading or illicit human trafficking for example.
Kirk Borne: 00:46:20
You never see A and C connected in any transactional database. You can look through all the data you want, you would never see it. But A is connected to B, and then from B to C you discover the connection between A and C. Now, it doesn’t have to be a negative thing like money laundering, but just a typical graph network that we use every day is a web search. When you do a web search, think about Google PageRank, it’s all about the network of knowledge, the network of the links, and the internet. Through that network, we can find things that we didn’t expect to find because we identify relationships between maybe disconnected objects through their connections in the network. That’s not the most easy thing for a human to see. It’s not about a group or cluster, it’s not about a trend, it’s not about an outlier. It’s about the more cognitive thing that humans do, which is seeing the connections among the disconnected things. I like to say, connecting the dots that aren’t connected.
Kirk Borne: 00:47:20
If you really want to be really a cognitive data scientist, then we should focus more on graph analytics and graph models of our data, not exclusively, but there’s so much knowledge and insight to be discovered there, which is why I put that in its own place in my list of four types of insights discoveries.
Kirill Eremenko: 00:47:40
Well, that’s super insightful. How can somebody learn more about graph analytics and get into that space?
Kirk Borne: 00:47:50
Well, there’s some really interesting new books that are out. You can search on graph algorithms and graph analytics. You might want to just start with a book on network science. I got hooked on this years ago with a book called Linked. Linked talked more about the connections in human society, I’m sure you’ve heard of the phrase, seven degrees of separation. Talks about the connections between people and things and processes in our world. It really wasn’t a data science book, but I read it as a data science book. Start with something that’s just very natural describing our world, like Linked, then you can move on to the more technical books, and graph algorithms, and graph analytics, and graph databases. It’s just a powerful tool.
Kirk Borne: 00:48:36
There is a lot of development this year. I’ve noticed a lot more development around graph analytics this year than I have ever seen. For one reason, one of the ways that we can help track the movement of the, so to speak, the epidemiology of the coronavirus is through network science. Sometimes we call that contact tracing, but it’s really network science. Who are you connecting with and who are they connecting with? Is there a chance that you could be infected from some third party through an intermediary? That’s graph science, network science. It’s not just for that reason, but there’s many, many use cases. Like I said, marketing attribution, causal factor analysis, all kinds of things that are empowered by graph models.
Kirill Eremenko: 00:49:22
Would you say that recommender systems are a subclass of network science?
Kirk Borne: 00:49:27
Yes, absolutely. A recommender system can be. I mean, there’s different types of recommender engine models, but one of them is definitely the graph, the knowledge graph, or the product graph if you want to call it. The knowledge graph, product graph, these are all examples of graph models. A product graph, again, people who bought this product also bought that other product so they recommend this other product to you based upon its place in that product graph. But that’s not the only way recommenders work. There’s other kinds of techniques where they basically predict whether you would like it or not based upon whether other people liked it or not, who had similar shopping patterns to you. There’s lots of interesting things in recommender engine science.
Kirk Borne: 00:50:10
That’s how I really got hooked onto machine learning actually 20 plus years ago, I started reading about the recommender engines. I mean, Amazon was one of the first to do it. I mean, the mathematics and research was done at the computer science community before Amazon. But Amazon is the one who really took it to the bank. I like to tell people, it’s pretty amazing, Amazon is a trillion dollar business. This trillion dollar business, I read 30% of the revenue of Amazon, 30% of the revenue of a trillion dollar business, comes from an algorithm, the recommender engine algorithm. That is phenomenal.
Kirill Eremenko: 00:50:48
Absolutely. You speak about three broad categories of AI and data science applications, namely image understanding, language understanding, and next best action understanding or decision-making. Which one do you think… If I’m a new data scientist entering the space of data science, and I want to build a career here, and at this stage I don’t know what I’m passionate about. Which one would you recommend to focus on? Which one is, based on your expectations, going to skyrocket or really stand out in the coming years?
Kirk Borne: 00:51:24
Well, that third one does sometimes I call context understanding. You think about data types, if you will, so images and language, and then all of these other sensors in the world. We got image understanding, language understanding, and context understanding. Context understanding really is about deciding what your next best action, the next best decision is. I mean, remember that AI is of no value unless it inspires a better decision or better action. Next best action understanding is really what all of it is about. But how does it do that? It does it to understand the contextual data, not just images and words. Right now I think both language understanding and image understanding are extremely hot topics, computer vision, natural language processing, natural language understanding.
Kirk Borne: 00:52:19
I don’t know if you’ve been following the news, but the reason that it has been in the news, this algorithm, GPT-3, I can’t even remember what it all stands for, but it’s a third generation text autocomplete. It’s really automatic narrative generation. If you think about autocomplete on your, like if you’re sending someone a text message, we all like the little auto complete that helps finish the word for us, or maybe even suggest the next word for us. But total narrative autocomplete is amazing. This GPT-3 algorithm can actually create an article just with a little bit of information, like what is the topic? They can then write multiple paragraphs. It’s actually extremely scary because it’s talking about generation of fake news. But the field is so hot with research right now and applications. Whether you’re coming at it from the research side, or a business application side, or somewhere in between that is you want to build technologies and deliver technology.
Kirk Borne: 00:53:23
No matter what sort of dimension of the world you live in, whether the business application user, or the business application developer, or the researcher developing the algorithm, or the data scientist who is just trying to tweak the algorithms and improve them, there’s a place for you in that space. I don’t know if any one of those I would pick as go here or go there, because they’re all really interesting right now and lots of exciting stuff going on.
Kirill Eremenko: 00:53:48
Gotcha, agreed. One last thing I want to discuss before we jump to the community questions, just quickly, analytics maturity. You outlined five levels of analytics maturity in data-intensive applications. That’s descriptive analytics, diagnostic, predictive, prescriptive, and cognitive analytics. You also have key words for each one of those. Could you just give us a quick outline for those of us who maybe have not heard this breakdown or need a refresher on it?
Kirk Borne: 00:54:24
Yeah, thanks for asking. I’ve actually decided recently not to call it five stages of analytics maturity, because people… I mean, I realized this even when I was saying in the early times I would say it, that that gives the wrong impression, but I never could find the right words. Now I call it five dimensions of analytics implementation. You could be at any level there and that’s fine because it’s just a different dimension of analytics. It’s not really a maturity question, it’s just an application question.
Kirk Borne: 00:54:55
The first one, I’ll just do them in sequence, not saying one is better than the other, again, they’re just five dimensions, is what we would call descriptive analytics. Descriptive analytics basically describes what has happened in the past. Describes, for example, how many things did you sell last year, last business quarter. Every business that’s publicly traded has to do quarterly or annual business reports. This is required by law, they have to do it, so no one should ever say, “Don’t do descriptive analytics.” That’s completely wrong. You must absolutely do it if that’s required by law. But anyway, descriptive is still powerful, but it’s hindsight, it’s looking backwards so to speak.
Kirk Borne: 00:55:35
The next one is diagnostic analytics, which is basically real-time. Not looking back, but what’s happening right now, streaming data. It doesn’t necessarily need to be streaming, but just the current data, the current moment, the data you’re collecting, what does it tell you what’s going on? The diagnostic analytics is real-time. Okay, so that’s basically oversight if you will, what’s happening now. We go from hindsight to oversight when we go from descriptive to diagnostic analytics. The next is predictive analytics. Predictive analytics is looking ahead. It’s taking the training data, which is the backward-looking data, of course, but it’s building forward-looking models from the training data to see what is next. What is an outcome given the past data? That’s foresight. Predictive analytics is about foresight, looking forward, seeing what’s coming. We’ve gone from hindsight, to oversight, to foresight using these first three dimensions, descriptive analytics, diagnostic analytics, and predictive.
Kirk Borne: 00:56:43
The fourth dimension now takes us back to the things I was saying earlier about prescriptive analytics. Prescriptive analytics tells us, for example, if we don’t like the outcome that’s predicted from our predictive analytics model, what can we do to change it? That’s prescriptive analytics and it comes from learning, like I said, those insights from the other dimensions in your data that illustrate, or demonstrate, or discover for you those causal factors, those causal dimensions in your environment, things that you have control over that you can cause a different future to occur than the one you’re predicting. I like to use an example from astronomy when I talk about prescriptive analytics and predictive analytics, and in fact, descriptive analytics also, and it’s called the killer asteroid example.
Kirk Borne: 00:57:33
This is my usual use case. Most people get it, even if you’re not an astronomer. With asteroids, astronomers collect the data and the data meaning the location and motions of asteroids across the sky, and we see these by the millions. I mean, they’re all over the place. Very few of them ever come near earth, that’s the good news. But we just basically collect data points to see where they are, how they’re moving, how big they are, and other kinds of properties like that. That’s just some descriptive analytics. We can build a predictive model, in other words an orbit, a trajectory, from the data points for a asteroid. For any given asteroid, we can collect multiple data points and build a trajectory, a model, for the orbit, the trajectory. We can predict where it’s going. That’s predictive analytics.
Kirk Borne: 00:58:22
Now, if we predict that this asteroid is going to impact earth, and if it’s a big enough asteroid it could actually wipe out human civilization or all life on earth, like almost happened millions of years ago when the dinosaurs were extincted by an asteroid impact, at least that’s what we think. If we build a predictive model and we say, “Oh, this asteroid is going to wipe out civilization at 12 noon next Tuesday. Have a nice day. See you later.” I think that people would say to the astronomer, “Hey, come back, come back. Can’t you do something about that?” If I was that astronomer I would say, “Oh, you don’t want just a predictive model, you want a prescriptive model.” The killer asteroid is actually, in some sense, relatively easy, at least in its technical application, because we know what the forces are to move an asteroid. We may not be able to move the asteroid, but we know what the forces are.
Kirk Borne: 00:59:22
Change the trajectory, change the path, and how you do that that’s up to engineers. But changing the path, changing the trajectory can move it to a different outcome than the one you predict. If we’re talking about customers, or hospital patients, or employees, or even machines, sometimes we don’t understand necessarily what are the nudges of those things. Prescriptive analytics, looking at all those different dimensions in the data, gives us insights into knowing what we can do, how can we nudge the thing to a different outcome than the one we predict? For example, we predict the customer’s going to leave. We predict the employee is going to leave. We predict the engine will fail. We predict that the patient will get sick and die. Well, if that’s true at the doctor’s office what do you ask for? You ask for a prescription. Prescriptive analytics is finding the prescription to change the outcome from the one that’s coming that you predict, which you don’t want to happen. Sometimes I say prescriptive analytics is like causal predictive, you’re causing a certain future to happen.
Kirk Borne: 01:00:28
We’ve gone through these four dimensions, descriptive, hindsight. Diagnostic, which is oversight. Predictive, which is foresight, and prescriptive, which I now call insight. That is we have enough insight to affect the outcome that we could actually change the outcome. The fifth one in my list of dimensions of analytics is cognitive analytics. Cognitive analytics is where you put it all together in the same way that a cognitive human being would do. You look at all the data and say, “What is the question I should be asking? What is the next best action, next best decision given all of the data?” Cognitive analytics is taking in your data with that 360 view from all those dimensions. From that, you get what I would call the right sight. Now that’s a play on words. We’ve gone from hindsight, oversight, to foresight, to insight, now to right sight. Right sight is, again, the right action, the right decision at the right time, and the right place, for the right customer, or the right product, or the right thing.
Kirk Borne: 01:01:35
Cognitive analytics is really the opposite, truly the opposite, of descriptive analytics. Because in descriptive analytics, for the most part, you have questions that you need to answer from your data. You go find the results, answers to your questions in the data. Cognitive analytics is about finding the questions that you should be asking. It’s not about answering questions that someone has given you, it’s about asking the new questions. That’s, again, a very human trait. If you see something odd, or weird, or strange, or emerging, or different, or interesting, you’re going to say something about it, “What’s that? What is this doing? Why is this happening? What caused that? This wasn’t here before, what is it?” We ask these kinds of questions all the time when we see things in our real world, why aren’t we doing that with our data? That’s what cognitive analytics is about.
Kirk Borne: 01:02:21
Cognitive analytics doesn’t have a mathematical formula. It’s really about the human in the loop and emulating that human behavior through an AI. You can train an AI [inaudible 01:02:30] machine learning to find interesting emerging trends, anomalies, outliers, new clusters forming, new correlations discovered, new outliers appearing, new links and associations. All of these five dimensions of analytics, every one of them can be applied to those four types of pattern discovery, insights discovery that I previously discussed. They’re not mutually exclusive, they’re just different ways of thinking about the same things.
Kirill Eremenko: 01:02:58
Thank you very much for the detailed step-by-step description. It’s interesting that you say that none of them is better than the other. Does, for example, our organization, need to go through all of them? For instance, to get to cognitive or to prescriptive analytics, is it necessary to go step-by-step through all of these, or can an organization, or even a data scientist, jump straight to prescriptive avoiding the previous steps?
Kirk Borne: 01:03:28
It’s not necessary to go through all of them. That’s why I’m trying to stop using the word, five steps to analytics maturity, because it gives the wrong impression that you need to follow these steps sequentially, and that’s not true. These are just five different dimensions of analytics. You can choose to jump in anywhere, except it helps to, of course, have experience with some things before you go into something too deep. For example, if you want to do prescriptive analytics, it’s probably a good idea to do some predictive modeling first, just so you can see what are the outcomes to find one that you might want to change, and then go exploring the data set to see if you can find causal factors and causal treatments, if you will, that can change the outcome. Just going about prescriptive analytics without knowing what and why you’re changing something is not a good business move anyway.
Kirk Borne: 01:04:18
Really taking from the business perspective, you just want to think about what are the business goals and objectives. Predictive modeling is very common. Predictive analytics is extremely common in business analytics applications. But you can take it one step further just like the killer asteroid case. If someone predicts a bad thing’s going to happen for the business, you just don’t want to say there, “Oh, well, we’re going to have a bad business quarter.” Someone’s going to say, “Hey, can’t you do something about it?” And then that’s the prescriptive model you’re going to try to find by asking questions of the data that you didn’t think that’s being cognitive.
Kirill Eremenko: 01:04:49
Got you. Thank you very much. We still have a bit more time. Are you excited to do some rapid fire questions from our community?
Kirk Borne: 01:04:59
I can try. If they’re all, yes, no questions, no problem.
Kirill Eremenko: 01:05:04
They’re a bit more complex than that, but let’s see how we go. Okay. Deep Shah asks, “Can a data scientists work as a business analyst? What extra skills do you need to work as a business analyst?” Basically, I guess, the difference between data scientists and data analysts.
Kirk Borne: 01:05:22
I’m not a business person, but I can assure you that the more skills you have the better that business analyst is going to be. But if you’re coming at it from a data science to becoming a business analyst, and of course what you need to do is understand business. I’m not saying I understand a lot of business, because my background is not that. But I’ve gone to enough conferences and talked to enough business people, and certainly have done a lot of consulting in the last five years at Booz Allen in the consulting of business, that the more you learn about the business, the more you can help the business. Whether you’re going at business from data scientist to analyst or from analyst to data scientist, I think the merger of the two skill sets is what’s really going to make you most valuable.
Kirk Borne: 01:06:08
Learn about the business definitely, because at the end of the day it’s really about solving the business problem and meeting the business goals and objectives, and reaching the business’ core mission. That’s got to start where you start. It’s very tempting as a data scientist to say, “Oh, I’ve learned this new cool neural network or deep learning algorithm. Let me find the use case for it.” Okay, that could be fun and it sometimes leads to good applications in the business [inaudible 01:06:37]. But think first about why you’re doing it, and that’s always a good… Even as a scientist, when I was an astronomer, I certainly wasn’t doing business, but it was about a big goal. It was about a big question that I was trying to answer when I collected data and did my data analysis and those [inaudible 01:06:53]. It’s really about finding, what are the questions you’re trying to answer? That’s really where you need to start.
Kirill Eremenko: 01:07:00
Got you, thank you. Next one is from Kareem, “How is AI shaping the future of data analysis and at which stage is it right now?”
Kirk Borne: 01:07:09
Well my company at Booz Allen Hamilton, we have this concept called Analyst 2.0. I guess everyone attaches 2.0 to a lot of things these days. But Analyst 2.0 is basically the AI-enhanced analyst. This goes back to what I said that AI is really not about artificial intelligence, augmented, assisted, amplified, accelerated intelligence. The analyst can be helped with the AI to help triage the data as I was saying earlier. Find the specific data sets that are relevant to my business questions and my business use case and help me to analyze that data better using these predictive and prescriptive models. Maybe building predictive analytics or even prescriptive analytics models to do correlation discovery and class discovery, and outlier discovery, and link discovery. The data science helps the analyst to go beyond just analyzing data to doing insights discovery. Like I said, even not just insights discovery from the predictive side, but from the prescriptive side. Analyst 2.0 is about using the AI and the machine learning to really augment the traditional role of an analyst.
Kirill Eremenko: 01:08:25
Got you. Makes total sense. AI is, as we discussed, really invading all areas, and I think it’s absolutely necessary for everybody to consider the implications on their careers from a data perspective. Questio
n from Neil, also about AI, “What do you consider to be the most important ethical issues in artificial intelligence at this moment in time?”
Kirk Borne: 01:08:52
There are a lot, of course. A lot of questions people are asking and a lot inquiries about what people are doing. I think in broad terms, I mean, it’s leaving the human out of the loop. That’s the biggest issue if there’s to be one. Not just building models and deploying them automatically, but having it looked at by humans, analyzed by humans and understood by humans. Underneath that is things like explainable AI. That is, can you explain the black box? How did it come to this decision? Also within that category of human-centered AI is the concept of trusted AI. What do we mean by trusted? I mean, does that mean that I trust it? No, it means that a broader audience, a more diverse audience, can trust what it’s doing, and how it’s doing it, and what data it’s using to get to that conclusion.
Kirk Borne: 01:09:50
Really a human-centric AI, I guess, would be the biggest challenge in the AI ethics space right now. As much as we want to do this, again, it adds a burden. I mean, it is a true fact that it adds a burden to organizations to have more ethical reviews and more people looking at the things instead of just… When I was doing my astronomy, I wanted to build some model that would explain some property of a galaxy where I could just… I would be working late at night on my model and I can tweak my model and then I would apply it to the galaxy data to see if it worked. Well, I didn’t need anybody’s approval because it didn’t have any effect on anybody’s life because it’s galaxy data. But what we’re doing now, it’s like you just can’t come up with a model and just deploy it because it’s cool. You have to go through a lot more steps. I think organizations are thinking more about this, which is why some organizations have not only chief ethics officers, but chief AI officers, where they do that governance stuff. That’s not inexpensive, but it’s absolutely necessary.
Kirill Eremenko: 01:10:58
Agreed. Nabeel asks… I don’t know. This is an interesting question I think, very relevant in your case. “What is the maximum period you have taken off from your profession and how have you spent that time?”
Kirk Borne: 01:11:16
Oh, I don’t think I’ve ever taken off. Ask my wife, I don’t think I’ve ever. Other than family vacations-
Kirill Eremenko: 01:11:21
That’s what I thought.
Kirk Borne: 01:11:23
Well, one could say, part of my years at NASA I was a contracted manager. This small company I worked for had a policy of giving us sabbaticals, to research scientists in their organizations, if you had a good research plan and you’d been with the company long enough. I put in a proposal to do a research sabbatical on some of the astronomical research I was doing in those [inaudible 01:11:54]. I applied and I won the sabbatical award. Much to my surprise, I discovered I was the first person ever to receive that sabbatical award. They basically covered my income for six months. During those six months, I actually went back to an office at the Hubble Space Telescope, because I used to work there. At that point, I had already been gone from there for five years or something. But they provided an office for me just to come back and just sit in my office and have no other responsibility but to analyze astronomy data.
Kirk Borne: 01:12:30
It was during that period where I started investigating more about machine learning and data mining as we called it then, and these algorithms of insight in pattern discovery and data, which was actually completely different from the traditional analysis. I had been doing data analysis for many years, but this machine learning was so different from analysis. It really changed the trajectory of my career that I was able to take that time off from my normal day job. Even though I was still doing my profession, that is astronomy research, I was at my day job, which was managing a contract, which is of course a completely different world. But I was able to take six months off from the day job to do this lifelong learning sabbatical as I would like to call it. I spent six months learning a whole new set of skills, and algorithms, and techniques in science that I’d never learned before. That was an incredible moment for me.
Kirill Eremenko: 01:13:29
Amazing, amazing. Have you ever thought of taking some time off and just going and relaxing and not doing anything?
Kirk Borne: 01:13:38
Well, we try to do that every summer, my wife and family [inaudible 01:13:42]. This summer of course is different from all summers, so we haven’t done that this year. But we-
Kirill Eremenko: 01:13:48
[crosstalk 01:13:48].
Kirk Borne: 01:13:48
We go up to Lake George in upstate New York. I don’t know if anyone’s familiar with that, but it’s a beautiful lake country up there in Lake George, absolutely beautiful lake, and we spend some time up there. It’s actually a lot of work because we got to get the house open and cleaned and ready. It’s not like it’s a lazy vacation, but it’s certainly different and fun. Really, once we get everything cleaned and opened up, then it is a lot of relaxation, hanging out in the sun. But the most exciting thing is it’s way far away from any bright light city. When you go sit out on the dock at night and the stars you can see in the sky is just unbelievable. I mean, that’s how I got interested in astronomy as a child. But nowadays it’s so hard to find a really, really dark sky, especially when I live near a big city. But when we go out to that spot where it just can be completely black and you can’t even see your hand in front of your face like they say when you walk out the door. It just reminds me of my youth and how I first got inspired, want to understand the universe as an astronomer.
Kirill Eremenko: 01:14:58
Amazing. Well, that sounds wonderful. I hope once the coronavirus situation settles down a bit, you’ll get a chance to go there and enjoy that once again [crosstalk 01:15:10]-
Kirk Borne: 01:15:10
Yes. Thank you.
Kirill Eremenko: 01:15:12
Awesome. Well, Kirk, thank you so much. We’re running out of time. This has been a fantastic podcast. To finish off, I wanted to say to everybody listening that, Booz Allen Hamilton is hiring. We actually spoke with Kirk right before recording. You can go to careers.boozallen.com. There’s over 200 data science jobs right there now, and over 1000 jobs that mention the word data. I’ve actually learned about Booz Allen and it’s a huge company that does a lot of work with thousands of employees around US and possibly even around the world. Highly recommend for everybody to check this out careers.boozallen.com. Kirk, please, could you share with us, how can our listeners find you, connect with you, follow your work, attend, I don’t know, maybe yet another keynote session that you will be giving. What’s the best place to stay in touch?
Kirk Borne: 01:16:17
Well, I’m extremely active on Twitter. My Twitter handle is KirkDBorne, the middle initial D for Daniel. KirkDBorne on Twitter. I’m most active there. I’m also on LinkedIn, just as Kirk Borne. Usually when I’m giving talks places, I will post the announcements there. But in general, on Twitter, I’m just sharing a lot of information about data science, and AI, and machine learning, digital transformation, internet of things. A lot of stuff I put there every day. I just call that my micro-education platform.
Kirill Eremenko: 01:16:55
And micro is a huge understatement. You have over 260,000 followers on Twitter. Everybody listening, join the little club that Kirk has created and get all the insights. That’s fantastic. Kirk, one more question before we finish up. What’s a book that you can recommend to our listeners to help them become better in data science or just inspire them in their lives?
Kirk Borne: 01:17:23
Yeah. It may not be one that people have heard of or be surprised when I say it, but there’s this book called Data Mining Techniques. It is actually published now in its third edition, Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. It’s written by Gordon Linoff and Michael Berry. The reason people might say that’s unusual, because why would an astrophysicist data scientist recommend a book on marketing sales and customer relationship management? Well, for me, an earlier edition of this book, now this one is much more expanded and larger than when I first found it, which was like the second edition years ago.
Kirk Borne: 01:18:03
When I discovered this book, I realized, in reading it, that it finally made clear to me all of these different algorithms that I was learning. When I first learned about neural networks. I mean, I could mouth the words and talk about hidden layers, and inputs and outputs, and activation functions. I could mouth the word, but I really just didn’t fully grasp what I was talking about until I read this book in the applications, not only of neural networks, but all kinds of different algorithms that we learn in data science and machine learning beyond the math, but actually in real life business applications, real world applications. I found it’s very informative, surprisingly so, to understand these algorithms in a business application context. I highly recommend it, especially make sure you get the latest edition that’s, Data Mining Techniques, the third edition, by Linoff and Berry.
Kirill Eremenko: 01:18:56
Thank you very much, Data Mining Techniques: For Marketing, Sales, Customer Relationship Management, third edition. Indeed, it is surprising to hear that discrimination. However, makes total sense, put these things into real life examples probably helps remember, understand [crosstalk 01:19:15]-
Kirk Borne: 01:19:15
I’m going to post that on my Twitter feed when we’re done, so people can go search my Twitter feed whenever you hear this. Hopefully you’ll find the link to this article or this book if you haven’t found it already.
Kirill Eremenko: 01:19:30
Fantastic. Well, Kirk, thank you so much. It’s been a pleasure to have you on the show.
Kirk Borne: 01:19:35
I appreciate it very much, Kirill. It’s been great. Thank you.
Kirill Eremenko: 01:19:43
There you have it. Hope you enjoyed this podcast. Hope you picked up lots of valuable insights. I love that we talked about several different frameworks, including the three broad categories of AI and data science applications, four types of data discovery, five dimensions of analytics implementations, and more. My favorite part of this podcast was graph analytics. It was a revelation to me when Kirk said that it is, in his opinion, one of the most powerful if not the most powerful type of analytics out there. It really made me think, and that’s why I asked the question, how can people get into that space [inaudible 01:20:29]? Based on how he describes it through the network sciences and other examples, sample use cases that he mentioned, indeed it is probably one of those more advanced types of analytics that is yet to be explored by many. The people that get there first, they’re going to be highly in demand, they’re going to be bringing lots of value. There’s also going to be a lot of fun discovering this new type or progressing this new type of analytics.
Kirill Eremenko: 01:20:59
I have a important announcement. I just spoke to Kirk right after the podcast and he agreed to come and join us for our DataScienceGO virtual event. If you haven’t registered yet, head on over to datasciencego.com/virtual and register. The event is on the 25th/26th of October. It’s absolutely free to attend and you guys will have lots and lots of fun being there with us. You’ll hear from Kirk and many other exciting speakers. There’ll be workshops, plus you will get to network with other data scientists from all around the world with our speed networking functionality. We’re expecting about 5,000 data scientists this time so make sure to jump in and join the fun. Again, that’s datasciencego.com/virtual.
Kirill Eremenko: 01:21:55
As always, if you enjoyed this podcast, you can get the show notes at www.superdatascience.com/397. That’s www.superdatascience.com/397. There you will find the transcript for this episode, any materials that we mentioned, show notes, and any links, including a URL for Kirk’s LinkedIn, make sure to connect, and a URL for Kirk’s Twitter. Make sure to follow Kirk and see what he gets up to in the near future. On that note, thank you so much for being here today. Kirk and I look forward to seeing you at DataScienceGO number two at the end of October this year. Until next time, happy analyzing.