SDS 391: Data Science Campfire Tales with John Elder

Podcast Guest: John Elder

August 12, 2020

John has been doing data science full-time for 25 years and brings his experience to this episode. We discussed complex math, turning real-world problems into data, finding data anomalies, campfire data tales from John, leaks from the future, how to measure complexity, Occam’s razor, and more.
About John Elder
John Elder founded the world’s most experienced data science company, Elder Research, 25 years ago. He has led teams to solve seemingly intractable challenges to, for example: detect fraud, find insider threats, discover a drug, time stock markets, and analyze vast networks. John has co-authored three books – on data mining, text mining, and ensembles – two of which won “book of the year” in Mathematics or Computer Science. He has invented analytic breakthroughs, chairs international conferences, and is a popular keynote speaker, illuminating complex topics with clarity and humor. Dr. Elder is an (occasional) Adjunct Professor at UVA and was named by President Bush to serve five years on a panel to guide technology for national security.
Overview
John Elder is one of the most seasoned veterans in data science, having worked in the industry since before starting his company 25 years ago. We kicked off the conversation with John’s first bungee jump and how he learned it’s statistically just as risky as walking into a hospital. We moved into the beauty of data and how, as John puts it, analytics can be a crystal ball to help people understand their world and even the trajectory of the future.
Elder Research is about 100 people working across multiple offices and industries to provide data science consulting and education for commercial, federal, and classified clients. Since 1995 they’ve been translating real-world problems into the domain of data to find solutions to help others. Domain knowledge is important for this work; as John points out many problems are strikingly similar from the data perspective but in the real world, in their domains, they can be wildly different. Sometimes their work results in what John calls “rolling a hand grenade into the board room” – they find a dramatic error in the business process via data analysis that causes immediate consternation, but ultimately helps. A frequent pain point is getting stakeholders to pay attention to things they initially think are unimportant in the data. John also stresses the importance of sharing stories and talking frequently as a way for data scientists to stay ahead of the curve and bring value to businesses.
One interesting topic we discussed is how and why ensembles work so well. John has joked before that much research is due to arguments, and he was virtually alone in believing that ensembles don’t negate Occam’s Razor – the principle that simplicity is important to out-of-sample accuracy. He was able to show that ensembles look complex (being combinations of many models) but actually act even more simple than a single model. That is, they are less flexible to arbitrary changes in their training data. So, with the right definition of complexity (see GDF by Ye), Occam’s Razor holds. John describes ensembles as like a board of directors, offering balance and cross-industry domain knowledge. The board’s consensus decisions are less extreme than those of any one member, so are less variable or complex. It’s a fascinating way to think of the paradoxical simplicity of ensembles.
John developed and popularized what is called the target shuffling method. He notes there is a huge problem in all branches of science because most published results are not reproducible. Widely-cited observers believe that 50-90% of what’s in journals, even those with high standards, is non-reproducible and therefore false. The p-value makes sense when researchers perform a single experiment, but most are improperly reusing the same formula for multiple trials as they vary conditions to discover something that “works”. The result is that spurious correlations are accepted as real (and papers are accepted for publication). Almost no one is intentionally defrauding the public but it has the same end effect! How can the scientific community allow necessary ingenuity in experiments but also properly account for chance? Target shuffling is John’s answer; it calibrates any measure of interestingness back to a true probability to weed out false positives.
As for the future, John looks back (just like a data scientist!) and muses on the series of surprises met along the decades. He notes that by doing excellent work and paying attention to colleagues and customers, Elder Research managed to survive and thrive through the changing landscape. Because of the high level of unknowns, he personally tends to not plan for the future even a few years down the road. But still prepare: learn the technology, gather tools, and surround yourself with people who can help. Pay attention to people, and their needs, and you’ll thrive through changes.
In this episode you will learn:
  • John’s first bungee jump [4:01]
  • Calculus vs. resampling [14:01]
  • Elder Research [21:11]
  • Domain knowledge advice [25:26]
  • The importance of instincts [41:52]
  • Ensembles and simplicity [59:33]
  • John’s opinions on neural nets [1:10:49]
  • Target shuffling method and the crisis in science [1:17:27]
  • What does the future of data science hold? [1:39:53] 
Items mentioned in this podcast:
Follow John
Episode Transcript

Podcast Transcript

Kirill Eremenko: 00:00:00

This is episode number 391 with Founder of Elder Research, John Elder.
Kirill Eremenko: 00:00:12
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. Each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks forbeing here today. And now, let’s make the complex simple.
Kirill Eremenko: 00:00:44
Welcome back to the SuperDataScience podcast everybody, super pumped to have you back here on the show. We’ve got an outstanding episode coming up for you. You’ll probably notice that this one is longer than usual, that’s because I justcouldn’t stop. John and I had so much fun on this episode and I learned so much. So, John is one of the most experienced, you might as well say veterans of the field of data science. He’s actually been doing what we call now datascience for the past 25 years through his company Elder Research, which he founded in 1995. He’s got tons of experience through consulting company that has worked in lots of industries, from healthcare, to stock trading, to the ITindustry, to lots of other commercial industry, to government and more. Lots and lots of fun in this podcast, you will learn a ton.
Kirill Eremenko: 00:01:42
Regardless of your level, you’re going to pick up a lot of cool things here. So the things we talked about, calculus, statistics and re-sampling. Taking problems from the real world into the data domain and back. The importance ofdomain knowledge and how to develop it. Speaking with your clients, noticing anomalies in the data, campfire tales in data science. So John shared some really cool stories of him being in data science, and the projects that he’s done,and that will help you in your work to incorporate some of the best principles. But most importantly, avoid some of the most dangerous mistakes.
Kirill Eremenko: 00:02:14
Then we talked about data leaks from the future, ensemble methods, this was a big focus of the podcast, so if you want to learn about ensembles and why they’re important, why they’re powerful, and why they’re actually simpler, and howdo you measure simplicity in data science, this is going to be very valuable for you. We talked about Occam’s razor, complexity of machine learning models, generalized degrees of freedom, neural networks, the crisis of data science, andwhy P values don’t work. And, what is the method to use instead of P values, or in addition to P values, to make sure that your insights, or your research, especially your research, is valid.
Kirill Eremenko: 00:02:55
So this question has been brought up several times on the podcast before, P values are destroying the world, unfortunately, the world of research. And we have a lot of research that is actually not reproducible. John answers thisquestion, this is the best way I’ve heard it answered ever. And you will learn about the target shuffling method. Highly recommend. Even if you just get ensembles and target shuffling out of this, you’re going to take your data scienceacumen to a whole new level. I love this podcast, can’t wait for you to check it out, so let’s dive straight into it. And I bring to you founder of Elder Research, John Elder.
Kirill Eremenko: 00:03:41
Welcome back to the SuperDataScience podcast everybody, super pumped to have you back here on the show, and today’s special guest is John Elder calling in from Charlottesville, Virginia. John, welcome to the show, how are youtoday?
John Elder: 00:03:53
Good. Thank you, Kirill, for having me.
Kirill Eremenko: 00:03:56
So excited to have you. There’s so many things I want to talk about, but I want to start with, rather than an icebreaker, a water breaker of sorts. John, tell us about your first bungee jump.
John Elder: 00:04:07
So, I was visiting my cousin in British Columbia with my son, and her son and my son were doing teenage boy stuff, video games. And here we are in one of the most beautiful places in God’s earth and they’re inside. And I said, “We gotto get out and we got to do something. So let’s do something we’ve never done before.” And we find a place where you can do bungee jumping over a shallow river. And the boys, they’re pretending to be interested in this, and you getweighed beforehand on the bottom. And apparently, for some people, that’s the scariest part. They actually write the weight in ink on your hand, in black magic marker, so there’s no confusion at the top, or if anybody’s modest aboutthat.
John Elder: 00:04:56
And when I get up there and show them the number they’re like, “Oh, bring out Big Bertha.” So they get this humongous cable that they don’t normally have to use, they have a towel wrapped around your leg and they wrap this big cablearound and they said, “Do you want to be dipped in the water or not?” So 400 feet below, there’s this little trickle of water. And I said, “Sure”. And they must have miscalculated because, in a moment of insanity, I throw myself offand I go head first in and hit the water, go about 10 feet deep in the water and then get pulled back, feet first up. And if you ever needed to clear your nasal passages, this is a very efficient way to do it.
John Elder: 00:05:40
So I get pulled back up through the water and I’m not doing any of the acrobatic things I thought I might do up there. And they’re quickly wheeling me down to be carted off and folks on the sideline are like, “Wow, did you pay extrafor that?” And I said, “No, no, I didn’t.” So I bungee jumped once in my life. Now, my son did it, he found it to be fascinating, he ended up doing it again in New Zealand later, where they invented that. The Kiwis invented adventuresports, basically. But anyway, I learned from that, never bungee jump over a parking lot. If you’re an outlier, they haven’t necessarily calibrated the system for you, so I’d beware.
Kirill Eremenko: 00:06:30
That’s insane, that’s a crazy story. So I was watching one of your talks from 2017 at the University of Virginia School of Data Science, and you told that story and the most strangest part to me is that it was related to some research,or some statistics, showing that walking into a hospital is as risky as doing a bungee jump. That’s crazy.
John Elder: 00:06:55
Yeah, and I couldn’t help… I do this free association and ended up telling that story, I didn’t plan to tell that story. But I was trying to tell people that’s pretty risky, but just literally, a researcher on risk had shown thatliterally walking into a hospital, as a healthy person, is as risky as that crazy thing that I did, voluntarily, one time. But I’ve walked into a hospital many more times than that, not aware that it was equivalent to wrapping a bigrubber band around a towel around my ankles and flinging myself off a 400 foot cliff, because of what you can pick up in a hospital.
John Elder: 00:07:33
And this was pre COVID. What was going on was the several people of us were invited to the post Challenger, you can tell how old this was, but the post Challenger analysis of what had gone wrong and how can we assess risk better, and soforth. So it’s amazing how many risky things we do without realizing it. And how people assess risk, and what the real risks are, are often completely different.
Kirill Eremenko: 00:08:06
Yeah, I’m reading a book now called Predictably Irrational and it echoes another thing that you said in one of your talks that-
John Elder: 00:08:13
Was that by David Hand?
Kirill Eremenko: 00:08:16
I don’t remember the author. I can check, but in one of your talks, you said that 95% of our decisions actually are not from our logical brain, are like happens spontaneously, they’re irrational. And only 5% is what we actually thinkand logic and all those things. So no wonder, right. So no wonder we don’t assess risk logically.
John Elder: 00:08:40
Yeah. And that’s not my research, that comes from a cognitive bias psychology research and so forth. It’s this, we basically have a model of, there’s a cognitive model that there’s two brain systems, one that’s intuitive and quickreacting and works astonishingly well, and it works with almost no energy expenditure. So when we’re lazy or happy, it’s the one we use, and it’s the one that we want to use. And then the 5% is the cognitive part that analysts aretrained to use. And it takes effort and energy and it exhausts us to use it. And you can actually tell when it’s being used, because your eyes dilate. So if you’re asked to multiply three times seven, you can do it without thinkingalmost. But if you had to do multiply two or three digit numbers, you would pause, you would focus.
John Elder: 00:09:32
You would actually, your peripheral vision would go. Your hearing would pause. All sorts of things could go on around you, and you would miss it. I even knew somebody who would walk down a hallway and stop and put their head against thewall. And they were thinking about something, and they couldn’t even walk and think at the same time. And then they would finish thinking and they’d keep moving. So it just takes more effort, and that’s what we have to use to do thisanalytic stuff, and it’s exhausting. So you can understand why management types never want to use that.
John Elder: 00:10:08
So what I tell people, what I tell people is we analytics, we’re that 5% of a company’s brain, and we’re doing all the hard work and we’re keeping them from just using their gut or using intuition, which does work an astonishingly largepart of the time, but it’s just fooled. It’s fooled and it doesn’t even know it’s fooled. And so I’m a big fan of cognitive biases because they tell us how easily we are fooled and made to do things that make no sense at all if you takethe time to look into it.
Kirill Eremenko: 00:10:46
Yeah, absolutely. And speaking of hard work, I was thinking about this earlier, that if there was a trophy for the most years in data science, you would have first place.
John Elder: 00:10:57
25 years.
Kirill Eremenko: 00:10:59
25 years.
John Elder: 00:11:01
25 years living off of data science. So, I’ve been doing it longer than that, but saying, hey, I want to do this full time. This is what I want to do, this is what I was built for. And oh, by the way, honey, I have a wife, and had threekids at the time and a mortgage, we’re we’re going to make a living off this. Okay? You know, so that was a bold move in ’95. But I had the great good fortune of working for a small company starting in 1980, that actually did inductedmodeling. I was an undergrad doing summer work for a company called Adaptronics in Northern Virginia, that was building inductive models using polynomial neural networks, they were called. They were out of the group method of datahandling, which is a Ukrainian method by a fellow named Ivakhnenko.
John Elder: 00:12:03
But anyway, it was a cool method that that was a cross between regression and neural nets. And it adaptively built a structure of its model according to how complex the problem was. And it was just a fascinating idea that you could belike a crystal ball to see the future, and you could build a model that adapted to how complex the data was, and how much of it you had, and be able to predict the future. And I studied electrical engineering, that was the mostinteresting, closest technical thing, and never took a statistics class, unfortunately, a real statistics class, until I had actually been out in the industry and even written a data mining algorithm. So sort of self-taught instatistics. And then when I came, when I had worked for five years in the aerospace consulting industry, wrote data science algorithm and all, went back to school, got my PhD in systems engineering.
John Elder: 00:12:59
But systems engineering is so flexible that I was able to turn it into a data science interdisciplinary before there was data science. Did a postdoc for a couple of years, and then started my company. So, it was real early. It wascalled data mining. The phrase data mining was coming out then, and presented in some of the early data mining conferences. And it was really a lot of activity going on then. It was a very, very fun time. Didn’t really get caught up inthe dot com stuff, but watched a lot of other people have the boom and bust, and some of them did well. Sorry, go ahead.
Kirill Eremenko: 00:13:40
Maybe that saved you.
John Elder: 00:13:42
It might have.
Kirill Eremenko: 00:13:43
Speaking of statistics, I liked one time you said that, “Statistics is like calculus plus Buddhism, a combination of the two.”
John Elder: 00:13:53
I love my own jokes.
Kirill Eremenko: 00:13:55
And then you elaborated that now… Calculus was designed because we didn’t have fast enough computers, and it was a shortcut to get to the results. But now we have super fast computers and why are we still teaching calculus when we canbe teaching things like re-sampling, and brute force methods, which in your opinion, is that going to lead to a better generation of data scientists and research?
John Elder: 00:14:24
Yeah. I mean, obviously calculus has its purposes outside of statistics, but in the statistics world, most of the calculus type things are approximations and shortcuts to what they’re really trying to do. And a much more accurate inmany cases, approximation and shortcut can be done through simulation using computers. Using random Monte Carlo and simulation and so forth, and re sampling is just that. It’s saying, hey, I’ve got a seven sided die. Maybe it’s a sevensided die, or a 12 sided die. And it doesn’t even have evenly spaced, it’s been hand-carved. So it doesn’t even have evenly spaced sides. And so what’s the best way to figure out the probability that each side’s going to show up? Justroll it a thousand times or 10,000 times, and see if you can just measure the probability. That’s really a very simple, and then in the end, a very accurate way, to do things.
John Elder: 00:15:25
So why not do everything that way, roughly speaking, and not try to have a formula to figure it out. And if you teach students these things, then it gets them thinking about the problem and not getting caught up in the math. And itmakes so many other so many problems solvable, and you can get a close, good approximation quickly to almost everything. Instead of saying, wait a minute, I’ve got to spend hours figuring out the theory, or what formula works here, ormaybe no formula works here. And you have this sick feeling that you find the best formula that’s closest, but you don’t know how to measure closeness. And anyway, it’s just a wonderful freeing thing. And it’s a tool that works on everyproblem to some extent.
John Elder: 00:16:18
And it’s just coming from a computer science engineering background, it’s like, wow, now statistics makes sense. And so anybody, when they’ve done experiments, teaching people the old way and the even older way, the counting versus theequation, people who do the counting, which is now possible when you have this blindingly fast, obedient servant of a computer, and you don’t have to do it yourself. Everyone who uses the re-sampling, not everyone, every class that usesthe re-sampling method to learn solves problems better and more accurately.
Kirill Eremenko: 00:16:57
Very, very cool. How long do you think it will take for us to shift from that old mentality to the new approach?
John Elder: 00:17:06
I think it’ll take a long time, because there seems to be a huge bias that if I went through the pain, you have to go through the pain. I mean, if someone used metrics, if academia had a performance based metric that said, how hard isit to achieve a goal of teaching this concept and attaining this capability, it would be over quickly because you can achieve the capability and teach the concept much more quickly with re-sampling. But that’s not, apparently, the wayacademia thinks about much of anything. I think the key, in my joke about statistics being a combination of calculus and Buddhism, is there seems to be with statistics, a philosophy that people have to break through. The idea that youdon’t truly know where something is, but you have a distribution that describes the possible places that something is. Like the Heisenberg uncertainty principle, like the quantum view of the world rather than the deterministic view ofthe world. And I think there’s, again, it’s almost this philosophical or religious sort of breakthrough about the nature of matter. I have to admit, I was a kind of a determinist. As an electrical engineer, physicist type person, F=MA.If I have a particle and it’s traveling through space, then if I knew everything about it, I would know exactly where it is at a given time.
John Elder: 00:18:55
And statistics says, no, there’s more of an uncertainty principle around it, there’s more of a distribution around it, is kind of the philosophy you have to have. And because even though you sample and you measure something at a giventime, the question is what’s going to happen in the future. And the measurements you have now are just hints about the future. So I always tell my colleagues, anytime don’t be satisfied with a point estimate, be a statistician and get adistribution, and that’s just the biggest breakthrough from statistics.
Kirill Eremenko: 00:19:30
Okay. Gotcha.
Kirill Eremenko: 00:19:34
Are you subscribed to the Data Science Insider? Personally, I love the Data Science Insider. It is something that we created, so I’m biased, but I do get a lot of value out of it. Data Science Insider, if you don’t know, is a free,absolutely free newsletter, which we send out into your inbox every Friday. Very easy to subscribe to. Go to SuperDataScience.com/DSI. And what do we put together there? Well, our team goes through the most important updates over thepast week, or maybe several weeks, and finds the news related to data science and artificial intelligence. You can get swamped with all the news, even if you filter it down to just AI and data science, and that’s why our team does thiswork for you. Our team goes through all this news and finds the top five, simply five articles that you will find interesting for your personal and professional growth.
Kirill Eremenko: 00:20:27
They are then summarized put into one email and at a click of a button, you can access them, look through the summaries. You don’t even have to go and read the whole article. You can just read the summary and be up to speed with what’sgoing on in the world, and if you’re interested in what exactly is happening in detail, then you can click the link and read the original article itself. I do that almost every week myself, I go through the articles and sometimes I findsomething interesting, I dig into it. So if you’d like to get the updates of the week in your inbox, subscribe to the Data Science Insider absolutely free at SuperDataScience.com/DSI. That’s SuperDataScience. com/DSI, and now let’s getback to this amazing episode.
Kirill Eremenko: 00:21:09
We completely skipped the intro. Tell us a bit about Elder Research. What do you guys do over there?
John Elder: 00:21:16
Yeah, well, we started 25 years ago and we’re about a hundred people now in five offices. Charlottesville, Virginia, Raleigh, North Carolina, Arlington, Virginia, near DC, Linthicum, Maryland, near Baltimore. And London is our mostrecent office. We do data science consulting, education, with three main areas, commercial, federal government, and cleared top secret work. We got into that after the 9/11 attack, and a lot of people were really motivated to try to dowhatever they could to pitch in and help that sort of thing not happen again. And the commercial work has been a lot of fun because we work with a lot of companies, big and small, and work in a lot of great areas, and learn somethingnew every time. What astonishes a lot of people who look at us is the variety of things we work with, and part of that is my fault.
John Elder: 00:22:19
Because I’m a little ADD, and I’m like, wait a squirrel. No. I like doing all sorts of things and just love the variety. Love the fact that we worked on our very first big, huge project was helping hedge funds on Wall Street, and wehelped discover a drug with pharmacy and Upjohn, now Pfizer, that they were basically thinking was a dud and we helped show them it was a really fantastic compound, and it became one of the three drugs that they discovered in a decadedue to our work. And we didn’t know, I don’t know anything about pharmacology. Chemistry was my worst subject in school. But I know a lot about data. And that’s the thing, is you can… And we’ve worked in anti-fraud, and uncoveredbillions of dollars worth of fraud for our customers, and so forth.
John Elder: 00:23:12
So we’ve done recommendation engines, and diagnosed diseases, and found new ways to recognize Parkinson’s for instance, and just all sorts of fantastic things, and learned something at every stage. So we’re always learning new things.And then when you learn something in one project, it can often help you with breakthroughs in another project, because you’re learning about data. New patterns of data, new types of features, you’re learning very cool things. So inelectrical engineering, I was a signal processing guy, and in the signal processing world, there’s one main trick. You take data that’s in the time domain, and you translate it into the frequency domain where sometimes things arereally, really clear. And you can filter it, and take out a low pass filter, or you can take out high-frequency stuff, or whatever. You can do all sorts of cool things and then translate it back into the time domain.
John Elder: 00:24:08
And so somethings that might be really clear in one domain and trivial to solve there, well, that’s what we do with data science. We transfer the real world problem into the data domain, and sometimes things are really clear there, andgo back into the real world. But the really cool thing is you can take your pharmaceutical problem or your aerospace problem or your, or your stock market prediction problem, or your fraud problem, and they all are not that different inthe data domain. And so tricks you’ve learned from one adventure in the data domain can often be used in the other, and we just have to learn the vocabulary of the client. The kind of problem they’re looking at, some of their specialcases. And then we can use all of our tricks and wisdom to their benefit in a very, very efficient manner.
Kirill Eremenko: 00:25:02
That’s a fantastic way of looking at going into the data domain and then coming back. It’s like going into the matrix from wherever you are-
John Elder: 00:25:10
Exactly.
Kirill Eremenko: 00:25:11
Kind of the same. That’s amazing. So, a couple of questions, but let’s start with this one. What’s the data domain? Okay, okay. Well, domain knowledge. That’s the thing. So going from the real world to the data domain, and going fromthe data domain back to the real world requires understanding of domain knowledge. And that’s what you said, like sitting with the clients. Do you have any advice, tips, tricks you’ve learned over the 25 years, how to master this domainknowledge quickly? Because that’s the main roadblock for most people.
John Elder: 00:25:43
Yeah. Yeah, no, that you have to have access. You have to have on demand access with the subject matter experts, the SMEs, and this sometimes scares the client. Because they imagined you’re going to need a lot of their time, and that’sgoing to be a big hidden cost to them. So we say, no, look, we’re talking two hours a week. We’re talking, this is not going to be much time, but when I need them, I need them. It was like, I’ve got to be able to get them within acouple of hours of the call or something, because it’s going to hold us up. We’ve got to meet with them, we’ve got to know about what their problem is, what their aims are. We’ve got to talk about what’s your pain, with the businessowner.
John Elder: 00:26:22
Who’s got the problem, what are they looking at? There’s a key thing, it’s interesting. Systems engineering is an interesting domain because there’s a principle in system engineering that the client doesn’t always know what they want.If they knew what they want, they could kind of do it themselves. And sometimes, that’s interpreted as the client is an idiot. That is not right. The client doesn’t know how to exactly define what they want. They know they have pain,something that hurts. They get this pain in their side, but they don’t know how to define always in a technical way, in a way that you can make it a problem in the back of an engineering textbook.
Kirill Eremenko: 00:27:04
They know the symptoms.
John Elder: 00:27:05
They know the symptoms, and you’ve got to be able to help figure out are they trying to maximize throughput? Are they trying to get rid of things that cause downtime? What are the aims? What are the objectives? What’s the criteria ofmerit for this thing that I’m going to turn into an optimization problem? And then, or maybe there’s multiple criteria of measurement. There’s multiple problems. And then you’ve got to figure out which ones are causing the most pain.And so there’s a lot of communication. So there’s that part, sort of the consulting problem of listening to them. And I’m not the world’s best listener.
John Elder: 00:27:43
So one thing I’ve learned is, oh, I want them to listen to me, so I’ve got to take along somebody who is a good listener. Because I’m thinking, what am I going to say next? What am I going to say next? So I got to take along somebodywho takes good notes and listens to them. And people use the vocabulary words in different ways. We had a client, we were doing oil and gas well predictions.
John Elder: 00:28:03
We were trying to predict which gas wells were going to freeze. Now, what does that word mean to you? Freeze. This is in upper Northwest U.S. In the cold times. You think that means temperature. And above ground, these pipes wouldfreeze, literally freeze, but below ground, they don’t freeze, but they use that word freeze for any clogging of any kind. And there’s a clogging that occurs when these particulars get together and they have to use methane to dissolveit, or they use a steel plunger to literally, a huge steel plungers to plunge and break up these… This is where fracking has occurred and so forth. But it was a year into the project before we realized that the vice president of thismajor international oil and gas company also thought of freeze as including gas production stoppage of any kind, including scheduled maintenance.
John Elder: 00:29:04
So to him, freeze meant, “We are not getting money. Money is not coming out of the ground.” And that makes perfect sense after the fact, but of course, why would that occur to us? No one told us that. We asked probably 50 times, sothis thing called active listening, if you’ve ever done marital counseling or anything like that, it’s like, “What I hear you saying is…” You say back to the person what they said, and you try to… Well, marital counseling isreally good practice for consulting. “What I hear you say is freezing is this,” and they’re like, “No, no, it#’s…” But nobody ever said that, but that’s a whole different problem. And by the way, they didn’t keep records of theirscheduled maintenance.
John Elder: 00:29:48
So we had to predict when their scheduled maintenance would be and have that be part of our problems. So anyway, it’s just so you might’ve gleaned from this that there’s a lot of communication that has to occur. You have to continuallybe asking somebody, “So, what does this common word that everyone’s using mean? And here’s what I think it means.” And they’ll say, “Oh, but,” and new stuff will come up all the time. There’s nobody that ever gives you a full listof requirements. It’s an evolving, rolling thing. And you can’t let that shock you. You have to keep going at it and keep poking at it. And [inaudible 00:30:27] thing is you’ll find stuff that’s weird, and you’ve got to show it back tothem and say, “What does this mean to you?”
John Elder: 00:30:33
And what I tell my folks. And by the way, not every analytic person is outgoing. I’m like an extreme extrovert. I’m an outlier. And I find it hard to talk to people because we all want to have this enormously finished and polished andbeautiful thing that we present and we get our blue ribbon, right? Well, we will have polished the wrong rock if we do that. We have got to communicate with them while it’s still very crude and completely ugly and we have nothing toshow for our work because we’ll be solving the wrong problem if we don’t do that. So I force our people to talk to the client every week for an hour, at least, and get stuff back much more often than anyone’s comfortable doing,especially some of the introverted, technical people. It’s just very painful, but it’s essential.
Kirill Eremenko: 00:31:30
Wow, outstanding. I love it. It really reminds me, I’m reading this book called The Lean Startup right now. And it’s about how to build products, no, nothing to do with data science, but it’s about how to build products and businessesin a lean way, meaning with minimum amount of wasteful work. So the work you do is useful. And what you’re saying really just resembles the book very, very closely, because there Eric Ries talks about speaking to customers all the time.And this is just a practice we started in our company a few weeks ago now and already we’re getting so much information. Even just speaking to them when we don’t really think we need to speak to them.
John Elder: 00:32:11
Exactly.
Kirill Eremenko: 00:32:12
Still scheduling those calls.
John Elder: 00:32:13
You have to. You don’t think you have anything to say. And you’re like, “Well, we just looked at the data. We don’t have anything to show for it, and of the 600 items,” and they’re like, “600 items. What are you talking about?” Andthey’re like, “Aren’t there 3000? I thought there were…” And you’re like, “Wait, what do you mean?” And so you already learned something, even when you don’t think you’re learning something. For instance, we did some work for alarge software company and they had a whole bunch of data and we were supposed to look for sequences that the users used. Part of the problem was just data engineering. Just, they had recorded a whole bunch of data. People said, “Yes.I agree to record all of my keystrokes.” And they had never looked at that data. One definition of big data, my definition of big data, is data no one’s ever looked at.
John Elder: 00:33:07
And so they’d been recording this for a long time and it was very unstructured. So these sequences could be extremely brief or they could be days long of all these keystrokes and no one had ever really looked at it. So, we have somegood software engineers who were able to build a tool to ingest all this data and look at these sessions. And one of the questions is what kind of keystroke sequences lead to crashes? What kind of customers do we have, users? Can wecluster them in a different segments? Are there some keystrokes that maybe we should build shortcuts for? Are there training opportunities we could see people using in efficient ways? All sorts of cool things you could learn from thisunstructured data. In some cases, it was a supervised learning because some outcomes like crashes versus non crashes, but others, it was unsupervised data. So it was open ended, but we found something that trumped all of that. It was adata artifact. Basically, there were a lot of users that weren’t registered in their system and it turned out that there were a lot of users that hadn’t paid for the software. Now, these are users that voluntarily said, “Track my everykeystroke.”
John Elder: 00:34:24
That’s not all the users that were using those software, but these are the ones that said, “Yeah, sure. Track every keystroke.” This was a great example of what I call rolling a hand grenade into the boardroom. Sometimes we have afinding and we don’t know what that impact has on the company. Sometimes people get fired over a finding we have. We report something and here was a finding, “Okay, there are users, a good chunk of your users that haven’t paid for yoursoftware. This is not supposed to be possible.”
Kirill Eremenko: 00:34:56
Do you remember what percentage?
John Elder: 00:34:58
I’m not going to say. And so they halted our project. This kills the project right away. Who cares what the clustering of the users is at this point. Something much more important has come up. So in the short term, it’s bad for us. Butof course in the longterm, we just found something that nobody thought of that’s way more important than the project we were asked to find. It’s a data artifact. It’s early in the process, but it comes from looking intelligently at thedata, asking questions, is there something weird here? And then getting back to the client and, “This is important. Look at this,” and them going, “Oh my gosh, you’re right.” And ultimately they fixed that problem and then came backto us and said, “Proceed.” So that’s the kind of value you can bring. You can bring value that no one even dreamed of, because you’re the first to look at it and you’re looking at it intelligently with your client’s interest atheart.
Kirill Eremenko: 00:35:59
You can do that even if you’re not a consultant, even if you’re working in the company, if you have access to the data.
John Elder: 00:36:04
Absolutely. If you know what you’re doing.
Kirill Eremenko: 00:36:09
Do you need to consciously look for it? What’s the advice here? How does one make sure to incorporate that type of analytics [crosstalk 00:36:23]?
John Elder: 00:36:20
That is a huge question. Isaac Asimov who was a great science fiction writer, he was actually wrote about real science too. And he had a phrase. He said that the most interesting phrase in science is not, “Eureka, I’ve found it.”It’s, “That’s odd,” and it even gives me chills when I say it. Now, think about it. To think that something is odd means you have to have expectations. So when you’re looking at data and you say, “Huh, that’s off,” that means thatyou had to know something about what should be there. You can’t be just looking at numbers. You have to know something about what they mean and what the business problem is, and so that means you have to know something about the dataand the problem and what you’re expecting, and that domain knowledge has to be translated into what you’re think you’re looking for. You have to have some kind of a hypothesis. And then you see something and you’re like, “That’sweird.” And then you follow that.
John Elder: 00:37:27
So it’s been a few years now, but there was a show called Monk, which was a Sherlock Holmes as a modern day person who was obsessed with details but couldn’t really function in the real world. Actually, he would do quite well during thepandemic because he was a germaphobe and he wouldn’t shake hands with people without a wipe or something. Anyways, it’s a good show, but you have to be like that. You have to be a real weirdo and be obsessed with some details. And wehad a problem once with, well, we had data, I was working with a guy I really liked, and he was a former air force guy who started his own company and was early, early recommendation engine thing and trying to figure out whatcharacteristics of different customers made them good prospects for some of their woodworking tools that they were selling through catalogs.
John Elder: 00:38:25
And I was finding some weird relationships in who was responding and I pointing out, “Well, there’s something weird in the data.” And he had given me a very specific task that he wanted me to do. This was early on in the company andhe was a [inaudible 00:38:44] guy. And eventually he says, “God dammit, stop talking to me about that and get on with the task.” And so I use it sometimes as an example, what do you do? You’ve got a client or a boss that’s told you todo this, but you’ve noticed something weird in the data that you believe it’s his best interests that you really need to look at that, and leave that as an open ended question for somebody and see how they struggle with it.
John Elder: 00:39:05
And of course, as a boss paying someone’s salary, they can’t say, “Ignore the boss and do this.” That is not the right answer, but it’s also not the right answer to ignore, and there’s kind of the answer that I’m looking for, which iswhat I did, which is you had to do both. Do the thing for your boss, or you haven’t earned the pay, but if you can’t convince him, obviously, but then in your own time, you got to follow up on your own thing or you’re just not in theright business. And I did. And I found the problem. The merge/purge house that puts all the different lists together and then does the final mailing out, they were automatically excluding international clients, but the internationalclients were not being sent catalogs. They weren’t being pinged, but those guys were ordering some of these tools, and they would get a catalog if they ordered something. So sometimes they were just ordering something just to get acatalog.
John Elder: 00:40:12
So they would have ordered a lot more if they’d been prompted with catalogs, but they were being thrown out. So this little relationship between orderings and mailings and so forth that fit for everybody else wasn’t holding up for them.And anyway, we built some models. We found that. They improved the quality of the paper to make a more attractive catalog and they doubled their sales per catalog. So unfortunately it did all those things at once. So it didn’t reallysay how much did that one finding matter, but doubling their sales was a huge step. And part of it was that, and so I won to a friend for life by ignoring him a little bit and [crosstalk 00:40:57] the thing that I knew what wasbest.
John Elder: 00:40:59
I did what he wanted to and so he was super happy with that. But you just have to follow your instincts, not to the exclusion of doing what you were told, and you get these instincts from experience. So there’s almost really no shortcutto it. But if you can get together with other folks and swap stories and especially stories of mistakes, that’s one reason I emphasize mistakes. I have the Top 10 Data Mining Mistakes is one of my chapters in the data handbook, the datascience handbook that we wrote a few years back. And the mistakes stories are so much more interesting because best practices are boring. Brush your teeth, call your mom, that kind of stuff is boring. You want to read about the guys whodid everything wrong.
Kirill Eremenko: 00:41:50
Absolutely. Absolutely. Yeah. It makes total sense. And it’s interesting you mention follow your instincts for data scientists because people expect the opposite, that data scientists are very rigorous, logical, straightforward. So it’scomforting to hear that instincts do play a role because one of the recent trends, I guess, that people talk about is, will data scientists ever get replaced by tools like AutoML and all these automated machine learning tools and datascience tools. So what do you think? Do you think data scientists will have a place always or one day even machines will be able to follow these instincts?
John Elder: 00:42:36
There’s more and more can be done automatically. And that’s in general, a good thing. I think there’s, of course, always a place for that translation of the problem from a new problem from the real world to the technical problem. Ithink it’s great to have well-established protocols and practices for setting up a testing protocol without a sample data and making sure that there’s no leaks from the future and so forth like that. So those are good places to have awell established protocol and to understand why you’re doing that and so forth, but where it’s really, really good to have instinct and have experience and a really wise person is one who can learn from the mistakes of others anddoesn’t have to make the mistakes themselves. So if you can sit around the campfire and tell stories, that’s extremely valuable.
John Elder: 00:43:37
Some idiots have to learn only by doing it themselves. So it’s going to take a lot longer. And I count myself mostly in that camp, but luckily I’ve made a lot of those mistakes now, so, but if anybody can benefit from the stories, Ithink my colleagues get a little tired, because I’ll say, “Oh, that reminds me.” And then they’re like, “Oh, another story,” but then sometimes it saves them a month. So we did some training of, it’s like, if you’ve seen Karate Kid,it’s kind of an old movie now, but Mr. Miyagi gets the young man to paint his fence in a very particular way. And it seems to utterly crazy to the youngster why he’s doing that. And then all of a sudden in a combat situation, he’s usingthose moves, those particular moves that he’s done hundreds of times.
John Elder: 00:44:40
And this thing happened recently. We’ve got a nice relationship with a large consumer goods company, several actually, but one of them, we’re doing a lot of training, and the training is very intensive. We’re training their datascientists to up their game a little bit. And we’re talking about techniques and stuff, but I’m also telling some stories and early on, they’re like, “Oh my gosh, when’s he going to get to the point? Why is he telling stories?” Andthen after one story, I asked folks to just write down one point that they heard from that story that they thought might be useful, and amongst them, they had maybe four or five things that came up and then a week later revealed,”Okay, here were the 12 points from that story that would have helped you on this homework problem that you had that were embedded in that story, in the lessons that were learned.”
John Elder: 00:45:38
And after a few weeks, light bulb started to go off and a lot of people were granting and telling me about, “I get it now,” and what they were being really annoyed with early on is like, “Oh my gosh, this old guy is wasting our timetelling these stories.” They weren’t realizing, “Wait a minute, this is the best part of the course.” And actually, a little bit of that was given away because the director of the whole group, there’s hundreds of people reporting tohim, dropped in for part of the course, was there for one of the stories, and I didn’t tell these stories constantly, but he was there for one of the stories and then his colleague told me he retold that story six times that next day todifferent groups all over, because he thought it did such a great job of illuminating a particular problem that he had never thought of before. And so I was like, “Okay, that was a win.”
John Elder: 00:46:38
So the most senior people got it. And it took a little while for the more junior people to realize, “Oh my gosh, these are like proverbs. These have nuggets in them.” So anyway, I wasn’t getting the love at first, but it eventuallycame around. Those are those campfire tales are extremely efficient ways to pass on the wisdom from the previous generation. It’s like you’re talking about a particular beast that you ran into in the forest, and if the younger folks canrecognize it when it comes around, I’m telling you, it’ll save you a month of work and you’ll be way better for it.
Kirill Eremenko: 00:47:30
I’m so intrigued. Can you tell us that story? What is this story?
John Elder: 00:47:35
Okay. The one that I told there, this is going to be such a letdown after that buildup, but it’s just one of the top 10 data mining mistakes is leaks from the future. And that’s the idea that anything that was in your training data, youmight’ve thought you did a good job of saying, “Here’s what I knew at the time,” but really something from the future leaked and got in there. And this can come in very subtle ways. And in this particular instance, we were workingwith a small startup company, not something we do much anymore. By the way, you know, there’s this problem. There’s a side story here. And this is one of the problems with telling stories, is sometimes a side story comes with a… Well,I’ll get back to that about startup companies. I’ll get back to the side story, why we don’t work with startup companies very much anymore.
John Elder: 00:48:22
But startup companies get a little desperate. They are so invested in solving the problem because everything depends on it, that they sometimes lose sight of the truth. We are servants of the truth. We have to give the people the answerthey don’t want if that’s what the data says. Startups tend to not do that. They tend to get the answer they want, come hell or high water. And this particular guy, a PhD in computer science, was very smart, very productive. He’d write300 lines of code at night, but he kept making the same mistake over and over, no matter how much I tried to teach him not to. And so the mistake was, they had a fixed amount of data because it was a biological problem. They were tryingto predict a particular disease in a very clever way, shining infrared light through the skin and have it reflect on the blood, and the blood chemistry would be revealed by the spectrum of light that reflected back.
John Elder: 00:49:34
A lot of information in that. There’s a lot of cool things you can do with that. They were actually getting enough information back to do some really cool things. The problem was, they had to have a certain level of accuracy better thanblood tests to unseat the barbaric blood tests that were being done currently for that particular diagnosis. They were close. They actually had published a journal article proving they were getting a certain level of accuracy, but whenwe came in and did it right, we showed them that their real accuracy was worse. So our first contribution was show them that they were actually doing worse than they thought they were doing. So, yay us. They were not real happy withthat.
John Elder: 00:50:16
But we eventually got them back up to where they thought they were, like, “Okay, great. You’ve gotten us back up to where we thought we were when we first hired you, but we need to get up to here.” And that was a struggle. The problemis they had a fixed amount of data. They had spent a fair amount of money getting experimental results from a whole bunch of people with the hardware that they built. So they built the hardware, they built the software. It was newtechnology. There wasn’t a lot of data. This is a hard thing. You need more data. Big data is not the problem. Small data is the problem, because then you’re going to learn too much from it. It’s not going to be able to get good out ofsample results. You’re going to know too much about your data.
Kirill Eremenko: 00:50:55
Overfitting.
John Elder: 00:50:56
Overfitting. And this was a huge problem for them. They knew too much about their data. They had a hard time separating data out, keeping it completely unknown to whatever learning they were doing, and then having a truth out of sampletest. So here’s what happened. They take the matrix of data they’d built. We got them on into principle components. We did a principal component feature of the entire dataset, then split it into training and testing, trained the modelon the principle components of the set, and then used the model on the out of sample principle component values and got a prediction. What did they do wrong? Well, they calculated-
Kirill Eremenko: 00:51:40
Oh,. I know, I know, I know when they were doing the principle components, they did it on the whole thing, not on the separate parts.
John Elder: 00:51:47
Exactly. Exactly. So the principle components, you’ve got it, and not many people do, so way to go. Good job. [inaudible 00:23:58].
Kirill Eremenko: 00:51:56
Thank you.
John Elder: 00:51:57
And you would think that wouldn’t be that big a deal because the principal components are just redrawing the axes, but the principal components are peeking ahead. If there’s any kind of outlier in the out of sample data, the principalcomponents are pointing at it. The principal components are affected by it. They’re not so surprised by it. It’s part of defining the component.
Kirill Eremenko: 00:52:21
Principal component.
John Elder: 00:52:22
Yeah, and it’s astonishing how much difference it made when you did it right. When you calculate the principle components based on the training data alone, and then used those weights as features on the out of sample data, you do muchworse in this particular case. And it’s like, “Dang.” I didn’t think it would do that much worse, but I knew we had to do it that way. And actually what happened, I don’t know if it was on that particular one or if it was on anotherone, because it happened like 20 times, but one time they called up, we were physically with them a lot of the time, and it was one of the times we were apart from them. And I had a brand new PhD employee who I said, “Oh, sit with meon this call. You’ll learn how to do consulting.” And he was a very polite, well educated guy, and he’s sitting right with me. And I’m an evangelical Christian. I’m a super polite person.
John Elder: 00:53:26
Anyway, I’m on this call. And they said, “Oh, we got these great results, and we did this well.” And I stood up and I said, “I don’t give a F.” I said the word out loud. I don’t think I’ve ever done that in public as best I know.And especially to a client on the call. And my colleague’s eyes are this big. “What result you got, you did it wrong.” I said, “I feel like a guy standing at the edge of a cliff trying to keep you from driving off. You’re doing itwrong. Listen, do it right, and if you get good results, don’t pay our bill, fire us. You’re doing fine. If you get the results that I think you’re going to get, which are wrong, pay our bill, and then decide whether you want to keepusing us or not. But don’t call me again with this same mistake.”
John Elder: 00:54:20
And three days later, the boss called and said, “You were right. What do we do?” But I was so embarrassed and my colleague, my new hire, said, “I learned a lot about consulting today.” I was like, “Please, please ignore, erase.Don’t ever do that. Whatever I just did, don’t ever do that.” I just lost control of myself. Yeah, I’d had it. And yeah, our relationship with that company didn’t go on much longer, but the friendship did. We’ve stayed in touch, butthat startup did go bust, yeah.
Kirill Eremenko: 00:55:01
Okay. Okay. Interesting. Oh, that’s not the first time you’ve done that. There was a project you spoke about at one of your talks where the client didn’t believe that you can bring value. So you said, “Okay, if we don’t bring value,you pay half our bill. If we bring value, you pay double our bill.” And they were so happy with that offer. How did that story go?
John Elder: 00:55:23
Yeah. And that’s one of the only times we’ve gotten value based or closer to value based pricing. That ended very well for both of us. Yeah. It was an interesting, because it was with a big company. It was Capital One. They’ve given mepermission to talk about that. It was years and years ago. It was 20 years ago and Capital One built a whole business around analytics and they do an extremely good job of it, but they were told, “Get you some of this new data miningstuff.” Elder Research is pretty good at this.
Kirill Eremenko: 00:55:54
Sorry, just for everyone. What does Capital One do?
John Elder: 00:55:58
Credit scoring. So [crosstalk 00:56:01] better credit risk for credit cards. So they’re a bank now, but at the time they were just credit cards, and that’s how they made all their money. As they said, “We can do better than the normalbanks at telling who is a good credit risk and who’s not,” and so they would take the Experian type credit scores and then add their own intelligence to it and be able to … and they invented the idea of buying other peoples’customers, doing a bank transfer. “We’ll transfer you over to us,” so stealing other customers to give them a better deal. Kind of a cool idea, but this idea was also pretty clever.
John Elder: 00:56:47
They had the idea of offering credit to people who had never even been considered before, and because people had never applied before, there was no analytics on them. So they were getting … so there was no data on them either.So they actually had to invest years to give people credit … had people apply and basically gave him credit no matter what. Very small amounts of credit, $300, $350, something like that, but they basically said, “Yeah, here’scredit,” and then kept track of them for a few years, and if they had defaulted after 90 days, they considered them … or if they hadn’t paid after 90 days in the first couple of years, they considered them late, and if not, theydidn’t. Now they had a machine … they had a very, very good … and still do, of course … a constantly updated way of building credit that’s very good, but it’s possible that the data science could be better and so they were kind oftold, “Okay, go see if any of these new things … get this consulting company to look into some of these new ways of doing it and keep some out of sample data and do a bake off.” I don’t know if they were told that or not, but wesuggested it.
John Elder: 00:58:05
Well, in dealing with the individuals who were sent, I could tell they were very reluctant. It was like, “We were forced to do this.” So where there’s a difference of opinion, there’s a betting opportunity. So I suggested the halfprice versus a double price … I really should have done higher than that, but I actually … this was good and they took me up on it. “Hey, great, half price deal,” and the end result was we beat them pretty handily. Now we didn’tbeat them by much, but the credit scoring is where if you can beat them by this much, that’s tens of millions of dollars because of the leveragability of it. If you can reduce the default rate from X percent to X minus 0.1% or whatever,that’s tens of millions of dollars. So the accuracy matters a lot, and one of the secrets was ensembles, using competing models, but that wasn’t the only secret. Some of the more modern modeling techniques at that time were prettyuseful as well. And I ask people sometimes … I ask when I’m teaching classes and I say, “Well, who was excited by that?” Obviously we were, but it turns out for us that was just tens of thousands of dollars. For them, it was tens ofmillions of dollars. So yeah, they were really excited about that.
Kirill Eremenko: 00:59:31
Yeah. Yeah. That’s awesome. Let’s talk about ensembles, or in the words of your Spanish translator in Santiago, Chile, tell the pig Christmas is coming. Tell us that story.
John Elder: 00:59:47
Yeah, soon after I did the … I did a study of five different techniques. We had some friendly arguments in the office. Again, this was 20 years ago … friendly arguments in the office of which techniques were better. It was reallyduring this time too, doing this work for Capital One, we had some friendly competitions going on, and so after we did the Capital One work, we did some out of sample tests on some other academic problems using different techniques, andI have a graph where I show that every technique that we used on these six different academic problems of the five techniques that we looked at, came in first or second once at least twice. So I said, “Every dog has its day.”
John Elder: 01:00:38
I was at a conference in Santiago, Chile and I said that, and the simultaneous translator in Spanish said something with no perros in it, said something completely different and I asked her later what she said, and she said … it wasamazing. She translated it into a local idiom on the fly. She said, “Tell the pig Christmas is coming,” and I was like, ” What?” And she said, “Well, you know, in the barnyard the pig thinks it’s hot stuff. Christmas is coming,you’re going to be dinner,” and I said, “That’s kind of the opposite of what I was saying, but yeah, that is the same point.” Yeah, the technique that was all hot stuff on one problem is dinner on the next one is utterly the worst,and that’s sometimes called the no free lunch theory, that there is no really one best technique, that the assumptions around the mechanisms of certain techniques better match certain problems.
John Elder: 01:01:31
And so a lot of times people are trying to look, “Well, what kind of characteristics does this problem have? And therefore, what technique should I use?” And that has its moments, but the very next slide here is if you just put thecompeting models together in a reasonable way, it does really, really well, and in fact, it does almost as well as the best, and then even some of the fancier ensembles do as well or better than, in this case, the one that won thecontest, which was neural nets, won this particular contest of the five techniques that we tried. So ensembles and the … even simple ensembles like averaging the predictions of the five different methods to get a new prediction workedvery well, or voting, or things.
John Elder: 01:02:20
So you don’t have to necessarily do fancier ensembles like boosting or ones I invented called adviser perceptrons or things like that. So ensembles have been a really cool thing and have been adopted all over the place, and I had theopportunity with Giovanni Sinni to write a book on that. So I was one of the earlier inventors of the idea, and if you look back … but I argue that if you look back in Proverbs, Solomon is talking about, if you’re going to war, ask amultitude of counselors for advice. If you’re making a really big decision, get a lot of people’s advice before you do it. So that can be thought of as an early ensemble model, in my opinion. So it’s really an idea that’s prettyancient, but one of the big questions … and Pedro Dominguez, who’s a fantastic researcher, even as a grad student won best paper award at one of the KDD, one of the Knowledge Discovery and Data Mining conferences.
John Elder: 01:03:21
And I was on the award committee, and I have to mention this … sorry, Pedro, I voted it second best, but everybody else on the committee voted at first best. So I thought it was an awesome paper, but I disagreed with it, because Pedrowas a great writer and a great researcher, but the title of the paper was Occam’s Razor is Dead, and one of the reasons … and the Occam’s razor idea is that the simplicity is better. If two things are equally accurate describingsomething, you should take the simpler explanation as being the more likely true one, which is really a philosophical idea of William of Occam in the 14th century, and it’s sort of been a principle that’s guided statistics, that youregulate complexity and you prefer simplicity, and it’s almost a religious belief, if you will, but he brought out a bunch of heretical critiques of that idea.
John Elder: 01:04:19
And one of the biggest was ensembles. Look at ensembles, they’re more complex. You’ve got multiple models, you’re adding them together in some way, and they generalize better. They are doing better than single models at predictingthings. So obviously simplicity isn’t all it’s cracked up to be, and I said, “You know, I think the problem is, I don’t think ensembles are more complex. I think we’re measuring complexity wrong,” so I sort of had a background task ofproving Pedro wrong. He got the paper award. I thought it was a second place, but anyways, so I was like, “Pedro, I’ll prove you wrong.” I love Pedro. He’s great. So I did. In the stories I did, I stumbled across a concept calledgeneralized degree of freedom by a guy named Ye who I haven’t met yet, but who thanked me for advancing his career.
John Elder: 01:05:19
But anyway, he had a measurement of complexity that measured the flexibility of a model. So he said that a model is complex to the degree that its predictions change when the inputs change. So he would add noise to the inputs. So you’vegot a black box here, and if you had noise to the inputs, you refit the model. If the predictions change a lot, then the black box is complex. If they don’t change much, then the black box is simple, but let’s say this is an average.You add random noise to your inputs. The average isn’t going to change much. Your prediction is going to be almost the same, but if this is some kind of very complex, nearest neighbor type thing, then your prediction vector might changea fair amount. So it turns out with regression, it works perfectly.
John Elder: 01:06:20
It gives you the complexity count is the number of coefficients in your regression, but that’s, by the way, the only technique for which your degrees of freedom is equal to your coefficients. People have known for a long time that youbuild a neural net, your neural net actually has fewer degrees of freedom than all the weights it has in it. The weights are rather weak. They only have a fraction of a degree of freedom, but if you build a decision tree, the degrees offreedom are more than the number of weights in the decision tree. The decision tree has three or four degrees of freedom probably for each weight people have estimated at different times. So if you look at these fractional things in theliterature about how powerful are the parameters in your modeling method, depending on your method, and it depends on how much data you have, how many variables you have, all sorts of things, and it’s kind of confusing. Well, what Yedidn’t realize is he had come up with the answer to this. He’d come up with a way of measuring, using re-sampling, the complexity of any modeling method.
Kirill Eremenko: 01:07:28
Wow.
John Elder: 01:07:28
So you have this really … and it’s not enough reason. I don’t read much of the literature, I have to admit, but as far as I’m aware, people haven’t done … there could be so many cool things done with it and just somebody could justdo a big survey and figure it out, what the complexity of a whole bunch of different methods, what the true complexity is, but I did a little survey and it showed that the complexity of a few different methods, including the complexityof a single decision tree versus a complexity of an ensemble of decision trees, and the ensemble of decision trees is simpler than a single decision tree.
Kirill Eremenko: 01:08:10
Well, more stable, right?
John Elder: 01:08:12
More stable, more simple, less complex, less flexible, and it kind of makes sense after the fact, if you think about it. Think of an ensemble as a board of directors. All right. So I’m the majority stockholder of Elder Research. Ireigned supreme as a dictator for life, which is a horrible situation, but if I had a board of directors … which I’m on a board of directors for a nonprofit, for instance … there, when we make a decision, there’s experts indifferent fields, and we all argue … very politely, we argue about things and the decision we make is a consensus, and it’s less extreme than if any one of us was dictator. And that’s kind of what the ensemble is like. So the decisionis a less extreme, more consensus decision, and therefore it is less complex. It is less variable given the changes in the inputs than if any one of the individual models was in charge. So if you measure the complexity of something, notby how complex it looks, but by how complex it acts, the ensemble is less complex, and therefore it does not overfit … no matter how many models you add to it, if the models are independently built without knowledge of the othermodels, then it does not overfit and therefore it generalizes better.
Kirill Eremenko: 01:09:41
Wow. Amazing. That’s a very interesting perspective, but by that token, if you just output a zero every time, then that’s the least complex thing you can ever come up with.
John Elder: 01:09:55
Right. Right. I mean, yeah. Obviously if you’re so simple that you’re useless, yeah. There’s the appropriate level of simplicity.
Kirill Eremenko: 01:10:08
Makes sense. Makes sense. Okay, fantastic. We’ll find that research paper and link to it in the show notes. I think it’s fabulous. And maybe somebody listening will take on the challenge and measure complexity over the methods that wehave right now.
John Elder: 01:10:21
Yeah. It would be a fun project to do. I’ve always wanted to do it, but yeah, it’s out there. Generalized Degrees of Freedom, GDF, by Ye. Ye is his name, and then the paper that I wrote might be a good starting point. It’s in theJournal of Computational and Graphical Statistics, basic GS, and I also basically reprinted that same journal article in my ensemble book.
Kirill Eremenko: 01:10:48
Amazing. Fantastic. You mentioned neural nets. So as far as I understand, that experiment that you conducted was in 2017 and you back then made a comment that you’re not able to know in advance that neural nets will be the best method.Three years have passed, have your opinions on neural nets changed?
John Elder: 01:11:10
Well, you have to realize I hated neural nets as a youngster. Remember I grew up … okay, they have a saying, to a little boy with a hammer, all the world’s a nail, and my hammer was polynomial networks. Polynomial networks adapt theirstructure, unlike stupid neural nets, which have a fixed structure, but polynomial networks go off to infinity at the edges, unlike well behaved neural nets that have that beautiful sigmoid that stops them, but neural nets got all thelove. Polynomial nets got no respect, and neural nets got the Gartner hype cycle. Oh my gosh, they’ve now been through it three times. Polynomial nets never had their moment in the sun. Okay, so polynomial nets were doing just all sortsof good things. Neural nets didn’t even have to tie their own shoes to get a press release, whatever.
John Elder: 01:12:07
It was just not fair. So I was terribly, terribly biased against neural nets. That can give you so many stories. It was just not fair. It was just not fair. So I am a reluctant convert to neural net. I’m like, “Okay. These thingsactually are pretty cool. They actually do some pretty cool things.” I mean, I did my dissertation on a global optimization method, so I’m no fan at all of the backpropagation because it’s such a crude and ugly optimization method thatcompletely misses out on all sorts of things, but Hey, it works. It’s like, “Golly.” Some of its weaknesses are actually strengths. It doesn’t overfit. The fact that the neural net’s using only a fraction of the descriptive power ofits weights is actually a strength, in terms of, again, not over-fitting.
John Elder: 01:13:06
It drives some people crazy that you can get a different answer with the same exact data because of the random starting points and the random searching, and I was like, “Ah,” but again, that can be seen as a … as neural net peoplesay, “That’s not a flaw, that’s a feature,” and they’re not entirely wrong. So I’m a reluctant fan of neural nets. I do watch with amusement the over-hype. I was glad to see that deep neural nets are starting to get some of thenegative press. They’re going over the peak into the trough of disillusionment. It’s like, “Oh, they haven’t really set up their experiments quite right.” “Oh, there are other techniques that can do just as well when they do thatcorrectly,” and so forth, but it’s still impressive, some of the problems that they’ve been able to solve, and they’ve been able to do a lot more without customized features.
John Elder: 01:14:04
They’ve been able to discover a lot of the features, but when people I’ve talked to who have done work with customizing features can get better results than just plugging in the raw data roughly on the order of about 10%, which … butstill, just the fact that you can plug in raw data and get 90% of the accuracy that you could do where you’re taking more care, that’s still very impressive. So neural nets are one of the top techniques that I teach, and I was a longtime coming to that level of respect and I’m trying to figure out why they work well and so forth, but yeah, I feel like if you pair down the inputs to the ones that really matter, it helps a lot. If you control the distributions ofyour inputs to be roughly normal, it helps a lot. So if you transform the shapes and if you get roughly the hidden nodes to be about the right shape, and you see that I’m talking about old school neural nets and not deep neural nets,which I don’t have much experience with.
Kirill Eremenko: 01:15:19
Okay. One time you said AI as an alternative source of wisdom is not as good as inducing models from real data. What did you mean by that?
John Elder: 01:15:29
So where I’m expert and where I’ve seen things work is when you have data that reflects historical and you learn from that. So you induce models from data. So this is the machine learning way of doing things, and the traditional AI wayof doing things would be more deductive or top down, the expert system, the rules-based, or not machine learning. So I think AI is used today to encompass machine learning and everything else, but I’m using the distinction between topdown deductive and bottom up inductive, and saying data science and machine learning are the bottom up, where you’re training on data, and AI is the top down.
John Elder: 01:16:27
You’re training a car to say, when you see a stop sign, do this, and a stop sign is defined by eight sides or six sides, and red, whereas inductive modeling would be just taking in a bunch of images and saying the expert driver stoppedthe car when the images look like this. Figure out why, and a lot of the success that’s being ascribed to AI is really a combination of those two systems. It’s a lot more efficient if you’ve got well-defined rules for driving to startwith that, and you can immediately get some results and then refining it with the inductive method is how they’re getting … so all the really cool results that have made the headlines have been a combination of those two kinds ofsystems from what I understand.
Kirill Eremenko: 01:17:19
Going back to the ensemble, right?
John Elder: 01:17:22
Yeah, kind of ensemble. Yeah. Yeah.
Kirill Eremenko: 01:17:25
Yeah, interesting. All right. I’d like to talk a bit about something that you popularized and I read, and I watched your video about it. It’s fantastic, I think, and it’s called the target shuffling method, and so that method’s analternative to the overused and over relied upon P-value test. Tell us a bit about that. I found that method … I’m surprised not more people are using that for evaluating models.
John Elder: 01:18:00
Well, thank you. Yeah, I think it’s probably the biggest problem in science, is that there’s a crisis in science, understood that most of the articles that are published in science are unreproducible. So they’re basically false, andeven the Lancet, which is one of the top medical journals, it’s published in Britain, they said in an editorial that they believe half of what they publish is unreproducible or false. They just don’t know which half. So it’s kind oflike marketing for us companies. We know that half of what we spend on marketing is worthless. We just don’t know which half, and their standards of course are very high. They only accept 5% of the papers that are submitted, and notjust anyone submits to the Lancet. There’s a lot of self selection before they submit, and of course they’re using P-values as a way to say … and the P-value makes sense if you’re doing one experiment.
John Elder: 01:19:05
So if you’re rolling one 20 sided die, the chance of it coming up 20 is 5%, but if you roll that die 15 times, the chance that your best role is a 20 is much, much higher. If you roll it 80 times, you’re getting close to certain thatyou’re going to get a 20, and the problem is that people are using the formula as if they rolled it one time when they really have rolled it 17, 35, 170 times. They’ve really gone back to the well in their data or in their experimentmany times. I play pickup basketball. I know it doesn’t look like it, but I enjoy that … or maybe golf is a better example. I only play once a year in a charity tournament, and luckily it’s best ball, it’s captain’s choice.
John Elder: 01:20:01
So if I hit one good shot, maybe it’ll get used by the team. Well, it’s like people are playing best ball, but scoring it as their own card. We all hit it, we take the best shot, and then we go forward and it’s like, “Wow, what a greatgame I had,” or I get a mulligan. I hit it into the pond. It’s like, “Well, I’m going to take another shot,” and I get that one and I count that as my game. That’s just not a fair card, and –
Kirill Eremenko: 01:20:29
Sorry, so they disregard a lot of the bad outcomes.
John Elder: 01:20:37
It’s not even that bad, because they’re not really aware that they’re doing it, because what’ll happen is they do an experiment and then it doesn’t really work out right and they think, “Well, what if I use a lower temperature?” Andso they’ll do another experiment with a lower temperature, or they say, “Well, what if I paint the samples blue first?” And, “What if I raise the ultraviolet frequency more?” Or, “What if I …” and this is ingenuity. Edison, hetried a thousand … he got the brilliant idea that heating a metal would be great to create a light, and he tried a thousand different metals. They didn’t work. “I haven’t failed. I’ve just learned a thousand things that don’t work.”Well, he eventually got the idea of doing it in a vacuum so it wouldn’t burn up, but he tried a lot of things that didn’t work, and this is all American ingenuity, and people are trying … they’re changing the experiment and they’retrying all sorts of things and they’re learning, but what they’re really doing also is rolling the dice again, rolling the dice again, rolling the dice again. They’re trying something different, but they’re also having another role atthe dice, and chance is going to work sometimes.
John Elder: 01:21:47
You’ve got to think about that. They’re not just trying something different. They’re also rolling the dice again. So this is the philosophy of statistics. Every time you try something, there’s a chance it’ll work by chance. That ideahas to enter your head. I’m doing something and it might be because the physics or the biology or whatever works, but it also might be that I got lucky, because there’s a casino and it’s not the way you pull that handle down, it’s justthat that machine was going to win that time. So anytime you do an experiment with psychologist undergrads doing something, it could be just that the order that they came in, lined up with their heights, or whatever it is, there’salways some chance. So you’re rolling the dice with each one, but you’re also feeding them carrots versus feeding them crackers or doing whatever, and you might think it’s the carrots versus the crackers when it’s really just that timeit’s just the dice. It’s just the role of chance. So every time you change your experiment and you do something, it could be the thing you did, but it’s also the dice, and one of them works, but you rolled the dice a hundred times andyou published that one thing. See, nobody thinks I cheated, because they were just trying all sorts of different ideas, but they didn’t have embedded in their brain that chance is one of the possible explanations for any result I get,and can this result be replicated if we did it again? They’re just grateful to get it done and to get it published, to get it going on. They truly aren’t cheating. They aren’t defrauding anybody intentionally, but they really are.
Kirill Eremenko: 01:23:59
Yeah, okay. I think –
John Elder: 01:24:00
Because that result doesn’t work.
Kirill Eremenko: 01:24:03
I think I got it. So, their test is statistically significant given that the dice is rolled to the same number that it was rolled when they ran the test.
John Elder: 01:24:14
That’s right. They’d rolled a 20. They really rolled a 20 and they did something different from the other times. They painted it green, or they turned down the temperature, or they used their left hand. And it finally worked and theypublished it. But so would’ve changing nothing and rolling it, and rolling it, and rolling it, and rolling it.
Kirill Eremenko: 01:24:38
Yeah.
John Elder: 01:24:38
Or would it have? That’s what they have to test it against. What they have to test it against is, okay, you did it 31 times. What would it just rolling the dice 31 times… How many times would it have worked? What’s your trueprobability that it would have worked?
John Elder: 01:24:55
You would have gotten that level of a p-value with making no changes to your experiment just with the level of variation that you had in your data and the level of significance that you’re looking for. Not making any changes, but justthe number of experiments, number of tries, number of rolls of the dice you did. What would have been the chance? That’s the test you need to use.
Kirill Eremenko: 01:25:19
And that’s where your target shuffling comes in.
John Elder: 01:25:21
That’s where target shuffling comes in.
Kirill Eremenko: 01:25:22
Wow.
John Elder: 01:25:22
It says, “What is the real test?” Now that’s one level of target shuffling. There’s another level that’s even more crazy, that’s even more real for when you use machine learning to do your thing. It says, “Okay, now that you’vechanged your data, now you need to use your model to see what the model could extract from the data and you got to beat that.”
John Elder: 01:25:42
So, anyway, that’s deep target shuffling. So, anyway, but there’s more about that. We have a two minute animated video to explain target shuffling on our website, elderresearch.com. And that’s sometimes it gets the concept across betterthan I can, because some people like animated videos. But anyway, the concept. There was a cancer researcher, he and his crew, that had published this great paper and a company wanted to replicate the results.
John Elder: 01:26:15
And they couldn’t. They tried to follow what was in the paper. And this was reported on here in Charlottesville a few years back. And the company tried to follow the paper and they just couldn’t get the results. So, they got an NDA, anondisclosure agreement, with the professor and the students and flew them all to the company’s site, had them watch the experiment. And the professor said, “You know what? You guys did it all right. You did everything right. Don’tworry about it. It didn’t work for us the first six times either.” So, of course, there was no mention of that first six times in the journal article, but…
Kirill Eremenko: 01:26:55
Classic example of what you said of the dice rolling.
John Elder: 01:26:58
Yeah.
Kirill Eremenko: 01:26:58
Gosh. All right. Okay, got you. So, how does this target shuffling work in a nutshell?
John Elder: 01:27:06
Yeah, so target shuffling in a nutshell… So, you have a matrix of data and what you do is you detach the Y variable from everybody else. Here’s the characteristics of the data point. I use the example of you got a classroom of people.A bunch have answered a lot of personal questions about themselves. So, this is one person and all the questions they’ve answered about themselves, and here’s their score on the statistics test.
John Elder: 01:27:36
And so everybody has taken a test and they’ve answered a lot of questions. How many sisters they have, whether they were in the glee club, whether they played sports, whether they’re on the test team? All these sorts of things. So,which ones are predictive of whether they’re good at statistics or not? And so you’ve got this class of 30 people. Now what you do is you give everybody this data set and see if they can build a model to predict it.
John Elder: 01:27:58
But what you didn’t tell everyone is you’ve given everyone a different dataset where you’ve shuffled the Y values. Now when I’ve actually done this, and I’ve been cruel and done this, I give everyone where their answer is correct. Theyknow who they are, even though no names are there. They can identify their questions and they know about what they scored, but I shuffle everybody else. So, anyway, but I give everyone a different data set where the target value, theoutput variable, has been shuffled and reassigned to everybody else. So, everyone should have-
Kirill Eremenko: 01:28:33
So, complete garbage data set?
John Elder: 01:28:35
Complete garbage. Exactly. There should be no real relationship between the output variable and the input variables, but all the input variables are… If there’s any relationship in the input variables, it should be real, likesomeone’s pregnant. They should be female. That sort of thing. There should be no illegal input combinations, because they haven’t been changed. It’s just they’ve been assigned somebody else’s test grade.
Kirill Eremenko: 01:29:01
The targets.
John Elder: 01:29:02
Except the, target variables, except for one person. And that’s fine. So, any relationships that they actually find are from the null world, the world of the null hypothesis. They’re from the world of noise. There you’ve created auniverse where nullness rules. And you said, “That model comes from that universe and so that is an indication of what the distribution of noise models is, the distribution of models where nothing is real. Where does my model fit inthat distribution? Where does my model of real data fit in a distribution of models of not real?”
John Elder: 01:29:49
And there’s probably a model from the null world that’s better than my model. There’s probably a stronger relationship that was random. But am I in the 1%, the 10%, the 50%? If I’m in close to 50%, then my model is no better than therandom models and it’s pretty clear that I shouldn’t trust it. If my models up in the 5% or 1%, then that’s my true p-value.
Kirill Eremenko: 01:30:23
Oh, okay.
John Elder: 01:30:24
I mean actually my value is the true p-value. The value you get is what the p- value is supposed to be. See the question in statistics, the significance question, I should have said this at the beginning, is of type squared tests, Fsquared p-value tests, T tests, all the questions in statistics I finally realize is how likely could I have gotten this result by chance?
John Elder: 01:30:53
How likely is this? How interesting is this? How likely that a result better than this have occurred by chance? And so that’s what you’re physically measuring. You’re saying, “Oh, this is the distribution of results by chance and thisis my real distribution. What’s the area under the curve better than my real one?”
John Elder: 01:31:16
And that’s the p-value. That’s the real p-value. So, p-value is a measure of interestingness, but it’s not a probability. And what you’re doing is you’re calibrating this to find out where it translates to a probability. And that’lldiffer depending on how many experiments you did, how deep your data is, how powerful your data mining algorithm is. So, it depends on the particulars of your experimental situation. But just by running this experiment, you cancalibrate it and get an answer that’s universal, which is what p-value was supposed to be.
Kirill Eremenko: 01:31:49
Wow.
John Elder: 01:31:52
So, problem’s solved.
Kirill Eremenko: 01:31:54
Mind blown, mind blown. This should be really used across the board by researchers.
John Elder: 01:32:01
It should be used everywhere, everywhere.
Kirill Eremenko: 01:32:04
Yeah.
John Elder: 01:32:04
And I did invent it, but it’s probably out there. There’s some other things with similar names, or some other ideas, but I’ve been in this business for 30 years and I never heard anyone talk about it. I’d never heard anyone use it. Andso when I invented it and when I realized, when I saw what it was, I saw it was a cover article of the Economist that said, “What’s wrong with science?” Or how science has gone wrong, or something like, as I was going through theLondon airport.
John Elder: 01:32:36
And I was very intrigued by that and saw it. And it was a very good summary of this problem. And I actually didn’t realize how widespread the problem was, how there was a crisis in science. I see a lot of problems, but of course Ithought, “Well, I’m like a doctor. Of course, I see sick patients.” I’ve been called in to solve this problem. I didn’t realize that it was so pervasive that it’s in every field, that the majority of research was unreliable everywherein every field, medical, chemistry, engineering.
John Elder: 01:33:09
Every field of published papers, which is the highest standard. Peer reviewed published papers, much less conference work, everything. I didn’t realize the problem was… So, since I learned that, I’ve tried to speak on this in everyopportunity. So, thank you for giving me this opportunity, because the message is so important.
John Elder: 01:33:25
It’s like, “How many lives are lost? How many billions was lost in wasted effort, because of people going down blind paths? Because they think that something’s worthwhile, but it’s just a spurious correlation that was stumbled acrossand somebody thought it was real.” And if we can just share this technique, or things like that, that can separate the truth from not truth, we can focus our efforts on things that really matter.
Kirill Eremenko: 01:33:54
Yeah. And it’s like billions. It’s okay. But the lives. That’s doctors, medical research use this and they publish results. And that drug actually doesn’t do anything, and people drink it and take it, and whatever else, and in the end,it’s all wrong.
Kirill Eremenko: 01:34:14
So, it’s just such an important topic. Why do you think it’s so hard to get it out there? Is it because people… If researchers use methods like target shuffling in addition, or instead of p-value, then they’re less likely to publish aresearch paper and that means more work for them. Is that the reason?
John Elder: 01:34:36
Yes, I think there’s a huge hurdle, because… For instance, one of the first places I taught this, a young researcher came up to me and said, “You’re now making it harder to publish than ever. What have you done?” And it’s like,”Well, yeah, I think she’s right.” I think so. Absolutely I’m making it harder to publish than ever if people are still using the old method.
John Elder: 01:35:03
And then anyone who uses my method would be like tying one leg and one arm behind their back, competing against somebody using the other method. If you’re allowed to publish with a method that lets you get by cheating, then if you’reholding up to the rules, you won’t be able to compete unless you’re extraordinarily lucky or good.
John Elder: 01:35:26
But even if everybody was able to do it right, there’s going to be less publishing, because half the papers wouldn’t make it, right? Even if doing it right, some papers are going to get through that aren’t really publishable, becausethere’s always… Even when you’re measuring it right, there’s going to be an unknown fraction. Right now, they think things are calibrated in such way that one out of 20 papers are supposedly unreproducible when it’s really more like19 out of 20 are unreproducible.
John Elder: 01:36:00
That’s how bad the problem really is. I’ve talked about over half. Well, John Ioannidis of Stanford, who’s the most quoted doctor, the most referenced doctor of all time, and an expert on this problem with the crisis in science, I’m inthe same camp that he is and believe that the problem is in the 90% to 95% range in terms of papers. Not 50%. like 90% to 95% of the papers published are unreproducible. So, the problem is huge.
John Elder: 01:36:33
So, let’s just say it’s 90%. So, 10% of the papers that are published today probably should be published. So, once people are doing it right, the amount of publishing is going to go down a lot. So, there’s going to have to be other waysof measuring people’s productivity and so forth. So, it’s just going to be enormously painful transition, which is going to be fought at a lot of different levels. And so I’m kind of tilting at windmills, but there’s a lot of good workgoing on.
John Elder: 01:37:11
There’s some advances where people are realizing that it’s a problem and they’re trying to do some things about it. Some journals are… Half the problem is the business of science. And some journals are saying you have to publish yourdata to get published. Although that’s failing for the most part, but Science Magazine a few years ago, one of its awards they gave to a landmark study that was actually led by the University of Virginia just a mile away from me, abouta hundred different psychology papers were revisited by teams all over, who were reproducing the work from a few years earlier, the top psychology papers published a few years earlier were revisited.
John Elder: 01:37:56
And they were only able to reproduce 36 of the hundred papers. And I got an award, that big study got an award. So, the problem of reproducibility and giving people recognition for doing reproducibility studies is getting someattention. But what they need to do is they need to change these metrics for publishing and use techniques like target shuffling. And that is going to be a really tough battle. But, yeah, I’m all for it obviously. I’ll do everything Ican to help.
Kirill Eremenko: 01:38:33
We should start something like a fund for… There’s the Nobel Prize award and fund. Maybe there can be a fund for people doing reproducibility, so that can become a career in its own, right? So, you go and you police these researchpapers, and you try to reproduce them and then you debunk them. And that way eventually people will get the idea that, “Hey, don’t publish stuff unless you’ve tested it properly.”
John Elder: 01:39:04
Yeah. I guess there is already the Ig Nobel Prizes. But, no, that’s a great idea. That’s a great idea. Anything would help. And there’s just a vast number of scientists and they’re all under a lot of pressure to advance their career.And they’re doing the best they can. There’s very, very little intentional fraud. Although that does occasionally happen. And there’s a lot of this, like I said, trying a lot of things and not realizing that they’re not accounting forit properly.
Kirill Eremenko: 01:39:40
Fantastic. Got you. John, we’re running out of time. It’s such a wonderful conversation. I’d love to keep going. I’ll ask you one more question, which I’m really curious about. From your experience, 25 years in the field, and how it’srapidly, explosively changing right now, what do you think the future holds? What should people entering this field now, or who are already in this field, what should they prepare for so they’re ready for what’s coming in like three orfive years from now?
John Elder: 01:40:11
Oh, my gosh. You’d think as a data scientist, I would look at the future. It’s funny. Plan is a four letter word for me. So, right outside my window is a Lewis and Clark statute. So, Charlottesville was where they came from, Lewis andClark and Sacagawea.
Kirill Eremenko: 01:40:32
Oh, that one’s the three year trip, yeah?
John Elder: 01:40:35
Yeah.
Kirill Eremenko: 01:40:35
To the West and back.
John Elder: 01:40:36
And discovered through the West, going through St. Louis all the way to Portland I guess. But anyway, and back, and I think only one person died on the trip. They had appendicitis I think. So, it was a tremendously successful trip andwent through what could have been a lot of hostile tribes, but actually made very positive contact with them.
John Elder: 01:41:04
A lot of amazing things. Sacagawea, for instance, had been captured from one tribe to another. She spoke both languages. That was very helpful. The fact that she was along and also had a child along the way, it meant to some of thetribes that they met that they weren’t hostile, because why were they traveling with a woman and child if they were hostile? One of the chiefs they met turned out to be her brother. It was astonishing some of the things, so this wasenormous and it was a very scientific trip.
John Elder: 01:41:42
They brought back samples of… They met grizzly bears by the way, which on the east coast, no one had ever conceived of concept of a grizzly bear. And the grizzly bears were not stopped by rifles. The fact that they survived encounterswith grizzly bears is amazing. That really shook them up. But anyway, I use sometimes that trip as an example of they planned only so much. And after that, it was a whole lot of improvising. They had a lot of skills, they had a lot ofcourage, and the party actually was quite democratic and made decisions by consensus along the way.
John Elder: 01:42:27
And they had to build canoes and travel by rivers. They had to scale mountains. In 25 years of business, I had so many surprises along the way, so many positive and negative shocks to the system. And it’s like whatever three year orfive year plans I had just gotten blown to bits, but by having good people and paying attention to the customers, and doing good work, we survived and thrived, and rode the rapids and went over waterfalls we didn’t know were coming. Andfelt like we had our Lewis and Clark expedition.
John Elder: 01:43:10
We learned a lot along the way. We were well prepared. We were alert. And I laugh about planning like, “Oh, what good would planing be?” Luckily I have a colleague who’s really good at planning. And he hears me tell this story and hekind of pushes me aside. And he says, “Lewis and Clark took with them this and that, and things to trade with with people they would encounter, and then all this good equipment. They didn’t just like set out one day.”
John Elder: 01:43:43
So, it was really a mix between the two. And I think that the common theme has been I love learning about new things and the people that we get. There are some that are really good about learning about the greatest new thing. You heardme say, “I don’t know. Deep neural nets, that’s pretty cool, but I don’t know much about them.” But I got some friends who do and thank goodness, because you have to know about the new technology, but not everybody has to knoweverything.
John Elder: 01:44:12
But you have to have people that like doing that in their spare time. They actually like this. Like I said, with that earlier story about, “Hey, do what your boss said, but you also look into it in your own time.” If you’re not thatkind of person, you’re in the wrong field. Not that you spend all the extra hours all the time, but when you’re on something, you’re intrinsically interested in it and you want to solve that mystery. You like to solve puzzles. You liketo solve mysteries and you get a real high out of that discovery.
John Elder: 01:44:46
And, and also it’s an opportunity to serve people. This is such a fantastic field, because we go in there and say on anti-fraud work, we’re working with a government agency, we are tripling their productivity often. They’re working inthat system where things are siloed and we’re bringing data from all sorts of places. And we’re helping them with the same amount of effort. Triple the amount of fraud that they discover and prosecute. They’re getting awards.
John Elder: 01:45:19
But more than that, they’re getting fired up in their career again. They’re getting energized. They may have been there 20 years, but now they are fired up. They’re doing stuff they didn’t even think was possible. And they see youcoming, they’re excited to see you coming. And they’ve got new ideas that they want to try out. And so we’re continually having these very positive experiences. And so it’s a fantastic field to be in.
John Elder: 01:45:45
I don’t think about retirement. I think about maybe working fewer hours a day, but it’s just not really work. It’s fun. So, if you like it, you stay. You learn something that adds to the team. You don’t have to know everything, but youdo something that makes you valuable to the team and keep up with it, and pay attention to people and their needs, you’re going to thrive.
Kirill Eremenko: 01:46:21
Love it, love it. Thank you very much, John. Amazing words of wisdom and a great point to wrap this podcast up. I want to thank you. It’s been a huge pleasure having you on the show. And before you go, please tell us what’s the best wayfor people to find you and your work. We didn’t mention, but, of course, if you’re interested in consulting work, John and Elder Research is there for you. And also you’re hiring always talented people.
John Elder: 01:46:46
Yes.
Kirill Eremenko: 01:46:46
So, where can they find you and get in touch?
John Elder: 01:46:48
Yeah, so elderresearch.com is our web and my email’s elder@elderresearch.com. So, lots of elders in there. We’re not elderly care people, but not very creative in the name when we started. But, yeah, we’d love to hear from you. And withour clients, our goal is to be trusted partners, to pay attention to the needs that the client has, and give our honest advice, and be there when they need us, and not be there when they don’t need us. And be the ones that they think ofand trust to have their needs first.
Kirill Eremenko: 01:47:37
Amazing. Thank you. And it’s okay for people to connect with you on LinkedIn as well?
John Elder: 01:47:40
Absolutely. I’d be honored.
Kirill Eremenko: 01:47:42
Wonderful.
John Elder: 01:47:43
Thank you, Kirill.
Kirill Eremenko: 01:47:44
All right. John, it was a pleasure. Amazing. Thank you so much.
John Elder: 01:47:47
Thank you. Take care.
Kirill Eremenko: 01:47:54
Thank you everybody for being here and sharing this time with us. I hope you enjoyed this episode. So many cool insight and so many amazing jokes by John along the way. I had a fantastic, fabulous time talking with John. My favoriteparts… It’s just so much to choose from. From campfire tales and data science to ensemble methods to neural networks and p-values.
Kirill Eremenko: 01:48:21
I’m going to probably pick the p-values and target shuffling, because I have often wondered why are p-values failing us? Why they’re not good in terms of… When the longer they’re used practical for research purposes, why are so manyresearch papers not reproducible? John explained it beautifully with his analogy of rolling a dice and changing your experiment at the same time. Finally, I was able to understand it. And the whole target shuffling method, we’ll put alink to the animated video. We’ll put a link to that in the show notes.
Kirill Eremenko: 01:48:59
I’d highly recommend checking it out. I’ve watched it several times now. Very, very powerful method. And if you can incorporate that in your data science project and especially in your research, let’s make the world a better place.Let’s include a method. If you know another method, please find that, but otherwise try including target shuffling in your research and validate it that way rather than just with a p-value.
Kirill Eremenko: 01:49:23
So, there we go. That was our podcast with John Elder. As usual, you can get the show notes at www.superdatascience.com/391. That’s www.superdatascience.com/391. There you’ll get the animated video a link to the animated video, the URL toJohn’s LinkedIn. Make sure to connect. A link to elderresearch.com, which is quite easy to find as well. If you’re looking for a job, or for a super experienced consulting company to help you out with your projects, then check it out.That’s elderresearch.com.
Kirill Eremenko: 01:49:57
Plus you’ll get the transcript for this episode and the video version of this episode also available at that link. If you enjoyed this episode, if you got a lot out of it, please share it with somebody that you know who is excited aboutdata science, who is as excited about data science as you are and wants to learn. And or if you know somebody in research, send them this episode. Again, let’s make the world a better place. Let’s move on from p-values and find andapply better methods to validating our research. It’s always very easy to share. Just send the link, www.superdatascience.com/391. On that note, I hope you enjoyed this episode and I look forward to seeing you back here next time. Untilthen, happy analyzing.
Show All

Share on

Related Podcasts