Welcome to episode #175 of the Super Data Science Podcast. Here we go!
In this episode, I talk to Gregory about his journey to data science, how KDnuggets started, why you should start honing your machine learning engineering skills at this very moment, what’s the future of data science, reinforcement learning, and a lot more.
About Gregory Piatetsky-Shapiro
Gregory Piatetsky-Shapiro is the President of KDnuggets, an online platform for AI, Big Data, Data Analytics, Data Mining and Data Science. He is also the co-founder of Knowledge Discovery in Database (KDD) conferences and SIGKDD.
We spent the first 15 minutes of the episode talking about Gregory’s musings and passions which led him to establish KDnuggets, one of the well-known media platforms in the world of data science. Who knew that his fascination with science fiction and robots would ignite a fire to learn everything about computers? From writing a program to play battleships, to organizing his KDD workshops, and to finally creating KDnuggets News.
If you want to find the right inspiration from data science experts like Gregory, then DataScienceGO 2018 is a can't miss! Join us in October and take your career forward…
KDnuggets has over 200,000 subscribers and followers and its website has 500,000 visitors a month. It started as an e-newsletter sent out to about 50 attendees of KDD-93 workshop. For 25 years, it has been providing the most sought knowledge about data science, analytics, and machine learning among others.
When asked on why he pursue it? Gregory mentioned that he loves the process of exploring data, the analysis, and the visualization. Curiosity is an essential trait for data scientists. Discover the best way to analyse data and present it.
Gregory shares that Data Scientists must look into the Machine Learning Engineer profession. In a KDnuggets poll, Machine Learning Engineers had the highest job satisfaction. Researcher/Professor and Data Scientist/Statistician coming in 2nd and 3rd respectively. Lately, he has also been seeing the trend of demands for their skills.
As data scientists, we try to predict the future. When Gregory was asked about the latest trends, he says that data science itself is becoming automated. There’s an AI hype and we have to be wary on how it progresses fast. Gregory also opens on the issue of the EU General Data Protection Regulation (GDPR) and how it impacts the data processes of the EU citizens. GDPR could push companies out.
One of the highlights of this episode was Gregory discussing on how reinforcement learning could be the key gamechanger on the progress of AI. Reinforcement learning concerns the reward-oriented performance of software agents in an environment. It is a key component of AlphaGo Zero, the computer program that started with zero knowledge of the game Go and after several hours of self-play learned Go so well that it defeated its previous version, called AlphaGo, which used a lot of human knowledge to defeat the human Go world champion Lee Sedol.
Gregory could help you situate where the trend is going. Should you step on that brake pedals on the way how data science, machine learning, and AI is progressing? Or should we step on the accelerator fully?
In this episode you will learn:
- How did KDnuggets start? (02:12)
- How frequent are the blog posts on KDnuggets? (14:43)
- Machine Learning Engineer profession might take on the Data Scientist profession. (18:27)
- What is a recent win in Gregory’s career? (21:20)
- Curiosity is an essential trait for a data scientist. (30:00)
- Where is the future of data science and analytics going? (32:48)
- Data Science is becoming part of the ML and AI field. (33:30)
- GDPR could affect the growth of data science. (34:20)
- The skepticism on the concept of “citizen data scientist.” (35:46)
- Data Science could be fully automated in the very near future. (37:30)
- Data Science vs. Reinforcement Learning. (38:30)
Items mentioned in this podcast:
- Blog: Machine Learning Engineer, Researcher, Data Scientist have the highest job satisfaction by Gregory Piatetsky, KDnuggets
- Blog: Will GDPR Make Machine Learning Illegal? by Gregory Piatetsk, KDnuggets
- Blog: Data Science in 30 minutes, Artificial General Intelligence, and Answers to your Questions by Gregory Piatetsk, KDnuggets
- Blog: The Mirage of a Citizen Data Scientists by Gregory Piatetsky, KDnuggets
- Blog: Which Data Profession Has The Highest Job Satisfaction?, by Gregory Piatetsky, KDnuggets
- Blog: Top May Stories: KDnuggets Poll: Software for Analytics, Data Science, Machine Learning; How to Learn Machine Learning in 10 Days by Gregory Piatetsky, KDnuggets
- Blog: Cartoon: Citizen Data Scientist At Work by Gregory Piatetsky, KDnuggets
- Blog: Excerpt from The Journey of Knowledge Discovery by Gregory Piatetsky-Shapiro
- Blog: An Introduction to Monte Carlo Tree Search by Gregory Piatetsky-Shapiro
- Blog: Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning by Gregory Piatetsky, KDnuggets
- Blog: Data Science in 30 minutes, Artificial General Intelligence, and Answers to your Questions by Gregory Piatetsky, KDnuggets
- Blog: 10 Free Must-Read Books for Machine Learning and Data Science by Gregory Piatetsky, KDnuggets
- Blog: 10 More Free Must-Read Books for Machine Learning and Data Science by Gregory Piatetsky, KDnuggets
- Blog: Top 10 Essential Books for the Data Enthusiast by Gregory Piatetsky, KDnuggets
- Blog: Top 8 Free Must-Read Books on Deep Learning by Gregory Piatetsky, KDnuggets
- Journeys to Data Mining: Experiences from 15 Renowned Researchers, Mohamed Medhat Gaber (Editor)
- Deep Learning (Adaptive Computation and Machine Learning) by Goodfellow, Bengio, and Courville
- Machine Learning Yearning by Andrew Ng
- Confident Data Skills by Kirill Eremenko
- EU General Data Protection Regulation (GDPR)
Kirill Eremenko: This is episode number 175 with President and Editor at KDnuggets, Gregory Piatetsky-Shapiro.
Welcome to the Super Data Science Podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. Each week, we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now let's make the complex simple.
Welcome back to the Super Data Science Podcast, ladies and gentlemen. Today, I've got a very exciting guest for you on the show, the legendary Gregory Piatetsky-Shapiro, who is the founder of KDnuggets, is joining us. I actually met Gregory quite a while ago. It was over a year ago in May 2017 at the ODSC Conference where we chatted and I invited him to the podcast, but it took this long for us to organize everything, and now he's finally come on the show. If you don't know who Gregory is, then this will just put things into perspective for you. KDnuggets is one of the most popular data science resources out there.
Write accurate news on data science, they provide their own articles, they conduct polls on data science, and many, many more exciting things in the space of data science. They've been around since 1997. Here's another perspective for you, Gregory has 256,000 followers on LinkedIn, so that should just tell you of what kind of an influencer in the space of data science Gregory is, and how much he's actually contributed to the community, how many things he's given back to the space. Today, we are with welcoming him on the show.
In today's podcast, what will we be talking about? Today, we're going to cover off quite a few topics. Of course, we'll go through the foundations of KDnuggets. A very exciting, very interesting story of how it all started, where Gregory began his journey into the space and what KDnuggets has grown into, but also we will cover off some of the more recent advances that have been happening in the space of data science that KDnuggets has been highlighting or has been participating in.
For instance, we'll talk about the whole concept of data science being the sexiest professional in 21st century and what has it turned into now, and what role is machine learning playing in there? We'll also talk about what the new GDPR regulations in Europe mean for data scientists. The Global Data Protection Regulation, it came into play in Europe earlier this year. We'll also talk about GDPR, the new European Data Protection Regulation which came into play earlier this year. It's one of the first changes in decades in the European Data Protection Regulations.
We'll talk about the concept of citizen and data scientist. We'll talk about reinforcement learning, and quite a lot of other very exciting things as you can imagine Gregory has seized all these new updates in the space of data science on a daily basis. He is the editor for KDnuggets, so all these articles that you're seeing on KDnuggets actually go through him, and today he's sharing his best and most exciting insights with us.
All in all, a very exciting episode full of most recent technology core advancements and interesting stories on how this all came to be. Can't wait for you to check it out, so let's dive straight into it, and without further ado, I bring to you, Gregory Piatetsky-Shapiro, founder and editor at KDnuggets.
Welcome, ladies and gentlemen to the Super Data Science Podcast. Today, I've got a very exciting guest, Gregory Piatetsky-Shapiro on the phone. Gregory, welcome to the show. How are you today?
Gregory P. S.: Thank you, Kirill. I'm excited to be here. It's a pleasure to be on your podcast.
Kirill Eremenko: It's so wonderful to have you. We met in May, 2,000, what was it? 17? No, I think May 2,000, yeah, 17. Last year in May, and it's been over a year, and I've been wanting to get you on the show for a year now, and finally we're here. This is super, super, exciting. Gregory, where are you located right now?
Gregory P. S.: I am in Boston, Massachusetts.
Kirill Eremenko: Is that-?
Gregory P. S.: Actually, I'm-
Kirill Eremenko: Yep.
Gregory P. S.: ... working at home, so we have beautiful sunny weather, and all of my cats, I think, are outside. As a data scientist in the daytime, I do have the cats, but hopefully they don't interfere in the middle of this conversation.
Kirill Eremenko: Fantastic. Yeah, I was just about to ask that. That is your home base, Boston. Is that correct?
Gregory P. S.: Yes.
Kirill Eremenko: Wonderful. It's so great to hear that you've got sunny weather in Boston today. Last time I was there, it was in May last year, it was surprisingly chilly. Yeah, so it's good to hear that the weather is nice today. All right, so let's dive straight into the podcast. Gregory, you are the Founder and Director or President and Editor of KDnuggets, a very popular data science media outlet and news aggregator and a platform that shares research about data science. You've been running this platform for 21 years now. Tell us a little bit about how it all started. Where did this idea come from?
Gregory P. S.: Yes, thank you. Probably, I started when I was a kid, I was very fascinated by science fiction, and I loved stories about robots, especially, from Isaac Asimov and other writers like Stanislaw Lem and [inaudible 00:06:39] that was known in the Western. I was always curious about the idea of AI, and this probably motivated me to learn computers when they first year appeared. In my first year in college when computers were still programmed with punch cards, I remember spending several weeks of my free time in the summer, writing a program to play battleships, which was still a very advanced program for that period. And then I used APL. That was a special language developed by IBM. It's A Programming Language, and it had special symbols for every different array operation.
You can think of it as like R but with Greek letters. After spending several weeks programming it, I played one game and I was very soundly defeated by my own program. I think as a result, I become much more interested in creating programs than playing them. I did my undergraduate, I studied for undergraduate degree in Mathematics, then I came to United States to study computer science at NYU, and I got my PhD in Applied Machine Learning to Databases. I think the idea was a self-organizing database system that automatically selects different indices and does something intelligent.
Then I worked at GTE as a researcher. GTE was a large telephone company in United States. Now, it is part of Verizon, which is even a larger telecom company. I remember around 1986 or so, I attended a workshop, which was called Expert Database System. That was a very interesting name, but the concept was very fuzzy, and the workshop paper and talks were all over the place. I thought we could focus on something more clearly defined, analyzing databases and finding interesting patterns. In one of our projects that we did on applying some intelligent to figure out databases, and I discovered that a particular query would run 10,000 faster if we knew that there was a particular rule, that kind of functional constraint that always existed. There were some over-supplication. Can you find some useful rules in databases?
I was, at that time, young, energetic and naïve, and I thought that I could organize a better workshop. At that time, a popular term was data mining. It's interesting to note just as an aside how the terminology changes and reflects the time. It went from data fishing and data dredging, which were bad times, and data mining became second popular term. Now, the popular term is data science or maybe until last year. Now, it's machine learning and artificial intelligence, but in any case, so I organized a workshop. I thought data mining was not sexy enough, so I came up with the name Knowledge Discovery in Data or KDD. That was the first workshop back in 1981, which attracted, I think, about 70 people including several leading researchers.
Kirill Eremenko: Wow.
Gregory P. S.: Then I organized a couple of more workshop, and later in 1994, one of my best ideas was to stop doing it myself and to recruit Usama Fayyad, who was then just a fresh PhD from Ann Arbor. His advisor Ramasamy Uthurusamy, was then a researcher general modest, and they agreed to run '94 workshop. Then, next year, that workshop went into a conference, and later with the help of Won Kim, who is the chair of KDD that SIGMOD. He was very experienced with ACM that's a leading professional organization, Association for Computing Machinery. We created a special interest group, SIGKDD, that was running KDD conference, and they're still running until today.
I think we've had about that 22 KDD conferences since then. I'm very pleased to say that KDD remains the leading research conference in the field based on citations and other indices. Now, I can stand back after many years of organizing [inaudible 00:11:50] like a grandparent, enjoy the baby doing really well. That was kind of one track of my activity.
How did I get to where I am? After third KDD workshop, I decided to send a newsletter to people who attended the workshop, and I called it the Knowledge Discover Nuggets. The first issue, which is still online, went to, I think, about 50 people, who attended that workshop. Now, it's almost 25 years, actually 25 years [inaudible 00:12:32] so KDnuggets has about 200,000 subscribers and followers that was emailed with the Facebook, LinkedIn, and our website gets about 500,000 a month.
Kirill Eremenko: Wow, congratulations. That's huge.
Gregory P. S.: Big goals. Thank you. But we're focusing on analytics, data science, and machine learning. If I try to talk to my people you realize that as a data scientist at heart, I just tried to select a few interesting things to write about or select things on the web that we can publish. I guess that was a second track in my career.
In parallel, when organizing conferences and publishing newsletter was not a full-time activity, and also in all the conference organizing that I've done was always as a volunteer [inaudible 00:13:43] was very received any payment for it, but probably was one of the more rewarding things that I've done because I enjoyed doing it with interesting people and helping to put good things together. But another interesting thing that I've been doing in terms of research and data mining involved consulting and being enrolled in the world of startups. In 1997, which was still very part of the Dot-com Rush 00:14:23], I left the GTE research lab and I joined the startup that was doing analytics data mining consulting for financial industry, mainly banks and insurance companies.
We worked with the largest names like Credit Suisse, Chase Manhattan, Citibank. I was a chief scientist, and managed a small team of perhaps about 10 people. Then around 2000, our smaller startup was bought by a big startup. For a very short period of time, the value of the big startup exceeded $1 billion.
Kirill Eremenko: Wow.
Gregory P. S.: It became the wanted unicorn, but before anyone, including me, could do anything foolish with the stock options, the larger startup's stock crashed almost all the way down to zero. I left it 2001. I think maybe couple of months before that stock went all the way to zero. I was self-employed since about 2001, mainly publishing KDnuggets and doing consulting and data mining.
I think one interesting question for all the younger people listening is synergy. In my case, I've done this three parallel and mutually supporting activities as a research and consult and data mining, and as a founder and chair of KDD conferences, or Publishing Editor of KDnuggets news and website. In each one of those activities was in some way helping the other. I know, Kirill, that you're also teaching courses and you have a very nice book, Confident Data Skills.
Kirill Eremenko: Thank you, yes.
Gregory P. S.: And probably doing other things. I guess probably, helpful suggestion for young people that try to do interesting things is to think is there a synergy with this activity with some other [inaudible 00:16:42] if there is not, then maybe it's not the best thing to do. The very synergy, it generally helps you to succeed. Just to finish in this, in the last few years, I think, maybe writing the big data and data science, which KDnuggets became so popular that I stopped that I stopped consulting them. Now, I only publish KDnuggets, and we have another excellent full-time idea [inaudible 00:17:17] based in Canada. We have several interns based in London and other places. KDnuggets is global in its reach.
Kirill Eremenko: Gotcha. Wow, that's such an interesting career, and I love that you mentioned that wonderful takeaway for your career [inaudible 00:17:41] about synergy. I can totally agree with that that when you're working on A and B, you should be aiming to make sure that A plus B is more than just A plus B. It's A plus B plus an extra value. So it's not one plus one equal two. If you truly have a synergy in the things that you're working on, one plus one equals three or four or five, because they complement each other, and they help your audience, and they help you propel your career forward. That's a very interesting takeaway, and definitely, I can agree that looking back unconsciously, I've probably done that. I can see I've done that in my own career, but that was always unconscious. That was just like a gut feel, but if you think about it consciously, I think you can make much faster progress in the things that you're doing and how you're going foreword.
Thank you, and it's really exciting to hear that KDnuggets has got so many followers, 200,000 subscribers and 500,000 visitors per month. That is truly astonishing numbers. You mentioned that you select those blog posts. How many blog posts do you publish on KDnuggets? How frequently do they come out?
Gregory P. S.: Well, we publish every weekday, and we try to select maybe two or three interesting blog posts a day. Now, we get a lot of submissions. Occasionally, myself and Matthew [May 00:19:14] we also write our own editorial pieces, and if we see some interesting blog posts around the web, then we'll also ask the authors for reposting those as guest blogs on KDnuggets, but there's so much stuff on the web that we try to select only a small number, maybe two or three per day.
Kirill Eremenko: That's quite a lot as well. Already, that makes it 10 or 15 or more per week. How do you find the time to go through all of them? You probably get a ton of submissions sent to you. How many submissions do you get, just out of curiosity?
Gregory P. S.: Well, it's hard to say, but I think we probably get something like three to five submissions per day, not a very large number because we have clear guidelines, and we're also focus on more technical solutions. Our audience is mainly data scientists, and machine learning engineers, so we'll not publish something like why your business should use data science. I assume our readers already know, but we would publish something that explains how to create a pipeline in Python or some ideas how to use Python [inaudible 00:20:38] or maybe some interesting polls that I run every month or so. There're some interesting observations like our recent poll, most popular annual poll on what is the software that you use?
I've been running this poll, actually, since 2001, amazingly.
Kirill Eremenko: Wow.
Gregory P. S.: Yeah, this is the 19th such poll. Now, the latest poll is out to show that there is kind of a clear ecosystem emerging around Python, Spark, Anaconda and TensorFlow. Now, it's becoming this integral part of data science tool box. Python seems to have more significant [inaudible 00:21:31] ahead of R. There're a lot more tools that use Python than R. There are some other interesting observations that your readers can see on KDnuggets.
Kirill Eremenko: Wonderful. Is it just like on the main page of the blog or is there a specific page for all these insights? 'Cause I-
Gregory P. S.: Well, on the main menu, we have a section called top stories, and if you scroll there, then you will find more interesting things.
Kirill Eremenko: That's so cool.
Gregory P. S.: Yeah, being data scientist, we always analyze the results, so we always like to see what's more popular, publish separate posts with just the top stories.
Kirill Eremenko: Gotcha. Wow, this is really cool. I'm on the page right now, and I highly recommend for people to check it out. It's KDnuggets.com, and then you can, at the top, find top stories and look through those. All right, well, that's really interesting, very powerful insight. Actually, before today's podcast, I was reading your most recent blog about why data science is no longer the sexiest profession of the 21st century, even though it's still satisfactory, there's a new profession that is the sexiest. Do you mind sharing a little bit on that with us?
Gregory P. S.: Sure. Recently done a poll of our readers, and I think we asked them basically, "What's your title and how satisfied are you?" [inaudible 00:23:05] very satisfied, which we converted to +2, to very unsatisfied, which we converted to -2, and surprisingly, the profession with the highest job satisfied was machine learning engineer, which, well, and as a researcher I have to say that the average satisfaction was like 0.7, and the standard deviation was around 1.0, so it's not like all the machine learning engineers were highly satisfied. There was still a lot of unsatisfied ones, but on average, I think there was a significant difference between the job satisfaction for this profession, machine learning engineer, and the second and third place, which were researcher and data scientist.
Data scientist is still the most common job title. I see that on the web and [inaudible 00:24:10] and on job [inaudible 00:24:12] to get on KDnuggets, but kind of there is more coming, more requests, more demand, coming for people with machine learning engineer skill. I guess a difference I would describe as machine learning engineer is building machine learning systems, probably they now use deep learning, and data scientist perhaps do more work on analyzing and then trying to understand what is happening with companies, not necessarily building production systems.
Kirill Eremenko: Gotcha. Very interesting. That's a little hint, I guess, to our listeners. If you're looking for the new data scientists of the job that's coming to take on the data scientists, it might be machine learning engineer. Very interesting. Thank you for that. All right, so I wanted to ask you a couple of questions. You've obviously had a very diverse and interesting, like a career filled with lots of different roles and different engagements, and different things that you've worked on, that you've done, I just wanted to find out some of the highlights. What is a recent win that you share with us? Something that you've had [inaudible 00:25:43]?
Gregory P. S.: I will mention maybe a couple of interesting things, maybe they're not as recent but it's still very instructed. I think one of the most interesting project that I worked on when I was still at GTE Laboratory was called Key Findings Reporter, for which we're called KEFIR. It was a system for analysis and summarization of key changes in large databases, and we applied it to healthcare data. Healthcare in United States is a scandal and also very, very expensive. I think we spend here twice as much as other industrialized countries. We got data with no better results, and trying to understand where all that money goes is an essential part of the equation.
Our system automatically analyzed changes in [inaudible 00:26:47] variables and it selected the important ones, and it was combined with the small and for a system to add recommendations at what to do about the changes. Like for example, if you have particular type of medical problem, then the expert system will recommend how to solve it. It presented visualization and it looked at changes in trends. One good way to identify what changes are more important is always look at changes. For example, if you just look at the associations, you can find a huge number of significant associations in data. How do you fill the important ones. You'll look at ones that change over time. What is true this period and was not true in the previous period.
It was all combined in one very nice system, and it was applied to all our GTE healthcare data and it identified some significant potential savings. We did win Highest Technical award from GT. Unfortunately, I guess I would still regard it as a failure because the system was not deployed.
Kirill Eremenko: Why is that?
Gregory P. S.: Probably that's connected to another question we discussed. What's the most thing to do? I think the most difficult challenge in my work as data scientist was getting the results deployed because that requires change in organizational culture and support from the top. In case of if the system was acting technically but there was no place in the organization. It was not clear who would us it, how it would affect them, the work of people who were analyzing healthcare data. That is probably the fate of many data science projects. You can easily build the great prototype but unless there is a clear way to deployment and support from the organization, it is still a failure.
Kirill Eremenko: I see.
Gregory P. S.: That's, I guess, another interesting story I can say, I worked in many different projects. Probably the ones I enjoyed the most was working on bioinformatics data. I had one project where we worked with a mass spectrometry data trying to develop early indicators of Alzheimer. The problem with analyzing biological data is you have a huge number of variables. You could have 20,000 different compounds, but you don't have a large number of patients. Typically you could get meeting several 100 patients. Imagine you have 100 trackers and poly trackers, you have applied 2,000 variables, it creates very significant problems in determining what's significant and what is just random noise. In that particular case, we did discover very strong biomarkers, but they were 100% accurate. There was, I think, quite dozen of them.
One of them actually had biological significance because it was like vitamin C, so our initial results suggested that people who had more vitamin C were likely to get Alzheimer. Even though my intuition is that the scientists told me, "Beware of perfect results." This was [inaudible 00:30:46] it was 100% correct, so it doesn't matter how you put in the data, if it's 100% correct, it will still be 100% correct. Myself and my friends, we all started to drink more orange juice and vitamin C, but were still skeptical about the results. The only way to test them was to get another population. We did that and we found that probably the original data was contaminated in some form. I guess don't trust the results if they're too good. That could be a useful lesson.
Probably, the most success that I had in my career of data mining [inaudible 00:31:41] was when we had to help organizations make some strategic decisions. We would examine whether they should use this particular strategy or that one. Some of those work was deployed but as they consulted, they cannot tell you unfortunately the details but I know that there were kind of pay-off of, I think, seven digits based on our results, but those results were easy to deploy because it was like do this decision A or decision B to get the required change in the entire organization structure.
Kirill Eremenko: Gotcha. Thank you very much. That's interesting. We just talked about the wins and the challenges, and I appreciate you sharing your experience. It's sometimes difficult to share experiences, especially if it's a project like the one you're working on for the Key Findings Reporter, where you're working on it for a long time and you're really proud of the results but it's not deployed, but it is a great example for our listeners, especially for those starting out of some challenges that they might come across. In this case the takeaway is that even if your project is great and you see that it's got a lot of value, the situation might occur in such a way that it might not be deployed in the end, and that shouldn't ... Of course, you should do as much as you can in order to avoid the situation, but if it does happen, then don't let it bring you down. It sometimes happen and even to the best people in the industry.
Also, the other example is also great where the results are too good. Even in data science, sometimes intuition plays an important role. Like you said, when the results for that vitamin C example were too good, your intuition was saying that don't trust the results [inaudible 00:33:50] I think it's also a good thing to look out for if your results are too good to be true, then find another place to check them, verify them, and make sure that the test or the example is repeatable.
All right, so we talked about something that's the wins and we talked about. How about what is your one most favorite thing about being a data scientist? What's the one most favorite thing that's kept you going through this career for more than 20 years?
Gregory P. S.: Well, I really enjoy the process of exploratory data analysis and visualization. Analyzing the data running, the data algorithms, what does the data review? It's like discovery of new and unknown realms. I think curiosity is an essential trait for a good data scientist. Along with discovering something, now I try to see what's the best way to visualize and present it. Especially, for example, if I'm looking at data for recently the Nuggets posts, there're many ways to organize it and thinking of what is a good story that the data sells and what is a good image that is worth a 1,000 word in a story. Generally, I think probably the most useful thing to read, and I think when I read a study somewhere that confirmed it is the captions on images.
If a picture is worth 1,000 words, then a good caption on that image may be worth 10,000 words. Think of how to present the data, present the story and visualize it and describe the image that you just presented.
Kirill Eremenko: Thank you. It's definitely one of my favorite parts as well of data science. Well, Gregory, I know that you will need to go very soon, so I wanna really jump to the part where I'm very curious, as you said, an important part of being a data scientist is curiosity. I'm very curious to get your answer to the following question. It's a philosophical question, one I ask very often in the podcast almost every time. I always get different answers. Different people have different perspectives. The reason by I'm so curious to get your perspective on this is because of the amount of experience you have in the field, your worldview and how it's developed overtime. On top of that, you just interact with so many people, over hundreds of thousands of followers, you influence them, you reply to their comments on KDnuggets, you get these emails, you have aggregated so much, such a wealth of information in the space.
Here it goes. From all this experience, from everything you've seen in the field of data science, where do think the field of data science and analytics is going, and what should our listeners prepared for to be ready for the future that's coming?
Gregory P. S.: Thank you, Kirill. I think that's a great question. I guess as data scientists, we should always try to predict the future, and as data scientists with a lot of experience, I can say that we're not very good at predicting human trends, but I'll try nevertheless.
Kirill Eremenko: All right.
Gregory P. S.: What I see now is data science is becoming part of a larger machine learning and AI field, which is really progressing very fast. Capabilities especially in deep learning are growing at amazing rate, like every day we see some really amazing stuff, like this recent Google Duplex Demonstration, where they had completely human quality calls with unsuspecting humans, but I think AI hype is growing even faster than AI capabilities, so beware of the hype, I guess that could be one warning.
Another interesting trend that I'm watching is what's being called citizen data scientist. I think this term was introduced by Gartner a couple of years ago, and the idea was to also become so good that any citizen can use the them and do data science. I have been very skeptical of citizen data scientist. I think do you want a citizen dentist to work on your teeth or a citizen pilot to fly your airplane, probably not. I think data science can either be fully automated, and this was a direction taken by companies like DataRobot, H2O and others that offer kind of full automated solutions, or you can have physicians that require training and expertise in data science and kind of having people with no training who use tools that are semi-automated. I think it's very dangerous because you can easily make blunt conclusions just think of my example with vitamin C and Alzheimer, which citizens data scientists will say that was correct results but would lack training and intuition to warn where they're going into a wrong direction.
Now, I think there's a golden age for data science. There're amazing tools that allow one person to do what hundreds of people could not do 10 years ago, but data science as most data-driven activities with some relatively clear rules and goals is also becoming automated. We had a poll recently on KDnuggets that asked readers when data science will be automated, and the median answer was 2025. For our data science listeners enjoy this great period but beware of coming automation. In terms of the future trends, of course, [WebID 00:42:30] has heard many times about deep learning. Another important technology that I think now is coming into forefront is reinforcement learning, and especially Deep reinforcement learning.
Data science involves really from data that has already been recorded, kind of learning from the past, whereas reinforcement learning is applied to agents that are active in their work, data experiments, and can learn from their experiments. This was the keep summaries and successes like AlphaGo that defeated the world champion in Go by essentially learning this by playing with itself. If I can make one more interesting observation about the future, so this AlphaGo was developed initially from learning with human masters or experts in Go, and later, people at DeepMind developed a more general version which they called Alpha Zero, Zero to indicate that it started with zero human knowledge, essentially just with itself using reinforcement learning and deep learning. It achieved in about four hours, the super human level in chess. That was very disappointing for me as a former chess player.
It took it, I think, three days to achieve that superhuman level in Go. This Alpha Zero version played strongest chess player in the world. It's no longer human. I think the strongest chess player in the world is now a computer. I think they've had a program called Stockfish which was programmed old fashioned style with [inaudible 00:44:38] human opponents and [inaudible 00:44:40] millions of physicians [inaudible 00:44:43] and when Alpha Zero played the Stockfish, it defeated it something like 10 to 0.
I looked at some of the games, and it made completely inhuman moves. I don't know Go but I do know chess, so I could appreciate how amazing those moves were. Humans would make and we'd call them, amazing examples of human intuition and creativity, but I think somebody describe it, "It would be like aliens landed on earth and they learned to play chess." I guess they're kind of looking forward, this give us a sneak preview into artificial general intelligence. I've got no idea when next it will be achieved but people who will interact with it will probably be hard pressed to understand why does it do what it does. That is experience of chess masters looking at how the superhuman Alpha Zero works.
It has a completely different intuition, and people who understand Go report similar things, that it plays completely different way that humans have never even thought about, not always, and then they're still moves that humans can understand, but occasionally does it completely superhuman move. That's kind of for a preview of small window into artificial general intelligence.
Kirill Eremenko: Well, fantastic. Thank you very much. I noticed that you have a blog post about this as well, which is very exciting, so if there're any chess players listening or even if you're just interested in artificial intelligence, Gregory has got a blog post about data science in 30 minutes, artificial general intelligence and answers to your questions, so you can read more about this, and I'm definitely curious about this. I've gonna jump onto this and check it out, 'cause I'm also a chess player myself. It's a very good lens to put it in. I've heard about the developments of Google DeepMind in the game of Go and AlphaGo Zero, and how it was able to win with a huge advantage.
In the same way, I don't play Go. I'm not a Go player so it's quite hard to relate, but with this chess situation, I definitely would like to know a bit more about that inhuman move with the knight and things like that. I'll have a look at that. Thank you so much for sharing. Yeah, it's definitely an interesting area, and of course, I'd like to also, just recap on the things that you mentioned about the trends. I knew this was a good question. Gregory, you were a great person to answer the question, and you did give us so many tips, so ladies and gentlemen, listen to this podcast, here are some takeaways from Gregory's answer to our question, what to prepare for the future. AI capabilities are growing, and machine learning as well, but beware of the AI hype.
GDPR, so look at that, the European Data Protection Direction which into action May 25th this year. Does it make machine learning illegal or not? There's a blog post on KDnuggets about that as well. Seasoned data scientists, that's a concept that was introduced by Gartner, but is it really a good thing or is it actually something that sounds good but it actually might cause more problems if people don't really know what they're doing? How is that related to automation of data science, things that companies like DataRobot and H2O are looking into.
Then the fourth thing was data science and automation. You're moving on from that. You had a poll that asked your readers and the median answer was 2025. That's when data science will be fully automated, so something to look into as well, and keep following the trends on KDnuggets to see how that changes, and if it does, and finally a new addition into this whole mix of AI, deep learning, machine learning and data science is reinforcement learning. It's picking up more and more these days, so another important technology to look out for in the future.
Gregory, all I can say is a huge thank you. I know we've gone a bit over the available time you had. Before you go, could you please let our listeners know how they can contact you, find you, follow you, get in touch, or just learn all these amazing things that you're sharing with the world?
Gregory P. S.: Well, thank you Kirill. Well, our listeners can find website KDnuggets. They can contact me by email, editor1, the "editor" followed by digit 1, [email protected], or tweet to @KDnuggets, and they can also like our Facebook, KDnuggets, or join our KDnuggets LinkedIn group. Welcome reader's comments, submissions or blogs. We always look for good technical submissions. As I mentioned we publish two, three blogs per day, although currently I have to say we already scheduled all the blogs until July 2nd, but good blogs will certainly get published.
Kirill Eremenko: Gotcha.
Gregory P. S.: Kirill, thank you very much. I enjoyed the discussion, and hope to see you again at another conference somewhere.
Kirill Eremenko: Thank you. Thank you very much, Gregory. Very lovely having you on the show, and I do also hope we'll catch up soon.
There you have it. That was Gregory Piatetsky-Shapiro, and all of his amazing and exciting and insightful stories from the years of experience in data science and all the people he's interacted with, all of the articles and news that he's aggregated through KDnuggets and all of the amazing events that he's been through.
I'll be interested to find out what your favorite part of today's podcast was. For me personally, it was the example that Gregory gave about KEFIR, that situation where a technically excellent system was developed but it wasn't used because it didn't have a place in the organization. A very telling example and something that can happen to anybody, it can happen on any project, so it's always important to understand, I guess, what you're working towards and learning from experience such as this one. When they're not even your own, you can still learn from it and understand that situations like that can happen, and how you can try to avoid them in your own career. Of course, among other things there was a lot of very valuable insights that Gregory shared with us.
On that note, we're gonna wrap up. I highly encourage you to check out KDnuggets and follow that website, and follow the news that they're sharing. Get onto their email list, so you get all the updates, all the very important and most recent updates of data science. Of course, follow Gregory himself, connect with him, if you're not following him already on LinkedIn, I'm sure he'll be happy to get in touch and stay in touch. Of course, you can find all of the short notes for today's episode at www.superdatascience.com/175. We'll also include a ton of links that we mentioned on the show so head on over to superdatascience.com/175, and check the module out, look up those articles, look at those polls, and see where the world of data science is going. I can't wait to seeing you back here next time. Until then, happy analyzing.