SDS 342: History of Data Science – Part 2

Podcast Guest: Kirill Eremenko

February 21, 2020

Welcome back to the FiveMinuteFriday episode of the SuperDataScience Podcast!

Today we’re continuing our series on the history of data science, looking at data science at the start of the 21st century.
Data science probably did not live up to the expectations of science fiction. I recently watched clips from 2001: Space Odyssey and the expectation was that, by 2001, we’d have fully sentient robots and artificial intelligence. Though we’re starting to get close now in 2020.
But, starting in 2001, we had influential people talking about data science becoming more of a science. The first call for the merger of data analysts and computer scientists happened as well as the argument for algorithmic models over statistic models. It’s fascinating to see where that started since modern data scientists use nothing but algorithmic models, for the most part. Two data science journals were launched as well.
The first data science research center was established in Shanghai in 2007. In 2008, DJ Patil was named the Chief Data Scientist of the United States. After that, multiple articles came out studying this career and what exactly it can and should be. What we truly see at the end of the early 2000s is the growth of interest in data science and a lot of hype being built around this growing position. The 2000s became the golden age of data science and we’ll explore more about what that means in the next episode. 
ITEMS MENTIONED IN THIS PODCAST:
DID YOU ENJOY THE PODCAST?
  • How does it feel to look back and see data science practices that are common today be first debated in the early 2000s?
  • Download The Transcript
  • Music Credit: Light by Krakn [NCS Release]

Podcast Transcript

This is FiveMinuteFriday, the History of Data Science, episode number 2 out of 5.

Welcome back to the SuperDataScience podcast, ladies and gentleman and everybody listening into this. Super excited to have you back here on the show.
Today, we’re continuing with our excurse into the history of data science. Last time, we looked at the origins of data science from 1950 to 1999. If you missed that episode, then check it out in the history of episodes for this podcast because today, we’re moving forward. In this episode, we’ll be looking at how data science continued to develop at the start of the 21st century, so between 2001 and 2010.
The first thing that we should point out is that data science probably didn’t live up to the expectations of science fiction. If you’ve seen 2001: A Space Odyssey, there’s this robot, HAL 9000. I personally haven’t seen the movie yet, but I’ve learned a bit about it on YouTube and now I’m very curious to see it.
For instance, if you haven’t seen it either, you can check out this short video, which has like over 2 million views. The video is called I’m Sorry, Dave. I’m Afraid I Can’t Do That. That’s HAL 9000, the AI in the movie, when he refuses to open the hatch door. So, yeah, very interesting.
The expectations were that by 2001, we would have artificial intelligence which would be as sentient as humans. However, of course, we didn’t have them then and we don’t have them still, although recently I watched an interesting interview, we’ll link to it in the show notes, an interesting interview between Tony Robbins and Sophia. You know Sophia, the robot that got citizenship in Dubai a few years back? So, this is a new interview that… Well, an interview that Tony Robbins had with her recently and, yeah, very good responses. The robot, it feels like it’s actually having a conversation. So, if you haven’t seen that either, check it out. It’s pretty cool.
Anyway, at the start of the 2000s, we didn’t have AI, still don’t. But, instead, what we saw were some influential people talking about data science becoming more of a science.
First one notable of mentioning is an article by William Cleveland, who is a professor of computer science at Purdue University, known for his work in data visualization. He called for a new type of profession, a merger between data analysts and computer scientists, so kind of like bringing together the computer science… Taking computer science and not just having it as just programming and developing software, but actually bringing it into analytics more.
The second was a piece, well, basically some arguments from Leo Breiman. Leo Breiman, you may have not heard the name, but he was a statistician at Berkeley and he bridged the gap between statistics and computer science.
For instance, you’re going to be probably shocked to hear this, but he invented the random forest algorithm. There is quite a high chance that you’ve used or will very soon use the random forest algorithm in your work, in your career. Well, Leo Breiman, that’s just one of the things he did for machine learning. He created random forest, something that data scientists use on daily basis. There are probably thousands, if not tens of thousands of random forests being created every single day and he was the person who invented it.
Well, Leo Breiman argued that we need to start employing algorithm models, which we use in machine learning now, rather than stochastic data models, which are predominant in statistics mostly. Back then, these were arguments and this was something to where people are predicting the world would be going. Now, it’s impossible to imagine a modern data scientist working any other way. All we do is we mostly use algorithmic models to model our data.
But in early 2000s also two journals were launched, the Data Science Journal and the Journal of Data Science, which are still going strong. You can find them online.
A few years later, in 2005, the National Science Board published definition of data scientists as those whose primary activity is to conduct creative inquiry and analysis and, thereby, this recognized the incredible importance and variety, so it’s not just crunching numbers. It’s actually a creative inquiry and analysis. So that recognizes the importance and variety of the work that we do to this day as data scientists.
The first data science research center in China was established in Shanghai in 2007. Researchers from there argued in 2009 that data science is a completely new science since, unlike natural sciences and social sciences, its research object is data in cyberspace.
So that was all very exciting. However, the big boom truly began in 2008. One of the first people to recognize it was DJ Patil, who is the chief or who was the chief data scientist of the US between 2015 and 2017. How cool is that being the chief data scientist of a contry? Not just any country, the United States. That’s a huge responsibility not only to deliver the work, but also to represent data scientists worldwide.
So DJ Patil is quite known for his Harvard Business Review article, which he wrote with Tom Davenport, titled Data Scientist is the Sexiest Job of the 21st Century. That’s where it all started. If you’ve heard about that, if you haven’t read it yet, I highly recommend checking it out. It’s on Harvard Business Review.
Then after that, in 2009, we saw a publication of numerous articles which were seeking to define what exactly this field was and how to develop it further. For instance, Harnessing the Power of Digital Data for Science and Society argued that data scientists are key to the current and future success of the scientific enterprise, but also that they often receive little recognition for their contributions.
Google’s chief economist noted that with the increasing availability of free data, the bottleneck is the ability to understand that data and gain insight from it.
And astrophysicist Kirk Borne, who is the principal data scientist at Booz Allen Hamilton now and he’s also one of the leading evangelists in the space of data science, he’s looking to educate lots of peoples, lots of followers on LinkedIn, also a very interesting person to check out, Kirk Borne. Well, he argued that “training the next generation in the fine art of deriving intelligent understanding from data is needed for the success of sciences, communities, projects, agencies, businesses and economies.”
So you can see how towards the end of the 2000 through to 2010s, so that first decade in the 21st century, things really started heating up. There were articles flying around data science, data science started getting in the limelight and becoming very popular and started even a hype developing around data science. But it’s not just one of those hypes as we know from the previous episodes, not just one of those hypes of that came out of nowhere.
There was a huge lead-up of decades of work leading up to this, and this is when it all started coming out into the world and many people hearing about it worldwide. So, essentially, we could keep going on and quoting others who presented data science in an increasingly glowing light, but they all said, essentially, the same thing. Welcome to the 2000s, the golden age of data science has begun.
That’s where we’ll leave it off today, very exciting time. In the next episode, we will discuss the following stage in the history of data science. I look forward to seeing you there. And until next time, happy analyzing.
Show All

Share on

Related Podcasts