SDS 119: Data Science Trends for 2018

Podcast Guest: Kirill Eremenko and Hadelin de Ponteves

January 5, 2018

Welcome to episode #119 of the Super Data Science Podcast. Here we go!

Happy new year and welcome to 2018!
After the previous reflective episode, today we are going to look into the future. Today, Hadelin de Ponteves and I discuss the key data science trends to look out for in 2018, including which ones we are most excited about and why. You will hear us draw from our own experiences and learnings across a broad range of topics, and perhaps even hints of what you can expect from some of our upcoming courses.
Excited? Tune in now to hear our insights!
In this episode you will learn:
  • Blockchain and AI: A Powerfully Disruptive Combination (10:03)
  • Cyber Security and its Relevance to Data Science (19:10)
  • Deep Learning Technology is Becoming Mainstream (27:43)
  • Persistent Growth of the Market for Big Data Systems (33:05)
  • Digital Twins and Their Uses (41:13)
  • Augmented Reality (43:58)
  • Self-Serve Analytics (47:07)
  • Data Science Teams and the Motivations Behind this Trend (52:10)
  • Upskilling Executives in Data and the Importance of Data Strategy (55:24)
Follow Kirill & Hadelin
Episode Transcript

Podcast Transcript

Kirill: This is episode number 119: Data Science Trends for 2018.

(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, data science coach and lifestyle entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
(background music plays)
Happy new year everybody, and welcome to the very first episode of the SuperDataScience podcast for 2018. Very, very excited to have you on board. Super pumped about the beginning of a new year, a new adventure. And we’re kicking off the year strong and on an exciting note. So Hadelin and I got together to record a webinar a week or so ago, which some of you actually attended. We were discussing trends. The trends that are coming up in 2018, what to expect, what to look into in the space of data science.
So on this webinar, we discussed topics such as AI, blockchain, data security, self service analytics, digital twin, and many, many more. So those are just some of the highlights of what we talked about. Prepare yourself for this exciting adventure, and also note that this webinar is available in video version. You can find the video at www.www.superdatascience.com/119. And if you have the opportunity, then it’s probably best to watch it that way. But also you will get exactly the same insights from listening to the podcast.
And on that note, without further ado, I bring to you the all-time favourite, the incredible Hadelin de Ponteves and the trends of 2018.
(background music plays)
So today we’re talking about trends, data science trends. It’s sometimes a bit complex to separate tech trends from data science trends because they come hand in hand. You can’t really imagine, a lot of times, technology without the power of data science behind it, and also data science usually is used to empower certain applications that have some technological aspect to them. We can highlight the ones that are more data science specific.
Hadelin: That’s right. And anyway, today when I talk about data science, I mostly talk about AI because that is the most exciting part, first. And second, data science has been kind of automated by AI actually, and so that’s why the demand for machine learning AI is growing as the demand for analysts is slightly decreasing its acceleration. So I mostly talk about AI, and we’re going to see that the big trends coming in 2018 will all be around AI.
Kirill: I’ll try to talk less about AI to mix it up, because Hadelin is super passionate about AI, and he will be doing most of that. Ok, so thank you very much to Leonid, our resident data scientist, who has helped us put together a list. We have two lists, actually. We have a list of trends that are predicted to be popular in 2018 and to pick up, and we have a list of trends that were predicted to be popular in 2017. So the way I think we’ll structure it is we’ll first go over the 2018 ones, see what’s coming, and so we make sure we cover them. And if we have time at the end, we’ll review what happened in the previous year and see how correct those predictions were with what we found there. Sounds good?
Hadelin: Sounds very good.
Kirill: Alright, let’s get started. So first trend in 2018: An AI Foundation. Funnily enough. We’re talking about how businesses are going to start using AI more and more and I’ll probably steal the quote here from Andrew Ng that AI is the new electricity. So Hadelin, what are your thoughts on that? How are businesses going to be using AI more and more in 2018, and what are going to be the major developments there?
Hadelin: That’s right. So lots of companies have integrated AI into their business process. And actually I have some figures. I think there’s around 60% of companies that have already made the move to integrate AI into their system, into their company, into their business processes. Automated processes, reduce costs, or make it an AI-based company. And only around 30-40% are starting to think about this but have not made the real step yet. So definitely that’s a big trend of not 2018, but already in 2017. Companies are adopting AI and try to build some AI teams and make it AI-optimised. So yes, this is really happening.
Kirill: And we talk about the difference between general artificial intelligence and narrow artificial intelligence, right? And in the sense that when we say AI, we don’t mean that there’s a robot controlling the whole business or anything. It’s like very narrow applications, in say one specific area, in marketing, in operations, or in another part of the business, there is AI.
Hadelin: And also in the decision-making process. They use AI lots for decision making. It’s sometimes the AI that makes the decision, because they apply the machine learning models on the data and they get some great [inaudible] tools to help the decision process, sometimes at the highest level. The executives use AI to help make their decision.
Kirill: This statistic is very interesting, because I was surprised at the 40%. I think it sounds really high, that 40% of businesses would have already in some shape or form started adopting AI. But I guess what that’s saying is that it’s not across the board, not across the whole business. What are your thoughts on that? Do you think more and more businesses will be adopting AI, not just in one specific application, because that’s what I think it’s talking about, that in one area there is some sort of AI that they’ve introduced and that’s how they get the 40% number?
By the way, guys, if anyone is interested, some of these stats come from the Gartner reports. We can mention where else we get them as we go along, but for example, that came from one of Gartner’s recent reports. So do you think, Hadelin, that businesses are going to limit themselves to having one application or two applications of AI across the whole business? Or do you think they will start doing narrow AI in many different spaces?
Hadelin: I think they will just leverage AI. Most of them leverage AI to improve their business, to optimize their business, to reduce their cost. There are a lot of applications that you can leverage AI from to improve your business process. So I think that’s mostly what’s going to happen. Then you have on the side the real AI-based companies that do general artificial intelligence, that do artificial intelligence at the core of their business. But if we’re talking about most companies, well, they’re definitely starting to leverage AI. And mostly thinking of the consulting companies, each one of them is building a team of data scientists and using AI, as I said, for their decision making process.
Kirill: Okay. All right. Any other comments on AI and how it’s going to lay foundations in business in the coming year?
Hadelin: Well, it can go really far. AI can be applied in some field that is not covered yet today. You know, when we talk about AI, there are actually a lot of branches of AI. You have computer vision, so computer vision can be used in many businesses to improve it. You also have deep natural language processing like what we’re doing with chat bot, and that can help significantly some companies by bringing some chat bot systems that can help people in the company and that can help navigate or whatever. You also have some other branches like robotics. Robotics can definitely automate the processes and everything.
And you also have those data robots that can leverage the data automatically and provide some outputs that will be insightful for decisions you have to make. So, there are tons of applications and there are even some applications that we haven’t thought about. So that’s definitely going to develop in the coming years.
Kirill: What’s your favourite application? Is there something that pops to mind that you’ve recently heard of that you’re like, “Wow, that’s a really cool application of AI?”
Hadelin: Yeah, I have two in mind.
Kirill: Okay, let’s go.
Hadelin: I have augmented reality and blockchain. Blockchain and AI is going to be the big trend in 2018, the combination of both. I actually don’t know which one exactly is going to be the new electricity. Remember, you said AI is going to be the new electricity, but when I see blockchain developing and what it’s capable of doing and all the things that are happening right now, I have some doubt what is going to be the real new electricity.
Kirill: Nice. And that’s a good segue for us to get into blockchain. Blockchain is a shared distributed decentralized ledger that basically takes out the middleman, takes out the bank or the polling system or something and helps people have trustworthy transactions with each other, secure transactions with each other even when they don’t know each other. I know you’re also very excited about blockchain and there actually was a TED Talk recently where they were saying that blockchain, as you just mentioned, is going to be the technology that’s going to change and shape the world in coming years, but especially in 2018, we’re going to see some major shifts because of blockchain, and those shifts are actually going to be even bigger than the ones that we’ve seen with AI. So what are your thoughts on blockchain and why are you so excited about this technology?
Hadelin: Oh, my goodness. It’s so disruptive. For example, it could build a new Internet because the fact that it is totally decentralized all over the world makes it extremely powerful at, for example, compressing data. AI and data science is all about compressing data so that we can have faster and faster transfer of data or faster and faster transactions and even more secure. And blockchain will play a significant role in that because since everything is decentralized and since everything is scripted and since everything is well-organized into flows that are in such a way that you cannot go back in the flow and modify anything, well that makes it a super solid, safe, and fast system.
And why did I say that it could build a new Internet? Since everything is decentralized, we could have the data divided into very small parts all over the world and that would make some kind of peer-to-peer compression that would make everything super powerful, like fast compression, you know, everything decentralized so that you have some extremely fast connections around the globe and that would be thanks to blockchain. So that could go really far. I think maybe two years or three years from now that that could go really, really far. And AI, of course, has a part to play in that because AI automates everything, it automates the processes so it will optimize the process inside the blockchain and that’s why the combination of both these technologies will make something super powerful.
Kirill: And also I wanted to mention that a lot of people, and I used to do the same, when we hear blockchain we think Bitcoin and when we hear Bitcoin we think blockchain, but those are not synonymous. Bitcoin is one of the things that leverages blockchain, that is built upon blockchain, and at the same time blockchain can be used for many other things. I really like the example somebody suggested of using blockchain to do voting. You know how in the U.S. there were elections and people were voting and then you go and you submit your vote? There is always an organization in the middle, like a big organization that counts the votes, that makes sure everything is safe, that there’s no cheating and that everything is accounted for, everything is trustworthy.
The organization that’s in the middle to ensure trust, that’s where blockchain comes in. Blockchain can remove that organization and basically you could do voting from your computer and the way blockchain works—we are not going to go into detail for me personally because I’m not an expert in blockchain, not yet, but I really want to get deep into the stuff and understand it better because it’s so interesting and disruptive.
But at the same time, you take out the organization, you put a blockchain, and what that enables, through the cryptography it has and through this decentralized and hyperconnected system, what happens is that now all of a sudden it’s completely trustworthy. There’s a ledger of everything that happens, this ledger is decentralized and it’s distributed to lots of people. You would have to hack hundreds of thousands of computers at the same time with the highest level of encryption in order to break into that, and that’s much harder than to hack into an organization or something like that – I guess, I’m not a hacker. So, that’s the power of blockchain. For instance, voting during elections, that’s a big deal. Like, imagine you wouldn’t have to get out of your house; you’d just vote on your laptop and do it like that. There are a couple of other examples. Anything else pops to mind?
Hadelin: Yeah, that’s right. You actually raised a very important point, which is actually there will be a massive trend in the coming years. And that trend is security. Blockchain will have a huge part to play in security. As you said, blockchains are safe because you need to hack thousands of computers to hack the system. And for this reason we’re going to definitely leverage blockchain to make some more secure and safe systems like the example that you gave about voting or any other ones.
Well, the main application of blockchain today is the new money, Bitcoins, which I actually have doubt it’s going to last because it has signs of a bubble. But with this blockchain we can make a totally decentralized and safe financial system, so that indeed we improve the security. That’s just an example of the security brought thanks to blockchain can be applied to many other fields and that’s definitely going to be a big trend in the coming years.
Blockchain will not only play a part in security, but AI will have to play a part on itself for security, because AI is developing pretty fast and at some point we will reach some powerful artificial intelligence that will go beyond human capacities and therefore we will have to control AI and that will be another part of this big security trend.
Kirill. Yeah. I had another thing I heard about blockchain, that it can be used for distributing music. There is a couple of people who distribute their music through blockchain and what that does, if you’re just a user, you can download it for free and listen to their song and so on, but if you want to use it for a project, like in a movie or in a trailer or inside your own YouTube video or something like that, then you just get it through blockchain and that way the transaction happens automatically. So, again, it’s this whole trust thing that nobody is going to get your music on its own.
Hadelin: Yeah. And what’s crazy is that you don’t even have the music on your phone. That’s because it’s totally decentralized. You have one part of the music somewhere, another part of the music somewhere else, sometimes 1,000 kilometres away from each other, and that’s a peer-to-peer system which makes it fast compression that allows you to have the music very quickly on your phone without adding it literally on your phone.
And that’s the same for movie compression or streaming. You can leverage blockchain by having some parts all around the decentralized system to get your streams, movies on your computer, and this will come from all this decentralized system brought by this blockchain technology. Again, there’s the safety component that is really improved by blockchain, but also the speed of the transfer, and also the data compression.
Kirill: Nice. So, the question I actually have that I was thinking about was, it’s really cool to know about blockchain and understand that, “Oh, cool, it goes into the foundation of Bitcoin or this can go into foundation of the voting system or content distribution and so on.” But the question is, is it something that’s completely out of reach, or is it something that we can create ourselves? Can we just sit down and program a blockchain in Python or something like that? Is it possible?
Hadelin: Yes, it is. And we will do it.
Kirill: We will do it?
Hadelin: Yes, we will do it very quickly.
Kirill: (Laughs) Awesome. Okay, so that’s our two cents on the blockchain. Okay, so the other one is not on our list right now, it’s on our list of stuff for 2017, but since we mentioned it I think it’s an important trend to point out: security, so the whole concept of security of data on the Internet and how that is important and how that is progressing. I have a few interesting examples. In 2017 we had some major, major security breaches in the world and that is a huge indication that in 2018 security is going to start growing again. Two major breaches: WannaCry, cyber attack in May 2017, people have probably heard about that. Microsoft computers in many countries around the world were locked down [indecipherable 20:07] companies from FedEx to the Ministry of Foreign Affairs of Romania were impacted by that. That was a major thing, there’s a Wikipedia article about it and so on and that was all over the news. That was a big one. That was in May 2017.
And just when you think it can’t get any worse, one of the biggest attacks in the world history also happened last year. You might know about a company called Equifax, it’s a credit rating company that has about 800 million customers around the world, it’s like one of the top three biggest credit agencies in the world. And 143 million customers were affected by that attack. That happened last year and it was announced on the 7th of September that they had a data breach, but that happened actually ages before that or months before that. People’s first names, last names, addresses, Social Security numbers, dates of birth and more information were stolen.
And if you think about it, 143 million people – there’s like 324 million people in the U.S. on its own. So that’s 143 million in the U.S. alone that were affected. There were people in Canada, in the U.K. that were affected. So, 143 million out of 324 million, which is the population of the U.S., that’s almost 50% of the U.S. population was affected by this attack. How crazy is that? And that’s just in the U.S. alone. That just stands to show.
And there were tons of other smaller examples, like the Uber hacking where they paid the hackers not to say anything and then some executives were fired for that. And other companies as well that we’ve heard in the news that have fallen victim to attacks. It’s on the verge of different trends. Data is becoming more and more popular, more and more all over the place. And with the proliferation of data, what’s happening is it’s harder and harder to keep it safe, it’s harder and harder to keep it secure and then also hackers have access to much more sophisticated tools, not even talking about AI and machine learning. Even the algorithms and ways they can infiltrate systems are much more sophisticated. There’s a huge thread and I think that security is going to be a major trend in 2018 and onwards because companies understand the importance of protecting their data and their customers’ data. What are your thoughts, Hadelin?
Hadelin: That’s absolutely right. Customers’ data actually already had issues in the past. People were complaining that the data was actually used in some non-ethical ways and that’s because there were some data breaches, security breaches. So, yeah, this is another essential part of the security trend that is coming, we not only the need to protect ourselves against powerful AI or we not only need to protect ourselves against hacking, but we also need to protect the data. For this, again, blockchain will definitely play a part in that, because since everything is encrypted, that can be a way of protecting the data in such a way that it would be very difficult to hack it thanks to the multiple systems decentralized all around the world.
So, I think we will always have to work on that because it’s the war of technology. Once you find the technology that can protect your data, what comes after that is that somebody who has a better technology and can break your technology that protects your data. So, we’re going to have some leap over leap, so the more we will get into some improving technologies, the more we will have to pay attention to the fact that it’s going to be difficult to keep up. And we’re going to have more and more experts in depth of all the technologies that are protecting the data and therefore less and less people because we’re going to reach a higher level of expertise that’s going to be very high.
And that’s another trend that I’m going to, it’s that trend and possibly a danger that has to do with security, is that the research on AI and everything is becoming so high-level that less and less people manage to get into it at the state-of-the-art. That’s a danger because imagine the state-of-the-art research falls into the wrong guys’ hands. You know, the black box will only be understood by the wrong guys. That would lead the world into some kind of danger. So, that’s why we not only need to protect the data and develop those technologies, but we need – and that’s the most important thing – to educate the world to teach them how this works and to explain them how the state-of-the-art models work, so that the black box doesn’t become that black for too many people.
Kirill: I think that’s a great answer, a great comment on that. I’m just checking the questions occasionally. We just had Halper say that “Sorry, guys, this is cyber-security. Can we please get to things more related to data science?” So, I want to respond to that because I have seen the world of cyber security. I’ve been working at Deloitte and I’ve seen what a market this is. It’s a huge, massive area which is so underrated by data science practitioners for the reasons that we mentioned.
One is that any kind of data science work that you do can fall victim to cyber-crime. Also, because it intersects with data science. There are so many different data science applications. Like, the machine learning algorithms that we’re using, that we’re learning, they can all be applied in the space of cyber-security and in the space of data security. There are definitely inherent algorithms that are specific. Like, what’s that best one called? I forgot, but basically there’s a mathematical equation that is specifically used in the space of cyber-security, at least one that I know of.
But all the other algorithms with the increasing rise of machine learning, AI, deep learning, that is slowly shifting into the space of cyber security and that’s what we had witnessed in 2017. We saw the infiltration of deep learning and machine learning in the space of security to help find these anomalies, find these possible areas where things can be breached, and to help mitigate those risks. So, I personally think that data science trends, we will cover off as many as we can right now, but one of the biggest trends overall is the one we just talked about. You can see how it’s on the intersection of different things like data science, AI and blockchain.
Okay, moving on. So, next one we have is deep learning technology is becoming mainstream. We’ve seen a lot of things happen just recently from image classification, machine translation, facial recognition, chat bots and other things that use deep learning insights. They’re starting to rise now. Hadelin, what do you think is happening now? Because a lot of these algorithms, tools and technologies have been around, some since the 80s, but some have been around since 2012. Why is it all happening now in 2017/18?
Hadelin: Because of the applications that we realize we can do with it. You know, we can do some crazy applications with deep learning now and we don’t have to wait for one or two years to do them. So, they have become extremely popular. We can see, for example, the GANs that managed to create some fake images of a real-looking human person, or those computer vision applications that can detect any object in videos. You know, these are some very cool applications.
And the other reason why they have become so popular is that it is being democratized. Thanks to all the open-source platforms like GitHub, you can get the code, and as we show in our course, apply them very easily on your videos or on your applications to do these very exciting applications. That’s why, it’s mainly thanks to everything becoming open-source and more and more easy to apply, because as you know, 4 or 5 years ago only experts could use that, only an expert had a good understanding of how it works and everything. You know, it’s like people had trouble in the beginning getting from old phones to smartphones. Four years ago people had trouble getting on all these codes and all these models, but today more and more people, even if they don’t have any notion of coding, they manage to code and use the deep learning applications.
However, that being said, I read some statistics about the models used in data science, in companies, or in general, and those deep learning models are still at the bottom of the ranking. The most used models in companies today are still logistic regression, Random forest, XGBoost, decision trees, and the deep learning models like CNNs, RNNs or GANs actually are still at the bottom of the list. It is growing, it is definitely growing. However, Geoffrey Hinton has just issued a new paper on capsule networks. If that works, if it can be implemented easily, and if it works fast enough, that could mark the end of the actual deep learning model because this would be revolutionary. But we’re not there yet. The implementation is very long to implement and execute, so we still have some to realize that.
Kirill: Okay. I agree with those points. Capsule networks are definitely some interesting disruption that’s coming our way. And you can tell about the importance of deep learning just by the number of students that sign up to our course. When did we release the deep learning course?
Hadelin: We released the deep learning course last March, actually.
Kirill: March, yeah?
Hadelin: Yeah.
Kirill: So that would be like nine months. And how many students signed up so far?
Hadelin: We have in total 65,000 students.
Kirill: 65,000 students just in that one course signed up in nine months. That’s quite insane. That’s like 7,000 people per month signing up to that course alone, is that right?
Hadelin: Yes, absolutely.
Kirill: That’s hard to believe. That’s showing where the world is going. We can kind of get a sense for those things. You know, before, machine learning—and still machine learning is very powerful, but now we’re slowly going into the space of deep learning because deep learning can solve any problem machine learning can solve, but better and more accurate.
Hadelin: Yes. However, for simple applications, models like logistic regression and XGBoost will still stand above deep learning because deep learning is not the fastest to execute, you have to iterate over many epochs, to train and apply forward propagation and backward propagation many times to train your data. Whereas XGBoost, you just put the data as your input, and in a flash it returns the output you want. So, for simple applications, for simple problems, which can definitely give you some insights for your company, I think the logistic regression models will still be among the first. So it’s not like deep learning is going to erase the other one, it’s not that it’s going to make the other ones disappear. It’s just that for the powerful applications that are extremely demanding, deep learning will become the best models.
Kirill: Gotcha. Very, very good points on that. Okay, moving further: Persistent growth of the Hadoop market. So, what we are seeing is that Hadoop and any Big Data systems like Spark, for example, which is kind of the new thing out there—remember that we were at the ODSC, I think, and Spark 2.0 came out, that was in May this year. That was like a new big thing. So why are these technologies, why is Hadoop and Spark, why are they becoming more and more prominent and why are more and more companies going to them? What do you think?
Hadelin: Well, that’s because the amount of data is constantly increasing. You need systems like Hadoop and Spark and Pig and Hive to handle all this data, to handle all the Big Data systems, because otherwise it would be really slow to handle them. Those systems are faster and faster to manage your data and to organize it and to leverage insights from them. You definitely need those systems. And actually I heard that—well, actually there’s an important point to say about data science, it’s that Python and R are still by far the software that’s the most used in company to do data science or machine learning or deep learning.
But then there is something growing up, it’s called Scala and it’s based on those Hadoop systems that handle Big Data. That is growing because you need more and more powerful systems to handle bigger and bigger data. That’s why it’s something to definitely consider. Actually, on LinkedIn I see a lot of recruiters’ posts and in these posts I see the skill that are needed, and I see now almost all the time, besides Python or R, I see Hadoop, Spark, Hive, Pig and Scala. Among all of them, if I had to choose one, if I had to recommend one, I would say Scala, because it’s extremely powerful at handling Big Data.
Kirill: And also I’d like to add that what we discussed before, deep learning and AI are contributing a lot towards the rise of Hadoop and Big Data systems. Because to train deep learning models and AI algorithms, you need a lot of data. You need that data to be stored somewhere, you need to be able to access it quickly, so it’s just natural that those two come hand-in-hand. The more the world turns to AI and deep learning, the more we’ll see Big Data systems such as Hadoop, Scala, Spark and so on.
And also a lot of it is going into the cloud. It’s like a trend that we’ve seen in 2017, that it’s not just Big Data, but it’s also Big Data in the cloud. And the reason for that is the cutting of costs, right? If you have servers on your premises for a large organization, that’s one thing. Then you have servers and you need to scale them, you need to broaden them, you need to update them even as new technology comes out and new hardware comes out. That’s millions and millions of dollars, tens of millions of dollars depending on the size of the organization; hundreds of millions of dollars for some organizations.
Whereas if you have things on the cloud, it’s less likely or it’s not as trusted yet by executives, especially old school executives who don’t want to let go of their data, they don’t want it to be somewhere else, they’re worried about security and so on. But, it can actually be even more secure, it can be very easily accessible, and it’s very easily scalable, and all the things can be updated very quickly so you don’t have to worry about updating your hardware. You can just click a button and your hardware gets updated, or the team that is managing it, because now they have economies of scale, the company that is managing it, they’re doing it for many businesses, so it’s easier for them to upgrade their hardware and also you just click a button and it’s scaled. That’s a huge thing and that’s why a lot of start-ups that are starting out, they don’t even consider having their servers on premises, but straight away in the cloud. It’s harder for companies that have been around for a while to make that move, but the ones that are doing it, those are the ones that are going to be ahead of the curve. My question to you, Hadelin, is, are we going to make a course on Big Data one day?
Hadelin: Big Data? Well, we are going to make a course on Big Data once Scala or any other system stands out. Because right now it’s not standing out that much. We still have Python and R, but as soon as one of these Big Data systems is the most used system in the companies that can have some tremendous and powerful impacts on the companies, we’ll definitely make a course on that. And actually, I did a lot of that when working in Google, I worked on a lot of the Big Data systems, so I could share with you my experience and how this works. Yeah, we can definitely do it. What do you think?
Kirill: I agree with that. Because we’ve been asked by students quite a lot about this. I think the reason we haven’t yet is because this industry is still in early stages, it’s very much forming, it’s very much shaping up. Like, Spark 2.0 came out this year, at the start of 2017. You constantly have new technologies, new versions come out and so on. Like, if we record a course today, two months from now we will need to re-record it, we will need to update things and so on.
It’s a good point, as soon as there’s a prominent market leader in that space and we know where this whole thing is going, then we can give you a course and also understand for ourselves and help you guys understand where this whole thing is going and how to keep updated with these things. It’s not in the pipeline yet, but it’s in our vision to create this course some time soon.
By the way, Leonid, our resident data scientist, asked me to make this little plug, so a little bit of advertising here. We don’t have a course on Big Data, but we do have a series of tutorials which apparently are amazing on YouTube, which are about PySpark. So, if you want to learn about PySpark, no cost involved and you don’t have to purchase anything, just subscribe to our YouTube channel, make Leonid happy, give him a Christmas present, because he is in charge of our YouTube channel, and you will also get updates about the PySpark series that we are releasing and maybe that will be something that you’re looking for. And also, of course, there are other things apart from Big Data that we talk about on the YouTube channel. Check it out. Okay, any other comments on Big Data?
Hadelin: Yeah. Well, PySpark is amazing. I really encourage students to subscribe to that channel. You know, it’s not standing out. If it was standing out, we would make a course on it, but it’s definitely useful. So that’s a great thing. That’s amazing.
Kirill: Awesome. Okay, we’ve already talked about AI. There’s another side of things which is applied AI, applying it in different spaces, different areas. I think we’ll skip that for now in the interest of time. Let’s talk about digital twins – interesting concept.
Hadelin: Digital twins! Yeah, I hear that more and more. That’s the Internet of Things, right? It’s like you’re connected to your objects and you can transfer some information between the digital twins and yourself, so that you can use them at a better and better rate. Is that correct?
Kirill: Yeah, yeah. And it’s not just for people. Like, an airplane will have a digital twin, or an airplane engine will have a digital twin and there’s like a data connection between them, or like a whole city could have a digital twin and you can basically model different scenarios that can happen in the city or in the turbine of an airplane by analysing the way that the digital twin is behaving and having the inputs from the actual object to the digital twin and also adding your own inputs. Or you could take the inputs from your own object to your digital twin and then also take inputs from the other hundred airplane engines that you have in your company and you compare it to the average, tweak things, and see model scenarios.
It’s very useful, for instance, in things like airplanes for preventative maintenance, so you know when the issues might come up even way before they are going to come up. And, of course, cities, to understand the behaviour, social and demographical things, transportation and things like that. So, you can model traffic jumps. For example, a city is growing. You constantly feed inputs from all the sensors that you have in the city into that digital twin and then you’re like, “Oh, I wonder what will happen during Thanksgiving when we block off these three roads.”
Because you have a digital twin, which is pretty much the identical copy of the city, you can actually block off the roads. This is my understanding of things. And then you see what happens to the traffic all over the city simply because you have been inputting those data points. It’s not just like a model that stores data points, it’s actually a model that learns how they behave, how they interact and what dependencies are in there.
Hadelin: That’s right. And there is another term we haven’t spoken about. At the beginning of this webinar I said, like, “The biggest trends that I’m most curious of or that I see coming the most tremendous way,” so we talked about blockchain. And then I don’t remember if I said it, but the other one was augmented reality. We didn’t speak about this one, right?
Kirill: No.
Hadelin: Yeah, that’s right. I heard that this could be the first AI trend that could reach the trillion dollar market, augmented reality, because it has tons of applications.
Kirill: Like Pokémon Go.
Hadelin: Yeah, but not that kind of applications. Like, long-lasting applications. And I heard this has huge potential, so that’s definitely something to follow. What do you think? Do you think it could be reaching such a huge market?
Kirill: Yeah, definitely. I heard that there’s VR, virtual reality, but augmented reality actually has the potential to be bigger. We saw initial attempts at that with the Google Glass, and that was like ages ago. What was it, 2015 or ’11?
Hadelin: Funny, when I worked at Google, they actually introduced them and I was there when it happened so it was 2015. What did you say, 20—?
Kirill: I don’t remember, 20-something. (Laughs)
Hadelin: Quite recent. Yeah, it was definitely 21st century.
Kirill: Yeah. And then you left Google and the whole project fell apart.
Hadelin: Yeah. (Laughs) Well, that happens.
Kirill: Yeah. Okay, augmented reality is an interesting one. It was funny to see how Pokémon Go just boomed and then was gone. I don’t hear about it.
Hadelin: I’m not sure what the reason is exactly. I don’t know, but it’s crazy how this was like what we call these ‘bubble trends.’ You think it’s a trend, you think it’s growing and at some point it bursts out and nobody hears about it anymore. Like Bitcoins, for example. I hear debates on Bitcoin all the time right now, like every day. The debate is whether it’s a bubble. “Do you think it’s a bubble? Do you think it’s not a bubble?” It’s based on blockchain technology, but still it has all the signs of a bubble, so everybody is talking about this. What do you think? Do you think it’s a bubble?
Kirill: That’s a good question. It really reminds of how the first time Bitcoin really spiked, I think it was 2014 or something like that, and it just went up and then people were like, “No, it’s going to keep going up forever,” and then – Bam! So, I don’t think it’s a fad, I don’t think it’s something that will go away. I think we will be using more and more cryptocurrency, Bitcoin or others, but I have doubts that it will keep growing forever like that. A lot of it is fuelled by hype, by media and stuff like that, and as soon as something else, the next big thing comes along, I think there will be a correction. This is not financial advice, by the way, guys watching this webinar. It’s just our opinions.
Okay, so we talked about digital twins, we talked about augmented reality. What do you think about self-serve analytics? Data science is growing. Let’s get down to the basics. Forget about AI and stuff like that for now. So, business intelligence, and we’ve got lots of different tools, lots of different approaches, and the amount of data, the volume of data, the velocity, variety, veracity, etc. of data is growing all the time and very quickly. So, with that amount of data, organizations are slowly starting to realize that it’s unsustainable to have only data scientists look into that data and pull the insights out.
It’s still very important to have data scientists, and more and more organizations are getting into that, but at the same time, what if everybody in your organization can look into data and get insights from it at some extent? And that is self-serve analytics. What are your thoughts on the trends of self-serve analytics in 2018?
Hadelin: That’s a very good question. Indeed, it’s like what I said about the black box. You’re absolutely right. Right now, only data scientists can leverage the data to gain some insights and help with decision making and everything. At the same time, we have those automated systems like this company DataRobot that basically makes what they call ‘data robots’ that take your data as input and will return the output without needing the work of a data scientist doing all the process of data analysis.
That’s what I said at the beginning of this webinar. I think that it has the potential to be automated, like self-managed data systems, and it’s actually going to come pretty quickly, but it will not replace data scientists. We will always need data scientists to improve these systems, check these systems, control that they give the right insights, check that that makes sense because sometimes the decisions can only be good decisions if you include the human factor. So, we’ll always need some people to have a complementary job on that, because the machines cannot do everything. So, I think self-analysing systems, as you call them—
Kirill: Self-serve analytics.
Hadelin: Self-serve analytics will grow, but will never grow to the point that it will replace the data science jobs.
Kirill: That’s a good point. A little bit of reassurance there. Yeah, I agree with that. And I also think it’s important for those of you out there who are data scientists or who are aspiring data scientists, it’s an important trend to keep in mind. It’s been around for a while now, but it’s going to be picking up more and more that people in organizations, regardless of their level, they are going to need to have some sort of data literacy. And it’s your job, or you can make it your job as a data scientist, to spread that, to create data advocates and to create people who are excited and inspired by data.
It’s going to make your job easier because that way, the people you talk to in the organization, they know about what you’re doing, they know the value and the importance of data and data science, and that’s cool. But also, it’s going to help the organization to grow into that right direction. If you really care about the organization, and I really hope you do in the sense that you’re working in the company that you love and that you believe in their mission. If you do, then that will be your contribution into putting them onto that right pathway where not only you are doing the data science work, but everybody in the organization is contributing, some people can do a simple regression, some people are better at understanding the different types of data, or some people have access to BI dashboards that you’ve created and now instead of you redoing it every time, you’ve created them in an interactive way so that everybody can get their own insights.
It’s an important trend for data scientists to consider, because one thing is just doing data science on your own and being the rock star that’s cool; another whole big thing is about educating others in the space of data science. At the end of the day, you will help them out not just in their roles, but also in personal growth because that’s where the world is going. You have to be data literate to be up-to-date with everything that’s happening and have other opportunities, you know, have a broad spectrum of ways that you can develop your career.
And speaking of not doing data by yourself, we had an interesting trend that we haven’t talked about yet, and that other trend is that companies are going to look more into not just hiring data science geniuses or wizards standalone, but actually building out data science teams. So, a slight difference there, but a very important one at that. What do you think, Hadelin? Why do you think companies are going to be steered away a little bit just from one super genius data scientist? That’s cool, but how about we build a team of five or ten that work together very well?
Hadelin: Because the goal in the end is to get as much people as possible on data and getting the skills to manage the data. It’s what you said a couple of minutes ago. Most people should be able to leverage the data to gain some insights as everybody is using a smartphone today. Everybody knows how to use a smartphone. We need everybody to know how to leverage data to gain some insights. That’s why I think they are making the teams. They don’t want to leave that to the experts because this is not democratization. If we leave that to the experts, we will miss out a lot on other capabilities because data science is not that hard. Everybody can do it. Everybody can apply the models. You just need to understand the intuition. And sometimes you don’t even need to understand the intuition, you just need to understand how you have to get your inputs into the system and apply the models and gain your insight.
The data is becoming so abundant. Data is everywhere. There is more and more data that of course we need more and more people, and the only way to get more and more people is, instead of leaving that to the experts, building teams of many data scientists or many people that can at least do the basic stuff in data science to gain some powerful insights.
Kirill: I agree with that. Like, when you have a team of people, you have one expert that’s awesome, but you’re dependant on them. Like if they leave, or if they decide to do certain things in a certain way rather than exploring other possibilities, other tools, you will be very dependent on that kind of stuff. I think your opinion here—people watching this or listening to this, you guys really should listen to Hadelin on this because you’ve worked in data science teams, right? You’ve been in Google and your other jobs that you’ve been in—I have been in that situation where I was the one data scientist and I was doing all the things.
From that I can totally speak to, yes, I tried to do my best in good faith and do really amazing work as much as I could, but at the same time it was very highly dependent on my subjective opinions, on my subjective ways I think the company should go and do things. You know, that might be wrong, it might be right, but you don’t want a large organization depending entirely on the opinion of one person. So I think what you said is valid here.
And the other thing that pops to mind is executives, so let’s talk about executives for a bit. There’s two sub-trends in the trend for executives that I see. We have more and more organizations hiring CDOs, Chief Data Officers, and the other one is that more and more executives, like Chief Executive Officers, the guys that are directors and heads of the companies, they are looking to get educated in the space of data science. Like, it’s not their jobs to be data scientists, but they want to find out more about algorithms, about applications, about AI, about deep learning, about all these different things data science-related to not become technological or data science dinosaurs so that they can see what this is all about. What are your thoughts on that, Hadelin? Why do you think more and more executives are jumping on board with this trend, and do you think it’s necessary?
Hadelin: Of course that’s 100% necessary, and a simple reason for this is that executives are the one who makes the decision. They are the ones who decide the next move in the company. And since data science is so powerful at leveraging the data to get the right insights that will help in a significant way to take the right decision, well, executives definitely need to be connected to data science; not necessarily be experts, but be connected to data science to understand and be convinced how data science can help them make the right decisions.
And I say that not only from a logical point of view, I also say that based on experiments. I had on the phone a lot of executives that asked me for some advice on how they would leverage data science to take decisions. They said mostly that the problem was that there’s a huge pyramid between them and the data scientists, so they are far from the data science teams and therefore they need some better data visualization tools to understand how the data is leveraged and the insights are extracted to help them take the right decisions. So the executives want to get more and more into data science and they actually need it for the simple reason that they’re the ones making decisions and data science is so powerful at helping them to take the right decisions.
Kirill: Interesting. So, let’s talk about strategy because decisions, they link up into strategy. What are your thoughts on data strategy for large organizations? Is that a thing? Is it important for an organization not just to think through their marketing strategy or let’s say operation strategy, growth, expansion and so on? Do you think that executives should be thinking about data strategy? And what does that mean, what does it mean to think about data strategy?
Hadelin: If you talk about strategy, I think strategy has a lot to do with intuition as well. It has a lot to do with intuition, experience and not only data. Data can help in the strategy because in the strategy you have to take some decisions and data helps in taking the right decision, but there is so much more than decisions in strategy. It’s a combination of things. It’s pretty complex, by the way, but you also need intuition a lot, and I think the intuition is the opposite of data. That’s why data will never replace everything because you always need intuition, and you mostly need intuition and strategy. So that’s a very interesting question, actually, which I think the answer is that data is not everything for strategy.
Kirill: I agree, but what I’m referring to is—data is not everything for strategy, I totally agree, but in the sense that let’s say we have strategy overall, but inside strategy we have everything to do with data, like the tools that we’re going to use. Are we going to install Hadoop or are we not going to install it? Are we going to go to the cloud or are we not going to go to the cloud? Do we add more data points? Do we have enough data points about our customers? Do we need more inputs? Do we need more unstructured data? Do we need to handle unstructured data? What insights can we gather from our data, or what is our current data saying about where our organization is going and how can we leverage that more, how can we implement deep learning or AI algorithms and so on? That’s the stuff I mean for the strategy around data.
I think it’s quite important for organizations to start keeping that in mind. I don’t know if it’s just going to happen on its own, the way it happens, and that might be a bit more reactive than proactive. Data strategy helps you be proactive in the sense—it’s really hard to be proactive in the first place because there are so many technologies that are coming out that you don’t even know about and that’s going to come out next year or a few months down the track, but at least you put in effort to be on top of your organization. You know your pitfalls and you know where you need to patch things up, you know where you’re not keeping up to speed with everything that’s going on in your organization in the sense of data. But if you don’t even think through data strategy, that leaves you way behind everybody else and I think that takes away a huge competitive advantage for companies.
Hadelin: Yes. And I’m reading something interesting here, so I’m going to read that to you. According to Gartner, 59% of organizations are still building their enterprise AI strategies while the remaining 41% of the organizations have already made the plunge. So, yeah, there is definitely something happening with the AI strategies for companies right now. 59% is a lot.
Kirill: Yeah, so they’re at least thinking through how they’re going to—
Hadelin: Yes, leverage AI for strategy.
Kirill: Gotcha. Okay. Yeah, very cool stuff. What else? Do you have anything else that we have missed?
Hadelin: No, I have mentioned all the trends I wanted to speak about. The ones that I’m very curious about and I will be following very closely for the next year, in 2018, will be blockchain and maybe augmented reality.
Kirill: Nice. And for me probably blockchain, I definitely want to get deep into that topic and understand a bit more about blockchain, what’s going on there, and how we can apply it in the world, how it’s going to be transformative. And I think AI, I will be interested to see how that goes. I’d say more deep learning, less AI for me. It’s kind of more basic than AI, but I like the concept of narrow applications. So something like, “Okay, there is a problem. Let’s apply deep learning and solve it.” That’s pretty cool.
Okay, we’re kind of running out of time, so I think that’s all the trends that we’ve covered. I think that was pretty cool. Thanks a lot, guys, for coming on the webinar.
Hadelin: Thanks so much, guys. That was my first webinar and I really enjoyed it.
Kirill: Yeah. All right, take care guys, and hopefully we’ll have more of these, we’ll see these coming up more. And good luck in 2018. Let’s stay in touch.
Hadelin: Yes, keep up the good work.
Kirill: All right. See you, man.
Hadelin: See you.
Kirill: There we go. Those were the trends for 2018 that we were able to identify. Of course, some of them will happen, some of them will happen less, but overall those are the most exciting things to look out for in this coming year. Which was your favourite trend? Which is the one that you’re most excited about, the one that you’re looking into the most?
Personally for me, I like the concept of anything to do with AI and digital twins and security as well, but the one I’m most curious about is blockchain. I have this new project of my own that is going on that I’m learning about blockchain and I want to learn more and more about blockchain, I want to find out how it works, what exactly goes into it, what the security, encryption and other implications are, and what are the use cases and so on. So definitely that’s the one for me. But again, yours might be a bit different.
In any case, I hope you enjoyed these trends and now you know what to look out for in 2018. If you know somebody in the space of data science that could benefit from this episode, then forward it to them and help them also get prepared and maybe you’ll have something to discuss and debate after they listen or watch, because this episode is available in video mode, and you’ll have something to discuss with them. Plus you can get all the links from this episode and the show notes at www.www.superdatascience.com/119. There you can also find the video recording. And on that note, thank you so much for being here. I can’t wait to see you back here again soon. Until then, happy analysing.
Show All

Share on

Related Podcasts