Podcasts SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

72 minutes
Business, Data Science

SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Today we’re talking about the cloud. We looked at auto ML, big data, natural language processing, business intelligence and data science, and much more.

About Greg Pavlik

Greg Pavlik is Senior Vice President, Artificial Intelligence/Machine Learning and Data Services, at Oracle. He is responsible for product strategy, the technical roadmap, and solution architecture initiatives. Pavlik’s primary focus is on data integration, big data, event streaming, and data science. His goal is to ensure Oracle has the most capable and performant platform in the market for end-to-end solutions in the cloud. Before joining Oracle, Pavlik was chief product officer for Hortonworks, where he built out the core product engineering and support functions and helped create the on-premises big data industry from its early stages. Prior to that, Pavlik was vice president for product engineering at Oracle, responsible for integration products in the middleware portfolio. He started his career at Oracle focused on transaction processing and application server technology in 2002. Pavlik has a BSE in materials science and engineering from the University of Pennsylvania and an MBA from the Wharton School of Business at the University of Pennsylvania.

Overview

Greg Pavlik and I started out by discussing the rise of working from home and remote work, especially in the wake of the pandemic. We discussed the possibility of a VR office taking over the physical one, though local offices do have human elements VR can’t replace. Then we went over to Greg’s background which is chemistry, not computer science. His early work involved developing simulations for a spacecraft and space environment. After that, he went through an interesting journey before landing at Oracle.

Greg started working with the team that developed Hadoop until a shift in 2016. The system was monolithic and hard and expensive to stabilize and public cloud infrastructure and object storage took over. One of the things that attracted Greg to Oracle was a great end product but a cumbersome overhead. He set out to solve that problem through the cloud service called Dataflow which utilizes Apache Spark. What they’re working towards is usable cloud services for a variety of businesses and populations. He notes though there is still a need for Hadoop for certain entities. One of the big missteps taken by Hadoop and its inability to continue to evolve was believing computing and data storage were intrinsically linked. The varying toolkits available with cloud infrastructures mean data scientists have an easier time sifting through data and business problems. These data science toolkits make the cloud resources more accessible, offer AutoML capabilities, and have model explanation capabilities.

But can non-enterprise companies get utility out of Oracle? The cloud arm of the company was built from scratch. You can come in as a developer and bootstrap your work with much less company infrastructure. The on-premises portfolio was targeted at enterprise businesses but the cloud was designed for a wider user audience. Startups, small businesses, developer students, and other varied users can find function out of Oracle’s cloud services.

In terms of trends, Greg says the number of functional data scientists has exploded in the industry, eliminating the scarcity of data scientists and tools for companies. Natural language processing is another boom he’s seeing. On of the moves in machine learning is the ability to answer questions you didn’t even know you would have to ask. This rise is the result of a convergence of investment, interest, and technology while people become more comfortable working with massive amounts of big data which is no longer constrained by a technology footprint. These present trends will likely persist in the near-term as people are driven to zero ops (non-management) infrastructure. Autonomous data warehouses would be run by machine learning models.

In this episode you will learn:

Will we have cloud-based solutions for VR and working from home? [8:15]
Greg’s career journey [11:50]
From Hadoop to Cloud [23:35]
The cloud element in data science [30:17]
Data science and AI in Oracle [33:00]
Is Oracle more suited for larger companies only? [37:35]
Fundamental differences between Oracle Cloud, Amazon, and Azure [42:12]
Trends in data science and data management [45:14]
Why should someone choose Oracle over any other open source? [52:50]
How does the future of data management look like? [56:00]
5G and edge computing [1:01:36]
Greg’s recommendation to data scientists [1:04:28]

Items mentioned in this podcast:

Follow Greg

Episode Transcript

Download The Transcript

Podcast Transcript

Kirill Eremenko: This is episode number 375 with Senior Vice President and Chief Technology Officer at Oracle cloud platform, Greg Pavlik.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Kirill Eremenko: Welcome back to the SuperDataScience podcast everybody. Super pumped to have you back here on the show, because today we’re talking about the cloud. As data scientists, we don’t often think about the concepts and mechanics behind the cloud that we often do use for our computations and data storage. We don’t often stop to think what it’s all about, what are the different vendors, how does it all work, what is the future, what are the trends in this space? Today is a great episode to educate yourself. I personally was a learner on this episode, I was learning and soaking up all this knowledge. And who is better positioned to teach about the cloud than the Senior Vice President and Chief Technology Officer at Oracle Cloud Infrastructure, Greg Pavlik.

Kirill Eremenko: So in this episode you will learn a ton about the cloud. For instance we’ll talk about AutoML and what it means for the future of data science. The trends in data science, for example big data and how we saw the rise and fall of Hadoop, the number of data scientists growing in the world, natural language processing, and why it is starting to dominate cloud computations. Data science and business intelligence, and what that intersect means for the profession. Small data versus big data, and much, much more.

Kirill Eremenko: So this is an episode to jump in and learn. It will at times, or it may at times feel complex. I definitely found it quite complex in certain areas, but that’s why I asked a lot of questions. And this is an opportunity to educate yourself about the cloud and understand the future, understand where all these trends are going, and take your professional skills as a data scientist to the next level in a domain that we constantly use for our work. And so on that note, I can’t wait for you to check out this episode without further ado, I bring to you Greg Pavlik, who is the Senior Vice President and Chief Technology Officer at Oracle Cloud Infrastructure.

Kirill Eremenko: Welcome back to SuperDataScience podcast everybody. Super pumped to have you back here on the show. Today I’ve got a very special guest, Greg Pavlik calling. I believe you’re from the West Coast of the U.S, right Greg?

Greg Pavlik: Yep, yep. We’re in the Bay area. So good place to be for technology, good place to be for machine learning.

Kirill Eremenko: Fantastic. How long have you been there for?

Greg Pavlik: About 12 years now.

Kirill Eremenko: 12 years?

Greg Pavlik: Yeah. Though we’re not really long timers, we showed up from the East Coast about 12 years back. We’re from the New Jersey area.

Kirill Eremenko: What made you move?

Greg Pavlik: Work. Yeah, yeah. I got to a point where I flew out once a month, then it was every two weeks, then it was every week. So after about a year of flying out weekly-

Kirill Eremenko: Wow.

Greg Pavlik: … we decided it was time to move. Yeah, I think it was 50 out of 52 two weeks out of the year I was on the road.

Kirill Eremenko: Wow.

Greg Pavlik: Now because of the pandemic I don’t travel at all. But the baseline for travel now is usually mostly around the West Coast and mostly once a month.

Kirill Eremenko: Very interesting. I met once a gentleman who was flying in flying out to mining sites, also like every week. Because he was a dragline operator, those massive machines. And they’re very rare or hard to come by, these people hardly return. So this mining center was flying him in and out, every week for seven years. So yeah, when I saw him on the plane I felt a little bit sorry for him, because the shape of his back perfectly fit into the seat of the plane.

Greg Pavlik: What I found when I was traveling all the time is, you get out on Monday morning on the 6:00 AM flight, and it’s the same people, every week, week after week. They just had their pattern. A lot of consultants, sometimes managers, but it’s not a good lifestyle if you can avoid it. I definitely recommend something a little bit more stable.

Kirill Eremenko: Yeah, I got you. And do you miss it now with the pandemic that you have to stay at home? Is it something you reminisce?

Greg Pavlik: Well, the big issue for me is, I don’t miss travel, but it’s more of the face-to-face teamwork. I mean, one of the things I’ve always felt is, the whiteboard is hard to beat, is the number one engineering tool. And I’ve still not found a great substitute in a face-to-face conversation for a whiteboard.

Kirill Eremenko: That’s interesting.

Greg Pavlik: And there’s that social capital you build talking to people, when you’re really in the same room, sharing a cup of coffee. Then the other problem is, and I think this is one that people underestimate, the value of these ad hoc hallway conversations, especially not so much when you’re trying to do a technical problem, but when you’re trying to work across teams and get teams to coordinate, keeping people on the same page. There’s a lot that happens informally. And it’s very difficult to do the informal thing, when you have to start a Zoom meeting in order to start a conversation. I think there’s a lot of discussions that just don’t happen or if they do happen, they’re email exchanges that can be interpreted in different ways.

Greg Pavlik: So that’s been a bit of a tax. The flip side is, and this is actually a concern I have, is people seem to be working more hours now more than ever. Because about a week ago, we gave everybody a mandatory day off in the organization. And we’ll probably do that again in another four to six weeks just to let people pace themselves. Because there’s this tendency, get up in the morning, login, start working, then take a break to get something to eat, work, work, work, work, another break to eat, work, and then next, your day is over. It’s great in terms of trying to advance the ball and moving things forward, up until you start to hit burnout. So we’re really trying to figure out ways to keep people productive, but also make sure they don’t wear themselves out.

Kirill Eremenko: Absolutely, yeah. Definitely something. Our team is fully remote. So definitely it was something we noticed as well. People need to take a break.

Greg Pavlik: Yeah. One of the things we’ve been trying to do is start to take lessons from companies and organizations that do work fully remote all the time. We have some people that have come in through open source communities, that we’re trying to adopt best practices from open source, especially from the Apache Software Foundation, in terms of how we do our internal development. That’s helping, I think improving things from a quality perspective, overall. But learning more from organizations that have been, especially companies that have been remote full time, is something that we’re working on as well. It’s really important.

Greg Pavlik: And it’s not the same. Things go reasonably well, I think as people adapt, but as far as really getting things dialed in and making sure that we’re keeping the bar high, from a quality perspective, and from a work life balance perspective, are probably the two biggest challenges we have right now.

Kirill Eremenko: Absolutely. And coming back a bit to the points you mentioned about the value of those ad hoc conversations in the hallways, it was very interesting to hear that coming from you, since you are in charge of a big part of Oracle to do with the cloud. And one of the goals is to move on premise to the cloud. Question, do you think sometime in the future, maybe triggered by this pandemic, maybe just over the course of time, we will be able to come up with a solution, whether it’s VR or AR, where we will move those ad hoc talks, for instance, we could all wake up and put on virtual reality goggles and be walking around a virtual office?

Greg Pavlik: Yeah, it’s possible. And I would certainly say the way things have developed, there’s a global search for talent. You can’t just go to any one country, any one state and say, “Hey, this is the talent pool we want.” So I think that there’s a strong potential for more and more organizations to adopt VR for things like international team integration. When you’re local to an office, though, I think there’s just a human element that it’s hard to replace unless the VR gets sophisticated enough that you can’t distinguish between reality and the virtual environment, I think people are still going to want to have the face-to-faces.

Greg Pavlik: When I was at the last company I was at, our management team was pretty distributed. But we made a real point to get together at an offsite every quarter, at least once a quarter. And it was a interpersonal relationship dynamic that got reestablished quarterly. And I think those are hard to replace with current technologies. But yeah, I think we’re going to see a lot more technology evolution toward facilitating better team dynamics. Right now, in some ways, the state of the art seems to be Slack, which Slack is great, but it’s also a strange, interrupt driven technology. It’s not the same thing as I’m walking down the hall, to get a cup of coffee, and I run into someone. So you’re already both out of the zone of work and trying to get something else done that’s not quite as a thing as a hardcore problem solving focus. So that kind of thing I haven’t seen a way yet to really replace.

Kirill Eremenko: Maybe Oracle can build something.

Greg Pavlik: Yep. [inaudible 00:10:25].

Kirill Eremenko: Gotcha. I hope you’re enjoying this amazing episode, we’ll get straight back to it after this super quick announcement. DataScienceGO Virtual. Have you registered to attend yet? If not, make sure to check it out datasciencego.com/virtual, the dates are coming up, June 20th to 21st. It’s a weekend. On Saturday we’ve got talks and workshops for newcomers and transitioners. And on the Sunday we’ve got talks and workshops for practitioners and managers. So whatever level you are, this is the virtual event for you. And it’s absolutely free. Yes, it’s absolutely free. But the number of seats is limited so apply to attend now, you can find the event at datasciencego.com/virtual.

Kirill Eremenko: Come, enjoy the talks, have lots of fun, network with your peers. Even if you don’t manage to get in for whatever reason, you will get the recordings afterwards if you register for the event. Once again, the website is datasciencego.com/virtual. No reason not to attend, no reason not to register, so make sure to jump on this opportunity, it’s only a matter of days left until this happens. And I look forward to seeing you there and now lets jump straight back into this amazing episode.

Kirill Eremenko: Well, Greg, you are a senior VP and CTO at Oracle Cloud Platform. What I’d love to dig in is to understand your journey. So you’ve had a very interesting career just by judging by your LinkedIn and you spent over, I was counting, over 12 years in Oracle in total. So-

Greg Pavlik: In total, yeah. I wound-

Kirill Eremenko: Could you walk us through-

Greg Pavlik: … up here by accident to be honest.

Kirill Eremenko: Sorry?

Greg Pavlik: I say I wound up here by accident.

Kirill Eremenko: How did that happen?

Greg Pavlik: So my background is not actually in computer science. It’s really solid state physics and physical chemistry.

Kirill Eremenko: Oh, wow.

Greg Pavlik: And I took a job to develop high temperature ceramics for satellite nose cones, back in the ’90s in Colorado. And I showed up at the job at day one, they said, “Well, you can do this ceramics engineering work that you’ve got prepped up and ready to go, or we need people to do software development. And with this project, we’re building a simulation for a spacecraft, really interesting stuff.” I said, “Well…” And, “Oh, by the way, we’ll pay you more.” And I said, “Well,” I’ll tell you, I said, “I’m willing to take more money, but you guys would be willing to put me through a master’s in computer science.” So they said yes. And-

Kirill Eremenko: Wow.

Greg Pavlik: … I just shifted my focus quite a bit. But it was a really great project actually, we developed basically a simulation, not only the spacecraft, but also the full space environment. So that when they took the actual command and control hardware, and they plugged it into the software simulation, it thought it was controlling a spacecraft. And as the spacecraft was doing things, moving solar panels or firing off reaction control thrusters, the simulation was then producing all the dynamics you would expect in the space environment, the fully testing.

Greg Pavlik: So it was really, really a cool project. One of my favorite work projects I’ve done in my career. And that started me down the journey of software and wound up going through a series of startups. The last one before the first day at Oracle was a company called Bluestone Software, which was early app server, in the heyday of the dot com boom and app server mania. And so we were one of probably four vendors at the time that were pure plays in the app server side. The incumbent that really won the day was BEA Systems who had launched a server and then they eventually were acquired by Oracle. So they wound up at Oracle too.

Greg Pavlik: But when the dot com market busted, we wound up being acquired by HP, that didn’t go very well. And Oracle was looking for a team of distributed systems and middleware engineers to start to build out their own app server platform. So I wound taking a job at Oracle and thought [crosstalk 00:15:03].

Kirill Eremenko: Wow.

Greg Pavlik: There were a couple of years and I wound up. Let’s just say it’s been about 12 years in total.

Kirill Eremenko: Wow.

Greg Pavlik: I think nine and a half the first go around and almost three now in my second-

Kirill Eremenko: Yeah. You had a bit of a break from Oracle for some time. What happened there?

Greg Pavlik: Yeah, I think we had gotten… When I joined Oracle the first time, there were about 200 people in the middleware division. By the time I left it was probably between four and 5000.

Kirill Eremenko: Wow.

Greg Pavlik: We really built that business up, both organically and then incrementally by acquisition, and eventually consolidated that whole Java middleware space between the BEA acquisition and then Sun Microsystems with Java itself. And we [crosstalk 00:15:49].

Kirill Eremenko: Sorry. What is middleware?

Greg Pavlik: Oh, middleware. Middleware is your connectivity software that sits between the application logic and your backend systems and databases. So app servers or messaging systems, Kafka, in some sense, Kubernetes is now playing the role of a middleware in a lot of systems. I think the heavyweight app servers have become largely displaced. People are moving more toward containerized applications. But back in the day for app development, modern app development it was the Java Enterprise Edition App Server Environment was the normative standard. And then that started to get displaced by the open source Spring Framework. And then, I think Spring while still around people have gotten much more freeform in the technologies they’re using for app implementations.

Greg Pavlik: I mean, it was great journey, very interesting. We really got to develop the market, the business. But we got to a phase where, this was probably around 2011, late 2010, where Oracle was really focused on ingesting and integrating all the acquisitions they had done and consolidating their platform around the app portfolio. Which is important work for the business, but I’m a hardcore technologist at heart. And I was getting more and more interested in the emerging big data segment. And so it was clear that at the time to really go out and work with Hadoop and HBase, and a bunch of other technologies that were coming together in that whole ecosystem, that that was really going to happen outside the company.

Greg Pavlik: So I wound up getting hooked up with the team that was spinning out of Yahoo that had built Hadoop from day one. And building out one of the two pure plays in the market around the big data, specific to the Hadoop business system. So we went on a tear there, that company IPOed remarkably fast. I think from inception to IPO was probably three and a half years.

Kirill Eremenko: Wow.

Greg Pavlik: And things were going quite well until I’d say 2016. And there was a pretty dramatic shift. If you think about Hadoop, it opened, I suppose important evolutionary technology, it opened up a lot of new use cases for non specialists really, say your typical enterprise business to start to deal with both multi structured data and very, very large data sets in ways that they couldn’t before. Economically couldn’t ever because the technologies really didn’t cater to their use cases but Hadoop opened up. The problem with Hadoop was this big monolithic system that was hard to stabilize, hard to run and just expensive. The open source bits were really the least expensive part of the equation because you had to rack and stack all these machines, put them in your data centers or in Colo pay for power all the time.

Greg Pavlik: And by 2016, I think people got comfortable enough with the public cloud infrastructure, they began to take the same data sets and just put them into object storage, which in that case, you basically shift the whole operational problem off to the cloud vendor, and you’re only really paying for what you use. The object storage, it’s pretty cheap. So-

Kirill Eremenko: What is object storage?

Greg Pavlik: Something like S3 in Amazon. Every cloud platform has some variant, we just call ours the Optics Cloud Service, at Oracle Cloud Infrastructure. Azure has had a couple of different permutations in their environment, but the latest they’re calling it Azure Data Lake Storage. But every cloud platform has this ability to take binary objects and just put them into-

Kirill Eremenko: Okay. And so they’re not… Whether S3 Amazon, Azure, they don’t use Hadoop, in the backend?

Greg Pavlik: You can. I mean, it’s one option. So if you put the data into object storage, you can spin up a Hadoop cluster, pull it from object storage, process it, shut the cluster down. It’s a very heavyweight infrastructure to do that. The approach we’ve taken… One of the things, when I came into Oracle is, like I say, I really saw a lot of value in this space for end users, on the one hand. On the other hand, the technology just seemed really too cumbersome and difficult to use. So what I wanted to really do was step back and say, “How do we maintain and preserve all the good parts of this ecosystem, but eliminate the overhead, eliminate the cumbersome nature of it? The unwieldy nature of it.”

Greg Pavlik: So we’ve taken a very different approach. We have a cloud service called Data Flow. And it uses Apache Spark to do the data processing, which is the dominant data crunching framework in that whole Apache Hadoop ecosystem. But it’s entirely clusterless. It’s not just serverless, it’s clusterless. We pre-allocated a bunch of resources in the backend, and all you have to do as a user is say, “Okay, I want to run this job, I want to use this much processing power, and I want to touch this data.” And then within seconds or 10s of seconds, we’re off processing arbitrary workloads.

Greg Pavlik: But the beauty of it is, not only at the storage layer, do you have nothing to maintain, or deal with as an end user from an operational perspective, but even at the data processing level. It’s about as close as you’re going to get to a zero ops model. The difference with Hadoop, you can do the same workload with Hadoop over object storage but to spin up Hadoop clusters probably takes five, 10 minutes. Like I say it’s a lot of overhead. And you really don’t get any real benefits beyond what you [inaudible 00:21:59] process with the actual Spark packages.

Greg Pavlik: So we tried to take a look at this as a Gen two approach to learn from what other people have done both good and bad and scrapped the bad part. So I’m pretty excited about this, I look at this as big data done right, and really Oracle being the first vendor to go out and not just utilize the open source technology as it was designed for the on-premise data center, but to really re envision it for cloud native use cases that are actually tractable for real businesses. Enterprise businesses, you go to your, say typical steel manufacturing or insurance company and so forth, you’ll have specialists. You’ll have people for example in insurance that are good for data science, because they come in with strong statistical backgrounds. But you’re not going to get the same kind of population of technologists that you would have in an eBay or a PayPal or backend for Apple, where people are doing lots of data management, data crunching with a staff that specializes in distributed systems, experts in open source, fully resourced to keep this machinery running.

Greg Pavlik: So I think that the goal is to not really lose anything in terms of the capabilities that those companies can bring to bear on the problems they’re trying to address, but at the same time, make it tractable for pretty much universal population.

Kirill Eremenko: Got you. Wow, thank you for the description. I remember in 2000, between 2012 and ’14 or ’15, I was working at some point with a company that was about to invest in the magnitude of 10s of millions of dollars to spin up Hadoop on-premise cluster, and that’s when Hadoop was big, and cloud was only getting bigger, only becoming popular and they were like, “Should we go to the cloud? Should we make Hadoop on-premise?” From what you just described, I gather that the age of Hadoop has gone. It’s had its rise, it had its fall and now we’re moving to something post Hadoop.

Greg Pavlik: Yeah. I think that, like I say it was evolutionary technology. I think it was important. But I think that, and I’ll be honest with you, the rise of the cloud, cloud-based data lakes, I didn’t see it happening in 2014. If you go back to 2014, Hadoop was in its heyday. I think we IPOed in 2014 actually. So it was a exciting year.

Kirill Eremenko: Good timing.

Greg Pavlik: But the cloud platforms at that point were seen as less stable and less secure. So I think there was a lot of skepticism that people were going to be able to take mission critical datasets, and just have them live in the cloud. I think by 2016, things have flipped over. There was a lot of hardening, a lot of maturation and the cloud platforms were starting to become the de facto data lake infrastructure of choice. And I think that’s only continue to strengthen itself.

Greg Pavlik: So I think, yeah, the days of Hadoop are effectively over it. But there’s still, look, there’s still organizations that for one reason or another, are not able or ready to make that transition into the cloud yet. And from an on-premise scale out a multi structured data management perspective, there aren’t really good alternatives to Hadoop. So there’s still a market there, and I think there will be for the foreseeable future. But our mantra at the time was 50% of the world’s data in Hadoop in 10 years. And I think 50% of world’s data will wind up in the cloud, not in Hadoop.

Kirill Eremenko: Probably, probably.

Greg Pavlik: But again, all these markings and the stuff that happened there, they were super important. I mean, they really helped-

Kirill Eremenko: Oh, of course.

Greg Pavlik: … to open up a tremendous amount of value for not just the tech industry, but I think for all industries. And that was one of the interesting things with the big data landscape. We speculated at the time, that there were certain industries that would be very heavily investing in big data in a lot of industries, that wouldn’t. Actually it wasn’t the case. Retail, healthcare, finance, manufacturing, we had a really strong presence across just about every vertical. So I think a very important technology, we learned a lot from, but now we’re moving into a world, really where there is a platform in the sense that you’ve got to manage your data and be able to access it, keep it secure, govern it. But the frameworks and tools that you apply over top of that data set, highly variable. Within an organization, the great thing about cloud infrastructure is it doesn’t really constrain you, you can run whatever you want, and have it access the data in the object store.

Greg Pavlik: So for example, we did the Serverless Spark Infrastructure, it’s one way to access the data. But it’s not the only way. You can bring in your own frameworks, you could spin up a neural network and grab GPUs, crunch the data through a whole bunch of training exercises, release the GPUs when you’re done the training, and maybe a month later you’re doing something different. There’s almost this infinite flexibility that the cloud opens up in terms of the tools that you can bring to bear to the problem domain. And as you know, with, especially machine learning a lot of evolution in the toolset. A lot of advances in algorithms.

Kirill Eremenko: Yeah.

Greg Pavlik: And that’ll continue apace.

Kirill Eremenko: And also helps smaller companies get started faster. Because a lot of startups which are crunching huge data sets and are also IPOing, not because they have a huge team or lots of money to spend on servers. No, because they can use Amazon servers.

Greg Pavlik: Yeah.

Kirill Eremenko: Or your servers.

Greg Pavlik: Or OCI. Yeah.

Kirill Eremenko: Yeah.

Greg Pavlik: No, that’s absolutely the case. And like I said, with Hadoop it was interesting patterns we were developing. People wanted to start doing more with machine learning and started to do more with, say, TensorFlow. The problem was Hadoop assumed that storage and compute were conjoined. They were having [inaudible 00:28:31]. So that we had, at the time seen organizations that were going and they were buying Nvidia appliances and they’re sitting at next to a dupe cluster copying a bunch of data out of the dupe cluster in this Nvidia thing. And this was expensive and unwieldly architectures to do what was becoming more and more fundamental work. As they say, now, you’re on the cloud, I can spin up a neural network overtop, instead of GPUs process the data. I don’t pre-spend anything. I spend what I use.

Kirill Eremenko: Yeah.

Greg Pavlik: There’s a lot of flexibility. And I think the economics tend to be much better if they’re done in a controlled way. I mean, the flip side to it is, if you get into the cloud, and you’re not careful about managing the compute availability to your consumption for when you’re using it, but releasing it and then releasing it, you can drive some pretty substantial bills. So-

Kirill Eremenko: Yeah, yeah. You got to be careful.

Greg Pavlik: … this almost chips the problem in terms of operations from keeping infrastructure running to managing the financial, the organization. Which is healthy, I mean, that’s the way it should be.

Kirill Eremenko: Yeah, yeah. That’s true.

Greg Pavlik: And I think the same thing now with a lot of data science, is you get more and more teams looking closely at the business problem, as opposed to the algorithm problem. In, say a typical enterprise organization. So these convergent trends are really more and more toward meeting the goals of the business versus trying to wrestle with the technology, which is where we want things to be heading toward.

Kirill Eremenko: Those are two very valuable insights. Thank you for that. So that Hadoop, one of Hadoop’s problem was it assume that storage and compute are together. By separating those out, we have now cloud platforms, which are much more efficient. And in addition, using cloud platforms, allows the objectives of this data science machine learning to be aligned with the objectives of the business financially.

Greg Pavlik: Well, yeah. So I think the cloud element helps quite a bit on the data science side. I think the other thing is, the state of the toolset available to data scientists has changed quite a bit. If I go back four years ago, you didn’t have things like Ubiquitous AutoML. So if I’m a data scientist four years ago, even if I’m using a pre-implemented algorithm, I still have to bring a lot more art, this dark art of trying to do feature engineering, algorithm selection, hyper parameter tuning. And if you look at where things have progressed with the availability of these AutoML capabilities, the machinery and the tools around the data science toolkit can do a reasonably good job, in many cases, as good a job as humans to actually get you to production of a good model.

Greg Pavlik: So then what does it mean for me as a data scientist? It means as a data scientist, I spend less time trying to do a lot of tweaking and tuning and instinctual adaptation of the tools and libraries and more focus on the actual data, understanding the data, understanding the business problem, and moving more and more into the business domain in terms of getting a focus on better results. That to me has been a big sea change, for sure. And we’ve been, I mean, just again, not the biggest vendor specific per se, but one of the great things about Oracle is after we did the Sun acquisition, we got a large research organization.

Greg Pavlik: And so Oracle Labs, one of their main pillars of focus is machine learning. And we work really closely with the last group around AutoML toolkit, which we think is getting pretty much better results than what you can get in the public domain. But we package it together with open source technologies and make it a part of a collaborative platform. So if you come into Oracle Cloud, you have a platform for data scientists to work together as teams. But just built into it for free, for all intents and purposes, you have all these AutoML capabilities just as a default part of the Python toolkit we provide.

Kirill Eremenko: Wow. Fantastic. Just before the podcast, your PR director Victoria told me about the new division that you’re heading in data science and AI. Is that what we’re talking about now or is that something else?

Greg Pavlik: Yeah, we’ve started a fairly substantial investment. Well, actually, Oracle has a lot of investment in machine learning overall. It goes from labs, all the way up through the apps. Now there’s a whole division of our applications organization that is basically just developing models for domain problems specific to the application so if you’re doing HCM, HR type applications we’ll do resume matching. Or supply chain optimization, all kinds of problems-

Kirill Eremenko: So products? Effectively.

Greg Pavlik: Yeah. We deliver… You consume the benefits of the machine learning models, but you don’t have to go build them yourself.

Kirill Eremenko: Yeah, yeah.

Greg Pavlik: And that’s always, I think that’s clearly where we’re going to see the most uptake of machine learning from end users. At the end of the day, it’s the same thing, you pick up your phone, and you’ve got image recognition and all that. You’ve got billions of people now, using machine learning models, but they don’t even know it.

Kirill Eremenko: Yeah.

Greg Pavlik: At the same time, at the cloud team, we’ve started up a fairly significant investment around both data scientists enablement within the cloud infrastructure. So adjacent to the big data space, adjacent to the data warehousing space. And that’s really derivative of acquisition that we did about two years ago, datascience.com. So we brought in this platform that allows you to take standard notebooks, standard Python libraries, stand them up and make them available for your team but it quirks an over layer wrapper around it, that ties it into source code control, helps you do easy model deployment. You get a manager or a administrator-

Kirill Eremenko: What does-

Greg Pavlik: … for the project.

Kirill Eremenko: What does that mean for data scientists?

Greg Pavlik: Well, so one of the things we saw a lot with data scientists is that they love open source. There’s a lot out there for free, it’s all great. And so they would grab it, they’d put it on their laptop, they’d go grab some data, and they’d start mucking around and building models, and then pop out something and it’s well, three months later, we’ve got a great model, what were the datasets used? How did you get here? What was the history? Can I reproduce it? We want to, in some ways, bring the more mature practices that you would see in software development and apply them in, I’d say non intrusive ways to the data scientist. So if you come into our environment, you’ll start up a session, working on a notebook. It’ll be all the tools and libraries data scientists are familiar with. But you’re-

Kirill Eremenko: With open source?

Greg Pavlik: Open source, yeah, for sure.

Kirill Eremenko: That’s really cool.

Greg Pavlik: Yeah. We provide I mean, we do provide additional libraries, we have this accelerated data science toolkit, which is Python add-ons, makes it easy to connect to the cloud resources. So if I want to do something like access data in a cloud based data lake work, I want to spin up GPUs to run algorithms more efficiently. Those kinds of convenience tools are there, we have the AutoML capabilities that I talked about before. And then we also have a bunch of capabilities for model explainability. Those are fairly in some visualization as well.

Greg Pavlik: So we do add in and augment with IP that we’ve developed, but there’s nothing that constrains you to use that. You can work with the open source tools. I think the real benefit for teams is that now there’s a single environment, you can share notebooks, you can publish models into a model catalog. So you start to bring all this governance and control and source code management into an environment. So as a data scientist, you don’t really lose anything you have everything you like, and you’re familiar with. But at the same time, if you’re running a data science project, now you’ve got a little bit more accountability and I think much better collaboration and consistency.

Kirill Eremenko: What would you say to the comments which I’ve heard in various forms, previously, quite a few times that, Oracle is more suited for larger organizations that have a large budget or enterprise level companies. Is Oracle suitable or beneficial and some of the things you’re talking about are amazing. I don’t have to have GitHub separately to my Jupiter notebooks to where I’m storing the data, all that is integrated. That’d be really cool. But what if I’m a small organization, a startup type level. Can I also get the benefit of these tools?

Greg Pavlik: Yeah. So it’s a great question. I mean, so first of all, if you look at Oracle historically, that’s substantially true. The statement you made is pretty accurate. The cloud business, we built it from scratch, denovo. And we did it with the intention of providing a hyper scale cloud that is as accessible as an Amazon or an Azure or Google. And that was the assumption from day one. So if you want to come in as a developer, there’s a free tier, you can get started. It doesn’t cost you anything. If you’re a small organization, it’s really easy to get bootstrapped, you can get on board with a credit card and start to work in the environment.

Greg Pavlik: So there is a certain sense in which the historical on-premise portfolio really was targeted more at the enterprise level, a step up of the SMB segment. I don’t think that’s true for the cloud. In the cloud, clearly, we want to be the best at the enterprise game. And that’s really not the strengths of the other players in the cloud market. But at the same time, you’ll never get there with the enterprise unless you win the hearts and minds of developers, and really your average user. And what you’ll see now is with the cloud capabilities, our customer profile has shifted quite a bit.

Greg Pavlik: So there’s a lot of customers that were never going to be large Oracle customers or even small Oracle customers, which have been onboarding into OCI. Lots of startups, just imagine the machinery taking advantage of our services, with a couple of reasons. One, we again, even with the cloud overall, we had this advantage of what we call a gen two approach. So we brought in a lot of architects and implementers that had worked on other hyper scale clouds, and the traction for coming into work on OCI was, “You get a chance to solve the problems that you realize you couldn’t solve because you had engineered your way into a corner.” So it was a clean room environment where a lot of the engineers had an opportunity to learn from the mistakes in the first generation and just do a better job.

Greg Pavlik: So we wound up with both a more efficient environment, especially strong at the network level. But also pricing wise, I think it’s more attractive than the competitors. Again, because we have the ability to do a more streamlined implementation, really, at the base IS level. So that’s been a real boon for us in terms of just attracting a new set of users into the cloud. It’s not just startups, so not just small businesses, I mean, it’s also individuals and developers, students, much different than what you would have seen certainly five years ago in terms of the customer spread that was typical for Oracle. The other thing, I will say, this is true that in terms of the SMB segment, not just at OCI, not just on our cloud-

Kirill Eremenko: OCI is Oracle Cloud Infrastrature?

Greg Pavlik: Oracle Cloud Infrastructure, yeah. So that’s really our-

Kirill Eremenko: And that’s the same as OCP?

Greg Pavlik: … IM.

Kirill Eremenko: Oracle Cloud Platform?

Greg Pavlik: [crosstalk 00:41:26] a whole bunch of rebranding.

Kirill Eremenko: Okay.

Greg Pavlik: So the standard unified term that we use now is OCI.

Kirill Eremenko: Got you. Got you.

Greg Pavlik: All cloud services done right in the gen two approach. I will say though, we’ve also picked up quite a few SMB customers, small businesses, medium-sized businesses, just in our SaaS portfolio as well. Partially because that was a sweet spot for NetSuite which is now a part of Oracle, but even in the more conventional segments for Oracle Applications on the SaaS side. Quite a few startups, quite a few younger companies have gone with Oracle. A lot of competition with Workday and others.

Kirill Eremenko: That’s great. So by SaaS, you mean the applications you mentioned like for instance, resume matching those type of things? Ready products?

Greg Pavlik: Yeah. Your HR apps, all that could be financials, could be supply chain management.

Kirill Eremenko: Okay. Okay. Very interesting. You actually answered my next question, which was about the differences with Amazon and Azure. Sounds like you’ve been able… Because you’re building it from scratch and laser-

Greg Pavlik: I think there’s two fundamental differences in my view. One is at, we might say at the base infrastructure, at the IS layer, we’ve had a chance to really do this clean room gen two implementation. And if you start looking at benchmarks you look at price performance. And in fact, there’s a new price calculator that supplement Oracle’s website. I mean, the differences are dramatic. So that’s been a big draw, not just for smaller businesses, but large businesses that are getting these huge bills from Amazon, you come in, you can do your cost calculation, in some cases, save 10s of millions of dollars.

Greg Pavlik: That’s why you’ll see companies like Zoom, or others that are doing video conferencing or moving over to OCI because they’re getting better, much better cost performance outcomes. On the one hand, on the other hand, what those vendors are lacking and one of the core strengths of Oracle has of course, always being this enterprise readiness at the cloud infrastructure level. From a security perspective, from a governance perspective, from an accountability perspective, but you marry that together with the apps and you really have a complete environment to run the entirety of the business. And today, to a large extent Amazon and Azure are just missing, they don’t have those core capabilities moving up into that SaaS or apps tier. So Oracle really does I think have the first cloud that I’d be fair to classify as an enterprise cloud. All in.

Kirill Eremenko: Okay, very interesting. Do you think are they catching up, Amazon and Azure?

Greg Pavlik: Well, who knows what’s going to happen with acquisitions? Organic development in this space is hard. To build out an [inaudible 00:44:36] portfolio, you’re talking about… In the mature apps vendor cases you’re talking about decades of investment. And even in quote unquote, startups that have come in from a SaaS perspective, so Workday, Salesforce, they’re no longer young companies. So it’s a big investment over a long period of time. I doubt that organic investments can fill those gaps for some of the other competitors.

Kirill Eremenko: Got you. We’ve talked a bit about trends. And we talked about big data or Hadoop for that matter, having its rise and fall, cloud picking up, gen two cloud. We talked about data science that with AutoML data science is probably going to become more of a soft skill type of profession where you need to do like get the business knowledge and understand what the questions are and how to communicate them. What other trends are you seeing in the space of data science or data management?

Greg Pavlik: Yeah. That’s a great question. One is the number of data scientists, functional data scientists. And I just explored it. And that’s great. Because it means you’re… Let me go back to, say we talked about 2014, we used to talk about data scientists being unicorns. The best you can do is go into university and hire somebody with a PhD or master’s in statistics, and hope to train them up. The toolsets weren’t really there. So you had this really wonky problem, and that’s changed quite a bit. I mean, the tools that are available have gotten a lot more sophisticated. And then just the number of people that are capable of doing meaningful work has exploded. That for us, especially as vendors is great, because it means we can bring more and more people into the platform, do more and more useful workloads.

Greg Pavlik: The other thing is NLP. One of my leads for the accelerated data science toolkit I mentioned, he likes to say that text is now as fundamental for businesses as instant floats were just 20 years ago. And it’s a lot of… It will be continued innovation, but the results that we’re seeing in terms of text summarization, topic modeling, etc. I mean, they’re infinitely better than they were a few years ago. We’ve been doing a lot of work with BERT and other techniques. And we expect to see that continue to accelerate in ways that I think businesses haven’t even yet started to tap into. Think about all the contracts, emails, documents, Word documents.

Kirill Eremenko: Phone calls.

Greg Pavlik: Everything is sitting there waiting to be mined. And I always like to say the real promise here from an analytics perspective or from machine learning is that you can start to answer the questions you didn’t even know you were going to be able to ask. And I think that that’s been a sea change over the last couple of years. And we’re doing… For example one of my groups in the cognitive services I had a equation heavily focused on text analytics. And we’ll be looking at applying that both inside of our own applications more and more aggressively, but also just opening it up to end users to use directly.

Kirill Eremenko: Very interesting. Why would you say that we are seeing a rise of NLP?

Greg Pavlik: I think it’s just the convergence of enough investment, enough innovation and enough hardware based acceleration that is almost like a perfect storm event. But that’s a big one. The other thing, as I say, people are comfortable working with terabytes, petabytes of data. Again, that was hard before. So I think this big data thing continues to be important. But it’s just not constrained by a technology footprint that was hard to utilize or stand up. That’s certainly part of the cloud trend that’s enabling these use cases to unfold. I’m trying to think.

Greg Pavlik: The other thing about this is we are seeing more and more bleed over into the conventional BI analytics side of the equation where you’ve got people who were looking at business problems, but largely data warehouses, largely SIPO oriented, that are starting to also pull in and mind meld with data science groups. So that’s, again, pulling the core ML capabilities closer into the lines of business in useful ways. I mean it’s a fantastic time to be working in this space right now.

Kirill Eremenko: Yeah, absolutely. So I’m really glad you mentioned this, Business Intelligence, merging with data science we’re getting closer because yeah, a lot of times it depends on your definition. People say data science and they actually mean dashboards or they mean Tableau and Power BI and those tools. It depends.

Greg Pavlik: Yeah, that’s right. So that’s a bit of confusion that’s going on as well. On the one hand, on the other hand, that community is starting to draw from the work of data scientists, more and more. So you will see ML powered dashboards for sure. One of the things, Oracle has got a large analytics business, the Oracle Analytics Cloud, on our data science service, you can publish models into model catalog, you can browse and consume those models from within the analytics tools. So you can start to build predictive analytics directly into your dashboarding and reports in ways that with more sophisticated models that you would typically be able to do even just a year ago.

Greg Pavlik: So there’s a kind of, it’s not so much a convergence. Just think about a Venn diagram, and you’ll see an area of overlap, an area of synergy. But at the same time, I don’t see the world of Tableau specialists suddenly becoming data scientists overnight either. I think you’ll see the intersection points. I should mention, we talk a lot about big data, but also getting really good at building good models with small sets of data. There’s more sophistication and transfer learning.

Greg Pavlik: So while big data has played a role in terms of acceleration of quality of models, we’re seeing more and more the case that you can do progressively good models for your own specific problem domain with relatively small data sets, which are often… For example, let’s say you’re trying to deal with a problem that is specific to an application that you’ve developed in-house and you’re collecting some data and that you’ve got accessible within an operational database under the app, there may not be tons of data there. But if you can start to apply transfer learning techniques, you can often exploit the smaller data sets in conjunction with work that’s already been done in terms of initial seed training and get good results as well.

Greg Pavlik: So I think you’re going to see more attention paid to how to get more effective models for specific problems with less and less data as well. So I think that’s going to be another area that’s going to be hugely beneficial overall from an enterprise perspective.

Kirill Eremenko: Very interesting. So a lot of these things that we talked about, again, going back to the question of enterprise versus a smaller business, quite clear, and I even see now that as a small business, I could come onto Oracle and benefit from all of these features, especially the gen two type of cloud. Question is, apart from the compute side of things, if I have all these free tools available to me, if I can technically do the things on my laptop and I can get version control through free software like GitHub and things or like tools like GitHub, why would I choose Oracle and stick with Oracle, as opposed to not choosing anything? And just going with all the open source tools all the time?

Greg Pavlik: Yeah. I don’t think it’s either or. You want a hub in a sense, where you can bring the work together. And you want to make sure you’ve got the resources that you need to actually do training effectively. And that changes over time. So if you’re trying to roll your own, you’re stuck in this static snapshot-

Kirill Eremenko: Okay, got you.

Greg Pavlik: … versus you come into a cloud platform, you can use all the open source stuff. But you’re not constrained in the same way. You can continue to evolve, you can evolve from a hardware perspective, you can evolve from a software perspective.

Kirill Eremenko: Got you.

Greg Pavlik: As I said, we take for example, on the data science side of the equation, when we developed the data science service, the idea was, make sure that you’re not taking anything away from data scientists. Foundationally open source is the center of the model. And just make sure that it works well so that you can do a well managed, collaborative set of projects, on the one hand, that you can share those models and outputs with other parts of the business easily. And then you can continue to just begin to leverage and uptake every new wave of hardware, every new wave of software as it becomes available. So for us it’s a hub that facilitate those things rather than a competition with them.

Kirill Eremenko: Fantastic. And I love that because I worked a bit with… I don’t know if it’s still around. There was a provider called, of Hadoop, Greenplum. And they acquired Pivotal which was I think a consulting firm. And in order to work with their instance of our on Greenplum, Hadoop, you had to learn not R but Pivotal R and it was like-

Greg Pavlik: Kind of a data warehouse. Yeah.

Kirill Eremenko: Yeah. So all right that-

Greg Pavlik: Look, all these data warehouses do have legitimate need to include libraries and capabilities for algorithms explaining algorithms directly on the data. I mean, you have data in a data warehouse, there’s a time and a place for that and all the major data warehousing vendors provide that. But I don’t think that’s also the general purpose data science problem. I think that’s a specialized problem specific to the data warehousing domain.

Kirill Eremenko: Got you. Okay. Understood. So we talked a bit about existing trends and things that are becoming hot or important picking up traction. How do you see the future? If we took a snapshot of the future, in three years from now, not too far, but not too close. Three years from now, what will the future of data management look like?

Greg Pavlik: Well, okay, so let me go out a little further than that.

Kirill Eremenko: Sure.

Greg Pavlik: Because I think present trends are going to persist in the near term. And like I said, I think we’re really focused on is driving people, more and more toward zero ops model.

Kirill Eremenko: What is zero ops?

Greg Pavlik: Where you’re not managing infrastructure.

Kirill Eremenko: Got it.

Greg Pavlik: We want people to basically say, “I’ve got data, I’m going to be able to put the data under management and I may be able to process it with the focus being on problem solving, not on infrastructure.” And I think you’re going to see that be one of our main focuses, same thing with our data warehousing, autonomous data warehouse. The idea here is that the data warehouse is actually being run by machine learning models by and large. So things that DBAs used to do, index management, tuning and so forth. The data warehouse is just getting better and better in doing that itself.

Kirill Eremenko: Just to clarify, so data warehousing is the storage. Data ops is the processing.

Greg Pavlik: Yeah, I think data warehousing is… Today when have a data warehouse, you start up a database. So you typically scale out multi node database. The ops around the database, where most organizations is a combination of IT and DBAs. We want to drive as much of that overhead down to zero.

Kirill Eremenko: Okay, got you.

Greg Pavlik: So if you’re in a relational data warehouse, we want to make your focus be how do you get the most out of your data? Not how do you invest the most on IT and in running databases and tuning databases. When it comes to these big data workloads, same idea. Put your data in object store. Things like data flow with a serverless implementation lets you get the value out of the data without having to run a bunch of machinery and maintain a bunch of big IT staff to keep a bunch of clusters going.

Greg Pavlik: So I think that will continue over the next three to five years, to be the major trend in the industry. I think the workloads got a big head start on both those dimensions. I think you’ll see others start to follow suit over time. The thing that… The reason I said let’s look out longer than that, I think ultimately where we want to be is to think about the cloud as your database so to speak. So you don’t think about individual technologies for storing the data. And ultimately, you don’t think about individual technologies for processing the data. You just push your data to the cloud, how and where it gets stored behind the cloud interface is a entirely vendor problem. And then you will more and more want to be able to just ask questions about your data without having to project into technologies that are very specific to data processing.

Greg Pavlik: So you can imagine where I can come in, and I can speak to my computer, which is hooked up to the cloud, say, “Hey, I want to see how sales forecasts were compared to actuals in North America for April.” And the result comes back. Almost like when you go into Google, and you just type in, you type a search, you get back a result. And the algorithms in Google are trying to figure out as best they can what are the most relevant results for your need, but you lack precision. And you’re, today at least, there’s a degree of personalization, but it’s not hyper personalized. I think over time, you’ll be able to get almost the same interaction that you have a Google except you’ll be able to ask very specific and very sophisticated questions and get very specific and very sophisticated responses back.

Kirill Eremenko: Wow.

Greg Pavlik: A response maybe a spreadsheet that comes back. Okay thinking of it-

Kirill Eremenko: Fantastic.

Greg Pavlik: … I didn’t have to say, it just knows I work with spreadsheets, this is going to be the best outcome for you as a user. And you’re not looking at individual databases, you’re not looking at trying to parse through abstruse data structures and so forth. That’s the level of sophistication that you’re going to get out of the cloud in another decade or so. And I think the thing about that, it goes back to also is language processing. If you think about speech, if you think about text analytics, just being able to say something how that interpreted. In some sense understood that having that translated into optimal set of queries that happen in the backend, and then coming back with an optimal set of results that will largely be driven through machine learning.

Kirill Eremenko: Well, thank you. That’s a great vision. I have one more topic that just popped to mind that I wanted to touch on, 5G and Edge computing. And what I’ve heard, I’m not an expert in this by any means, but what I’ve heard is 5G is here to partially enable Edge computing and Edge computing is computing things, for instance Siri right now, it won’t work if you have no internet connection. But if we have on device computing, then it will work. Whereas Edge computing is somewhere in between. It’s between the cloud and it’s locally in your area. So is Edge computing going to disrupt Oracle’s business model?

Greg Pavlik: No. I mean, I think in general, the capabilities of the cloud will progressively look more and more like they’re just a part of the natural landscape we work in. But you’re still going to need to do a lot of core data processing, a lot of core data management at scale, within a centralized context. What the promise, at least in the near term with Edge computing is, is that you can start to externalize what you might call auxiliary processing down toward devices. And I think 5G… I mean, 5G will be important because it’s opening up bandwidth. But it’s also going to be processing power at the Edge, which is going to be a determining factor for what we can do overtime as well. But for sure, I mean, we’re certainly see quite a bit of model execution occurring outside of a centralized context.

Kirill Eremenko: Is Oracle planning on becoming part of that Edge computing game?

Greg Pavlik: Yeah. I mean, it’s unavoidable now. So it’s all part and parcel. Right now we’ve got a whole bunch of stuff around digital assistants, chatbots, and so forth. Those things will be the first wave you’ll see projected more toward the Edge, app functionality, disconnected modes, etc. Those are all going to be things that we’ll see moving more and more into the Edge. I still think, it’s not going to be either or. This is complimentary set of developments which will allow us to do things that frankly, as a matter of just what would have been impossible today will be doable on the Edge but it’s unlikely anytime soon that you’re going to supplant the need for internal systems, centralized systems. I think 30 years out, who knows? A different question. But in the near term, I think that these are more or less entirely complimentary.

Kirill Eremenko: Okay. Understood. Yeah. So that wraps up all my questions and we’re also running out of time. But I wanted to ask you before we wrap up, a guidance. Because a lot of people listening to this are data scientists, aspiring data scientists who want to progress their careers and learn as much as possible. And personally, I’ve learned a lot from you today. For me, it was a very insightful conversation to get up to speed with the world of cloud because normally as a data scientist, you don’t think about it that much. You’re not up-to-date with these trends and things that are going on.

Kirill Eremenko: So what was your recommendation or wish, if you could make one wish for people listening to this or data scientists, in terms of their relationship with the cloud and them being up-to-date with what’s going on in the cloud. What would your recommendation be?

Greg Pavlik: Well, I think there’s a lot of advantages to thinking about the ability to have a hub so that teams can work together, so you get more productive because the better outcomes we get, the more audibility we get, the more control the teams have, and traceability in terms of libraries and versions, and so forth the more ubiquitous the outputs from data scientists teams are going to be in organizations that might otherwise have been a bit conservative about accepting work that was harder to understand provenance.

Greg Pavlik: And like I say, I think the ability to keep up with the demand, the processing demands for doing a lot of artificial intelligence work are going to be impossible unless you progress rapidly or able to take advantage of the latest hardware. So if I want the latest generation of GPUs, yeah, some organizations will buy them if they’re building out large HPC clusters, things like that. But for most businesses, it’s just not practical. I think that looking at the cloud as an enabling tool and not as a… And it shouldn’t be looked at as an impediment. It doesn’t take anything away. It only makes the job easier to get good results as opposed to being stuck in the laptop based world.

Greg Pavlik: The other thing I would say, just in general, for data scientists is don’t be afraid of getting close to the line of business. Because again, the value of the technology is what’s going to drive investment which is what’s going to drive innovation. So we need to continue to really be driving powerful outcomes. And I know as a technologist, it’s easy for me to just get excited about technology. But on the other hand, we all need this stuff funded. So getting closer and understanding the business because we’ve seen… A couple of examples. One of the areas Oracle has worked with was the health system in the UK, and they brought a bunch of machine learning algorithms in from our Oracle Machine Learning Platform.

Greg Pavlik: And the turnaround there, they applied it to patient outcomes, they applied it to the fraud detection, and they were saving within, I think within a year with a 20 person team, something like a billion pounds plus. It was a billion pounds plus, in net savings on a year over year basis.

Kirill Eremenko: Wow.

Greg Pavlik: So you can show those kind of results for an organization, where you get this return on investment, that you’re just not going to see it through any other mechanisms, that’s really going to build up business confidence that continue to invest and continue to really make sure that this whole ecosystem is becoming more and more of the mainstream.

Kirill Eremenko: Wow fantastic. Great advice. Thank you. Thank you very much. Cloud doesn’t take away from your experience, but adds to it. And make sure to keep the business objectives in mind. Greg, on that note, it’s been a huge pleasure. And before I let you go, could you please help us out, where can we follow you, get in touch or learn more about Oracle Cloud Platform or Oracle Cloud Infrastructure?

Greg Pavlik: Yeah. So I have I guess, periodically. I don’t do as good a job as I should but I’ll put up snippets and updates and news of interest on LinkedIn. So it’s probably the easiest place to quickly follow what I’m up to, when I’m not going to heads down in terms of our development work. At Oracle Cloud, just easiest thing to do is just go to Oracle Cloud and open up a free account and start to play with it. I think people will be impressed right off the bat.

Kirill Eremenko: Fantastic. Fantastic. And one final question, do you have a book that you can recommend to our listeners?

Greg Pavlik: It depends where you’re at, in terms of maturity in the industry from a data science perspective. One of the books that we’ve found to be pretty helpful for our customers have been one of the O’Reilly books, Data Science from Scratch. It’s a Python oriented book, and I think Python’s others going down a little bit Python’s been on the upswing. So I think in terms of languages to really try to get a mastery around from a data science perspective, Python’s pretty much where it’s at for today. Who knows, five years ago, or five years from now if that’s the case. And it really walks you through building out algorithms, understanding how to really get value from data in a fundamental way. So it’s a good starting point.

Kirill Eremenko: Great, thank you. Data Science from Scratch, right?

Greg Pavlik: Yep.

Kirill Eremenko: Got you. Data Science from Scratch by O’Reilly. On that note, thank you very much, Greg, for coming on the show. It’s been a huge pleasure. And I personally learned a lot, and I’m sure many, many other people will too, as well.

Greg Pavlik: Yeah. Thanks for having me.

Kirill Eremenko: So there you have it everybody, that it was Greg Pavlik, who is the Senior Vice President and Chief Technology didOfficer at Oracle Cloud Infrastructure. And I hope you enjoyed this episode as much as I did. And I hope you learned quite a few things about the cloud and were able to pick up on some of the interesting trends that are going on in the world, what the future of the cloud looks like, how to compare between the different vendors, and why this service actually exists. What’s the purpose of encapsulating everything together? And personally, that was my favorite part of the episode, the whole notion of not just using open source tools, but having a wrapper around them, that allows you to scale with time because indeed, having your data on the laptop only takes you that far. And then you need to start thinking about, “Okay, how do I add cloud services to this? How do I add traceability or versioning of the different algorithms that I’m writing? And also how, which data I’m using,” things like that.

Kirill Eremenko: And to me, it sounds quite exciting that solutions like this, like what Oracle is providing under object store exist and can actually benefit the community. And I’m curious too, as to what was your favorite part of the episode. There was definitely lots of interesting gems that Greg shared. And as usual, you can find all the show notes at our website, at www.superdatascience.com/375. That’s www.superdatascience.com/375. There you can find the transcript for this episode, any materials were mentioned on the show, plus the URLs to Greg’s LinkedIn and the Oracle Cloud Infrastructure website where you can check out all the amazing things that we talked about today.

Kirill Eremenko: And on that note, thank you so much for sharing your time today with us and for being here and learning together on this journey, hopefully the insights were exciting and interesting to you and I look forward to seeing you back here next time. Until then, happy analyzing.

Podcasts SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

Share

SDS 375: Utilizing Oracle Cloud as an Enterprise, Small Business, or Developer

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025