Podcasts SDS 265: Data Science in the World of Big Data

61 minutes
Data Science, Deep Learning

SDS 265: Data Science in the World of Big Data

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Today I talked with big data expert and educator Frank Kane about his work at Amazon and how big data and date science are finally beginning to work together to expand the field.

About Frank Kane

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Overview

Frank, who started out as a video game engineer and flight simulator developer, worked for over ten years at Amazon after he got a recruitment call in 2003. Back then data science didn’t even exist. He worked from Software Engineer to Senior Manager who was in charge of IMDb by the end of his work at Amazon. After wanting to escape the rain of Seattle, he moved out to Orlando where he’s been working for himself ever since. His work at Amazon was extensive and educational, especially working directly with Jeff Bezos who he describes as the smartest man he ever met.

Afterwards, because of his noncompete, he began working in virtual simulation before moving into educational work. He started freelance gigs to help make ends meet as a newly self-employed person after 9 years of salaried work at Amazon. One of those gigs was putting together curriculum for data science and machine learning. From there, he started putting together his own courses and built upon his audience until he reached the point he’s at now. Frank teaches mainly in big data but there is overlap in big data and data science and machine learning. Big data is the distribution of data processes in massive scale, which owes its existence, in large part, to the beginnings of data science and neural networks on Jupyter Notebooks. Frank describes it as doing machine learning and data science in the environment and context of big data.

Frank’s work at Amazon dealt mainly in recommender systems. The widget at the bottom of an Amazon page that tells you what other people bought while looking at something is work Frank put together. He created a database out of 2-D matrix between customers and items and finding the relationships in between. From that starting point, the vectors can grow and start to duplicate and score personalized recommendations and beyond. Today, recommender systems have been changed by the advent of deep learning. The challenge is getting a neural network to work with sparse data, something Amazon has achieved but few other people have, keeping the simpler, traditional approaches relevant for those who don’t have access to the work and resources of something like Amazon.

Recommender systems also come with a double-edged sword of privacy vs. accuracy. You want YouTube to give you relevant content recommendations but you don’t want it to know everything about you. The tradeoff of giving up your privacy is better service and better products. The younger generations already are losing their concept of privacy online, as it grows, the concept of privacy might become foreign. Still, what do we do with all that information people have given up? It’s a question without a concrete answer as more and more people quit Facebook every day and people start to wonder what data and information governments have access to.

As for becoming an expert into recommender systems? Be a good software engineer. Have a solid foundation in linear algebra. From that point, there’s a lot you can start to learn. On that note, Frank also did functions as a “bar raiser” at Amazon as part of the hiring process. He had veto authority of every hire and made sure candidates kept the quality of Amazon standards. His number 1 tip for Amazon is putting the customer first in your thought process and problem solving. Work backwards from the customer experience. The role exists to avoid the desperation for software engineers at Amazon causing hiring managers to lower their standards. The best thing talented engineers can do is “build a beacon” above their head. Amazon and Google will come to you if you’re talented enough and put your work out there.

In this episode you will learn:

Who is Frank Kane? [7:41]
How did Frank end up as an educator? [12:30]
Big data vs. data science/machine learning [17:20]
Recommender systems [20:19]
Advice for newcomers [41:00]
Hiring at Amazon [45:00]

Items mentioned in this podcast:

Data Science Insider
Elasticsearch 6 and Elastic Stack Course – Get free video course with code (active 1 month): elstc6-D5CF
The Ultimate Introduction to Big Data Course – Get free video course with code (active 1 month): ultbdi-2D38
Manning Publications – permanent discount code for SuperDataScience podcast listeners: podsuperdatasc19
Amazon DSSTNE
IMDb
Recommender System Handbook by Francesco Ricci, Lior Rokach, Bracha Shapira and Paul B. Kantor

Follow Frank

Episode Transcript

Download The Transcript

Podcast Transcript

Kirill Eremenko: This is episode number 265 with top instructor in the space of big data, Frank Kane.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.

Kirill Eremenko: This episode of the SuperDataScience podcast is brought to you by our very own Data Science Insider. The Data Science Insider is a weekly newsletter for data scientists, which is designed specifically to help you find out what have been the latest updates and what is the most important news in the space of data science, artificial intelligence and other technologies. It is completely free and you can sign up at www.superdatascience.com/dsi. And the way this works is that every week there’s plenty of updates and seemingly important information coming out in the world of technology. But at the same time it is virtually impossible for a single person on a weekly basis to go through all of this and find out what is actually really relevant to a career of a data scientist and what is actually very important. And that’s why our team curates the top five updates of the week, puts them into an email and sends it to you.

Kirill Eremenko: So once you sign up for the Data Science Insider, every single Friday you will receive this email in your inbox. It doesn’t spam your inbox, it just arrives and has the top five updates with brief descriptions. And that’s what I like the most about it, the descriptions. So you don’t actually even have to read every single article. So our team has already read these articles for you and put the summaries into the email so you can simply just read the updates in the email and be up to speed in a matter of seconds. And if you like a certain article, you can click on it and read into it further.

Kirill Eremenko: And so whether you want great ideas that can be used to boost your next project or you’re just curious about the latest news in technology, the Data Science Insider is perfect for you. So once again, you can sign up at www.www.superdatascience.com/dsi. So make sure not to miss this opportunity and sign up for the data science insider today and that way you will join the rest of our community and start receiving the most important technology updates relevant to your career already this week.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen, super excited to have you back here on the show today. And the guest for today is somebody who I’ve wanted to interview for quite a while now, Frank Kane. Frank is an expert in the space of big data. He worked at Amazon for over a decade and you might actually know him quite well from his courses on Udemy where he’s one of the top instructors in the space of data science and big data. And today’s conversation was very interesting because we approached it from two spaces, from the space of data science and the space of big data. And in this podcast you’ll find out how the two areas have been different but are now slowly but surely converging into something that is very intertwined and why it is important or why it is becoming more and more important for a data scientist to be well adept in the space of big data as well.

Kirill Eremenko: Also in this podcast we will talk about Frank’s background, which was very interesting spending over a decade at Amazon and working on lots of different systems. There you’ll find out very useful tips on recommender systems such as user-based and item-based collaborative filtering as well as other types of recommender systems and where this space of recommender systems is going. So you can probably already tell that this podcast is quite heavy on recommender systems. So if that’s your thing, then this podcast is definitely for you. And you also find out why recommender systems are important across all spaces, not just in retail, so how many different industries can use recommender systems.

Kirill Eremenko: We’ll also touch on singular value decomposition or SVD model based methods, deep learning and Amazon DSSTNE. And finally towards end of this podcast we will talk about hiring. So Frank had a huge say at Amazon on who’s hired and who’s not hired into the teams and he’s got some really exciting tips to share with you on this podcast. So can’t wait for you to check out all the great insights from Frank here. And without further ado, I bring to Frank Kane, one of the top experts and instructors in the space of big data.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen, super excited to have you back here on the show because I’ve got a great colleague of mine and a great online instructor and entrepreneur. On the phone, Frank Kane calling in from Orlando, Florida. Frank, how are you doing today?

Frank Kane: Doing Great. Kirill, how are you?

Kirill Eremenko: Doing well as well. Such an honor to talk to you again. We met at Udemy Live, I think it was last year and had some interesting chats and now we’re here on the podcast. How’s things been for you over the past almost year now?

Frank Kane: Yeah, it’s been going great. Things continued to grow and as I’m sure you know, there seems to be a boundless demand for online education in the fields of data science and machine learning and big data. So we’re all kind of riding that wave.

Kirill Eremenko: Yeah, yeah, for sure. And exciting to see new courses popping up from you. And you mentioned you’re working on some really exciting things right now. What are the courses that you working on right now?

Frank Kane: Well, I just released a update to our Elasticsearch course. So kind of lately I’ve been focusing on the big data side of things and Elasticsearch is a really interesting technology that kind of diverged from its original purpose. That’s kind of that the cool thing about it. So you hear about elastic search and you think it’s just research engines, right? Like powering search on Wikipedia or something. But it’s sort of morphed into this tool for doing large scale data analytics and web log dashboards and things like that. So that’s the latest thing I’ve been up to. Prior to that I released a new course on recommender systems, which my time is something we want to talk about as well.

Kirill Eremenko: Yeah, very cool. And a big shout out goes to Manning Publications for helping us arrange this podcast. And it’s really funny, like you mentioned, they reached out to us to arrange the podcast and to promote your new work while we already knew each other. So like you said, it’s very serendipitous how these things happen sometimes.

Frank Kane: Yeah, I love that word. Serendipitous. And I mean, that’s a big part of what we do in recommender systems too, is what we call serendipitous discovery. This is like a serendipitous connection, small world kind of a thing.

Kirill Eremenko: Awesome. Awesome. Okay. So yeah, we’ve got so much to talk about. You have such a broad, I mean, such an interesting career path with your time at Amazon and how you moved to courses. So to kick us off for, I’m sure a lot of our listeners, those of you who take my courses on Udemy or SuperDataScience or those of you who take Frank’s courses, there’s a huge overlap in the sense like there’s a lot of people who already know you, but for somebody who doesn’t know you or doesn’t know you well, give us a quick rundown, who is Frank Kane and what has your career being like? Where has it taken you?

Frank Kane: Yeah, man, it’s a long story. I kind of started off as a software engineer in the video game development industry of all things. And from that, I went on to developing flight simulators and one day I got a call out of the blue from amazon.com in Seattle. And they said, “Hey, you know, we’re looking for good engineers. Do you want to do a phone interview?” I’m like, “Sure.” And next thing I knew I was moving to Seattle. Right? And they hired me into their personalization department and that’s basically what we call recommender systems today. So this is back in 2003, I think. So real early days of this field and we didn’t even call it-

Kirill Eremenko: Yeah. Data science didn’t even exist back then.

Frank Kane: That’s exactly what I was going to say. That wasn’t even a thing. That term wasn’t even coined yet, but we were doing it. It was kind of a-

Kirill Eremenko: Yeah, the minor seventh year of data science.

Frank Kane: Yeah. And we were kind of inventing it as we went. Right?

Kirill Eremenko: Yeah.

Frank Kane: So it was exciting to be a part of that. And yeah, I stuck it out at Amazon for 10 years, almost 10 years anyway.

Kirill Eremenko: Wow.

Frank Kane: And work my way up from software engineer to a senior manager. And by the end of my career there, I was actually running the engineering department of IMDB.com, which is a subsidiary of Amazon. So that was fun. It’s a big movie website if you’re not familiar with it. But yeah, after 10 years, it was time for something new. My family was itching to get out of the rainy environment of Seattle. So we decided to make a go of it on our own and packed up and moved down here to Orlando and been working for myself ever since.

Kirill Eremenko: Yeah. Orlando is great. Right? I was there once and you guys have Universal Studios parks, theme parks there, right?

Frank Kane: Yeah. Universal, Disney World, Sea world. It’s a definitely a fun place to be, especially if you have kids.

Kirill Eremenko: That’s awesome. How many kids do you have?

Frank Kane: Two daughters, they’re both grown up now, but when we moved here, they were still young enough to enjoy it. So it’s been fun.

Kirill Eremenko: That’s awesome. Okay. And so 10 years in Amazon, amazing. Really, really cool. And you moved there from a software engineer or senior manager and then managing a whole department at IMDB. How was that like? How was it like working at Amazon?

Frank Kane: It was exciting. I mean, the thing that I love the most about it was that you’re always surrounded by really smart people and you’re never going to have a problem finding people that are smarter than you to learn from. Right? So a lot of people say that if you’re not learning, you’re in the wrong job, right?

Kirill Eremenko: That’s true.

Frank Kane: So you’re always learning at Amazon because they’re just so picky about who they hire. And there were just some amazing people there that you can learn new techniques, new ways of thinking from, and not just in engineering too, right? Also from the business side, just being able to sort of absorb how Jeff Bezos thinks in itself is hugely valuable as well. Right? So it was tricky sometimes.

Kirill Eremenko: Got you. Did you ever get to meet him?

Frank Kane: Yeah, yeah, quite a bit. I mean, back then Amazon was a much smaller company than it is today. So we were all in the same building and you’d to find yourself in the men’s room next to him for all you knew. But yeah, I had a lot of meetings with them and got to talk to him quite a bit actually.

Kirill Eremenko: What was he like as a person?

Frank Kane: He’s intense, but super smart. Definitely the smartest guy that I’ve ever met in my life and that’s saying a lot. But yeah, just his ability to sort of analyze any situation and just be right about it really quickly is pretty admirable.

Kirill Eremenko: That’s awesome. That’s fantastic. All right. And so then you moved to Orlando and you founded Sundog software. Why the name Sundog?

Frank Kane: Oh, that’s a long story. So it actually has nothing at all to do with data mining, sorry data science or machine learning. After I left Amazon, I had a noncompete agreement like a lot of people do, so I couldn’t really do anything directly related to what I was doing an Amazon. So instead I got into the field of visual simulation, basically making a three-d simulations of clouds and weather and oceans for simulation and training products. So that’s where Sundog Software came from. A Sundog, if you don’t know, it’s actually a atmospheric effect that is like a rainbow on either side of the sun, under certain conditions. So since I was building software that stimulates the sky, we kind of drew our name from that because it was basically the only thing that wasn’t trademarked yet. So that is the genesis of the Sundog. It wasn’t actually named after a dog.

Kirill Eremenko: Got you. Okay. And so you were providing, creating this software for simulation and how did that morph into online education? I’m always curious about these stories, because so far nothing in your story even flagged that you are going to be a super successful online instructor. When did that transition happen?

Frank Kane: Yeah, I didn’t see it coming either. So I mean, how did it go down? Basically, after I quit Amazon and decided to go on my own, I was kind of freaking out. Right? Because I left behind these hugely valuable stock options and stuff and I came down here with enough money to get by for a while, but I was still pretty nervous about it. Right?

Kirill Eremenko: Yeah.

Frank Kane: If you’ve never been self-employed before, it’s a very scary thing to jump into. So I started doing some freelance work on the side to sort of supplement what I was getting from selling my own software that I had written. And one of those freelance gigs was actually doing curriculum development for a company called General Assembly in New York City. So they were looking for someone to put together a data science curriculum for an in-person training class that they were putting together. So I did that and somehow, because I had this Amazon pedigree, they plastered my face all over their websites saying, “This course was developed by an Amazon guy.” And basically, so what happened then was someone from Udemy was trying to recruit new instructors in the field of machine learning and data science. And they somehow found me spelunking on the Internet and gave me a call out of the blue and said, “Hey Frank, we’re looking for instructors on Udemy to teach big data and data science topics, want to give it a shot?” And I’m like, “Oh, why not? How hard can it be?” Right?

Kirill Eremenko: Yeah.

Frank Kane: Little did I know. It’s actually really hard. But yeah, that’s kind of how my first online course came to be. They reached out to me and I said, “Well, let’s give it a shot.” And the funny thing is, the first course that I made was really kind of a flop. The first month that we put it out, it made like 200 bucks or something. I’m like, “Well, all that crap.” Well, we tried. But after putting in so much effort into a course, I mean, as you know, it takes many months to put all of these things together, right?

Kirill Eremenko: Yeah.

Frank Kane: I didn’t want to give up on it that soon, right? So I’m like, “Fine, I’ll try making another course and see if I can sort of like build up on this and not give up quite yet.” And as a result of that, things started to actually take off. So it was just sort of a hockey stick of growth after that for a few years where you kind of have this compound interest effect where you make one good course and the students from that course, there are people that you can sell your next course too and so on and so forth. And you just keep building upon that audience. Right? So that’s kind of how it all snowballed.

Kirill Eremenko: Very, very interesting. Yeah, totally, totally can relate to that story. It’s [inaudible 00:14:55] but I guess as long as you have that inner drive or you get this feeling of not just accomplishment but fulfillment when somebody takes your course and feels that they’ve learned something and that they can now use these skills and especially if they tell you about it, if they say, “Hey, Kirill or Frank, I took your course and I feel empowered to do something in my job.” Or “I actually already did something with that knowledge and I finished the project, I got a promotion or I helped a colleague learn”. It really gives you that additional inspiration to keep moving forward and not to give up. Would you say that you get that feelings as well?

Frank Kane: Oh yeah. There’s so much to keep you motivated, right? I mean, like you said, just that positive feedback of how you’re actually changing people’s lives in a positive way. I mean, what’s not to love about that? Linkedin has been great for that. Right? Like I’ll see people posting online, “Hey, I actually got this certification because of you or I got this job because of you, or thanks for your career advice on getting interviewed at Amazon. Thanks to you, I actually got a job.” I’m like, “That’s awesome.” Everyone wants to make the world a better place. Right?

Kirill Eremenko: Yeah.

Frank Kane: Yeah, so that’s awesome. And also just the scope of the impact, right? I mean, I had no idea there was so huge of an audience for this stuff out there in the world. And if you think about how many football stadiums you’d have to fill out to put all of our students in them at one time, it’s some crazy number, right? It’s just hard to visualize even.

Kirill Eremenko: That’s crazy. Yeah, I’m looking at your Udemy profile. You have 248,000 students for those out there, it’s like almost a quarter of a million students. That’s crazy, one fit-

Frank Kane: Yeah, that’s not new to me.

Kirill Eremenko: Yeah it’s just-

Frank Kane: Then there’s also Manning and all the other platforms that we’re on too. So it adds up to quite a bit.

Kirill Eremenko: Yeah, for sure. And so what I wanted to touch on here is that like our area of expertise and area of where we teach overlaps to some extent, but it’s also slightly different. So you mostly teach in the space of big data, plus how it overlaps with data science, machine learning. And that’s what I wanted to touch on. With the passing years since data science came around, big data up here. These two have been kind of close and also the relationship between them has been also developing over the years. So can you tell us a bit about that? How has the relationship between big data, data science and machine learning on the other hand, how has it developed over the past couple years and what is it like now?

Frank Kane: Yeah, I mean, kind of my perception of it is that they start off going in kind of their own directions, right? And now they’re kind of all starting to converge it seems. I mean, that’s kind of my high level take of it. So originally when we started teaching data science, it was all about messing around with Jupyter Notebook on your own individual PC somewhere or individual Linux host or whatever. And it’s messing around with smaller data sets. And to be fair, you can analyze a lot of data on one machine if it’s a beefy enough machine. Then we have machine learning, which is off playing around with the neural networks and stuff these days. And you can still do quite a bit on a single GPU or a machine with multiple GPUs.

Frank Kane: And then almost orthogonally, we have this world of big data where people are using things like Hadoop-based platforms like Cloudera or Apache Spark and things like that to distribute the processing of data at massive scale. And there’s been these efforts to kind of slap one on top of another. Spark has their Spark MLlib library, for doing machine learning on spark. Obviously, tools like Cloudera have tools for doing large scale data analysis using their platforms. But it’s only recently I think that it’s starting to converge. Right? We have things the data pipeline on… Sorry, the deep learning pipeline on Spark coming out where you can actually do large scale machine learning and deep learning on Apache Spark. So that’s coming together.

Frank Kane: We have TensorFlow being distributed on clusters, that’s coming together. So it seems like there’s still like 10 different ways to do everything, but at least we’re starting to all come together at the same thing, that it’s not just about data science, it’s not just about machine learning, it’s not just about big data, it’s about doing machine learning in a big data environment.

Kirill Eremenko: Right. Why do you think now, why is the time now that they’re converging?

Frank Kane: That’s a great question. I mean, I think it’s just sort of a natural process that’s happening. There’s definitely a lot of interest in market forces, that are behind this. But really I think it’s just that these technologies have all been maturing at a similar rate and now they’re all at a point where they’re like, “Okay, how do we all get together and do something even better together?” Right?

Kirill Eremenko: Yeah. Okay. Fair enough. Fair enough. What has been your favorite course to teach? What has been your favorite topic to share with the world?

Frank Kane: Oh, I always have a soft spot for recommender systems because that was kind of what I specialized in that my time at Amazon. So if I had to choose one child that I love the most, it would probably be my recommender system course.

Kirill Eremenko: Okay. Got you. So you did recommender system at Amazon, are you able to tell us a bit about that? To go into a bit of detail about sharing any IP or sensitive information?

Frank Kane: Yeah, I mean it was seven years ago when I left Amazon. So everything that I can tell you is well beyond the range of their nondisclosure agreements because it’s history at this point. Right? But there’s still some good stories about it that I-

Kirill Eremenko: Good, let’s talk about it. Sounds like it’s still a very relevant and really cool topic and a lot of companies really enhance their sales. Netflix, Amazon, online marketplaces, they… Even Udemy itself, right? You take a course and then you get recommended to other courses on what to take. So please do tell us about that. What was your role at Amazon? I mean, what kind of recommender systems were you exploring back then?

Frank Kane: Yeah, I mean, let’s see. I mean, originally I was working on things like people who bought also bought… I actually ran the team for that for awhile. So if you’re shopping on amazon.com and you’re looking at specific items, there’ll be a little widget that says, “People who bought this also bought this.” Or people who viewed this also bought this. Or something along those lines. So that was kind of like the heart of the whole thing and this is all published publicly, so I can definitely talk about it. So kind of like the main component of doing any recommender system back in those days, was this item to item similarities matrix. Right? So we would take these vectors of everybody that bought a given item, right? And make this two-d matrix, I tried to find similarity distances between every item based on what customers they had in common. And by doing that you can create a database. It’s basically like, “Okay, here’s item ID, whatever corresponds to this book, and it is similar to this list of other books sorted in order by similarity.” Right?

Kirill Eremenko: Okay. Could you tell us a bit more about that. So how is the vector created? What are the dimensions of this vector?

Frank Kane: Well, it’s a very, very sparsely populated matrix, right? So the main problem of recommender systems is that most people did not buy most items. So a given person only bought a very, very small percentage of everything that Amazon sells. Right?

Kirill Eremenko: Yeah.

Frank Kane: So, basically these are all sparse vectors that you think of as a matrix, but when you actually get down to the code of actually constructing that matrix it’s not really a two-d matrix. Basically you have customers on one dimension and items on the other dimension, right? And you just try to find how it’s all interrelated.

Kirill Eremenko: Got you. Okay.

Frank Kane: Yeah, I mean that’s kind of like the building block for doing other cool stuff because once you know what items are similar to other items, first of all, that’s a very permanent relationship, relatively speaking. So a math book will always be similar to another math book, is how we put it. These relationships aren’t going to change overnight. So you can get away with computing that relatively and frequently. Right? And once you have that, you can actually do things like build up personalized recommendations by saying, “Okay, here’s the vector of everything that I personally have liked either by buying it or looking at it or reading it or something.” Some indication of interest, I can go out and get all the similar items that are similar to everything that I expressed interest in, de-duplicate those, score them and that becomes your personalized recommendations. So that’s what we call item-based collaborative filtering, basically.

Kirill Eremenko: Okay. Got you. So that was back then. How have recommender systems progressed now, in the courses for instance, you teach these days, how are they different?

Frank Kane: Yeah, I mean obviously the thing that’s changed everything has been the advent of deep learning, right? So, now the modern way of doing it is to actually build a big deep neural network. And again the challenge there is getting a neural network to work with sparse data. But Amazon for one has cracked that nut. They have a system they’ve published called DSSTNE. You can find it on Github, that does that and it works really, really well. I was actually very impressed with the results. But it’s still hard to beat the old school way of doing it. Item-based collaborative filtering still produces great results. So while it is true that a deep neural network can be a great tool for solving just about any machine learning problem you can dream up, these simpler approaches still give it a run for its money.

Kirill Eremenko: Yeah. And also they’re more cost effective I guess in terms of computing power and time, to create and things like that.

Frank Kane: Oh, absolutely. I mean, you’d be amazed how little computing power we needed to actually produce those item to item similarities, because it was all very highly optimized code written in C. It was really, really tight. But we used to really… A very Amazonian way of thinking is to really favor simple solutions over more complex solutions given the choice. Right? So given a solution that will run on one system versus one that’s going to run on a hundred, if the end result to the customer is going to be the same, we’re going to take the simpler solution because it’s going to be easier to maintain.

Kirill Eremenko: Yeah. Makes sense. So is that just a question of how, it’s not a same result if it’s only 80% of the original results and that’s the question. Do you use the simplest solution and get 80% of the results or do you go for the more complex one, aim for the 100% of the result? That’s kind of the trade off, but probably-

Frank Kane: Yeah. I mean we definitely spent a lot of time trying to squeeze every percentage of approving that we could get out of it. Because it was such a huge lever. Right? I mean, you can imagine, I think it’s been published that like 20% of Amazon sales was attributed to personalization at that time. And that’s not really the real number, which I can’t tell you the real number, but that’s the one that people talked about and it’s not that far off.

Kirill Eremenko: Yeah. It’s crazy.

Frank Kane: But, yeah, when you have a lever that big, you think about how many billions of dollars Amazon makes every month, about one percent improvement is a really big deal. Right? So if it really came down to a more complicated solution will give us a 1% boost in sales and yeah, we would do that. But generally speaking, you didn’t have to, you know what I mean? The algorithms themselves can still be relatively simple and you can still have a simple framework for blending different algorithms together. So there are ways of experimenting and trying simple changes and simple solutions that will achieve those results.

Kirill Eremenko: Got you. And what would you say to somebody who first of all, do you think any kind of business can benefit from a recommender system or is it only just B2C?

Frank Kane: Ooh, well, I wouldn’t say any business can, but it’s obviously a useful thing. I mean, it depends mainly on the size of your catalog, right? So if you like the New York Times and you have like a jillion articles and somehow they’re all still timely, which isn’t actually the case. Great, a recommender system might help people find content that’s relevant to their interests. Maybe a magazine would be a more of a relevant example there, but if you’re just running a little like mom and pop ecommerce store where you’re selling five greeting cards that you’ve made by hand, a recommender system isn’t going to be helpful. Right? You’d be better served just like manually, creating those pairings, based on your human intuition than by trying to get built some algorithms. It’s not going to have enough data to work with in the first place.

Kirill Eremenko: Okay. Yeah, I know. It makes sense, makes total sense. Tell us a bit about the difference between when you have a recommender system that looks at content, like for instance, you as an individual, you consume certain content or you purchase certain items and then it looks at similarities between items to recommend to you versus recommender systems that look at your similarities as an individual to other individuals. And then it looks like what purchases they made, what content they consume and makes recommendations that way.

Frank Kane: Yeah, I mean that’s basically what we call a user-based similar item. User base collaborative filtering as opposed to item based collaborative filtering. So the idea of user is collaborative filtering is that instead of finding similar items, you find similar users by flipping the problem on its head basically. And then you recommend stuff that the similar users like that you didn’t indicate an interest in yet. That works too. The problem is that people are more fickle than things, right? So before I said that a math book will always be similar to a math book. But Kirill might not always be similar to Frank. I might go off and get interested in astronomy tomorrow and say, forget about all these data science stuff.

Kirill Eremenko: Which you are, which you are interested in astronomy, which is really cool.

Frank Kane: That is my latest side hobby for sure. But still sticking with the big data stuff for now. That’s my day job.

Kirill Eremenko: Okay. And so people are more fickle and so therefore it’s harder to create those recommender systems, is that what you’re saying?

Frank Kane: I wouldn’t say it’s harder. It’s actually exactly the same technique. Just flipping the dimensions, one for the other, but if the results aren’t going to be as good, I would pause it.

Kirill Eremenko: Got you. Okay. Is there any other types of recommender system, in addition to the user base and item-based collaborative filtering, more innovative or newer experimental types of recommender systems that you can share with us?

Frank Kane: Yeah, definitely. Before I forget though, on the previous point, another downside of user based, collaborative filtering, is that there’s usually more users than things in a given website. So you have a much greater computing requirements to actually compute user’s similarities and items similarities.

Kirill Eremenko: Interesting. I wouldn’t say that about Amazon though. They have so many things that they sell. I guess it’s a bit debatable question-

Frank Kane: They do. Yeah, I mean, I actually don’t know what their current numbers are, but you’re right. It’s probably not that far off at this point. They sell everything you can imagine they can sell.

Kirill Eremenko: That’s crazy-

Frank Kane: I think there’s still more people interested in things that they can buy-

Kirill Eremenko: And there’s new things that are popping up. For instance, I’m here, I’m in Bali right now and people use this thing called Ali Express from China. I’m not sure if it’s related to Ali Baba or not, but then there’s also Alibaba, there’s Ebay and Amazon seems to be… I was thinking about this the other day, Amazon seems to be very dominant in the US, Australia, now they are in Australia as well. Some European countries, but more in the Asia space, in the Asian market, something that people don’t recognize or realize that there’s these other players that are gaining so much momentum that are growing so fast that there’s some countries here, where the people haven’t even heard of Amazon and yet they’re shopping online, buying everything. Even in China, what’s it called? That platform WeChat. I think if you can get anything on WeChat. You can get a car wash through WeChat, it’s ridiculous. It’s crazy how big these things have gotten and yet we just simply don’t hear about them for now until they come and start disrupting the normal world that we are used to living in.

Frank Kane: Absolutely. I mean, right when I left Amazon was when they were trying to get into the Asian market a little bit more. And I mean, it’s been a real challenge for pretty much every US tech company that I can think of. Right? I mean, it’s just a completely different political climate, completely different culture. And unless you partner with a big company that’s out there existing already, which is hard to do by the way, it’s hard to break in there for sure.

Kirill Eremenko: Yeah, yeah, for sure. All right. What about the different recommender systems like new, innovative?

Frank Kane: Yeah. I mean, kind of the thing that evolved after collaborative filtering was what we call model-based methods. So basically matrix factorization. So the idea is if you can think of the recommendation problem as multiplying two matrices together, that’s basically like your matrix of interests as an individual by some matrix that ties those interests to other things. That’s just another way of approaching the problem basically. So we have things like a SVM that are used for that, SVD rather. SVD plus plus is a specific variation on SVD that’s used for recommender systems that has really good results.

Kirill Eremenko: What does SVD stand for?

Frank Kane: Singular value decomposition. So basically it’s a matrix factorization technique. But yeah, I mean that was basically one of the winning approaches and what they call the Netflix Prize a while ago, I don’t know if you’ve ever heard of that one.

Kirill Eremenko: Yeah.

Frank Kane: So Netflix put out this, I think it was a $1,000,000 bounty was it? If I remember right. For anyone that could like make a recommender system that was, I’d have to look at the number, but I think it was 10% better than what they had measured by RMSC score. And as I recall the winning entry actually used SVD as part of their solution. It was actually more of a hybrid approach. But that was part of how they did it.

Kirill Eremenko: Yeah.

Frank Kane: So that was kind of like the next generation of recommender algorithms at that point. And after that we entered the age of deep learning, right? So now it’s all about, “How do I use a neural network to solve this problem?” And that’s where we get into things like a Amazon DSSTNE. And that’s also how companies like YouTube are doing as well. They published a really interesting paper that details exactly how they’re doing their recommendations, using a deep neural network.

Kirill Eremenko: Why do you think they’re not afraid to disclose their intellectual property like that?

Frank Kane: Well, I mean, they’re part of Google and Google’s always kind of had this open academia-friendly stance, right? So I think it’s mostly just a company culture thing. Plus they realize that nobody has their data. So one thing that I learned at Amazon is you can have… The quality of your data matters way more than the quality of your algorithm. In Amazon, if you know everything that everybody’s actually bought that they’ve actually spent their money on, you’re not going to get better data on their actual interest in that. Right? So having that powerful interest data to start with, means that you can do pretty much anything on the algorithm side and still get awesome results. And I think the same is probably true of YouTube as well. They actually know if you’re actually watching a video and for how long did you actually stick with it all the way through and they can use that view data to actually figure out what you’re actually interested in. Right?

Kirill Eremenko: Yeah. This ties into an interesting question, that value… And this is for real business owners out there and for heads of departments and executives. The value is not in your algorithms, the value is in your data.

Frank Kane: Right.

Kirill Eremenko: I find, still to this day, companies sometimes sit there and think that they’re going to create some miraculous world changing algorithm. They’re super protective of it. They either patent it, or in most cases they keep it as, from what I understand that they keep it as a trade secret so that nobody even [inaudible 00:34:50] get access to it. But realistically, we live in a world where Google publishes more than one research paper per day about machine learning, AI, computer vision deep learning. So per day, That’s crazy. So there’s no way, and that’s all open sources. So Python-based, predominantly TensorFlow or PyTorch for Facebook. Those things are open source. You can go and download them and there’s no way you’re going to beat Google.

Kirill Eremenko: There’s no way you’re going to invent something that’s so bespoke that Google’s never going to be able to create that on their side. And it’s just going to take so much resources and effort from the perspective of a small, medium, even large business. It’s just much easier to go out there, read these research papers, track what you need, apply it. It doesn’t matter that it’s open source because at the end of the day, the value’s not an algorithm, the value is in the data that you have.

Frank Kane: Absolutely. I think another motivation for them to share this research is from a recruiting standpoint too, right? They want to get smart engineers out there, learning about how to use their systems and get excited about them and hopefully they can recruit them to work at Google. I mean, that’s ultimately their goal. I mean, that’s really the number one concern of these tech companies. They just cannot hire enough experts in these fields to meet their demand.

Kirill Eremenko: Yeah, yeah, totally. And for recommender systems, we’ve seen this evolution that you kindly walked us through on how they’ve changed. What I’m noticing is that they’re getting really good. They’re getting crazy. As a user, I go on Netflix and I… Something pops up and I’m like, “Whoa, that’s really cool. I didn’t even know that existed. So glad that I found out about this.” Or I give this example, I think couple podcasts ago where my mom has a special relationship with YouTube that she just doesn’t even search for videos herself. She just relies on youtube to recommend things. And then she already knows she’s going to love it. And she just goes with the ball and just watches whatever recommendation comes up. And so whenever somebody else touches her iPad, she gets a bit protective of it because because she doesn’t want-

Frank Kane: Yes, I have a feeling.

Kirill Eremenko: My dad’s interest in her youtube because that’s going to mess with her recommender system. So examples like that illustrate that they’ve gotten really good, very powerful and they know sometimes us better than ourselves. What kind of future do you see for recommender systems? Where’s this whole space going? If it’s already that good, what can we expect to appear next?

Frank Kane: Well I think you’re right in that the algorithms aren’t going to get that much more better. Already I would say that the difference in quality between deep learning systems and some of the older systems or matrix factorization are pretty minimal, quite honestly. Really comes down to the quality of the data, like you were saying. So the big leap forward is going to be as people amass more and more of this data to learn more and more about you. But now we’re like starting to get into this world of ethics, right? And privacy. So it’s going to be interesting times for sure. Because at the same time, we don’t want these… You don’t want YouTube to know everything about you necessarily, but you still want good recommendations from YouTube. Right? You can’t have both.

Kirill Eremenko: Yeah.

Frank Kane: So, I’m not really sure how that’s going to play out right now. It’s an interesting time for that.

Kirill Eremenko: What do you think of this notion? I was discussing this with somebody, I think a few podcasts ago as well, but I’d love your opinion on this, that 100 years from now, privacy will be such a foreign concept. People will be looking back on it and be just thinking, “Why was this even a thing? What did privacy even mean? What’s the definition of privacy?” Because we’re so rapidly moving to a world where people, especially millennials, are trading in their privacy and anything, any information they have on themselves, trading it in for better services, better products, better user experiences. And that’s not even a question to them.

Kirill Eremenko: So this whole privacy issue, from my conversations, I see it as, I’m more of a… My generation, older generations that that’s a concern for us. But the new generations are coming around, they don’t really worry about that stuff so much. So right now, yes, there’s some legal and struggles and barriers that are being put in place, but there is a theory that in 100 years from now there will be no such thing and everything will be completely publicly available, fully exposed. What do you think?

Frank Kane: Yeah, I mean I think like you said, the younger generations are already there. They don’t really have a concept of what privacy even means, right? At least online. They definitely want physical privacy still, but online, it’s not even a thing. It’s not a concept. What does that even mean to them? I don’t know. So I think we’re already there to some extent, honestly. The question is, what do we do with all that information that people have given up? And if government started abusing that information, to persecute people or something, then people are going to care about privacy real fast. Hopefully that won’t happen. But the other thing too is, we’re using all this personal information to… This is a very real problem right now, filter bubbles, trying to create these echo chambers online. Where we’re using a lot of the same technologies that we developed way back in the day to try to recommend better books to you to figure out what are your interests personally and how do we connect you with more news and information and people and viewpoints that are consistent with what you already like.

Frank Kane: This is how you end up in these online bubbles, right? And that’s very much a pressing issue right now. And you have people quitting Facebook because they don’t want any more part of it. So that’s what I’m kind of talking about when I say, it’ll be interesting to see where this all goes. I mean, I myself quit Facebook in January because of this stuff and I know a lot of my friends have as well. So as for millennial they-

Kirill Eremenko: Tell a bit more about that, I didn’t know you worked at Facebook.

Frank Kane: Oh, no. I mean I meant I quit Facebook as a user.

Kirill Eremenko: Oh as a user. Okay, yeah.

Frank Kane: Yeah. I deleted my account.

Kirill Eremenko: Yeah. Got you. No, yeah, definitely some of these things that are very controversial. Yeah, it’ll be interesting to see where it goes. But one question that you might be able to help guide our audience in the right direction is, if somebody wants to get into the space of recommender systems, right? There’s lots of spaces in data science, machine learning, deep learning that are, sorry and big data that are very exciting. But I guess recommender systems is one of those that is kind of on the verge of these converging or on the overlap of these converging areas that we talked about of big data. In recommender systems, there’s often these big data, there’s a lot of data.

Kirill Eremenko: At the same time machine learning and data science, it could be an interesting place for people to dive into if they want to be in between these fields. So what would your advice be for somebody who wants to get into recommender systems, but doesn’t have much experience in the space? Zero to not much. Where should they start? What should they look into? And in general, how would you recommend going about getting into this space of recommender systems?

Frank Kane: Well, I would say first and foremost to be a good software developer. When I was at Amazon, we hired software development engineers primarily. We didn’t really care what their specialization was, we just cared that they were smart enough to write code and do it well. And we figured if you can do that, you can learn anything because this stuff changes every freaking day. Right? So we didn’t really focus on hiring people for specific skills. Like in my case, they hired a guy that did visual stimulation in video games and just taught him how to do this stuff when he came in. So step one is to be a good software engineer and maybe that means Python, if you want to start off easy, that’s certainly, it’s still a great choice, but just get proficient in some sort of programming, if you aren’t already.

Frank Kane: Beyond that, you’re going to need some background in linear Algebra. To understand the algorithms, you need to have at least that level of mathematical background to understand what’s going on. Right? And from there you can start to actually learn the actual algorithms and techniques, either from my course or a book or however you want to do it, or online resources. Everyone learns different ways. That’s cool. And then you can actually start playing around with small datasets, on your own PC. One that I like to use is called the Movie Lens Dataset. I don’t know if you know that one. Basically they have… Really? Yeah, go to a grouplens.org, I think it is. And they have this a free Dataset of movie ratings that I love to play with, probably because I used to work at IMDB. So I have a soft spot for movies but they have different sizes you can mess with.

Frank Kane: So they have like 100,000 reading data set and then they have a 20 million dataset. And so you can work your way up to bigger and bigger data, but you can start just playing around, on small datasets, get a sense of how these algorithms work, experiment with them, try different ways of doing it. That’s really what it’s all about. Just experimenting with different ideas and different tweaks and different parameter tuning and well, hyper parameter tuning I guess, that’s the technical term for it all these days. And then you can think about scaling it up, right?

Kirill Eremenko: Yeah.

Frank Kane: So then you can start to think about how do I blend this with tools like Apache Spark. If I’m going to be using a neural networks, can I use TensorFlow to distribute this across a cluster? That would be kind of the final stage. And once you’re at that stage, I would say, start messing around and do some freelance work. Prove that you can actually do this and build something. And at that point you will probably be able to find a job in this field.

Kirill Eremenko: So the jobs are there, people want to hire people for recommender systems?

Frank Kane: Yeah, I mean that’s just central to a lot of the big technical companies out there, right? I mean, Amazon, we talked about huge part of their revenue, YouTube huge part of their views. Netflix, it’s what they’re all about. Their entire company is about recommendations, fundamentally. They’re just built around the whole thing. And a lot of people don’t realize that. Yes, I mean, deep neural networks are hot, but really it’s recommender systems that these companies are built around and they cannot find enough people who know this stuff.

Kirill Eremenko: Yeah, no, that’s really great advice. Thank you so much. At this stage I wanted to shift gears a little bit and talk about what you mentioned just before we started the podcast that at Amazon you were part of the hiring and recruiting process. We’d love to learn a bit more about that and maybe there’s some tips and tricks you can share for people to get hired at Amazon or maybe even beyond that.

Frank Kane: Yeah, definitely. Yeah, so part of my duties at Amazon is I was what they called a bar raiser. And this is basically a role where you spend a lot of your time doing interviews, both phone interviews and in-person interviews, mostly in-person interviews. So whenever there’s an interview loop at Amazon there was one person on that loop called a bar raiser that interviews you and it’s not necessarily someone that’s in the team you’re interviewing with or even the same department.

Frank Kane: Their role is to sort of make sure that Amazon standards for hiring are being applied consistently across the entire company. So I was that guy. So it meant that I had veto authority over every hire that came across my desk basically. And I led all the hiring discussions where we decided whether or not to hire someone. Right? So a lot of influence there. And as a result, I ended up interviewing over a thousand people, I think while I was there or some crazy number.

Kirill Eremenko: Wow.

Frank Kane: Yeah. So as far as tips go for getting into Amazon, my number one tip is to always think in terms of the customer. It’s not just lip service when Amazon says that they’re customer focused, it really does permeate their entire culture. And anytime that you can tie a question or a problem that you solved from the viewpoint of the customer, you’re going to get major brownie points. All right. So anytime you’re asked to design a system, work backwards from the customer experience, start with what will the customer get out of this system? What did they want to see? What are their requirements? How fast does it need to be for them? Right? What results did they want to see? And then figure out what technology you’d have to build to deliver that experience. Don’t start from the bottom, don’t say, “I know this cool algorithm and I would use this cool algorithm and build it out and hopefully customers would like it.” That’s the wrong answer. So, always start with a customer experiences is tip number one.

Kirill Eremenko: Great tip, great tip. What else?

Frank Kane: Well, you can go online and look for Amazon’s leadership principles. And customer obsession is number one, but there’s others as well. And I would just encourage you to familiarize yourself with all of those leadership principles. The other ones are ownership, invent and simplify, write lot, learn and be curious, insist on high standards, think big and really internalize what these all mean and come up with stories that you can talk about where you’ve exhibited these qualities on your own. Because again, you’re going to get a lot of interviews with managers and a bar raisers like myself who aren’t necessarily part of the team that you’re interviewing, you’re going to be put on. And these are the things they’re really looking for. Do you fit with Amazon’s culture and way of thinking?

Frank Kane: Obviously you need to be technically competent as well. It’s going to be a very long and grueling day there writing code on the whiteboard and solving design problems on the whiteboard. So by all means, you have to be ready to do that. You have to have really strong coding skills, really strong systems design skills. That’s going to be the case for any interview. But what’s different about Amazon is they actually care about what they say about their values and principles that they live by. And you need to demonstrate that yourself.

Kirill Eremenko: Very interesting. What would you say has been the biggest mistake that you’ve seen recurring on the entries that people make?

Frank Kane: Oh man, you’d be amazed. It’s just like not knowing how to code.

Kirill Eremenko: No way.

Frank Kane: Yeah. You’d be amazed how many, especially in phone interviews, usually they get weeded out by the time they actually come in the house.

Kirill Eremenko: Yeah.

Frank Kane: But we used to have a… Have you heard of fizzbuzz?

Kirill Eremenko: Nope.

Frank Kane: Okay. This is one of the interviews questions that we use for screening out people and it’s widely known, so I’m not giving away anything secret here. The problem is this, iterate through the numbers one through 100 and write code that if it’s an even number of print fizz, and if it’s an odd number of print buzz or something like that. I forgot the exact structure of the problem, but it’s just that simple, right?

Kirill Eremenko: There’s no catch trick?

Frank Kane: No, that’s it.

Kirill Eremenko: Okay.

Frank Kane: I’d say about 5% of the people couldn’t do it.

Kirill Eremenko: No Way. That’s like a five minute exercise.

Frank Kane: Yeah, yeah. You’d be amazed. So make sure you can write code guys. That’s my main tip. But beyond that, just make sure you’re well rested. A lot of people come in kind of like low energy because they flew from someplace far away the night before and didn’t have enough coffee or whatever. But you just got to have a lot of stamina to get through the day, if you do come in house. So make sure you’re arrested, drink whatever beverages you want to drink to stay alert and whatever hack you have to do to make sure that you keep your energy level up throughout a very challenging that day.

Kirill Eremenko: Very interesting. So know how to code and keep your energy up. I wasn’t expecting those two tips as the most common mistakes. All right. What would you say is like the biggest, I don’t know biggest advantage of somebody who comes in for an interview if they have this skill or have this experience, or can demonstrate something that… They’re almost right away. Everybody knows, “Okay, this is the person.” Have you ever had that feeling, you see a person, you haven’t interviewed them much, but almost right away you can tell this person is going to make a great addition to the team. We definitely want them on board.

Frank Kane: Ooh. I’m always careful in those situations because sometimes your gut is wrong. Right? I mean, human brains are fickle things as I’m sure you know, now that we know how to stimulate them to some extent. So I’ve been a manager long enough to know that it is very easy to make bad hiring decisions on someone that looked great on paper or came across as very charismatic. Right? You really need to separate that charisma for how are they going to be able to interact with your team? Are they got to be a “team player” That doesn’t have a huge ego to deal with, things like that. So I’ve never been in a situation where like, “Oh my God, I talked to this person once and we absolutely have to hire them right now.” But after two or three interviews, yeah, there’ve definitely been cases where I’m like, “We really got to get this person here. Pull out all the stops, make them an awesome offer. Whatever they want, give them twice that.”

Frank Kane: But when it comes to stock grants and things like that, they often had quite a bit of discretion as to what they can offer people to get people that they really wanted.

Kirill Eremenko: Got you. And what has been the most common trigger for you to use your veto power and not hire somebody that maybe even others thought was kind of those?

Frank Kane: Yeah, I mean after doing that many interviews, you kind of like learn what a good engineer looks like. And I guess the thing that would probably give me the most pause would be someone who pretended that they had more experience and knowledge than they really did. They’re kind of a little bit deceptive on their resume. I can uncover that pretty quickly. That’s not cool. So don’t do that. Or someone who’s coding ability at that thought just wasn’t up to stuff. Right?

Kirill Eremenko: Yeah.

Frank Kane: The main problem that… The reason that the bar raiser existed is because there’s huge pressure to hire at Amazon or any big technology company because there’s just not enough good engineers in the world to go around. And a lot of these teams are really desperate to fill positions. That is their number one goal is to just fill seats within their team and get more engineers working on whatever they have to deliver. And my role is to make sure that they don’t get so desperate that they lowered their standards. Right? So that’s what that’s all about.

Kirill Eremenko: It’s interesting, isn’t it? That there’s so many, as you say, seats in the companies and they’re just so eager to hire people and on the other hand, we have such a huge pool of candidates, so many data scientists, engineers out there who want to get hired. It’s just like the bottleneck is that weeding out process and finding the talented people, which there’s plenty of as well, but they’re rare, right? Compared to millions or hundreds of thousands of people who want to get hired. Those hundreds or dozens of people that are really talented and still also want to get hired. They really need to stand out somehow for… If they had a beacon above their head that, “Hey, I’m talented.” You’d hire them in a heart beat. But it’s like it’s not that case.

Kirill Eremenko: You have to go through this process. So is there anything that’s talented people whom I’m assuming many of our… Listening to this podcast or most of the people listening to this podcast are, you care about their careers already by definition because they’ve listening to career advice on these topics. Is there anything that they can do to help recruiters such as yourself or such as who you were back in your past life of Amazon to identify them to make that whole process easier and that match happen faster?

Frank Kane: Yeah, I mean, it’s like you said, you’ve got to build that beacon above your head, right? So here’s the reality of of the situation. Everyone applies to Amazon and Google and all these big companies and they don’t even look at the resumes that are submitted to them because there’s just so many of them. And weeding through the mall is impossible. Instead, they will come to you, right? So you want to make sure that you’ve done something that’s going to catch the attention of a hiring manager or recruiter at the company that you want to get hired at. One way to do that is to know somebody, right? So if you know somebody who already works at the company you want to work with, oftentimes they get referral bonuses, if someone that they recommend gets hired. And that’s probably the best way to get your foot in the door.

Frank Kane: So really scour your social network, scour Linkedin, see if you know anybody or if you have a friend of a friend at the company that you want to get into because that might be your best way to get noticed. But beyond that, if you don’t, make sure you’re winning coding competitions. Make sure you have stuff on Github that people can find, get published. Put out a blog, make sure you’re on LinkedIn and having the right keywords that they’re looking for there. Because the recruiters are looking for you. They’re not waiting for you to come to them. Right? Beyond that, I mean, obviously the more traditional channels like college recruiting is an important source of new hires for these companies as well.

Frank Kane: Career fairs and stuff at colleges or obviously, if you graduated from Stanford, you’re probably going to get a call from all of these people, right? But not everybody can afford to go to Stanford. So for everyone else, you just have to make sure that your profile stands out online and your accomplishments are easy for them to find.

Kirill Eremenko: Fantastic. And I just want to add to that, that in the process of you putting up all these things online, whether on GitHub, on Medium, blog posts, videos, whatnot, you’re going to make connections, right? People who already at Amazon, they’re not just sitting there and wiggling their funds and just doing Amazon work or whatever other company they’re in. They also go out there and they also read, they also want to know new, what’s been happening in the competition space, what’s new on GitHub, what’s new… a recommender system that somebody is exploring. So inevitably the more stuff you put out there, sooner or later somebody from Amazon’s going to read it and they might ask you a question and then you talk to them and then you can build that connection.

Kirill Eremenko: So you don’t have to just go and put yourself as target. I have to know somebody else. And even if you do like as Frank, which you said, if even if you do this part of just building your online presence, eventually you’ll build these connections in a very natural way. And sooner or later somebody from Amazon or Apple or whoever else you want to get into is going to come across your way. So yeah, these two come hand in hand and there’s a self fulfilling prophecy as long as you invest the time and effort and energy into it.

Frank Kane: Yeah, I agree completely. I mean everybody at these companies are invested in hiring. It’s not just the hiring managers and recruiters and if they come across something you’ve done online and they like it, they very well may reach out to you. So you’re absolutely right.

Kirill Eremenko: Fantastic. Well thanks a lot Frank. We’ve slowly come to the end of this podcast and super pumped about the chat that we had. Before I do let you go, please tell us a couple of places where our listeners, our audience can follow you, get to know you better and see what new things you’ll get up to in the coming months and years.

Frank Kane: Yeah, I mean, if you want to check out what I’m up to, you can head to my website, which is a sundog-education.com. And from there you can follow me on whatever social media you wish. And also you’ll find… we’ve got to give a tip of the hat to Manning Publications at manning.com and you can find my couple of new courses from them under their live video tab there. The Elasticsearch 6 and The Ultimate Introduction to Big Data are found there.

Kirill Eremenko: Fantastic. And is it okay for our audience to connect with you on Linkedin as well?

Frank Kane: Absolutely. The more the merrier. So bring them on.

Kirill Eremenko: Fantastic. Awesome. Well, Frank, thanks so much. One last question I have for you today is what’s a book that you can recommend to our listeners that’s changed your life?

Frank Kane: Ooh, that’s changed my life. The most recent one that I read is a big thick book called… let’s see. I have it right here, Recommender System Handbook. And it’s basically a huge collection of papers from various researchers in the field including Netflix and people like that. So as I was preparing my recommender system course, that was a hugely valuable resource for getting caught up on the current state of the art. And for someone new to the field, I think it’s sort of required reading for figuring out what’s out there and getting a broad lay of the land of the techniques that are being used today.

Kirill Eremenko: Awesome. Is this by Francesco Ricci? [crosstalk 00:58:38] up on Google.

Frank Kane: Yeah, it’s published by Springer.

Kirill Eremenko: Springer, yeah. Published by Springer. Yeah, I found it-

Frank Kane: Yes, I’ll pull it out here at my bookshelf, yeah, Francesco Ricci, that’s right they’re the editors.

Kirill Eremenko: Okay.

Frank Kane: It’s not cheap but it’s worth it.

Kirill Eremenko: Yeah, definitely. Best things in life are sometimes it’s free, sometimes you got to buy them and then they’ll change your life. Okay. On that note, Frank, thanks so much once again for coming on the show and sharing all the insights and knowledge. Really cool chat and yeah, catch you soon. Maybe at Udemy Live this year. You’re going?

Frank Kane: No, I’m not going this year, but definitely next year.

Kirill Eremenko: Okay, no worries. We’ll catch around then. Thanks so much for coming on the show.

Frank Kane: All right, good talking to you.

Kirill Eremenko: Thank you ladies and gentlemen, boys and girls for being part of this conversation. My favorite part was about the convergence of data science and big data. It’s very interesting how these two fields are becoming more and more intertwined. And of course there were plenty of other great and useful insights throughout the podcast. A huge shout out goes to Manning Publications, which are hosting some of Frank’s courses. So you can find Frank either on Manning Publications or on Udemy, and if you haven’t taken any of his courses yet, highly recommend checking them out, especially if you’re interested in getting into the space of big data after today’s podcast.

Kirill Eremenko: As usual, you can get all the show notes at www.superdatascience.com/265 that’s www.www.superdatascience.com/265. There, you’ll find all of the resources, materials that were mentioned on this episode plus the transcript for the episode. And plus of course, any links to Frank’s social media where you can get in touch with him, you can follow his career or simply check out his courses. On that note, thank you so much for being here today. I am very grateful that you’re part of the SuperDataScience podcasts and the SuperDataScience journey, and the community that we’re building. If you don’t, if you’re not aware of yet, then we actually just launched a slack channel for SuperDataScience members.

Kirill Eremenko: So if you’re a member at SuperDataScience, you must have gotten an email. Make sure to join that slack community that we’re building, it’s not just one Slack channel, it’s actually a multitude of Slack channels in a Slack community, where you can chat to each other, to me, to instructors and if you’re not a SuperDataScience member yet, then make sure to check out www.superdatascience.com where we’re adding new features all the time. On that note, thank you so much and I’ll look forward to seeing you back here next time. Until then, happy analyzing.

Podcasts SDS 265: Data Science in the World of Big Data

SDS 265: Data Science in the World of Big Data

Podcast Transcript

Share on

Related Podcasts

June 26, 2026

June 23, 2026

June 19, 2026

Podcasts SDS 265: Data Science in the World of Big Data

Share

SDS 265: Data Science in the World of Big Data

Podcast Transcript

Share on

Related Podcasts

June 26, 2026

SDS 1004: Recursive Self-Improvement

June 23, 2026

SDS 1003: Building an AI Data Center End to End, with Lightning AI’s Frank Basso

June 19, 2026

SDS 1002: Fable 5: The Full Story from Capabilities to Drama