Jon Krohn: This is episode number 435 with Erica Greene, manager of machine learning engineering at Etsy.
Jon Krohn: Welcome to the SuperDataScience podcast. My name is Jon Krohn, chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today. Now, let’s make the complex simple.
Jon Krohn: Welcome to this episode of the SuperDataScience podcast. I’m your host, Dr. Jon Krohn. We’re joined today by our very special guest Erica Greene. In this fun and hugely informative episode, Erica provides us with countless valuable pieces on high actionable guidance on scaling up machine learning model development to a massive scale; machine learning operations, including how to avoid frightening feature drift in production; how to prioritize research and development project for a large team of data scientist and ML engineers; the three critical areas of expertise ML engineers should try to master; and whether a Ph.D. is advantageous to professionals who apply machine learning.
Jon Krohn: This episode is particularly well-suited to practicing data scientists or ML engineers who’d like to learn how to scale up their ML models to a massive high-throughput scale. That said, we also have tons of guidance for relative beginners who would like to know what it takes to be hired as an ML engineer, an extremely in-demand and highly rewarding career specialization in the data science world.
Jon Krohn: Beyond the technical considerations we covered, Erica also drew upon her rich managerial experience to provide up with lots of thoughtful, practical tips for managers who deal with data science or software engineering projects. Wow. So much to learn from this episode. Let’s go.
Jon Krohn: All right, Erica. Welcome to the show. Wonderful to have you on. We’re going to dig right into some meaty topics. We know that you work at Etsy. Tell us about the kinds of problems that you solve at Etsy. Tell us about the kind of company that Etsy is as well. Probably, many people are aware of it, but you could do a better job explaining it than I could.
Erica Greene: Sure. Thank you again for having me. It’s good to be here. Etsy is a two-sided marketplace for selling and buying vintage and homemade goods, handcrafted goods. The company’s roughly five to 600 engineers, a little over –
Jon Krohn: Wow.
Erica Greene: …people. Yeah. It’s pretty big.
Jon Krohn: That is 10 times bigger than I would have guessed ahead of time.
Erica Greene: Yes, it’s also not really a startup anymore. It is 15 years old. It’s based in Brooklyn, New York, and we have offices in San Francisco and Dublin. I’ve been with the company just under a year and a half. I actually worked there eight years ago right out of graduate school for a year, and then had circled back and rejoined Etsy. I manage a team of eight machine learning engineers. We work on their ad system.
Erica Greene: Etsy runs a onsite advertising platform, where sellers can choose, if they want to, to advertise on our site. We reserve real estate on our apps and our website, where sellers can advertise. We run, essentially, a full ad platform, second-price ad auction between sellers and listings on the site. My team owns all the machine learning systems that power that auction.
Jon Krohn: Nice. Obviously, there’s a lot of companies out there that exist solely for creating these kinds of ad platforms. To have that internally, I think, it’s a non-trivial engineering task.
Erica Greene: Yes, for sure. It’s an interesting problem space because we own everything. We’re in some ways both the buyer and seller side of the ad auction. It runs at a very large scale. Etsy has been very successful. We have a lot of people on our site. We represent, essentially, both sides. We want to make sure that buyers… We show them listings that are interesting to them that are relevant to them, that helps them find the thing that they want to purchase. Then, we want our sellers… We want to give them a good return because we know if we don’t offer them a return on their ad money, they can easily go and advertise somewhere else.
Jon Krohn: Right. I understand. What kinds of problems do you need to be solving specifically as managing the machine learning engineers on the ad platform at Etsy?
Erica Greene: I refer to us as search, but harder. We have to do the search problem. Basically, when somebody is looking for something on our search, they put in a query. The queries are often very funny. I mean, they’re like, polka dot, red face mask, or something like that. Our system needs to return 24 listings that are relevant to that query, which is basically exactly the same problem as the search problem. But on top of that, our systems need to return a price with each of those listings.
Erica Greene: The onsite advertising platform at Etsy, it’s called Etsy ads, is a CPC model or cost per click. We charge sellers when a buyer clicks on their listing. We charge them a small amount. That price is not set. It’s not the same for everybody. It’s dynamic. We want to set that price such that sellers… We call it a bid, even though they’re not actively bidding. We bid on their behalf. But they bid higher if their listing is a really good match for the buyer and a really good match for the query. They bid lower otherwise because that’s how you ensure them a good return on their money. You want to bid up when you’re very, very confident that your listing is a good fit, and you want to bid down when you’re not. Those are the high-level problems that we’re solving. We’re solving the ranking problem just basically classic search, and then we’re solving the bidding problem.
Jon Krohn: I understand. Makes perfect sense. In the beginning of what you were explaining there when you’re saying we have to return a price, it isn’t the price of the item, which is just something that’s stored in a database. It’s that dynamic price for the cost per click that you need to calculate on the fly and figure out how buyers should be bidding for this based on how relevant the search is to them in particular. That does sound like an interesting problem.
Erica Greene: Yeah, sorry. Right. It’s not the price of the listing. It’s the cost; I guess the bid price. People just refer to it as the CPC, the cost per click, but the ad world is littered with terrible acronyms.
Jon Krohn: Three letter acronyms, the TLA.
Erica Greene: Yeah.
Jon Krohn: I know it all too well. Tell us some major problems that you’ve had to solve in the machine learning realm. I know we talked about some of these already before. Things like migrating models to TensorFlow and real-time neural networks for solving these kinds of problems. Do you want to tell us about that?
Erica Greene: Sure, yeah. We have two core modeling tasks that we work on. One is click-through rate prediction, which is a very classic ad tech problem. We use that to do rankings. It’s showing a particular listing for a particular buyer during a particular time when they’re looking for a particular thing. We think that that listing is going to have a high click-through rate. That’s, essentially, analogous to it being relevant. There’s a model that we use to do that prediction. We do that in real-time for 600 listings for every query. That is one.
Erica Greene: Then, another one is conversion rate prediction. On the bidding side, we want a bid up when we think that somebody is going to actually purchase it if they click. What is the chance that they’re actually going to purchase the item? That’s a conversion rate prediction problem. They’re similar, but they’re not exactly the same. They’re trained on historic data of what people have clicked on and purchased in the past. Etsy’s been around for a long time. This ad system has been around for eight years or something. Originally, they were using linear regression, logistic regression to do that. They migrated to trees. This is before my time. In the past year, we’ve migrated to neural networks and the entire system over to TensorFlow. Each of those has been a boost up in performance and accuracy. We’ve seen that.
Erica Greene: These just are classic problems of make-models-that-are-more-accurate, but then there’s the bigger system. How do you set up the option? How do you weigh off this conversion rate and click-through rate objectives? How do you scale the system? There’s engineering problems there. How do you measure success? How do you weigh these different outcomes for these different stakeholders? There is just the make-the-model-better problem, but then there’s also the how-do-you-tune-and-run-the-entire-system?
Jon Krohn: Yeah. I hadn’t thought about how if you have two different objective functions, one for click-through rate and then the other for purchases, to somehow combine those in a way. Is there some meta-objective function, or that’s something that you have to make decisions on outside of the models?
Erica Greene: Yeah, that’s an interesting question. We don’t run it as a meta-objective function.
Jon Krohn: I made that word up.
Erica Greene: Yeah, yeah. I mean, you could. I’ve thought about it. I think, when I first joined the team a year and a half ago, it’s a fairly complicated system. The engineers and the team… I pulled them into conference rooms for hours at a time and had them whiteboard the whole system out for me. I remember one of the first things I said was, “Why are we running this like an auction? Isn’t it just a black box, where query comes in, and we need to return listings and prices? But why do we need to have all these different parts of the system?”
Erica Greene: Part of that is historic that Etsy used to allow the sellers to pick their CPCs. They would say, “I am willing to pay seven cents per click, or I’m willing to pay up to 20 cents per click or something like that.” For various reasons, the company decided to move everything onto this auto-bidding system. But in the past, it had to be possible to set a CPC, a bid price.
Erica Greene: I mean, I think it doesn’t need to be run in different ways. It could be run as a joint optimization problem. But there’s a sort of efficiency to doing it that way. I think if you think of the bidding system as separate, if you thought of there being a church-state divide between the bidding side and the ranking side, then certainly it is optimal for the company running the auction if we rank based on your bid times, and the click-through rate. That is a weighted chance that something is going to be clicked. That’s the revenue-optimal thing, but we own both sides, so we really are trying to both get good return and good-
Jon Krohn: I totally understand. I think it makes sense to have those as a separate model.
Jon Krohn: This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience, it’s the namesake of this very podcast. In the platform, you’ll discover over two and a half thousand video tutorials, more than 200 hours of content and 30+ courses, with new courses being added on average once per month. All of that and more you get as part of your membership at SuperDataScience. So don’t hold off, sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.
Jon Krohn: That migration from decision trees to neural networks and TensorFlow, that’s something that happened since you joined. Was that an obvious decision? Was it always something that you were confident would be successful, or was it easy?
Erica Greene: It wasn’t easy. Very few things are easy. I pushed for it. Yes. I think this world where every model maps one-to-one to a library. Then, you need to figure out how to scale that library. It makes doing model development very slow. The company had been going along that line. We had a library we were using for logistic regression. Then, it was a new library for tree-based models. Then, if you wanted to use neural networks, you would need a new library. That meant you needed integration at all these points in the process of training the models, and evaluating the models, and then serving the models. It just made the full development process so slow.
Erica Greene: The beauty of TensorFlow and the beauty of PyTorch, and I honestly don’t have such a strong opinion, other than Etsy runs on GCP on Google’s Cloud. It just fits better with all the Google if you stay in the Google world. Of course, you have a lot of modeling flexibility. There are frameworks and languages, not just in single implementations of models. I was very convinced.
Jon Krohn: That makes perfect sense to me. That is one of the great things about TensorFlow is that it has so many modules that allow you to be training locally or on a single server, and then using TensorFlow Serving to have your in-production deployments run across a large number of servers. I imagine with the scale you have that is critical real-time bidding and calculating these CPCs. I imagine those performance issues must have been top of mind the whole time.
Erica Greene: Yeah.
Jon Krohn: That decision makes a lot of sense to me.
Erica Greene: One thing I will say is that at some point, we were thinking like, “Oh, TensorFlow has all these… Of course, people use it for neural networking and deep learning, but they actually do have implementations of other types of models in the library, in the framework. They do have gradient boosted decision trees and really, “Oh, hey. This could be great; win-win. If somebody wants to continue using these tree models, we can stay within the TensorFlow universe.”
Erica Greene: One thing we tried early on was trying to get it to work and had such problems getting it to work. I think it just isn’t well maintained and doesn’t work with AI Platform Training well. We were back and forth, I think, for months and months and months with the Google reps, trying to figure out how to get this working, how to get it working with hyperparameter tuning. I think the real dream of only using TensorFlow for everything, even the things it’s not primarily used for. I don’t know. I would check that out first before… I don’t know, convincing yourself that it’s actually going to work.
Jon Krohn: Yeah, that makes a lot of sense. I would have actually assumed that those extra libraries for decision trees would work. That’s a really good tip to look into that, if that’s one of the key reasons why you’re thinking about using TensorFlow. To my students, I often say, “You shouldn’t think of TensorFlow and PyTorch as deep learning libraries. Think of them as automatic differentiation libraries. Libraries that allow you to descend a gradient and optimize objective functions, so find the right cost per click. It should work across any kind of model. You don’t want to have to code up a random forest algorithm from scratch in TensorFlow if you could do it much more easily with XGBoost or something.” That is a great pro tip.
Erica Greene: Yeah. This universe is changing so quickly. You have TensorFlow 1. There’s TensorFlow 2. There’s different versions of it. I don’t know what your students are using for training, but we’ve been using the AI Platform, which is just a Docker image with TensorFlow built-in and NumPy and whatnot. Then, you can run it in the cloud, trained model, but that has certain versions pinned. You get into this, and then these is not very common implementations of models that are not used necessarily. Google is not great at keeping things backwards compatible. It’s not that they never worked but, they don’t necessarily work for the setup that you have.
Jon Krohn: Yeah. When people ask me, they’re like, “Are you ever worried that machine learning is going to put you out of a job?” I’m like, “No, absolutely not because there’s all these version issues that machines couldn’t possibly figure out.” Maybe they’ll figure out how to model perfectly, but they won’t be able to get everything up and running and installed.
Erica Greene: Yeah, yeah, yeah. Although that’s not really the fun part of the job. But yeah, it is [inaudible 00:17:55].
Jon Krohn: Yeah. I think that doing it the way you’re doing is the best way to do it, to have everything inside a Docker container and maintain the versions within, absolutely, when you get into these situations where you have a big version update. At my company, a couple of years ago, we made the switch from Python 2 to Python 3. It was a huge engineering undertaking to make sure that our production systems were going to work as well or better in Python 3. Those are the biggest changes, but even, as you say, TensorFlow 1 to TensorFlow 2… Did you start right off the bat when you were at Etsy with TensorFlow 2, or did you have to make that TensorFlow 1-2 transition as well?
Erica Greene: We had to make that transition at the time. AI Platform Training, I think, did not support TensorFlow 2, and so we did it in TensorFlow 1. We have had to make that transition. It has been much more painful than I expected. We have trained models in TensorFlow 2, just made the small code changes to make it compatible with APIs, and then found that the predictions for every… We were predicting on listings; for every listing was shifted by .15 or something.
Jon Krohn: Wow.
Erica Greene: Exactly shifted for everything. We-
Jon Krohn: You’d never expect that. If things break, you’re like, “Okay.” But for things to work, but be off by a little, that is bizarre.
Erica Greene: Yeah. It’s really bizarre. Because we run this bidding side… this system is a bidding side of the system and a click-through rate side of the system. Then, we take these two different numbers coming from either side, and we multiply them together and do a little logic on them. Actually, them shifting is [inaudible 00:19:42]. There’s a page on the TensorFlow Google website of all the things that have changed between TensorFlow 1 and TensorFlow 2. Some of them are very low-level, very frustrating. Like, they calculate the loss as the sum of the losses of all the examples in each mini-batch instead of the average, or vice versa, but it changed. In the atom optimizer, this is the stuff that you don’t expect to be changing. If anyone from Google’s listening, maybe they could tell me why that happened.
Jon Krohn: Yeah. I always think of it, in my mind, as an API change of how you’re interacting with things behind the scenes. I did not even know that. That is hugely important to know. Maybe I need to go back and look at some changes that I made in the past from TensorFlow 1 to TensorFlow 2 because I was not aware of the things under the hood like that. Like, how values are being calculated or changing. I’ll note that down.
Erica Greene: Yeah. Yeah. The backwards compatibility stuff is… It is important when running an enterprise service. We run in Google Cloud at Etsy. We have been increasingly using Google managed services on my team. It’s different across different teams in the company. But all these different pieces fitting together, data flow and AI Platform Training, and flow, and the hyperparameter tuning service is another managed service. They’re great. They’re incredibly powerful, but the versions change very quickly.
Jon Krohn: Yeah. Something that you’ve mentioned, now, managed services, and you talked about the AI Platform. I’m actually not very familiar with that. Are those the same thing? Are they related?
Erica Greene: Oh, yeah. Sure. I just meant managed services as a broad category of these cloud-based services, where you’re not hosting your database, even if you’re renting the servers. You’re not running the database. They’re running the database, and then you’re calling out. You’re using it. Data flow is their service for doing large scale computation on data, parallelized computation on data.
Erica Greene: Then, they have this service called AI Platform Training. It used to have a different name, but I can’t remember. But basically, it’s a wrapper. If you’ve got Python script that trains your model, you can actually call that same script, and you can either run it locally, but the real benefit is that you can run it in a Docker container locally very easily, their Docker container with all their pre-installed data science libraries, or you can run it in the cloud.
Erica Greene: You can get hardware to run it on. You can run it on GPUs. You can run it on big machines with lots of memory or smaller machines. You can see there’s some web UI. You can log in and see you and how it’s progressing and see the logs from what’s going on. You can basically run any arbitrary… as far as a new Python script, as long as you have the libraries. You can add different libraries if you wanted them to include in the Docker container. This is great. This means that you don’t have SSH into some computer somewhere. You don’t have to keep those resources running. You can scale up and scale down. You can run against lots of hardware, GPUs, DPUs, and whatnot.
Erica Greene: It’s been great. The other real benefit of this service is that they have the capability to do hyperparameter tuning. If you can run your model and train it on the AI platform, then you can run it 100 times. You just specify config with the arguments that you want to tune, and it will even do something smart, some Bayesian search of the parameter space and give you all the runs and give you all the metrics.
Jon Krohn: Yeah. I love that. Anytime I come across a Bayesian way of doing anything, including hyperparameter tuning, I think that’s a smart way to do it. That’s the smartest way we could possibly be doing this. Well, that’s awesome. I didn’t know about that AI Platform. I’m going to check it out because I spend way too much time SSHing into servers than I spend up on GCP.
Erica Greene: Yeah. Yeah, I think it’s great. I had a big argument with my partner about this. He runs a research lab. He has these computers. He has his students SSH into it. I was like, “This is crazy. You can’t scale that.” But he doesn’t have hundreds of students. But I’m interested in making sure that the tooling that we have for doing model development and training is available to two very large teams. I don’t want everyone to SSH into their own personal servers.
Jon Krohn: Yeah. Thanks a lot, Google, for another dinner ruined. Nice work. All right. That was hugely interesting about the problems that you’re solving at Etsy. These migration issues diving deep into them, the AI Platform. Another thing that I would love for you to talk about is interesting failures. For example, data quality issues that you’ve had.
Erica Greene: Yeah. Data quality issues. Fascinating. We’ve had some interesting failures. I can chat about them. The most interesting ones, recently, have fallen into the category of feature distribution changes. We just didn’t have the monitoring in place. This was over a year ago now. It still feels traumatic to identify it. Then, it was just very, very difficult to track it. Basically, it looked like, all of a sudden, performance in production and the production online metric starts going down. Like, “All of a sudden, less people are clicking on our app. That’s strange.” Then, trying to figure out what’s going on, identifying that the offline model accuracy has decreased, and then trying to figure out what happened. If we didn’t change the model code, we didn’t change the model, what could happen.
Erica Greene: Finally, traced it down to one feature. It had been assumed that the distribution of that feature was never going to change. It was coming from a different system. They had a different system. It was a incorrect assumption, but I don’t blame the person for making it. The distribution changed by 10X, and we were training these models-
Jon Krohn: Wow.
Erica Greene: …on 30 days of data. We were training it every night. All of a sudden, this one feature is essentially becoming incredibly noisy. It was, unfortunately, a useful feature. The model had put a high weight on it. It was totally messing up the predictions. It was messing up the model. It was a really difficult problem to recover from because we have these data pipelines that generate our training data. They’re fairly slow to rerun. Rerunning them on all the data is expensive. For 30 days, it’s expensive. We can’t really recreate the old, can’t go back to the system, and make the change back. Enough time had passed that we had enough days of data, that training a model on the newest data that didn’t have this feature from two different distributions that that model did good enough. Essentially, we switched to that.
Erica Greene: But these silent changes in distributions of the features coming in keeps me up at night. We’re monitoring everything. We have a long way to go. That’s one of those things that I put solidly in the ML engineering problems and not traditional data science problems. I didn’t know this term until the last year, but [crosstalk 00:28:05] MLOps.
Jon Krohn: Oh, damn.
Erica Greene: Were you going to guess that?
Jon Krohn: No. Because you’ve been talking about features changing, I thought you were going to say feature drift specifically.
Erica Greene: Oh, feature drift. No, no, just the area that-
Jon Krohn: ML-
Erica Greene: MLOps.
Jon Krohn: Yeah.
Erica Greene: Which are Dev Ops for ML. This is a classic MLOps problem.
Jon Krohn: Yeah. I love that you’re talking about feature drift and MLOps. Because on a recent episode of the podcast with Ben Taylor from DataRobot, we were talking about this. We were talking about data science trends that we think are going to accelerate in 2021. These were two of the main topics that we talked about. It’s great to have you reinforcing that these are the things that are happening in the real world. I think, probably, you’ve done a thorough job of explaining what this is for our listeners, but feature drift is this phenomenon that is common in the real world. So if you’ve been a data scientist that has worked with some classic data science dataset, like the MNIST digits or the image net database, the data don’t change. You can, over decades, in the case of MNIST, come up with gradually better and better models for handwriting prediction.
Jon Krohn: For things like handwriting, probably that doesn’t change semantically over time. Though, you can imagine that maybe the sensors that you’re using for reading handwritten digits that could change all of a sudden at some point or maybe gradually. Basically, in the real world, you don’t have a fixed dataset. When you’re at Etsy, and you’re dealing with products that are constantly changing, you’re dealing with behavior that’s constantly changing, and so a model that is built for one set of data, and eventually not fit the data as well as it could because the underlying features change. It sounds like, in this case, something changing by 10X, one of the inputs, it’s obviously going to make a big impact on the model given that, like you said, it was weighted highly.
Erica Greene: Yeah. I think slow drifts don’t keep me up at night. We retrain our models fairly regularly, and over a fairly short time, I mean several weeks or months. But the real thing that keeps me up at night is that there is some feature, which is tied to not the direct user input, like user preferences are changing or something like that, but is tied to another system, and that system changes. Etsy’s a large company. There’s lots of engineers. People are not aware that the data that they are generating is being used to train models. Generally, those worlds are fairly separate.
Erica Greene: Another example is if a… This hasn’t happened, but very well could. If we’re using some taxonomy, some structured data, we’re asking our sellers to fill out about our listings. Then, there’s some team in charge of making that better or easier and make a change, and many layers downstream from that change, our features are suddenly radically different. That is the thing that really is worrisome.
Jon Krohn: What keeps you up at night is somebody’s not telling you that something major has changed.
Erica Greene: Yeah. I think that’s not a scalable approach of everybody just sticking together to really make a change that impacts one of our features. But we should have monitoring and alerting and et cetera, et cetera, to identify [inaudible 00:31:58].
Jon Krohn: Exactly. That’s what we were talking about with Ben Taylor on that recent episode as well, was that you can monitor. This is something for the listeners. If you’re worried, if you, too, are losing sleep, you can get a good night’s rest yet again by having monitoring in place and making sure that the distributions are fitting particular parameters that you were expecting, assumptions you were making. If the features drift away from those expected distributions sufficiently, then you’ll be informed.
Erica Greene: Yes.
Jon Krohn: Awesome. Tell us about how you decide what to work on. How do you decide as a team? How do you… When you’re presented with particular problems to solve, you decide which ones are worthwhile which ones are worth tackling?
Erica Greene: Yeah. That’s a great question. I think it’s challenging. I think it’s challenging in the same way that doing product development is challenging. You’re given a product. There’s a million things you could change, but what’s high opportunity versus the cost. I think a lot of times teams with data scientists or teams with ML systems that the people… These are fairly technical algorithmic systems, in that, the way… I have seen this at other companies. The way they decide what to do is that the people on the ground go out and read papers or read blog posts or listen to podcasts about data scientists and hear something that’s buzzy. Then-
Jon Krohn: Got to make sure I some feature drift sensors in. That’s product of data [inaudible 00:33:39].
Erica Greene: Then, they come back, and they get excited about it, and, in a better case, they read up a proposal, and in the less great case, they just start coding it. Then, you have a hodgepodge of darts that you’ve thrown at a dartboard. Some of them work out, and some of them don’t work out. You can get lucky in that approach, where more work out than not. It doesn’t engender a lot of confidence, as a manager, to just be like, “Well, we have eight random chances. We’re hoping that some of them hit.” I don’t know.
Erica Greene: I’ve tried to take a little bit more of a strategic approach to this. We’ve actually been running a discovery process not dissimilar from what people do for product discovery and a classic design-thinking type product discovery process. So what does that look like? It looks like we brainstorm high-level areas that we could go into. What about incorporating more image processing as a broad scope thing? What about more personalization? What about some approximate nearest neighbor, or what about… just better candidate, who did some more sophisticated things, candidates like that? We brainstorm eight of them roughly. Then, we have to a week or a little bit more, where are the goal… We split into teams and several people in each one of these areas. The goal output is a one-pager, which is a go-no-go recommendation, and an argument about why go-no-go. What we’re looking to do is not to try to get as far as possible in the implementation of it. It’s to pick the areas of highest risk, pick the areas that we don’t know the most about, and go out and try to get answers to those questions.
Erica Greene: That might look like prototyping in it. It might look like running a load test. It might look like going and talking to somebody who’s worked in that area, or inside, or outside the company. It might look like reading papers. We spend a week and a half in aggregate on that, twice a year or once every three quarters or something. Out of that come bigger bets. We’re going to take a bet on images. Because why? Because we have a one-pager and we can make that argument to you.
Erica Greene: Currently, no one asked me to do it. We have them ready. It’s for us. It’s to convince ourselves. We still read papers. We have a reading group. We still get excited about buzzy things, but we try to be a little bit more strategic about it.
Jon Krohn: I love it that all sounds so amazing. That’s the kind of thing that I’m like, “I would love to be a part of that process. Wow, that sounds like so much fun.”
Erica Greene: It’s fun. Yes. But then you have to hold yourself to it. That’s the harder part.
Jon Krohn: Then, you’re like-
Erica Greene: [crosstalk 00:36:55]-
Jon Krohn: …this didn’t turn out like I wanted it to be-
Erica Greene: Or they’re just like what I really want to do is implement that paper I read last week.
Jon Krohn: Right. But yeah, I guess that’s the whole point is to make sure that the work that your team is doing is grounded in practical business realities, even when it’s using potentially some shiny new state-of-the-art approach, making sure that there’s a good basis for that. I think that’s brilliant. I think that that thinking doesn’t happen enough. I think it’s great that you’re spearheading it. Yeah, absolutely brilliant. Thank you for telling us about that because it’s another action item that I should be coming out of this podcast and do it.
Erica Greene: Yeah. I basically think if I was my boss, and I was pretty skeptical about what I was doing, how would I convince myself. You can’t 100%. It’s different in that way than classic engineering work, not that classic engineering work is no risk, but a lot of things you know could do. It might take more time. It might take more money. They’re solving bad engineering decisions-
Jon Krohn: [crosstalk 00:38:13]-
Erica Greene: If you chose react versus choosing angular, you can make a [inaudible 00:38:18]. Certainly, one may be a better answer, but you’re not going to knock it to the finish line. With this data science stuff, it might not work. There is, I think, an inherent, higher level of risk. But I think you can do things to try to mitigate that risk.
Jon Krohn: Brilliant. Yeah. I mean, that’s, again, another one of the trends for 2021 that we talked about in this recent episode with Ben Taylor, which was that we need to be focusing on… It’s all too easy with AI and machine learning projects to have them spiral out of control and not lead to any tangible outcomes. That’s going to happen some proportion of the time.
Jon Krohn: Elon Musk recently said, I’m going to completely butcher the quote, but it was the idea of, “If you’re not failing, you’re not pushing research and development.” You’re not doing something innovative unless some proportion of the time we’re failing. We have to accept that. But following a process that you outlined, Erica, being thoughtful about those risks. I think that this is a really great idea. I love it.
Jon Krohn: I had a really interesting conversation with you recently about engineering at scale. I guess we touched on this a little bit now earlier on by talking about the Google AI Platform. But what can you do at Etsy to allow the models that you build, be able to do real-time bidding on your huge platform?
Erica Greene: Yeah. Almost all the credit for this actually goes to a team that I don’t run. We have a really wonderful team, we call ML Platform, and they own the infrastructure and build the infrastructure and maintain it to serve models in real-time. We build on top of that. We rely on that and use that. I recommend you talk to them. [inaudible 00:40:27] that particular thing, but Kubernetes cost. We’re scaling it up and et cetera, et cetera. The stuff I think about more because this team has this part solved for basically. I think about more model development at scale-
Jon Krohn: Perfect.
Erica Greene: …and ML systems at scale instead of the infrastructure to do it. Some things that are hard to scale, I have seen working in this industry at different places: one, just collaborating on an experiment or on a model. I think it tends to be really difficult, and something that there’s not, I think, a lot of guidance or resources to do. We often see people just working in silos not because their collaborators are bad communicators but because the tooling to do it collaboratively just isn’t there in the same way it is for classic software engineering.
Erica Greene: You try 10 different things. How do you do that? Are they different branches? Are they different COMMITs? Where have you recorded? Do you record it on [crosstalk 00:41:44] spreadsheets that are 150 lines long with experiment, only the person who wrote that spreadsheet could possibly decipher? You start [crosstalk 00:41:53]-
Jon Krohn: If they’re lucky.
Erica Greene: If they’re lucky. How do you map it back to the COMMIT in code that did it? All of that is hard to scale. You end up writing a bunch of code that could be shareable but isn’t because there’s… I don’t know. I just think that the best practices aren’t really there yet, so we migrated everything to TensorFlow. We’re writing all of these transforms in TensorFlow Transform now. We’re just writing them for our team, but there’s lots of other teams building models. We’ve been talking, at the company, about having a shared TensorFlow Transform repo and having it be pulled in as an external library. It is, I think, the right thing to do, but it takes some tooling and commitment from these different teams, such that every time that if there’s a canonical transformation that uses some canonical data from Etsy. If it’s a reimplementation, we have an in-house tool for featurization right now. There’s lots of teams that are going to want to reimplement the standard featurization functions that this in-house tool has. We should do it once. We should do it 10 times.
Erica Greene: I’ve been thinking a lot about third party, or not a third party, but just offline experiment tracking tooling. There’s now several companies that have offerings in this space. We just started working with one. That I see as a trend in 2021 in the future because the spreadsheets are no good if you really want to [inaudible 00:43:31].
Jon Krohn: I mean, if you’re comfortable saying, what was the vendor that you ended up going with? There’s lots of startups out there that do this ML Flow management. I think ML Flow is either a name of one of the libraries or one of the companies that does-
Erica Greene: Yeah. ML Flow, Weights & Biases is another one, and then Comet is another one. We ended up going with Comet. But I think it’s one of those things that zero to anything is night and day. It’s like, if you weren’t using version control and you started using version control, it’s night and day and such a huge win.
Erica Greene: These different companies do have to be specialties in that space, but I think we had nothing. Now, we have something, and that is a huge game-changer. The other things are just canonical datasets. Google has an internal Kaggle service. You can put up a dataset or a task and have people compete internally. Great idea. That’s really cool. That’s really making it easy to physically recreate baselines for these different models.
Erica Greene: Etsy’s a big company, but we don’t actually have so many different ML tasks that were training models for and making it possible that an intern could come in or someone in their free time or something like that, could say, “Hey, I wonder how well this thing works for this task,” and freeze a training set and freeze a test set, and then have all the different metrics for the scoreboard or something internally, makes it possible for people to try out ideas really quickly. I think that’s a great idea.
Jon Krohn: That’s brilliant. You are providing so many amazing applicable tips for me, and, I assume, many of the listeners, as well. This is brilliant. We’ve talked about applied ML, the problems that you’re tackling. We’ve talked about scaling up, machine learning, if not machine learning, engineering. We touched on it a little. Now, I’d love to just hear about how your working day is, how you manage your team, what your day is like. I’d love to hear that kind of thing.
Erica Greene: Sure. I’m not the best at time management or processes. I don’t know if I’m going to be able to provide-
Jon Krohn: That is a shock, given everything you’ve said so far. That is literally surprising.
Erica Greene: Let’s see. In the COVID world, I, and my team, have been trying to make as many decisions async as possible and have as few team meetings as possible. We do a daily stand-up, so we can all see each other’s faces and just really do, as quick as possible, what’s going on. Then, we try to do meetings when there’s actual decisions to be made. We are refactoring all of the repository where all models live and trying to… We’re doing a big refactor. There’ll be ad hoc meetings about how do we structure that.
Erica Greene: I let the team run those as much as possible. I don’t necessarily need to be involved, but we used to have a modeling meeting and an infrastructure meeting regularly. People just don’t like being on video chat when we’re all [crosstalk 00:47:12]-
Jon Krohn: I totally get it.
Erica Greene: I totally get it, so we canceled all those meetings. I think people are happier. I mean, they’ve told me that when they’re working collaboratively. I think that even more so in this remote age. If somebody could do something by themselves before and not feel lonely and isolated because other people around them, they can talk about it over lunch and stuff like that, it feels so much more remote and isolated now. Even though I know they can do it by themselves, I’ll put two people on a project or three people on a project so that we can slice it up smaller and just seems more collaborative.
Erica Greene: I spent a lot of my day… I do one on ones with my team every week, with my manager, and then with my partner managers in the ad space. We run a reading group every other week, which is really fun. We read applied ML papers, like KDD type papers. It’s open to anybody at the company. I run that. I mostly get feedback or iterate on project proposals. I really care a lot about things being written in writing, and small to large, and then we’ll iterate asynchronously on ideas in written documents. That’s my preference, so in one-on-one meetings with people, and then constantly iterating on ideas on paper. I miss whiteboards desperately. I know that there are these-
Jon Krohn: Me, too.
Erica Greene: …services that try to recreate them. But I’ve not found anything to be remotely satisfactory. I can’t wait to go back to the office.
Jon Krohn: Same. I think the whiteboard thing, it isn’t just because… I mean, thinking of, in your head, “Okay, I need to design an application that emulates a whiteboard.” It seems comically easy. But it’s never going to solve the… I would run, maybe every other week, what I would call, a local science conference with my team, where we’d booked a separate room. We were all clustered together in one little office together. But for the use of local science conferences, it was a big to do.
Jon Krohn: We prepare. I would work with team members on what they were going to be talking about at the conference. Then, we’d book a separate room. It had big a whiteboard. It didn’t have a screen. Nobody brought laptops. We brought notebooks. We have a whiteboard. It’s that presence with the problems that people are presenting that you can’t recapitulate in the same way that you can’t recapitulate real-world meetings because people are inherently distracted. You can’t sit people at a computer, which has all of the internet on it, and all of their other applications that could be lighting up in the background, and maybe they’re actually lighting up, though. You could ask people to turn notifications off or have only this one screen open, but it still is just the habit of this being the device that does all of that other stuff. You can’t be focused like you are in a real-world meeting around a whiteboard. That’s-
Erica Greene: Yeah. I mean, this is a very hot topic to have a hot take on, like the future of work, the future remote work, the future of flexible work. There’s a million think pieces on this. Etsy did a survey of, “Do want to go back to the office? What would you feel comfortable with?” They’re trying to figure it out just like any company’s trying to figure it out. They’re doing it in a really thoughtful way. I very much appreciate it. I don’t think they’re going to get rid of their office. I can’t imagine long-term staying in a company where there’s no office.
Jon Krohn: Yeah. Yeah, I agree. I certainly hope that we’ll go back to having an office. We got rid of our offices in New York. I think all of our other locations around the world we kept the office. But in New York, with the way it turned out with a lease rolling over, it was like the perfect time-
Erica Greene: Yeah, yeah, yeah. That makes sense.
Jon Krohn: We were like, “All right. Let’s put everything in storage and wait until this is over.” Now, there is some discussion about us not going back into offices being possible. I think that, with my company, there’s other parts of the business where that might be totally possible. Our productivity is off the charts. Our revenue is better than ever. Profitability is better than ever. Remote working is working. But for my stuff that I’m doing with my team, the science R&D, it is not the same. I think at least having a couple of days a week, where maybe that’s the hybrid future that could be working well.
Erica Greene: Yeah. I have found that the execution is fair… [inaudible 00:52:12] to be of execution is not affected. It might even be better off remote, but coming up with new ideas is harder.
Jon Krohn: Yep. Preaching to the choir, Erica. All right. Speaking of working with you, I know that you are hiring or Etsy is hiring for data scientists and ML engineer roles right now. What are the things that you’re looking for in people that you hire, or how can people be looking to get hired with Etsy, or, even more generally, just tips you have for data scientists and ML engineers out there?
Erica Greene: Right. Etsy is hiring. You can apply through the website, et cetera, et cetera. You can reach out to me-
Jon Krohn: We’ll provide a link in the show notes.
Erica Greene: Yeah, great. Advice about getting hired at Etsy or getting hired in general. I don’t know how broadly applicable this is, in general, but for me, I think this area of machine learning engineering is really interesting. I think it’s one of these areas that didn’t have a name for a long time. It’s fairly new, and what it means and what the skill set required is still fairly amorphous. It’s like data engineering or something. What does that mean? Are you a DBA? But you’re not a DBA. It’s taken a while for that to mature, and the same thing for machine learning engineering.
Erica Greene: I think of it as three things. I think of it as data engineering, backend software engineering, and then machine learning modeling knowledge. To me, all three of those are important. But you often can’t find people who have expertise in all those three things, which is fine. I think of building a team that that is strong in all of those, but not everyone necessarily has to be strong in all of those. But they certainly have to be interested.
Erica Greene: I don’t want hard lines between, “I just do the modeling, or I just do the data engineering.” I like to hire people who are honest about what their background is and what they’re good at, and then passionate about and excited about learning the other stuff. I’m happy to hire people who have strong ML data science backgrounds but excited and eager to learn the software side and the engineering side. I’m happy to hire people who have strong software backgrounds and took some Coursera classes and are excited to learn ML stuff.
Jon Krohn: That is such a beautiful answer, just like the whole rest of this podcast has been such clear and actionable items for the audience. I love the way that you broke that down into those three categories. That makes perfect sense to me. I think that that idea of being interested in all three and being able to dabble or at least understand what other people are talking about in that space, and maybe over time, maybe over the course of a decade or multi decade-long machine learning engineering career, somebody can really master all three of these areas. But as you say, very difficult to find people who are strong in all three right off the bat.
Jon Krohn: You weren’t sure whether that was going to be great advice for just Etsy or more generally. I think that that is great advice, generally speaking, for ML engineer roles. Even more generally, I think, that it’s never going to hurt if you’re applying for a data engineering or data science role, and you’re interested in all three of those things. Data engineering was the first, backend engineering was the second, and then machine learning is the third. All three of those things are more and more and more important as the amount of data that we have on the planet is growing exponentially every year. That is just going to happen more and more and more going forward with 5G sensors everywhere, and better connectivity, cheaper storage, cheaper compute. Even as a data scientist, being able to handle some of that of the backend engineer and the data engineering is going to be critical going forward.
Erica Greene: Yeah.
Jon Krohn: Nice. You mentioned to me that you had some thoughts on diversity in hiring as well. Do you want to share those?
Erica Greene: Oh, yeah. I would love for there to be more women and black and Hispanic engineers, machine learning engineers, and data scientists. I care deeply about it. Etsy cares deeply about it. We could have many, many long conversations about it. I think the one thing I’ll say here because it’s such a complicated and interesting topic is that I heard this piece of advice recently, which has really stuck with me, which is about this idea of growth mindset and hiring people for potential, not only experience. That, if we only hire people for experience, which I see so often happening… As a manager, you hire someone who’s not good; it’s really painful. It’s a painful process to try to help them. Then, if it’s never going to work out, to manage them out. It depends on their performance review plan. People tend to be very conservative of looking for people who have exactly done the thing that you want them to do. That feels the safest so many times.
Erica Greene: But if you want to hire underrepresented minorities into these roles, you have to… There’s not enough people who have exactly these experiences, but there’s tons of people who have something similar and have a ton of potential.
Erica Greene: What do you ask? How do you evaluate potential? It’s much easier to evaluate experience. It’s these behavioral interview questions, “Tell me about a time when you had to model… The feature distribution shifted, and you had to make…” et cetera, et cetera, but how do you evaluate for potential. There’s some very easy ways. One of them is just to ask, “What are you most proud of that you worked on?” Some of these open-ended questions where you can give people space to let their passion show through and let their competence show through. I always ask what are you most proud of that you’ve worked on. I don’t know. That’s my one piece of advice that we should be hiring for potential and that we should design interview panels and interview processes that allow people with potential to get hired. Then, if we only hire people who’ve done exactly what we expect them, what we’re going have them do, we’re not going to end up with a more diverse workforce.
Jon Krohn: Yeah, it would be a chicken and egg scenario. It like if you only ever hired for exactly what is already the market, then you’d be stuck at exactly the demographic distribution that we’re in today.
Erica Greene: Yes. I mean, I certainly had a manager early on… My first job out of graduate school, I was coming from a Ph.D. program. I left this Ph.D. program. I had not, at all, thought about what I was going to do as a job to make money. I got referred to Etsy. Someone from college I knew worked there. I really wasn’t a strong programmer. I only did academic programming, didn’t know nothing about web development, software engineering, and-
Jon Krohn: I love that. You have a degree in computer science-
Erica Greene: But they don’t [crosstalk 01:00:05]. I come from a liberal art school. I have a degree in math. I have a master’s in computer science. I just took… basically, taught myself, took optimization classes and stuff. I didn’t know anything about web development. I was interviewed for potential. Someone took a chance on me. I don’t know. I feel that very strongly that someone took a chance on me early on, and people taking chances on me since then. I try to keep that in mind when hiring.
Jon Krohn: I love it. That’s such a great message. Maybe starting to round off the interview now, but a question that I get asked all the time, and you have some very interesting insight into given your background: Do you need a Ph.D. to be a data scientist, or what’s the advantage of having a Ph.D.? Maybe even tell us about your decision to be in a Ph.D. program but go into the workforce before finishing it.
Erica Greene: Yeah. I went into a Ph.D. program right after college. It was not the most well-thought-through decision in my life. I was a math major undergrad and wanted to do your theoretical math but wanted to stay in this academic space. This was 10 years ago. More than 10 years ago. The field of AI and ML has changed a tremendous amount since then. I would say, again, a very interesting topic that we could talk about for hours. If you want to be a research scientist, if you want to be a professor, or you want to be a research scientist at very few “pure” research labs that there are, of course, you do need a Ph.D. for that. If you do love it and are willing to be paid very, very small salary for five or six years, but just it’s a chance to do this academic exploration that you wouldn’t otherwise get the chance to do, then by all means. But what most applied ML or data science-y type positions don’t require a lot of the things that you would learn in a Ph.D.
Jon Krohn: Yeah. I couldn’t agree with you more. I have people with Ph.D.s work for me and others who don’t. I do not give them work differently based on that. Of course, a lot of the things in terms of applied ML that you need to be learning, you learn outside of the academic environment anyway. I don’t think anyone’s ever going to say that it’s a disadvantage that you have a Ph.D. As you say, the roles where it’s an absolute necessity, like being a research scientist, are few and far between, and that the most important thing is getting the experience. If you’re interested in these things, doing Coursera courses on your own, on applied ML either before you start a job in data science or machine learning, or on the job, any of this continuous learning is critical. That can show just as much potential or even capability as having a Ph.D. or more.
Erica Greene: Yeah. Yeah. Right. Even if I had finished my Ph.D. program, basically almost nothing that I would have written my thesis on would have been very relevant right now. The field has changed so much. I don’t know. What else? Management, communication, writing skills, software engineering, data… I mean, there’s just so much. There’s so many things to learn. If what you are really most passionate about is the academic part of that, then by all means, but in the industry there’s you can shine in lots…
Jon Krohn: What a wonderful message to end on. Thank you very much, Erica, for being on the program today. We just have two quick questions before I let you go, which is, do you have a book recommendation for us?
Erica Greene: Yes. Yes. I wrote down a few, but I will give you one. It has nothing to do with machine learning. I am-
Jon Krohn: Great.
Erica Greene: …a wine nerd.
Jon Krohn: Wow.
Erica Greene: Yeah.
Jon Krohn: We have most of our listeners, about 90 or 95% of them, listen via audio-only platforms, but I do encourage you to check out the YouTube format, the video format. Is there a wine bottle over your right shoulder?
Erica Greene: This is Château Mouton Rothschild 1988, which is the year I was born. My dad bought a case for each of us of our birth year. This is a Bordeaux. Then, we open it on special occasions. I don’t know when we opened this, but I use this for watering plants. But I think I love the bottle. Anyway, so I thought I’d give some wine reading. A lot of writing on wine is a little dense and hard to get through if you don’t know a ton about wine and even if you do know a ton about wine. But there’s a wonderful book called Bursting Bubbles, which is about the champagne industry and the history of the champagne industry, which is fascinating. Champagne is really of any wine region. It’s all about branding. The chateaus and the region itself have developed… People pay money for the brand in the same way the fashion industry works. The history of that region and how it’s played into culture and what’s happening, it’s been a bit of a renaissance in the last 15 years, is fascinating.
Erica Greene: I think that we will hopefully be drinking a lot of champagne in the next coming year as the world becomes brighter and we crawl ourselves out of the pit we’re currently in. If you want to learn about the Champagne region before you start drinking a bunch of champagne, Bursting Bubbles is a great read.
Jon Krohn: I love that. It’s such a good recommendation, maybe one that I’ll check out myself because I barely drink. But when I do, it is champagne.
Erica Greene: Okay. Good. You see, so maybe it’s worth it, yeah. Yeah.
Jon Krohn: Perfect. Then, final question is, how can people get in touch with you? Is there social media that you use or any way that you’d like people to get in touch?
Erica Greene: Sure, yeah. I’m on Twitter. I tweet so infrequently, but it’s one of my new year’s resolutions to be more active on Twitter. I am Erica Green, E-R-I-C-A G-R-E-E-N-E, at Twitter, on Twitter. I grabbed that username when I was in college. You can follow me there. You can friend me on LinkedIn. I will accept. You can message me if you’re interested in a role at Etsy. I’m happy to connect you with the right people.
Jon Krohn: Beautiful. Thank you so much, Erica. This interview has been amazing. I’ve learned so much, and just really enjoyed chatting with you.
Erica Greene: Yeah. It’s been great.
Jon Krohn: Yeah. Thank you very much, and we’ll catch up again soon.
Erica Greene: Yeah. Happy New Year’s.
Jon Krohn: Amazing. I learned so much from Erica Greene in this episode. From the cool models they build at Etsy to represent both sides of their marketplace, to the machine learning operations and management best practices that need to be in place to allow for efficient data science collaboration at massive scale. From the three critical areas of expertise ML engineers should be interested in, data engineering, back-end engineering, and ML, to the relative lack of need for a Ph.D. if you’re going to be applying data science in the field.
Jon Krohn: As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, and URLs to Erica Greene’s LinkedIn and Twitter handles, as well as my own LinkedIn and Twitter handles at www.SuperDataScience.com/435. That’s SuperDataScience.com/435.
Jon Krohn: If you enjoyed this episode, kindly leave a review on your favorite podcasting app or on YouTube, where you can enjoy a high fidelity video version of today’s program. I also encourage you to tag me in a post on LinkedIn or Twitter. I’d be delighted to hear your thoughts on this episode and would love to respond to them in public. All right, it’s been so much fun. Thank you for listening, looking forward to enjoying another round of the SuperDataScience podcast with you very soon.