Podcasts SDS 451: Translating PhD Research into ML Applications

76 minutes
Data Science, Machine Learning

SDS 451: Translating PhD Research into ML Applications

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Dan is simultaneously working full time at Twitter and achieving a PhD. He talks about how to get started on working on a PhD while working, what juggling that looks like, and his tips for succeeding, category theory, revenue science, and more!

About Dan Shiebler

Dan is a Staff Machine Learning Engineer at Twitter in New York. He is also a (remote) PhD student at the University of Oxford. His research focuses on applications of Category Theory to Machine Learning. Previously, he worked as a Data Scientist at TrueMotion in Boston. Before that, he studied Computer Science and Neuroscience at Brown University. He’s also spent some time doing Computational Vision research with the Serre Lab and Neurosurgery research with the Asaad Lab.

Overview

At Brown University, Dan studied neuroscience for his undergraduate work which interested him as a technical field in the world of brains. He started off in data science in biological data analysis where he first experienced coding. He took on CS as a second major in his junior year as a result. On graduating, he perused a computer engineering career through MathWorks, having used MATLAB extensively in school.

Currently, Dan is in the midst of a 4 ½ year PhD program at Oxford University. His research is on the applications of category theory on machine learning – a branch of mathematics that can categorize items based on behavior. It helps in reasoning when it comes to invariances in transformations and objects. Framing machine learning in this way can help to extend existing algorithms, designing new ones, and help us understand exactly how algorithms work, according to Dan’s thesis. The perspective allows for strong claims that would be harder to assert from other perspectives. This theory has been applied to economics, quantum physics, and theoretical computer science, to varying successes. Dan aims to stay grounded and close to machine learning in his study and application to simply take a new perspective on a concrete field.

His research has the ability to be applicable in his work at Twitter but, overall, the work and the academic studies are very different. Dan wanted to study something less applied, to ensure he could view his PhD work as a hobby, rather than a stressor. Dan began researching potential PhD programs, assuming he would move to the location of his university and work it out with his job. After 8 months of discussion with Oxford professors, they managed to create a program framework for Dan. Once that was settled, he went to Twitter who was supportive, and now has even more extensive programs for employees wanting to tie their degree to their work at Twitter. Dan also worked at TrueMotion, an insurance company utilizing data augmentation from GPS and other systems to determine driver habits and helps set the insurance rates based on, for example, how hard you take turns or if you appear to be texting and driving.

Currently, Dan is the staff machine learning engineer which involves multiple initiatives across the organization, having input and influence outside of their current projects. It’s a great position in any company, Dan believes, giving engineers a voice into the higher levels of the company and into different parts of the organization. This was interesting to me since I assumed a staff-level engineer would have a more academic and research style role when, in reality, it’s a larger job function with more higher-level responsibility and oversight. One specific project Dan leads on is the ad serving framework: how much companies pay, how they go to auction, how Twitter places the ad, potential ad redundancies. The tools they utilize include a JVM-based backend (on which Scala runs), Tensorflow, some in-house systems built into the java from a pre-TensorFlow era. Most of their model training happens in Python while deployment and application happen in Java. For those looking to work with Dan, Twitter prizes culture fit and core fundamentals and tools experience over a host of impressive past projects. A holistic understanding of problems that appear in production machine learning systems is big, as well as general technical knowledge.

In the future, Dan has been seeing a lot of no-code tools. An example is BigQuery, which Dan used to train machine learning models with surprising ease. This type of tool feels like it’s the future to Dan. It won’t kill a slew of jobs, but jobs will begin to appear around business problems that can utilize low-code or no-code tool to get their job done with less technical work needed.

In this episode you will learn:

Dan’s neuroscience undergrad and MATLAB [4:12]
Dan’s PhD timeline and research [14:01]
How to start a PhD while working full time [22:45]
Dan’s work at TrueMotion and label data [30:39]
Dan’s title and role at Twitter [39:15]
Specific projects at Twitter [44:09]
What skills someone should bring to a Twitter job interview [52:06]
What machine learning approaches will be important in the future? [1:00:38]

Items mentioned in this podcast:

DSGO Connect
MathWorks
MATLAB
TrueMotion
Snorkel
BigQuery
Coding the Matrix: Linear Algebra through Applications to Computer Science by Philip N. Klein
Machine Learning & Data Science Foundations Masterclass

Follow Dan:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 451 with Dan Shiebler, staff machine learning engineer at Twitter.

Jon Krohn: 00:00:12

Welcome to the SuperDataScience Podcast. My name is Jon Krohn, a chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today and now let’s make the complex simple.

Jon Krohn: 00:00:42

Welcome to the SuperDataScience Podcast. I’m your host, Jon Krohn, and I am very grateful to be joined today by Dan Shiebler. Dan works full-time as a staff machine learning engineer at Twitter in New York while he’s simultaneously pursuing a PhD in machine learning from the University of Oxford.

Jon Krohn: 00:01:02

During this episode, Dan fills us in on how to get started on a PhD while working full-time, what it’s like to actually juggle the two, and tips for succeeding if you ever chose to go down that route, what the mathematical field of category theory is, and how it’s relevant to machine learning, proven strategies for labeling huge datasets, what revenue science is, what a staff software engineer really does and, finally, the software tools used at Twitter and the skills they look for when hiring.

Jon Krohn: 00:01:35

This episode will be of special interest to anyone who’s considered pursuing a PhD but might not want to give up their job. That said, much of the episode is broader than that, providing unique and fascinating insight into little known fields such as category theory and revenue science that I think will appeal to anyone. Here and there, Dan provides technical guidance and practical software tools that will be beneficial to hands-on data professionals.

Jon Krohn: 00:02:10

Dan, welcome to the program. You were last year on episode 345 in March 2020. Has much changed since then in the world? What’s your life been like?

Dan Shiebler: 00:02:23

Things have certainly changed quite a lot. I miss the office, where I recorded in the last episode.

Jon Krohn: 00:02:30

Nice. Well, you’re still in New York, right? So, you work at the Twitter New York office, but you’re still based in New York.

Dan Shiebler: 00:02:39

That’s right, less the free food.

Jon Krohn: 00:02:43

Oh, yeah. Do they do delivery or anything to make up for it? Probably not.

Dan Shiebler: 00:02:47

I wish. No. I’ve had to learn how to cook.

Jon Krohn: 00:02:51

Oh, maybe that is something that will become useful in the long run.

Dan Shiebler: 00:02:55

Perhaps.

Jon Krohn: 00:02:56

Yes. You were on episode right before the lockdown started in New York. I remember it all too well. Here we are a year later, probably didn’t have imagined that a year later we’d still be at home.

Dan Shiebler: 00:03:12

Yup, yup. Constant suspension of disbelief.

Jon Krohn: 00:03:17

Nice. Well, I’m excited to have you back on the program. So, for real, absolutely loved having you for episode 345. So, you were a star guest, and that’s why Kirill recommended that you come back on. I was blown away when Kirill introduced me to you because we have, from my perspective, an uncanny number of parallels. So, we both studied neuroscience as undergrads, and I continued … So, my master’s and PhD are nominally in neuroscience, but, actually, I was focused on machine learning through my entire PhD, and that PhD was done at Oxford University, where you are currently doing a machine learning-focused PhD just like I did. The final one is that we both have taught at the New York City Data Science Academy. So, I’m going to go through those one by one, but let’s start off by talking about your neuroscience undergrad. So, tell us about it and how that led to a machine learning career.

Dan Shiebler: 00:04:22

Totally. So, really, neuroscience was my first interest. I went to Brown undergrad, and one of the main reasons why I wanted to go to Brown in the first place is that they had a neuroscience major that people paid a lot of attention and seemed really exciting. I thought that there is nothing that can be more important in the world than studying how people worked. Really, the best way to understand how people work is to look at the science of how people work. Psychology seemed a little bit too soft for me, and neuroscience was really the most technical details that you could get in terms of how our brains actually work.

Jon Krohn: 00:05:10

I couldn’t agree more. I thought exactly the same way.

Dan Shiebler: 00:05:10

Yeah. So, that was what started me down the path of neuroscience. At the beginning, I had no strong attachment to coding or math or machine learning. I don’t think I heard the term machine learning once before maybe my senior year of college. I didn’t code in high school. I only learned how to code when I started to do data analysis for my undergrad when we did neuroscience research.

Dan Shiebler: 00:05:44

My experience with starting off doing data science in the basic biological data analysis, it wasn’t really called data science at the time, but it really was data science in my neuroscience undergrad is what got me excited about data science, about machine learning, about coding in the first place writing MATLAB scripts.

Jon Krohn: 00:06:09

I was just going to ask. Was it MATLAB? Yeah. Me, too.

Dan Shiebler: 00:06:09

Of course, of course. Yup, yup, yeah, yeah. I mean, it convinced me to take on a second major in computer science, actually, which I started my junior year of college. So, it wasn’t really a pivot because I continued doing neuroscience throughout my undergrad, but I added it on partway through.

Jon Krohn: 00:06:33

So, did you get in to … So, I guess in your junior year, you were taking first year computer science classes and you’re like, “Hey, guys. Check out these MATLAB scripts.”

Dan Shiebler: 00:06:44

Yup, yup. Yeah. A lot of the computer science classes, very different mindsets to the kinds of things that happened in the neuroscience department, the things that I needed to learn in order to do data analysis there. There was this class that was a MATLAB class that computer science students would usually TA but not take, and it was offered to engineers. It was like a scientific computing and MATLAB course. So, I TA’d that course because I knew MATLAB very well from my research, and then I was a computer science student. I TA’d it one year, and then I was the head TA another year. I got to make some of the projects for it, grade the tests and such, which is nice.

Jon Krohn: 00:07:30

Oh, nice.

Dan Shiebler: 00:07:31

That was a nice experience to blend near the end of my undergraduate experience is the computer science or MATLAB sides.

Jon Krohn: 00:07:40

Nice. So, here’s a question for you. What happened to MATLAB? I mean, through my PhD, I had one … So, for the chapters of my thesis, which for people who have done a PhD, which is obviously most people, you tend to break up your research into a few chunks, which are chapters of your doctoral thesis. So, I think I have five different strands of research that became the five chapters, the five core chapters of my thesis. One of those, I actually used MATLAB for it, but the others were all are in Python and I called myself RN Python during the PhD. It’s interesting how much MATLAB was used in science, maybe still is taught to undergrads in science, but nobody uses it professionally. I don’t think I know anyone.

Dan Shiebler: 00:08:29

Yeah. So, fun fact, actually, I did work at MathWorks for several months.

Jon Krohn: 00:08:34

Wow! You’re the person to talk to.

Dan Shiebler: 00:08:40

I have a bit of a soft spot for MATLAB from that experience. Yeah. After I graduated college, I have last minute pivots and I decided I did in fact want to pursue a software engineering career, and I knew MATLAB very well from having TA’d in this course. So, going to MathWorks was a relatively easy path. I mean, when I was there, they were doing quite well. This was six years ago.

Jon Krohn: 00:09:14

I think they are. It just shows that I’m … I don’t know what it is about my experience professionally, but when I go to conferences, it seems like today, particularly, so on the machine learning side, people are almost always using Python.

Dan Shiebler: 00:09:29

MATLAB is expensive. I mean, I think that’s one of the really big problems. It’s not an open source software. Really, this is something I didn’t understand this before I’ve worked at MathWorks, but I understood this that the MATLAB product is not really the programming language. It’s the customer support, that it interfaces very tightly with NASA, with Boeing, with these massive companies, basically, all engineers who are employed there, who then utilize MATLAB for modeling and part of their engineering jobs. When they face various eccentricities of the language, they don’t want to be chasing down the sorts of issues that someone who might be more … wants to spend all their time thinking about coding or thinking about building things in a program language would want to think about. They want to just have their things work properly.

Dan Shiebler: 00:10:31

So, having the ability to talk to someone and send questions that will then get someone spends a lot of time looking through their questions, helping them through the things as part of a MathWorks contract is a very attractive deal. So, it’s a totally different model. I don’t even consider it necessarily a direct competitor to Python at all. Of course, this could very have changed in the years since I’ve been there. I don’t work on that level now.

Jon Krohn: 00:10:59

Well, I’m sure you’re right. I do know people who work in MathWorks still today and they’re incredibly clever people just like you. So, I have no doubt MathWorks is still crushing it. It’s just interesting how they don’t come up in conversation as much as I would have expected when I was using it all the time as an undergrad and even a bit in my PhD.

Jon Krohn: 00:11:20

Another interesting thing, I mean, it follows directly from what you’re saying, another great thing about MATLAB is because it is a relatively contained ecosystem, everything works exactly as it should, and everything is documented to the same high level standard. So, you never run into … You talked about bumping up against eccentricities. I don’t remember that ever happening in MATLAB in the same way that ends up happening if you’re using an open source library where you run into versioning issues all the time, you run into situations where there is no documentation all the time, and that’s just part of the open source life.

Dan Shiebler: 00:12:01

Totally. Yeah. I mean, it’s a trade off. There’s a lot of flexibility that open source gives you and the ability to … There’s just more code that you can access that other people have written and a larger community of people that you can look at the things that they’ve been building. In the flip side, there’s all these weird issues that you need to be able to deal with, environments that you need to handle that not online before you.

Jon Krohn: 00:12:27

Yeah. It takes a lot of trust. I’m sure this method works perfectly behind the scenes.

Jon Krohn: 00:12:35

You may already have heard of DataScienceGO, which is the conference run in California by SuperDataScience, and you may also have heard of DataScienceGO Virtual, the online conference we run several times per year in order to help the SuperDataScience community stay connected throughout the year from wherever you happen to be on this wacky giant rock called planet Earth. We’ve now started running these virtual events every single month. You could find them at datasciencego.com/connect.

Jon Krohn: 00:13:07

They’re absolutely free. You can sign up at anytime, and then once month, we run an event where you will get to hear from a speaker, engage in a panel discussion or an industry expert Q&A session, and critically, there are also speed networking sessions where you can meet like-minded data scientists from around the globe. This is a great way to stay up-to-date with industry trends, hear the latest from amazing speakers, meet peers, exchange details, and stay in touch with the community. So, once again, these events run monthly. You can sign up at datasciencego.com/connect. I’d love to connect with you there.

Jon Krohn: 00:13:49

Cool. So, neuroscience was your first love, but you went to the workforce. You worked at MathWorks. You now work at Twitter, which we’ll get to in a bit, but while you’re working at Twitter full-time, you are also doing a PhD on the side, and it seems like, I mean, fill me in more on this, it seems like, well, so what’s the timeline? Are you planning on doing it in the same three to four-year timeline that you would usually do a PhD at Oxford or a little bit longer?

Dan Shiebler: 00:14:21

Yeah, a little bit longer, but not much longer. I’m targeting around four and a half years, but we’ll see exactly how things shake out, but I’m thinking about a nice path. I actually just completed my transfer of status the other day.

Jon Krohn: 00:14:38

Oh, nice. Congratulations. Was it stressful?

Dan Shiebler: 00:14:42

Not too bad. I felt like I was pretty prepared.

Jon Krohn: 00:14:46

Nice. So, I think it’s an Oxford-specific thing, really, this transfer of status. So, it’s in the entire, I guess, the three or four years that I was doing my PhD, other than the final evaluation at the end, where you have your draft of your dissertation and you get formally examined, this was the only other assessment in the entirety of that entire time.

Jon Krohn: 00:15:12

So, it does leave a lot of room for procrastinating in my experience. I missed having from my undergrad experience of like, “All these quizzes, all these tests to keep me in shock.” I was so studious. Sometimes at Oxford, months go by and you’re like, “Wondering the last time I really did some work was …” It’s because you’re like, “Well, I have this one transfer of status evaluation coming up in a few years. I might as well enjoy the weekend.” Well, congratulations. It is no small feat to be well-prepared for that and get through it.

Dan Shiebler: 00:15:48

Thank you.

Jon Krohn: 00:15:50

So, yeah, so you’re planning on doing it on roughly the same kind of timeline. Are there synergies with what you’re doing at Twitter or is it relatively independent? Tell us about the research that you’re doing.

Dan Shiebler: 00:16:01

Totally. So, there are synergies. My research is based around the applications of category theory to machine learning. Category theory is this, depending on who you ask, it’s either very hip or very esoteric. It’s esoteric if you’re in the applied world and hip if you’re in the pure math world, a branch of mathematics that is, it’s useful for characterizing things in terms of how they behave rather than in terms of the direct set of axioms that they satisfy.

Dan Shiebler: 00:16:34

There’s a lot of nice ways that category theory lets you reason about the invariances of transformation and the invariances of objects, what sorts of structure some kind of transformation preserves, and what sorts of structure things are susceptible to. It lets you reason about these things formally. So, my thesis, essentially, is that studying this, studying these properties through the lens of category theory that the words that I use are compositionality and functoriality, which are two main things that category theory has as its core. Framing machine learning components in this perspective will give us new ideas about how to extend algorithms, how to design new algorithm, and how to better understand how algorithms work.

Jon Krohn: 00:17:29

All right. I’m going to take a crack at trying to explain back to you what you just said.

Dan Shiebler: 00:17:33

Totally.

Jon Krohn: 00:17:35

I’m probably going to land flat on my face. So, the idea here is that instead of … So, if you had some mathematical concepts, instead of trying to describe them by specific axioms, like specific proofs, you’re describing these mathematical functions or objects in terms of what they can do, in terms of their actual behavior. Is that roughly? Am I roughly in the right ballpark?

Dan Shiebler: 00:18:06

Essentially, yes. You would say that a lot of things in category theory are defined implicitly. You say, “Well,” rather than say a group is something that satisfies this set of constraints or a matrix is a 2×2 array, you would say that a group is the only object that has this kind of transformation to this thing, this sort of transformation, this sort of transformation to that thing. When you have this set of … This is the only thing that satisfies these transformations. It behaves this way.

Dan Shiebler: 00:18:41

That’s a powerful perspective. It gives you the ability to make very strong claims in certain circumstances that are harder to come to from other perspectives. There’s many different ways. It seems so esoteric. There’s actually many different ways to apply something like category theory to a field like machine learning. There’s a burgeoning field of applied category theory, where people take this category theory idea, they try to apply them to economics, to engineering, to quantum physics, to the theoretical computer science. Some of these are very difficult to get purchase on. Some of them there’s already tens of papers that are being written that are describing this.

Dan Shiebler: 00:19:34

My approach is to try to stay as close to the application as possible and try to stay as grounded in particular machine learning algorithms or particular settings that we can describe still theoretically, but from a more concrete setting, and then try to identify what’s a slightly different way to view this concrete thing that then lets us expand a little bit higher.

Dan Shiebler: 00:19:59

I’ve been focusing a lot on clustering algorithms recently and manifold learning algorithms, just saying, “What is the simplest way that you can describe a clustering algorithm?” There’s a natural hierarchy of different kinds of transformations in category theory. So, if we can explain a certain kind of clustering algorithms, one of these transformations, then there’s this natural question of when we make these modifications to these other kinds transformations, what do we get out of that? What kind of new algorithms come out? Then we could actually implement these and run them on data and have novel approaches or write proofs about them.

Jon Krohn: 00:20:41

Wow! Cool. So, then I guess clustering might be a part of what your work is at Twitter.

Dan Shiebler: 00:20:51

Yeah. So, I would say that the intersection points between what I do in my research and what I do at Twitter comes in at this space like there’s an algorithm that I might use in my research and it’s like, “Actually, this works nice. I wonder how this would work in the Twitter stuff,” or maybe I’ll come across a challenge at Twitter and I’m like, “Oh, that’s an interesting case where you have this sparse to dense relationship. I wonder what that might look like on my research side.”

Dan Shiebler: 00:21:22

In general, I would say that they’re very different largely by design. I purposely chose not to do my PhD in something that was more applied and study graph research or natural language processing, which is something that comes up a lot at work because I knew that if I was going to do these at the same time, they would only work if I was able to view my PhD work as closer to a hobby than to something that takes a lot of stress and mental energy.

Dan Shiebler: 00:21:53

So, I tried to pick what are the major stressors at my job and make sure my PhD has none of those. It has its own set of stressors, but there’s no duplication. So, when I switch from my job work to my PhD work, it feels like I’m switching from my productive job to my hobby, where now I’m doing some things to unwind. So far, it’s been successful.

Jon Krohn: 00:22:21

That is great to hear. I’m glad to hear it’s going well. That is a pro tip in terms of how someone might approach doing a PhD while simultaneously working full-time. Can you provide us with some more context on how that happened, how it came about? Did you have to bring that to your employer? Does Twitter have existing support for people who are doing PhDs? Do you know other people that have done it? So, basically, for me and for our listeners, what is the experience like of getting started on doing your PhD while you’re working full-time? How easy it is to get that going? Whether there’s specific special things that you need to do to be working remotely like that and presumably … Well, actually, there’s a number of reasons why your Oxford one is interesting. So, not only are you in a different timezone.

Jon Krohn: 00:23:12

So, I was going to say, typically, people might be expected in a lab during work hours, but you’re at work during work hours. On top of that, you’re also on the other side of the Atlantic Ocean. So, maybe that’s another thread that we can go down. So, I’ve now posed a number of questions. It’s probably going to be hard to keep track of all of them, but, please try to answer all of them.

Dan Shiebler: 00:23:36

Totally. Totally. Totally. Yeah. I’ll give my best shot. So, I guess for the first question of how it’s happened in the first place is, I mean, I was working at Twitter. I had been forming in my mind this idea that I wanted to do a PhD, but I really enjoyed my job at Twitter, and I felt like I had good momentum in my job and I didn’t want to leave and do a PhD.

Dan Shiebler: 00:23:58

So, I was exploring. My original plan was that perhaps I would try to negotiate something with my job where I would move to wherever I got accepted at PhD and work part-time or maintain full-time work but actually live wherever my PhD was. Then when I started looking into different programs and I sought programs at Oxford, and Oxford offered some programs part-time on their website, a few things in the Department of Engineering, a few things in the Department of Statistics. So, I reached out to professor, and I told him what I was interested in and my situation, and managed to work out this arrangement. It’s an enormous organizational lift. It took almost eight months, I think, of discussion, flying to Oxford, working closely with him on a number of projects with where I was actually accepted as a student, but it worked out very well. I’m very happy how it worked out.

Dan Shiebler: 00:25:12

The situation, the Twitter side is much simpler. I didn’t really discuss anything until all the details were finalized at Oxford because I figured there’s no real point in having both lines of discussion open at the same time, but once things we’re settled and I knew the situation, I went to my boss at the time and he was very supportive. He thought it was an interesting thing to do.

Dan Shiebler: 00:25:39

At the time, I think there was nobody else who’s doing this sort of thing. I knew a few other people who are doing master’s programs at the same time. So, that was not too uncommon. There’s at least two or three other people who I worked with regularly who are doing master’s programs at the same time, which I think was part of why nobody … The fact that I was doing PhD and they were doing master’s, it didn’t seem to be something that meant I needed to be put in a different category, although things have actually changed since then.

Dan Shiebler: 00:26:11

There are a few people now who are at Twitter who have joined appointments at universities, and there are some employees who are both a Twitter employee and a student at a university with their boss at Twitter and their adviser in their PhD is the same person. Of course, their PhD work is extremely tied to their Twitter work. So, it’s a different sort of circumstance, but that wasn’t an option when I first went down the path of looking for where I ended up.

Jon Krohn: 00:26:48

Let me make sure I’m getting this right. So, the person, their supervisor also works at Twitter-

Dan Shiebler: 00:26:54

Yes.

Jon Krohn: 00:26:54

… and is a lecturer or a professor at a university.

Dan Shiebler: 00:26:59

Yeah. So, I don’t know how many people have this. I know for sure Michael Bronstein, who’s a professor at Imperial College, he’s also head of graph learning research at Twitter. So, there’s a few people in his group who are also his PhD students. To my knowledge, they were Twitter engineers first and then applied to Imperial College.

Jon Krohn: 00:27:23

Nice. Yeah, and that’s also for people who aren’t aware, Imperial College is an outstanding British university in London, which mind-bogglingly outside of England people have almost never heard of.

Dan Shiebler: 00:27:36

Great research in machine learning.

Jon Krohn: 00:27:38

Great research. They’re often the top ranked research university in the UK above more internationally recognized places like Oxford and Cambridge. It’s a really interesting situation. Someone needs to work on their branding.

Dan Shiebler: 00:27:52

Yeah. I mean, I think, in general, people in the US have relatively limited visibility into universities outside of the US. I mean, there’s a number-

Jon Krohn: 00:28:02

Yeah, I guess that’s true.

Dan Shiebler: 00:28:04

Yeah.

Jon Krohn: 00:28:04

Yeah. I don’t know what it is about it. Maybe, yeah, I don’t know. Somehow I think Oxford and Cambridge are names that people know. Maybe they show up in movies and TV shows more often than-

Dan Shiebler: 00:28:13

There’s a lot. I mean, the word Oxford is very common, the Oxford Dictionary. There’s a bit of a … Cambridge, as well, we have Cambridge, Massachusetts.

Jon Krohn: 00:28:31

Yeah, which it’s interesting. I think it’s not a coincidence that Cambridge, Massachusetts is called Cambridge, Massachusetts because … So, I’m stretching a little bit here, but Harvard University, which is based in Cambridge, Massachusetts, they used the same degree name conventions as Cambridge, which was a little bit unusual at the time. So, at that time, it was more common to use the Oxford convention. So, we’ve been talking about the degree that you’re pursuing all this time as a PhD, but, technically, it’s a DPhil, which is it’s the same Latin words but the other way around, so doctorum philosophiae. I’m probably butchering that, but it’s the same words as PhD, which is the philosophium doctorae or something. I don’t know. So, they’re the same Latin words, but the other way around.

Jon Krohn: 00:29:23

Cambridge was super unusual to call that degree a PhD at the time and it was Harvard University that somehow seeded those degree names becoming common around the world. Anyway, so now I’m feeling it’s not a coincidence that the city at Harvard is-

Dan Shiebler: 00:29:43

Yeah. I can’t say no. It would not surprise me at all.

Jon Krohn: 00:29:48

Yeah. All right. Well, somebody is hopefully researching this as they listen to this, and you can send me a LinkedIn message or something and let me know how wrong I was or, hopefully, I was right on the money. If I was right on the money, then make it a public LinkedIn post and tag me. If I was wrong, make it a private message. No, I’m kidding. Either way, you can make it a public post. I am shameless. Cool.

Jon Krohn: 00:30:13

So, you told us a bit about your research at Oxford, and we have a sense of what it’s like to study while you’re working. It sounds like you’re enjoying it, and I guess you don’t have any reservations about it. You would probably recommend it. It sounds like the logistics of getting it approved on the university side might not always be the easiest thing in the world, but it’s possible and really exciting.

Dan Shiebler: 00:30:37

Totally.

Jon Krohn: 00:30:39

All right. So, then we’re finally our way to the third and final thread that is common between us, which is teaching at the New York City Data Science Academy. So, I only ever taught there on Saturdays. I had a deep learning curriculum that I taught there for years and it formed the basis of my book, Deep Learning Illustrated, but, actually, I didn’t look up. I don’t know how I noticed. I think I might have just seen it on your LinkedIn that you’re an instructor there. So, I don’t know what you taught or anything else about it. Feel free to fill me in.

Dan Shiebler: 00:31:10

Totally. Yeah. So, I didn’t have a full-time position as an instructor there. I gave several talks. I think I gave talks to three separate cohorts over the course of about a year and a half. My talks, it was all during the time that I was working at TrueMotion, which is the company I worked at before Twitter. My talks were all very based around … Something that I like to talk about a lot then and I still talk about now when I give public talks, which is label engineering or just the process of given that you’re in a situation where you don’t have access to high quality labels, how do you reconcile that, high quality labels to train a machine learning model with. That was a very common problem at TrueMotion. It’s less of a problem at Twitter or less of a central problem, but still very important, but at TrueMotion, it was a do or die problem.

Jon Krohn: 00:32:07

Yeah. Tell us about TrueMotion. I guess something that is probably obvious to most listeners as a label is just if you have a bunch of pictures and some of them are of cats and some of them are of dogs that you actually have the ones that are cats labeled as cats and the ones that are dogs labeled as dogs. So, that way, you can train a supervised learning algorithm we’d call them. So, supervised learning often allows us to do a lot more. When we have those labels, there’s a lot more inference and prediction that becomes possible relative to if we just have a dataset of a bunch of pet photos, which there’s still a fair bit you can do, but, yeah, labeling data is a key to building great machine learning models in a lot of cases, certainly a lot of real world use cases.

Jon Krohn: 00:32:59

Anyway, tell us about TrueMotion and why you didn’t have labeled data as well as maybe some tips as to … Give us a little taste of what you would lecture about.

Dan Shiebler: 00:33:09

Totally. Totally. Absolutely. So, TrueMotion is a company in Boston that I worked at for about two years. They develop technology for insurance companies, particularly car insurance. The technology integrates into insurance company apps as an SDK and it’s utilized for usage-based insurance pricing, so basically to determine how much somebody should be charged for their car insurance based on how they drive.

Dan Shiebler: 00:33:41

So, a lot of the problems that we solved was given the stream of GPS and motion sensor data, do inference on how good of a driver somebody is, how likely they are to get into an accident, how likely they are to cause various kinds of wear and tear on their car. There’s a lot of submachine learning problems that go in on here. Most of them were caused by the fact that the way that the software ran was not something where we could ask the user, “Before you drive, load the app, open it up, place it on the mount.” Everything needed to be completely in the background.

Dan Shiebler: 00:34:20

I mean, users, nobody is getting tricked. Everyone knows they’re signing up for a usage-based insurance program and downloaded this app on their phone, but it needs to be non-obtrusive, and it needs to be accurate without requiring any input from the user.

Jon Krohn: 00:34:33

I love the idea of how if you were to ask them for feedback, it could be the app asking them for feedback that leads to accidents.

Dan Shiebler: 00:34:41

Yeah. Very real, actually. That is something that we’ve had to discuss, the degree to which we would involve the users in the app, especially during driving time. Can you prompt them for labels or things like that?

Jon Krohn: 00:35:01

Let us know if you’re encountering horrifically icy conditions.

Dan Shiebler: 00:35:04

Yeah.

Jon Krohn: 00:35:05

Please pull the app out of your pocket at that time.

Dan Shiebler: 00:35:09

Yeah. Send a notification. Are you looking at your phone right now? Yes. No. So, the kind of problem was identify when someone is texting and driving based on the motion sensors of the phone or identify when they’re taking a turn too hard or identify when they’re driving, which is actually the hardest of all the problems, driver identification, identify when they’re driving, and if there’s multiple people who own a car, identify who’s driving. Distinguishing between a car and a bus and a train and a bicycle is also difficult. So, there’s mode of transit problem. For a lot of these, labeled data was extremely sparse or nonexistent. So, there’s a lot of creativity that goes into the proper derivation of these labels or how we can utilize a small amount of labeled data to make much larger inferences or to transform somewhat labeled data into more labeled data.

Jon Krohn: 00:36:09

Nice. So, are there other any specific practices or pieces of technology that you recommend for any of those kinds of situations like taking a small set of labeled data and being able to infer what the labels would be on a bigger dataset?

Dan Shiebler: 00:36:28

Totally. So, I mean, in terms of technology, one piece of technology that I think is great is Snorkel, which is it’s out of a lab in Stanford that they do a lot of these label augmentation work. We utilized Applause, which was … I think it’s very similar to Amazon’s Mechanical Turk, although a little bit more interactive as our company that we contracted with to hire people to use the app and then provide labels on what they did.

Dan Shiebler: 00:37:02

So, a lot of the time, the bottom line with generating labeled data or utilizing this label engineering is a thought process of proxy labels. It’s often the case that there’s a hierarchy of information that you have. Sometimes users will provide permissions only in certain circumstances. Those permissions will give you some amount of information or their battery will be high enough that you could tap into the GPS. So, if you’re trying to build an algorithm that only operates on motion sensor data, then you could utilize your more accurate algorithm that incorporates GPS data and incorporates other kinds of user signals as the labels for your algorithm that needs to operate under more stringent circumstances.

Dan Shiebler: 00:37:54

That higher level algorithm might be trained on some even higher resolution type of label that involves manual user feedback. Often, what will happen is we would … We have these little accelerometers that we’d strap to cars and we’d drive around and do all sorts of crazy things, and then we’d also fill the car with hundreds of cellphones and they could be put in every possible position and we would train these phones on full resolution maximum battery, all the sensors kinds of algorithms on these accelerometers. Then we’d train low batter versions of all these algorithms on the high battery versions of the model that was trained on much more data, and then we’d have something that was actually trained at a very large, very diverse dataset, all starting with just a very small amount of very high resolution data. So, an outward expanding process of going from high resolution to low resolution signals.

Jon Krohn: 00:38:58

Nice. That is super clear. That is a great tip. So, since TrueMotion, you now find yourself at Twitter. We’ve talked about that a little bit, obviously. We’ve talked about specifically how your PhD research is not related to your Twitter. So, first off, congratulations. Since the last time you were on the show, your title has changed from senior machine learning engineer to staff machine learning engineer. So, I understand that’s a promotion not a demotion. So, tell us what this staff title means. We see these staff titles at many of the big tech companies have these kinds of staff scientist or staff engineer roles. What exactly does it entail?

Dan Shiebler: 00:39:41

Totally. So, staff is the fourth level up the ladder, not including the apprentice or intern roles. Usually, there’s a SWE1, as we call it at Twitter, then a SWE2 level, then a senior level, then the staff level above that.

Jon Krohn: 00:39:56

SWE, I guess, is software-

Dan Shiebler: 00:39:58

Software engineer, yeah, or MLE is the other term, which is machine learning engineer.

Jon Krohn: 00:40:03

SWE, but W is the most annoying letter to abbreviate with because it’s almost always more syllables than whatever you’re abbreviating.

Dan Shiebler: 00:40:10

Yes. Absolutely.

Jon Krohn: 00:40:12

All right. SWE, I like that.

Dan Shiebler: 00:40:13

Yeah, or MLE, which I guess is probably more accurate because it’s machine learning engineer.

Jon Krohn: 00:40:17

I think it’s MLE, MLE.

Dan Shiebler: 00:40:20

MLE. I haven’t heard that one. Yeah. So, usually, staff engineer is a role that corresponds to having cross-org responsibility, having a role in leading a relatively large team or driving multiple initiatives across organizations. Usually, one of the hallmarks of staff engineers is that they have inputs and influence outside of the projects that they’re working on. They’re often called them to review proposals or to have input into larger scale technical decisions that affects multiple organizations or the whole company.

Dan Shiebler: 00:41:08

So, this is something that I think is a very good thing to have in a company to have a staff level on the ladder and give engineers a voice at these higher level organizational decisions. In a lot of companies, these sorts of roles where, I mean, some staff engineers barely code at all. I certainly code much less than I did as a senior engineer and some do almost no coding, but have a lot of influence in terms of technical direction of the company, in terms of setting the roadmap for work streams that require a lot of connecting work, connecting upstream data pipeline work with downstream work on models that will consume those and the parts of the front end that then consumes the predictions of a model.

Dan Shiebler: 00:42:04

A lot of this requires organizational alignment. So, at some other companies that don’t have this level of technical ladder, this would normally be folded into the role of managers, and just a larger staff of managers who maybe hired from outside of engineering backgrounds playing a role in these decisions. At a company like Twitter, the role of manager is separated a little bit from these more explicit technical decisions, which are delegated to higher level engineers. Although it depends on exactly the organization and exactly the project. That’s largely the way that it breaks down to.

Jon Krohn: 00:42:47

Nice. That is usually informative. I was not aware of most of that. In fact, I had an inkling that it was almost the opposite of … So, I had this idea that a staff machine learning engineer or a staff software engineer might almost be more academic that you had more time to be doing research, but, actually, you’re playing a bigger role than ever in the organization.

Dan Shiebler: 00:43:14

Yes. That’s absolutely the case. So, I would say a lot of people say that the work courses of an organization are the senior engineers and the SWE2s who churn out most … The vast majority of code is probably written by senior engineers and SWE2s when you look at what actually gets done. I think research is a little bit of a tougher question because there’s always research directions, but certainly looking at who’s published the most papers at Twitter, I would be surprised if the largest by far disproportionate to the percentage of employees would be senior engineers.

Jon Krohn: 00:43:56

Right, right, right. Cool. All right. Now I know. I would have never have talked about my assumptions out loud, but it’s nice to now be able to speak knowledgeably about it in the future. So, now, tell us perhaps some specific projects that you’ve been working on at Twitter. How does being a staff machine learning engineer provide you with this scope over more of the organization and be able to do more? Yeah. Fill us in.

Dan Shiebler: 00:44:28

Totally. So, previously, when I last was on the podcast, I was still a part of Cortex, which was a part of Twitter that’s responsible for their core machine learning pipelines and core machine learning models. I was on Cortex for about three years. I branched out, which I encouraged Twitter moving from branching out moving from one part of the company to another at almost exactly the same time that I was promoted from senior to staff because an opportunity opened in the revenue science organization, which is within the ads part of Twitter, essentially, to lead the website direct response advertising product, which were building out and improving that.

Dan Shiebler: 00:45:18

So, this is basically the space of performance ads. There’s advertisers who will pay to have their ads shown on Twitter and they’ll pay an amount that is directly related to the value they expect to get out of showing these ads. So, these aren’t the kinds of ads where you just show a billboard and you’re like, “Hey, this is an announcement. This thing exists,” or something that’s more long-term potential put this in front of people for your considerations at some point in the future.

Dan Shiebler: 00:45:50

These are ads that are high intent ads. The goal is to show this to somebody, have them click on it and make a purchase or have them click on it and visit the website, and there’s a very technical machine learning breakdown on what happens here in terms of we have the large set of such ads that we might show. Our goal is to display them on the Twitter app and certain locations on the app, and we choose to display them in places based on the expected value of showing it, of how much money the advertiser has bidded on a particular event like a click or a purchase or a download if it’s an app that we were advertising, and then our models estimates of how likely this event is to occur.

Dan Shiebler: 00:46:42

So, this all goes into an auction, which has its own set of interesting engineering and mathematics, and the crazy dynamics that happens, and then the actual ads are displayed. The quality of these estimations of how likely somebody is to do these sorts of things with an ad, as well as a whole host of other models that check whether or not an ad is trying to game the system in a bad way or whether it was produce a poor user experience or if it will be redundant with other ads that we’re showing to users are all part of this infrastructure and system.

Dan Shiebler: 00:47:25

My role is leading the development of a subset of this kinds of ads, which are ads for advertisers who are particularly trying to drive users to their websites to make purchases.

Jon Krohn: 00:47:40

Nice. I mean, other than driving people to websites is to make purchases. How else would you get people to buy something right away?

Dan Shiebler: 00:47:50

Another very large area is mobile apps, getting people to download mobile apps and then make purchases of mobile apps. Many people, that’s a major-

Jon Krohn: 00:47:59

When people see the ad, they immediately get up out of their seat and go to their local grocery store.

Dan Shiebler: 00:48:05

Yup, yup. Another is video views, which it’s really somewhere in between what you’d call performance ads and reach ads. It’s performance ad because we’re not just showing it to someone and saying, “That’s enough,” the person need to actually click on the video and watch it. So, there’s a modeling component, a model user’s behavior. You model how likely someone would actually click on this and watch this video after we show it to you. So, the revenue science organization is also responsible for those. Those are very important, too.

Jon Krohn: 00:48:37

Nice. So, what kinds of software tools would people in the revenue science organization use everyday? So, presumably, everyone’s using MATLAB all the time.

Dan Shiebler: 00:48:48

Number one most used tool. No. I don’t think that Twitter has any relationship with MathWorks I’m aware of. So, a lot of our backend is JVM-based. Twitter is very Scala-focused. I mean, it’s one of the largest companies that had Scala at its core, although a lot of the ad tech is in Java, partially for legacy reasons, partially because a lot of the people within the ads organization are more familiar with it, but the two are very interoperable and there’s many, many places where they hit, but it smoothly blends from one to the other.

Jon Krohn: 00:49:29

JVM is Java Virtual Machine?

Dan Shiebler: 00:49:32

Yes, that’s right. Yeah. Scala runs in the JVM, which is why Java and Scala plays nicely together. For that reason, we use TensorFlow, which also has a really nice integration with JVM. TensorFlow, which is mainly Python-based, is our primary system that we utilize for deploying models. Twitter has some in-house systems that are built in Java that are from a very, very long time ago before TensorFlow existed, that also are running in certain circumstances, but the TensorFlow is our primary modeling tool.

Jon Krohn: 00:50:09

Yeah. It’s interesting how much people associate the TensorFlow library with the Python software language when in fact it was created in C++, imported over to Python. It does have its primary interfaces. I think the most supported interfaces for TensorFlow today are C and Python, but tons of other software languages like Scala are supported in an official capacity. So, TensorFlow is a really nice language for affordability between languages, between devices, but TensorFlow graphs can be, yeah, you can move them between languages, you can move them between a large number of servers to an embedded device on a car or a mobile phone, even execute in someone’s browser. It is certainly a versatile language. I think it will be entrenched in our devices for decades to come.

Dan Shiebler: 00:51:08

I think that’s probably right. TensorFlow really is a great piece of technology. It had a bit of a bumpy start in terms of usability. It still has a very steep learning curve, but its interoperability is very high. It’s pretty well-supported. I like it as a tool.

Dan Shiebler: 00:51:27

Really, the most model training on TensorFlow at Twitter happens in Python. It’s part of a larger Python-based data science ecosystem, but model execution happens within a Java environment. So, there’s optimization of Twitter-specific TensorFlow operations that’s written in C++. There’s the deployment engines, which are written in Java, and then all of the actual model training is in Python.

Jon Krohn: 00:51:57

That all makes perfect sense, and it’s a perfect use case for that kind of interoperability. It definitely shows how that can be useful. So, if somebody was looking to work at Twitter, what kinds of things do you look for in the people that you hire?

Dan Shiebler: 00:52:16

It really is very role dependent and depending on the level that someone is coming in at. When someone is at earlier in their career, what exactly they’ve worked on in the past and the exact technology that they’re familiar with is a lot less important than whether or not they’re a culture fit, whether or not they have core fundamentals, and are clearly able to be competent with the tools that they do know, and there’s less of an expectation that somebody will be able to come in and make a huge impact on our legacy systems immediately.

Dan Shiebler: 00:52:58

For someone who’s at a higher level, having a holistic understanding of just the kinds of problems that appear in production software systems, especially if it’s a machine learning role, which is really the only kinds of roles that I’ve ever interviewed someone for, having an understanding of the long list of production machine learning problems that always occur is really, really critical.

Dan Shiebler: 00:53:30

This is an addition to the general technical knowledge that is usually used as a bar at most large tech companies, and then, of course, culture fit and having a good attitude. I think we do a disproportionately large number of interviews that are based around someone’s ability to have self-reflection and fit with our culture.

Jon Krohn: 00:53:57

Oh, nice. So, when you talk about a long list of common machine learning problems, this would be things like feature drift, the kinds of problems that if you had only studied machine learning academically or even if you are doing it hands-on through courses provided by Udacity or SuperDataScience, you might not know how in production there’s these specific kinds of problems that you’re always running in to.

Dan Shiebler: 00:54:28

Yeah. Yes, that’s the case. I mean, that’s why I say this as something like if someone’s coming out of a university, we don’t expect them to have all the experience of someone who’s worked in machine learning jobs in the past. When we hire someone at a higher level, we look for the traits that will allow them to make an impact and in order to make an impact, it’s important that you have thought deeply about the kinds of problems that are likely to occur in reality. So, yeah, I mean, like feature.

Dan Shiebler: 00:55:05

Another huge part is organizational alignment. It’s very common that you consume features or produce features or have models running in environments where people leave, organization shifts, a code, [inaudible 00:55:19] on some one part of a library gets deprecated, that there’s an enormous number of these software issues that have huge impact on production machine learning systems. Most of our time is spent designing things in a way that minimizes the impact of these issues and then also directly mitigating them.

Jon Krohn: 00:55:42

Yeah. I think probably to somebody who is thinking about getting started in data science and machine learning, you probably expect that when you’re working in the field, you spend all of your time thinking about particular modeling approaches, and maybe new data sources, interesting ways that data you do or could have access to could be used in a model. In fact, those are the minority of the time. It’s a great part of the time, but it’s some of the most enjoyable, but the reality is that, especially the closer you are to the production end of the spectrum, the more time is consumed with the kinds of issues that you’re describing, which are-

Dan Shiebler: 00:56:29

Yeah. I mean, I think that. So, that’s definitely the case. What I’ve found is that it varies a lot depending on exactly where in the organization that you sit. During my time at Cortex, I got to experience several different roles within Cortex, including a stint on Cortex is applied research organization, part of their platform, machine learning platform tools.

Dan Shiebler: 00:56:54

So, right now, I’m what would be called a product engineer. I’m directly working on a product that ships things to users and my impact, any launch that I have I can launch at an AB experiment, directly see how much revenue that increases. Whereas when I was on Cortex on a research capacity, it was a lot more difficult to identify exactly what might the things that I’m doing affect revenue. When I was on a platform capacity, even harder because the systems that I would build would then plug in to some other team systems. I mean, who knows what of those teams’ wins that they got were actually due to the things that I did to make their systems easier to use.

Dan Shiebler: 00:57:39

The bottom line, the point I would made at that pent at this time, that people working on each of these different parts think about very different things. On a products part, I think a lot about how the things that I’m working on corresponds to the general product vision and the things in the product timeline that are coming up later, whether or not these upstream datasets that’s my models need to consume are able to meet all of the proper SLAs and if we need to change our strategy or counter that.

Dan Shiebler: 00:58:13

When I was on Cortex applied research, the challenges were very different. There was a lot more time spent doing literature versus time spent implementing some really gnarly TensorFlow code to do some crazy sampling strategies. Still a lot of these kinds of organizational issues, and in a lot of ways, there’s more frustrate because sometimes I had less visibility into why these issues were occurring. We built something, we thought it worked really well, and then it was like, “Oh, it can’t go into production. Sorry.” It’s some product thing that some series of reasons why this isn’t the best thing to do right now but we didn’t have visibility in to.

Dan Shiebler: 00:58:53

So, I guess it’s a trade off. When you spend more of your time doing these more fun parts, you have less control over how much impact you’re actually able to drive in the business. You will be able to spend more time using cool technology. You will be able to spend more time thinking about these cool ideas, but when the time comes to actually ship software that makes a real difference, you can very well have been working on something that just was not aligned with the core company priorities or less aligned than you thought it was, but if you spend all your time thinking about the corporate company priorities and how to align your work with that, you’re very unlikely to be working on the coolest technologies and the coolest kinds of machine learning challenges. So, I think there’s a fundamental dichotomy here and it really just depends on which path someone wants to go down.

Jon Krohn: 00:59:49

That dichotomy is a great insight. I hadn’t thought about it that way before, and you articulated it perfectly. I have, well, my questions are going to start getting us toward wrapping up. In a vein related to your position at Twitter, one of the world’s biggest tech companies and your experience as a staff machine learning engineer there, you have a lot of insight into where technology applications are going and then through your PhD, you also are developing a lot of insight into what kinds of research is becoming more and more important, your particular research straddles pure and applied. So, what do you think, what kinds of machine learning approaches or ideas do you think are going to be important or prevalent in the coming years that listeners can maybe get in front of?

Dan Shiebler: 01:00:51

So, one thing that I think I’ve seen a lot of recently that’s been a growing trend both within machine learning and without machine learning is no code tools and tools that make training machine learning models easier, I recently tried using BigQuery. BigQuery is a sequel engine that Google provides.

Jon Krohn: 01:01:14

Google, yeah.

Dan Shiebler: 01:01:15

Yeah, it’s part of Google Cloud platform. It’s extremely scalable. It’s great. Twitter signed a contract with them and now we get to use BigQuery. It’s awesome. One of the things-

Jon Krohn: 01:01:26

We also use it.

Dan Shiebler: 01:01:28

Great. Yeah. I mean, one of the things that blew me away about BigQuery was how easy it made it to train machine learning models within BigQuery. It’s a BQ ML pipeline. There’s tiny changes to the way you write a sequel query and then suddenly you have machine learning models that are using top of the line technology that are extremely easy to export and then schedule the jobs to run offline. It really amazed me how it’s possible to do so much with so little configuration.

Dan Shiebler: 01:02:04

I think that what that tells me is that this low effort or low code or no code approaches to building new models that then go into all of these different places, that seems to me to be the future. I think we’ll always have … A company like Twitter is going to need to have a very sophisticated structure of feature engineering and monitoring and deployment, and then some models are going to be extremely well-optimized. So, it’s not like all the jobs are going to be lost in no code tools. I think there’s a lot of jobs that are going to appear where people spend most of their time thinking about business problems and utilize low code or no code machine learning tools as part of their job, similar to how people use Excel now. I think that this kind of no code modeling is going to find its way into a lot more people’s toolboxes.

Jon Krohn: 01:03:07

Nice. So, as a bottom line, if you’re looking to become better at data science, forget how to code and focus on business applications.

Dan Shiebler: 01:03:20

I don’t know if that comment was really driven towards people who are-

Jon Krohn: 01:03:23

I know. I’m just making a joke. No, that makes perfect sense. I understand exactly what you’re saying and you’re absolutely right. I think that there’s a lot of companies that have been working for years on building these kinds of no code or low code tools and make life easier, and it means that we can have more people working on data-related problems, which is great, and it means that some people will be able to spend less time fiddling about with some parts of their machine learning process. So, maybe with BigQuery, allowing data acquisition to be easier and then allowing you to have more time for either learning about some crazy new application like quantum machine learning and spending some of your time on that instead of worrying about data ingestion or freeing you up to spend more time with business problems or change management or product or really having your data-related applications make a big impact on it.

Dan Shiebler: 01:04:39

I think, I mean, one other data point here, just specifically from Twitter, is when I look at the set of people who work entirely on projects that are machine learning projects, the core tool here is the model, the prediction models for ad ranking, the prediction models for timeline ranking or the prediction models for sending notifications. I look at the people whose main job it is is to make these models run, make them run better.

Dan Shiebler: 01:05:08

A very small percentage of them actually train models. Most of these people are software engineers, essentially, or familiar with the challenges related to machine learning, those that may have in their previous job trained models and may have branched in from a team where they trained models. I mean, they certainly don’t lack the skillset of training models, it’s just that so much of the work that comes in to the deployment and maintenance and operation of machine learning models in a production system is software engineering work, especially at a company like Twitter, where the size of our data, the speed with which our models need to respond, the changes with data distribution requires enormous software lifts.

Dan Shiebler: 01:05:49

So, I mean, I almost feel like for someone who wants to work at a company like Twitter, the software engineering part, much more of it is that. That was not the case at all at TrueMotion, which is interesting. I think at TrueMotion, the way that the balance was was the really hard problems were the machine learning problems. The actual deployment or our software was not that challenging. The number of people who were really working on developing machine learning models was twice the number of people who are working on any part of the actual maintenance of models. Most of the software engineers were working on the app development and the backend parts that weren’t particularly tied to the machine learning models.

Dan Shiebler: 01:06:36

That was just the dichotomy of the two companies in the way they work, but I think large companies like Twitter and then companies that are way larger like Facebook or Google, my guess is that this software engineer adjacent to machine learning models is a much larger percentage of the jobs and often a higher amount of the overall impact than someone who’s trained models directly.

Jon Krohn: 01:07:06

Yeah. Beautifully said again, and another great dichotomy contrasting TrueMotion and Twitter. Yeah. It shows there’s a huge number of skillsets that are valuable in really broad range of companies, and these kinds of things like, yeah, these engineering considerations, making sure that the data structures are performant and never been efficient is, again, maybe the kind of thing that if you’re getting started in data science or probably even in machine learning, things like algorithms and data structures probably are the thing that you’re thinking is going to be hugely important in your job but in lots of roles like you’re describing. It can end up being one of the most important aspects.

Dan Shiebler: 01:07:51

Absolutely.

Jon Krohn: 01:07:52

Cool. All right. So, I’ve learned a lot in this episode. I have no doubt that our audience did, too. One last thing that I’d like to learn from you is a book recommendation that you have.

Dan Shiebler: 01:08:03

Totally. I would have to recommend Coding the Matrix by Philip Klein, one of my professors at Brown. It’s a great book, basically an intro to linear algebra from a computational perspective. I’ve thought about this book a lot recently because I don’t think I fully appreciated it when I took the class. It was my only linear algebra class in college. I mean, most people take one linear algebra class. It went through linear algebra from a computer science perspective. It was offered by the computer science department rather than the math department.

Dan Shiebler: 01:08:41

We spent a lot of time discussing algorithms that you don’t talk about in normal linear algebra class. We talked about gradient descents. We talked about Newton’s method. We spent a lot of time talking about the singular value decomposition, which I think might be the single coolest algorithm that exists. It’s so critical for so many different applications, but it’s often not talked about in linear algebra classes. When I’ve talked to some of my friends who took linear algebra classes in the math department or other universities, they’d never heard of singular value decomposition. They’ve learned about it first in their machine learning course or at some other point in their career.

Dan Shiebler: 01:09:21

I really appreciated that. I got a little bit of a headstart on that. When I think of linear algebra and I think of the core ways that linear algebra works, I think about it from this computational perspective. I think those are good vantage points. So, for someone who’s taking linear algebra and who’s looking to brush it up or has never taken linear algebra before and is interested to learn from a computer science perspective, I’d highly recommend the book, Coding the Matrix.

Jon Krohn: 01:09:47

Coding the Matrix. I’m looking forward to reading that. So, you probably don’t know this, but I recently ruled out in partnership with SuperDataScience a course called Machine Learning Foundations, and it covers linear algebra, calculus, probability, statistics, algorithms and data structure from the perspective of somebody that would like to be applying data science and machine learning techniques. Right now, all of the linear algebra content is available in, say, Udemy. So, those specific kinds of topics like singular value decomposition is something that is a big part of the intro to linear algebra class. I suspect, however, that your book recommendation digs in a lot more deeply into the math than I had time for in this survey of all of these subjects. So, that is something that I think would be good for me. It’s great to hear when people reinforce what I think are the important things that people need to know in terms of these foundational subjects that underline machine learning.

Dan Shiebler: 01:10:55

Yeah. I mean, I think that’s good. I think that that kind of mathematical foundation for machine learning, especially for someone who’s pivoting from another career path, it can feel very intimidating to pick all of those things up, but it’s true that there’s really just a relatively small subset of things that are really important to understand within these fields. I think it’s great to have options to learn those sorts of things in an accessible way.

Jon Krohn: 01:11:27

Cool. Well, thank you so much for being on the program. I, yeah, really loved it. I’m sure a lot of audience members did, too. So, how should people follow you? Do you have a Twitter account?

Dan Shiebler: 01:11:41

I do have a Twitter account.

Jon Krohn: 01:11:45

Great. So, we’ll provide your Twitter account for people to follow you and hear the latest on your insights. That will be in the show notes. Yeah. Any other ways that people should stay in touch?

Dan Shiebler: 01:11:58

I think my Twitter account is probably the best bet. It’s certainly what I pay the most attention to. I am on LinkedIn. People can feel free to reach out to me there, but, I mean, I think I’m much more likely to say interesting things on Twitter than on LinkedIn.

Jon Krohn: 01:12:14

Perfect. That makes a lot of sense. When people use Twitter in the workplace, does it ever … Is there any stigma around that feeling like you’re not really working? Do you get my question?

Dan Shiebler: 01:12:30

Yeah.

Jon Krohn: 01:12:32

If you see someone on Twitter and you’re like, “I really needed them to be working on this particular task,” are they working or not when they’re on Twitter?

Dan Shiebler: 01:12:41

Well, I mean, first just in general, most of the people I work with are in San Francisco, anyway. So, I’m not spending much time looking over people’s shoulder, and even in New York. I mean, people work on different ways. I knew one person who had two monitors and on one monitor, he always have either a TV show or a video from … It’s a video game livestreamers always on one screen no matter what. There’s other screen, you have code, docs, everything. That’s just how he works. That’s how he was most productive. Using Twitter, it’s fun, but I do feel like it’s productive because there’s a lot of … Whenever I have experiences on Twitter, I’m like, “I really feel like that piece should be better.” That was a bad recommendation. I wonder why that got recommended to me.

Jon Krohn: 01:13:36

That’s funny. I thought you were going to say, because even for me, Twitter can be very useful in terms of work because almost everyone I follow is someone in data science or machine learning. So, I get to learn new trends and it is an educational process for me, too. Yeah. Of course, from your perspective, it’s also a product.

Dan Shiebler: 01:13:56

Yeah. I mean, I also almost entirely follow machine learning and category theory and ad tech people. So, my feed is a giant stream of math, which is exactly how I like it, but there’s this other part of it, too.

Jon Krohn: 01:14:12

Nice. All right. Well, thank you once again, and we’re looking forward to having you on the show again at some point in the future.

Dan Shiebler: 01:14:19

Absolutely. I had a great time.

Jon Krohn: 01:14:26

Many thanks to Dan for his articulate communication and comprehensive illustrative examples of any topics he brought up. I hope you enjoyed learning as much as I did about the logistics of getting into and succeeding at a PhD while working full-time, how to translate almost pure mathematical subjects like category theory into real world, cutting edge machine learning innovations, guidance and tools for labeling huge datasets with semi-supervised approaches, the influential position of staff engineers at big tech companies like Twitter, the software languages and libraries used for machine learning at Twitter, how revenue science can boost ad performance, and the efficiencies afforded by low effort or no code tools.

Jon Krohn: 01:15:16

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, and the URL for Dan’s Twitter profile at www.superdatascience.com/451. That’s www.superdatascience.com/451. If you enjoyed this episode, I’d of course greatly appreciate it if you left a review on your favorite podcasting app or on YouTube. I also encourage you to tag me in a post on LinkedIn or Twitter, where my Twitter handle is @JonKrohnLearns to let me know your thoughts on this episode. I’d love to respond to your comments or questions in public and get a conversation going.

Jon Krohn: 01:15:55

All right. It’s been a great episode. I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Podcasts SDS 451: Translating PhD Research into ML Applications

SDS 451: Translating PhD Research into ML Applications

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

November 14, 2025

November 11, 2025

Podcasts SDS 451: Translating PhD Research into ML Applications

Share

SDS 451: Translating PhD Research into ML Applications

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025

November 11, 2025

SDS 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta