Podcasts SDS 583: The State of Natural Language Processing

75 minutes
Artificial Intelligence, Data Science

SDS 583: The State of Natural Language Processing

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Tune in to this week’s episode to hear natural language processing (NLP) expert and Lead Data Scientist at CB Insights, Rongyao Huang, deliver a masterclass in NLP. From her thorough review of NLP over the past decade to how we will overcome the limitations of today’s approaches, this is a jam-packed episode for fans of this exciting field.

About Rongyao Huang

Rongyao is a data scientist bootstrapped from stats and social science training and an industry-honed engineer. Having worked across domains from anti-corruption research to digital advertising and finance, she specializes in end-to-end problem solving from ideation to deployment. She has spent the past four years absorbing the explosion of deep learning and NLP and leveraging them to scale information extraction at CB Insights. She is equally proud/humbled to be a parent, constantly inspired by the tiny humans in both work and life.

Overview

Calling in from Staten Island, natural language processing and Lead Data Scientist at CB Insights, Rongyao Huang joins Jon Krohn for a masterclass in NLP history.
Jon and Rongyao don’t waste any time diving into the short but rich history of natural language processing and quickly start reviewing the beginnings of this impactful and fast-evolving field.

Starting with the ‘prehistoric’ age of NLP, Rongyao begins by reviewing the super sparse Bag-of-Words era. Next, she moves on to the ‘stoneage,’ and discusses the Word2vec model, single-layer neural networks and LSTMs. Third, Rongyao talks about the current state of NLP and the development of large transformer models, which are deep neural networks that use attention to encode the context of words and deliver exceptional results.

With NLP history behind her, Rongyao reflects on the NLP developments we can expect in the future – the iron age. With Jon, she highlights two main themes. First, how the scaling law of increasing model parameter counts by orders of magnitude suggests we’ll continue to obtain dramatic improvements in NLP model capabilities for the coming 5-10 years.

Rongyao’s second prediction touches upon how the coming iron age of NLP could involve overcoming the 512-token sequence-length limitation of today’s models as well as meta-learning algorithms.

Next, Rongyao introduces her Bauhaus-inspired framework for effective data science. She honed this framework after working in data science for the past eight years and observing breakthroughs and processes across a number of roles. Initially inspired by a design book she came across at the MOMA Design store, her framework aims to make data science fit seamlessly into all fields and team members.

“Sit really close to the problems in your domain. Keep an eye out for all the cutting-edge stuff that is happening. Have a really big chess box and utilize as much autoML as possible,” she says of her design-inspired analogy, before adding that a data scientist should prioritize “forming partnerships with other people in your organization…It’s important to have an end-to-end mindset.”

Tune in to learn more about previous and future developments in NLP and to hear about Rongyao’s long-term career pathfinding framework.

In this episode you will learn:

The evolution of NLP techniques over the past decade [4:14]
What’s next in the coming iron age of NLP [35:33]
Rongyao’s Bauhaus-inspired model for effective data science [43:12]
Rongyao’s long-term career pathfinding framework [51:50]
Rongyao’s top tips for staying sane while juggling career and family [1:00:30]

Items mentioned in this podcast:

Follow Rongyao:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 583 with Rongyao Huang, lead data scientist at CB Insights.

Jon Krohn: 00:00:11

Welcome to the SuperDataScience podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.

Jon Krohn: 00:00:41

Welcome back to the SuperDataScience podcast. On today’s episode, I’m joined by the deeply insightful natural language processing expert, Rongyao Huang. Rongyao is lead data scientist at CB Insights, a real-time marketing intelligence platform based in New York. Previously, she worked as a data scientist at a number of other New York startups and as a quantitative research assistant at Columbia University. She holds a master’s in research methodology and quantitative methods from Columbia. Today’s episode is more on the technical side, so will appeal primarily to practicing data scientists. However, the second half of the episode does contain general sage guidance for anyone seeking to navigate career options, as well as to balance personal and professional obligations.

Jon Krohn: 00:01:25

In today’s episode, Rongyao details the evolution of NLP techniques over the past decades, through to the large transformer models of today. She talks about the practical implications of this dramatic NLP evolution, how the scaling law will impact NLP model capabilities over the coming decade. The major limitations of today’s NLP approaches and how we might overcome them. Her Bauhaus-inspired model for effective data science, her path-finding model for making effective career choices and her top tips for staying sane while juggling career and family. All right, you ready for this fun content-rich episode? Let’s go.

Jon Krohn: 00:02:11

Rongyao, welcome to this SuperDataScience podcast. I’m so excited you’re here, you are this outstanding speaker that I was introduced to at the machine learning conference in New York. So we did the first ever live filmed episode of SuperDataScience at that. So in it, we had Noam Brown in episode number 569. You could check that out listeners. So that was the first episode filmed live with a live audience. And earlier in the day before we recorded the SuperDataScience episode, Rongyao had a talk and it absolutely blew my sock off. I learned so much, and so I asked Rongyao right away if she’d like to be on the show. And now here she is for you. So Rongyao, thank you so much for being here. Welcome. Where in the world are you calling in from?

Rongyao Huang: 00:03:01

I am calling in from the least New York city of New York city, which is Staten Island.

Jon Krohn: 00:03:09

Oh, I almost guessed that. I had a feeling, but that’s probably pretty nice. You probably have a backyard.

Rongyao Huang: 00:03:17

Oh, I do. I moved in here for the space and I do have a lot of them.

Jon Krohn: 00:03:23

Lucky. Do you have a car?

Rongyao Huang: 00:03:24

Oh yeah, I have my driveway.

Jon Krohn: 00:03:26

Wow, you have a driveway. That’s amazing.

Rongyao Huang: 00:03:30

Yeah.

Jon Krohn: 00:03:30

No, I don’t know anybody in New York with a driveway. That is pretty cool. Well, so welcome. I’m not far away from you in Manhattan. I’m in lower Manhattan, not too far from Staten Island. And yeah, so excited to have you on. I would love to just dig right into your talk. I think I learned so much, I’d love the audience to hear about your talk as well. So you’ve been working on natural language processing for about seven years. And in those seven years there’s been a huge amount of space in NLP technology and what we’ve been doing in research and the applications that have made their way into consumer technologies. And so in your presentation at MLconf, as well as in a medium article, which we will link to in the show notes, you compare this transition in natural language processing to going from prehistoric to the bronze age of human history. And so can you elaborate for us on what you mean by that and what you talked about at MLconf.

Rongyao Huang: 00:04:39

Sure. It’s a very broad topic, but just to give a little bit of a personal reference, I started my career in data science about seven, eight years ago. And back then, if you look at all these sub domains within machine learning or AI, whether it’s computer vision, NLP or other messes, you’re kind of like moving along at the same pace. So if you ask any data scientist back then, oh, you’re working on NLP, what methods are you using? You’re going to be hearing words like TFDF, I’m doing structural topic modeling and all that. But things really started to change I would say since 2018. And that’s what I call like we’re entering bronze age and we see this huge explosion that happened in NLP and the artifacts generated by that have since actually been benefiting all the sub domains within AI. And they’re converging in terms of the methodology that you’re using. I don’t know for our audience if you’ve heard of anything like ELMo or BERT or GPT-3. I bet you have, because they’re out there in every single news article.

Rongyao Huang: 00:06:03

What’s really underneath are large language models. So you might ask what are language models. These are actually usually deep networks that are able to encode amazing amount of information about what I call human knowledge, but represented in text or voice. And we’ve seen with the effective way of encoding information that is really a breakthrough NLP. Now that has been carried over to works in computer vision. And we are seeing increasingly multimodal generalist. There’s actually a paper that just came out of DeepMind I believe May 12th, that it talks about a generalist agent, which is a decoder only transformer. And we’re going to talk about that in a bit, what are transformers. But it’s a big transformer that is trained on vision, text and also control environments. So a lot of game playing like Atari games and all that. It’s trained in a multimodal way. And all of these tasks are handled by the same set of weights. And it is for that reason, it’s called the generalist. So you’re seeing the breakthrough that happened in NLP has transferred into all the other fields that have enabled increasing generalizability, which really is an early sign of AGI in my opinion.

Jon Krohn: 00:07:44

Wow. What you’re saying is the transition from the prehistoric age through the bronze age was the development of these large NLP models, which typically are transformer models. And so transformer is a particular kind of component that we can include in a deep learning model architecture. And so these large NLP transformers, they around 2018 started to become prominent, architectures like you mentioned, ELMo, and now more recently, GPT-3. They transformed not only natural language processing, but lots of other fields like machine vision. And we’re starting to see these multimodal models that in your view are a step towards AGI, artificial general intelligence, or an algorithm that has all of the learning capabilities of a adult person.

Rongyao Huang: 00:08:41

That’s right. Yeah.

Jon Krohn: 00:08:43

Wow.

Rongyao Huang: 00:08:43

And this very problem starts with representation of information or representation of human knowledge, which in the prehistoric time in NLP, we’re talking about a bag-of-words, this high dimensional space where each word in the vocabulary is its own dimension and it’s encoded as zeros and ones. So it’s very sparse and there is no innate relationship among them. And we kind of forced that relationship statistically by looking at distribution of words in documents. Now, that is prehistoric and the limitations are pretty obvious because if you think about how our brain represents knowledge and information, there is a lot of association between words. Every word carries with it a context like the famous example in NLP would be the bank of a river versus if you’re going to a bank to deposit money. We know the bank is different, but in the early days when they’re represented as bag-of-words, there’s only one embedding for every single word. We know obvious something is missing.

Rongyao Huang: 00:09:58

With the stone age which we haven’t talked about that started off by word2vec and doc2vec and they’re still popular right now, right? That started in 2013. And you start to see this distributed representation of words. And what that means is simply if you’ve learned a foreign language or done English exercise, there’s those close tasks where you mask how the word, and you’re asked to fill it in based on the context. So that is exactly what distributed representation is. You’re representing a word by its context. And when we start to do that and train even a very simple model, like in the case of word2vec, it’s a single layer neural net, could not be more simple, right? It’s still able to capture very interesting information in the syntax and the meaning. I believe for a while there are a lot of articles, given example about this equation where you look at king versus queen and man versus woman. And you can do this semantic subtraction or addition in the vector space to hop from the conceptual universe. And people are amazed by how a very simple neural net are able to encode those information. But that really is the start of the power of distributed representation.

Jon Krohn: 00:11:31

Yeah. To make that example kind of concrete for listeners who aren’t aware of it, if you in one of these vector spaces, you find the … so you can kind of imagine it like a 3D space because that’s the most dimensions you could imagine in your brain. But in practice, this is a higher dimensional space. It might have a hundred or 200 dimensions. But every word in your vocabulary gets a location in that high dimensional space. And so if you find the location of the word for king, you subtract the location for man and you add the location for woman, you’ll end up at approximately the location of the word for queen. So queen is equal to king minus man, plus woman. And then there’s lots of fun kinds of examples of these kinds of things where you could say, “Okay, let’s take Musk subtract Tesla and add Facebook and you’ll end up at Zuckerberg.” So yeah, there’s all these kinds of arithmetic operations that you can do in the space for a coding meeting. Anyway, I interrupted you because I think that’s just kind of giving example of what you mean by that. And absolutely this kind of this stone age, these kinds of vector based representations as opposed to the really sparse, like you said, representations like bag-of-words in the prehistoric age, a huge jump forwards. And yeah, so I don’t know if you remember where you were before I interrupted you.

Rongyao Huang: 00:13:02

Yes. I was closing up the stone age and about to move into the bronze age.

Jon Krohn: 00:13:08

Nice. Exciting. Perfect. Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such, I only subscribe to a handful of email newsletters, those that provide a massive signal to noise ratio. One of the very few that meet my strict criterion is the Data Science Insider. If you weren’t aware of it already, the Data Science Insider is a 100% free newsletter that the SuperDataScience team creates and sends out every Friday. We pour over all of the news and identify the most important breakthroughs in the fields of data science, machine learning and artificial intelligence. Simply five news items. The top five items are handpicked, the items that were confident will be most relevant to your personal and professional growth. Each of the five articles is summarized into a standardized easy to read format and then packed gently into a single email. This means that you don’t have to go and read the whole article. You can read our summary and be up to speed on the latest and greatest data innovations in no time at all.

Jon Krohn: 00:14:17

That said, if any items do particularly tickle your fancy, then you can click through and read the full article. This is what I do. I skim the Data Science Insider newsletter every week. Those items that are relevant to me, I read the summary in full. And if that signals to me that I should be digging into the full original piece, for example, to pour over figures, equations, code, or experimental methodology, I click through and dig deep. So if you’d like to get the best signal to noise ratio out there in data science, machine learning and AI news, subscribe to the Data Science Insider, which is completely free, no strings attached at www.superdatascience.com/dsi. That’s www.superdatascience.com/dsi. And now let’s return to our amazing episode.

Jon Krohn: 00:15:09

So even that jump from the prehistoric age of bag-of-words to the stone age of vector representations was massive. And for me revolutionary in what I could do with natural language models at work and make available to our users, so yeah. Tell us about that transition from the stone age vector spaces to the bronze age of transformers.

Rongyao Huang: 00:15:32

Right. So that is when the scaling law kicks in. So with the stone age, we’re really laying the foundation of now we’re able to represent a word by its context. So we got context aware embedding self words. Now going into bronze age, instead of a single layer neural net, we have evolved into really deep neural networks that can be used in code information. Because if you think about how scale impact the performance or capability of models, it’s useful to think about the, I guess biology analogy. If you compare a sea worm versus a frog versus a jellyfish versus an octopus and then a human or elephant. If you think about the amount of neurons and the synopsis in a person’s brain, that number doesn’t matter. Even if the mechanism is almost the same, but the sheer number of connections that you’re able to use to encode information and to retrieve information, that is very important.

Rongyao Huang: 00:16:47

And you will see that this trend has continued and will continue from this for this point, five, 10 years. That is actually a big piece of the AGI view is to continue climb up this curve of the scaling law. We might talk a little bit more about that in a bit. But coming back to the bronze age, I think really what has happened there is we take the context of where embedding that idea. We made it, we scaled that up. We’re able to encode information in huge networks that now have hundreds of millions of baits. And it has evolved very quickly because if you look at the starting of 2018 with BERT, we’re looking at hundreds million. And then if you look at now how many parameters we have in these super models, you’re looking at 540 billion and above, so it has, yeah.

Jon Krohn: 00:17:54

And I think there’s a Chinese one, [inaudible 00:17:56] 2.0-

Rongyao Huang: 00:17:55

Oh, yeah.

Jon Krohn: 00:17:55

Wu Dao 2.0 with over a trillion.

Rongyao Huang: 00:17:59

Over a trillion. Yes. They’re definitely getting bigger and they are, I would say different things happening beneath the statement like models are getting bigger. One is that you need to be able to effectively train these models, right? And you need enough data to train these models. Because if you come from a statistical background, in the textbook used to say, the number of data points that you use to fit the model needs to be at least plus one, more than the weights that you have. Otherwise this is considered [inaudible 00:18:41] because you don’t have enough data estimated. But in the world of deep learning, this has kind of the turned upside down as you’re talking about these hundreds of billion parameter model that can be fine tuned with a couple thousand data points, which is unbelievable in the past. And the reason this is happening is because of technology breakthroughs or methodology breakthrough that happened in AI. Some of them are really older like gradient descent. You hear people talking about them 10 years ago, they’re still front and center.

Rongyao Huang: 00:19:19

If you are learning about AI, this is the thing that you start with. How great does that work? But there’s also new things like the attention mechanism. And that is really at the core of transformer models is there’s a mechanism that people use to encode information effectively. And by saying effective, it’s helpful to compare that with how we used to do it in an NLP, which is sequential. If you look at a model that’s like LSTM, long short term memory model, you are feeding in word tokens one at a time, and then you are generating, embedding that way. This is computationally like sequential tasks that just take time to finish. With the attention, you’re able to batch feed in your sequence of task. And they’re going to travel through this layers of attention blocks in parallel. Even if you have a huge model and you have a lot of data to train it on, it’s faster actually as compared to sequential way of training, so that is one.

Rongyao Huang: 00:20:33

And the second is when it comes time to training data. Where do we get a lot of training data for these deep models? And it used to be a barrier I think for a lot of people in data science. If you ask, have you been doing deep learning at all? I think the honest answer would be for most people, no. Because you simply do not have that many label data in your pocket that you can even use to train a deep learning model. But that has changed, right? In a way, the transformer models and the paradigm shift it has brought into the world of NLP and data science has made a poor man’s deep learning dream come true. Even if you don’t have a lot of label data points, you just have a couple thousand, you can now entertain with really deep models, like hundreds, millions parameters, and you will be able to get very good performance out of it.

Rongyao Huang: 00:21:38

This is like the transfer learning or fine tuning part of the paradigm shift, which we probably should spend a little more time on. But really the empowering people to use deep learning model that also come from these powerful pre-train model that are pre-trained in a self supervised fashion. What that means is, look, in order to encode all these human knowledge and the model you don’t even need a lot of people hand labeling the meaning of these sentences or these tokens. You could just take the entire corpus of text out there on the web and feed them into the model. And you use a very clever training techniques that is masking, you’re randomly taking out words from these training corpus and you ask the model to figure it out. And even though it’s seemed like a very simple training task, the model is actually learning a ton through this task. It’s picking up the syntactic, the semantic and a lot more from this huge amount of data that’s unlabeled, the model is self supervising in the training process,

Jon Krohn: 00:23:00

Right.

Rongyao Huang: 00:23:01

Yeah. I think I lost myself because I had so much in mind.

Jon Krohn: 00:23:06

Oh, no. Well, I can kind of pick up a little bit there just with kind of summarizing some of the points that you were making. Which is that this idea of self supervised learning allows these natural language models to train without label data, because you can just feed in all of the English language articles on Wikipedia or all of the English that you can find on the internet or all of the language you can find on the internet in any language.

Rongyao Huang: 00:23:36

That’s right.

Jon Krohn: 00:23:37

And the bigger these models get, the more nuance you can get in your results as you use these larger and larger data sets.

Rongyao Huang: 00:23:48

Yeah. These are I think methodology on the algorithm side. There are also stuff happening on like how do I more effectively utilize more distributed TPUs or GPUs. And that would be the pathway system research that came out also earlier this year.

Jon Krohn: 00:24:11

Oh, yeah. It’s really cool stuff. I did a Five-Minute Friday episode, episode number 568 on PaLM on this. Tell us about it, it’s super cool.

Rongyao Huang: 00:24:23

Yeah. That paper is like … look, I wanted to just get bigger, right? I want my model to get bigger because we are seeing clear benefit of scaling up the model. So coming back to this point of the scaling law that is really front and center in the evolution and the current development of AI is if you are to scale your computational resource, where do you allocate that to get the best performance out of your model. You can either scale the number of parameters in your model. You could scale the amount of data you used to train, and you can scale the training iterations to make it converge more. What people find is that scaling the model itself is orders of magnitudes more important than scaling the data and iterations.

Rongyao Huang: 00:25:20

So you can go from a hundred million to a hundred billion and trillion and your data probably only increased by two magnitudes and iterations are maybe even constant. You’re not adding everything to converge, but you increased your model parameter by like six folds and the performance absolutely skyrocketed. And you did not see any plateau either, which means you can just keep going. And that becomes the question of how do I make this feasible. How do I train such a huge model? And I believe this pathway system or methods that came out of the paper is to make it more practical to train huge models on distributed infrastructure. That is the breakthrough that this PaLM model and system bring to us.

Jon Krohn: 00:26:15

And it’s also kind of cool, because it creates regions that are useful in particular data flows. I think kind of the idea, I think you know this probably a lot better than I do, but my memory from having reading about PaLM is that it allows us to have parts of the neural network that are relevant to some task. Those don’t always need to be engaged like the kind of-

Rongyao Huang: 00:26:42

Right.

Jon Krohn: 00:26:44

Yeah. The typical with a neural network or these large transformer models that aren’t PaLM, typically you are propagating information through all of your model parameters. But with PaLM, we sometimes don’t, we only go to the regions that could be most useful.

Rongyao Huang: 00:27:02

Right. Sparsely activated huge network.

Jon Krohn: 00:27:09

Yeah.

Rongyao Huang: 00:27:09

And I think that is going to be a area where a lot more innovation needs to happen before we get to the human performance level. And it carries a very interesting parallel with the system one and two. Thinking in human brain.

Jon Krohn: 00:27:29

Oh, Daniel Kahneman.

Rongyao Huang: 00:27:30

Yeah. I’ve been brushing upon that book lately.

Jon Krohn: 00:27:36 Thinking, Fast and Slow.

Rongyao Huang: 00:27:38

Thinking, Fast and Slow. Well, for the audience, there are two helpful, they call that like little characters that you can use to help you understand how you bring functions. The system one is the effortless association machine that is responsible for achieving information and generate fast response. And it’s always on and monitoring the environment, but it’s sometimes stupid. And then system two is effortful. I remember the author talking about like pupil dilation, whenever you carry out a mathematical computation, like the shift digit holding three numbers in your working memory type of thing. And how system two can jump into check on system one, but it’s also lazy. So if you think about that and what’s happening in AI, starting with these huge language models. And I think we had a breakthrough in our system one counterpart in AI.

Rongyao Huang: 00:28:53

So you have these very deep models. They’re able to encode a lot of information and association and without doing any back prop you can solicit information from it. It’s going to give you a quick answer. Well, it needs to get quicker actually as compared to human performance, which is the efficiency of inference, talking about that aspect, that needs to be still improved. But if they think about the fundamentals in these language models, it’s more like system one, where you have encoded a huge amount of information dissociation and you can just solicit answers in real time response. But where is system two? Right. And part of system two I feel is deliberate back prop. Because back prop is definitely more expensive than feed forward. So when your system two is engaged, it’s like I need to be optimizing given this objective and the set of variables.

Rongyao Huang: 00:30:03

I’m actually doing intense computation there. You’re fine tuning your huge network over here in real time where you engage system two. But it’s also about is system two inter … kind of planning things out for system one, right? I would like you to put these information in your working memory for fast retrieval. I would like you to recode these pathways because I’ve learned new information. It kind of struck me one day when I was reading the book that this is the analogy that we’re seeing in AI.

Jon Krohn: 00:30:45

That’s such a cool analogy. And for listeners out there, if you’ve ever been listening to an episode in the past, and I said something wrong or really dumb on air, it was my system one. It was just rolling on its own. I’m engaging in conversation sometimes several hours at a time with our guests. And I’m taking notes at the same time. And my system one is doing its best. I’m trying. But yeah, sometimes my system too is too lazy to come in and fix everything that system one is doing. I’m sure there’s been the occasional mistake. But that’s an interesting example how most of your conversation that you carry out, the vast majority of things, like your facial expressions, your tone of voice, and even a lot of the content of what you’re saying is your system one just like making lots of guesses.

Rongyao Huang: 00:31:42

That’s right.

Jon Krohn: 00:31:44

And it’s very rare that your system two comes in and slowly it’s like really focused on like … and sometimes I have to, I’m like, oh, what was the name of that model that just came out or who’s the author of that paper? And so my system two is kind of kicking in consciously in the background while my system one is still talking. And then I’m even more likely to make a mistake, system two watching while my mouth is moving.

Rongyao Huang: 00:32:09

And it’s an evolutionary advantage, to have a effortless system one. And that’s how AI system should be built too. There is a real opposite [inaudible 00:32:21] when it comes to inference time performance. It’s still too slow. You’re able to encode a lot. But for example, in this DeepMind paper of the generalist agent, they’re using a very small model. I believe it’s like 500 million probably on that level. They didn’t use a billion level model because in these tasks that they are trying to get the model to handle, a lot of them evolve real time actions like stacking blocks and stuff. So you cannot use a big model for that because the performance is not there.

Jon Krohn: 00:32:58

Yeah. That’s something that I talk about actually in the most recent Five-Minute Friday episode that just aired, episode 582. In that episode, I talked about how you have this trade off between model speed and accuracy. And so when you’re thinking about taking your model from a Jupyter Notebook where it’s performing really accurately, and you’ve got this amazing AUC. But when you put that model in production, if it’s really big, if you’re using a giant transformer model, it’s not going to produce results for your users in real time or in that deeper reinforcement learning paradigm that you’re describing that Deep Mind is using. It’s not going to be able to act in the environment in real time. And so while some of these giant transformer models are super cool and they produce amazing results, for your real time production deployments, it’s often going to be the case that you want a much, much smaller model that is way faster. Not only does that provide results to your users in real time, but it’s also cheaper for you to run.

Rongyao Huang: 00:34:06

Right. Definitely.

Jon Krohn: 00:34:10

Anyway, I’ve taken you off track.

Rongyao Huang: 00:34:13

We’ve been both taken off track, which is fine.

Jon Krohn: 00:34:15

I need system two back to bring me back to where we should be in the conversation. Well, that was super fun.

Rongyao Huang: 00:34:24

I think we’re kind of talking about what has happened, NLP, all the excitement and where it’s going, right?

Jon Krohn: 00:34:33

Yeah. Where it’s going. That’s what I wanted to do next.

Rongyao Huang: 00:34:38

Our system two helping each other out right here.

Jon Krohn: 00:34:41

Nice. System two high fives. Exactly. We’ve talked about the prehistoric age back with their super sparse bag-of-word models. Then we had the stone age with word2vec, single layer neural networks, creating embeddings, and then maybe having those embeddings flow into LSTMs. And then we talked about the bronze age that we’re in right now, where we have these large transformer models which are deep neural networks. They use attention to encode the context of words and we get really exceptional results. And then you’ve already talked about this scaling law, which so that’s one thing that I know is going to happen in the future. So without a doubt, and you give us five to 10 year guess, and I’d be interested to hear why you speculate for five to 10 years. But you said that for the next five to 10 years, scaling model size is likely to give us huge improvements still in what we can do with these large transformer models. Not only in natural language processing, but also in other areas like machine vision alone or multimodal processing. And so yeah, what do you think is happening next? I mean, maybe in addition to, or other than continuing to scale the bronze age up, what do you think the next age is going to be like, the iron age?

Rongyao Huang: 00:36:08

The iron age, I think there are two big parts in it, the scaling, continue to climb the scaling curve. That is definitely still going to be the case. And the five to 10 years is my system one thinking, just don’t put it out there.

Jon Krohn: 00:36:27

Okay.

Rongyao Huang: 00:36:29

Right. But a useful reference would be to look at the number of neurons and synapsis in human brain. Now compare that to the current top level AI model to see where the gap is. In order to scale that further, I think specific to the current architecture and transformer, there is one, the sequence lens limitation that is still not soft.

Jon Krohn: 00:37:01

Oh yeah.

Rongyao Huang: 00:37:02

When talking about, for the audience, there is a paradigm shift that happened in NLP from fully supervised learning, meaning I need label data and I’m going to just train on them and optimize and there you go. And then it kind of shifted into pre-train fine tune, which is pre-train my model in a self supervised fashion and massive amount of data. And then fine tuning that same model downstream using a smaller data side for a specific task. And now the third paradigm is called pre-train prompt and predict. And so here the key point is prompting. So you have a model that known a lot. A lot of information is coded. Now, when you try to solicit information from the model, you need to prompt it. So in DPD three style, you would say you wanted to compose a novel. You would start off with the beginning of the novel basically. And then the model would go on. And that same paradigm actually can be used for even solving maths problems too. There are papers out there, was also given the pre-trained large language model.

Rongyao Huang: 00:38:22

You show the model a couple of examples of how you would solve certain math problems. And then you prompted again with now this is the question, not giving an answer. And the model is able to spit up, answer following kind of the patterns that are showing the example. But there is a challenge in the amount of memory consumption, which is quadratic to your sequence lens. Basically, if you feed in the text, a lot of these transformer models, typically the max sequence lens is 512 tokens and you cannot go bigger than that because the memory consumption would explode. But that’s often not enough to fully represent the context that you pass to the model to solicit an answer for. So that is still something that will need a breakthrough either on the hot work, like how are we able to do it maybe in a distributed fashion more effectively, or it would call for a architectural breakthrough again, maybe a better version of attention that doesn’t require such memory allocation.

Rongyao Huang: 00:39:39

So yeah, that is on the skill side of things. And then the second, and that is really, I don’t have evidence that back me up, entirely analogy based is how do we get more of system two in there. How do we get more planning, meta algorithm that help you to solicit information more effectively? How do we enable the hybrid? Here is my knowledge base that’s pre-trained, I’m going to be also doing very small skill online learning, maybe in a reinforcement learning fashion or in a [inaudible 00:40:18] learning fashion that allow you to keep evolving your knowledge base without throwing away all that you’ve known, because that is a problem in fine tuning models is that sometimes you forgot about all the good stuff. When you pre train the model that you are fine tuning it, how do we get that sweet spot? I’m keeping all this useful stuff, but I’m keeping learning.

Jon Krohn: 00:40:45

Yeah. That is a huge difference between the way that today’s machine learning algorithms learn and our human brains learn is all of the weight shifts associated with pre-training a machine learning model mean that some valuable connections can just be lost because in this task that we’re fine tuning for, some aspects, general aspects of language maybe aren’t relevant. But then in production, they suddenly should have been. And we’ve forgotten those things. And so that’s very different to how humans learn where humans are able to keep piling things on top.

Rongyao Huang: 00:41:26

Right. Exactly. Yeah. And there’s also this concept of confidence, right? How system one is very heuristic based and biased and it doesn’t really think about what should the level of confidence be when I say this, versus system two actually has more of that concept. So when it comes to data drift or out of distribution, I actually never seen this before. I should be hesitant in given this, so that all needs to be happening.

Jon Krohn: 00:42:01

Yeah. The algorithm should say something like, “I’m going to need to check with my supervisor.”

Rongyao Huang: 00:42:08

Right. I actually don’t know this one.

Jon Krohn: 00:42:12

Try asking another algorithm.

Rongyao Huang: 00:42:16

Here’s a referral.

Jon Krohn: 00:42:17

Yeah, exactly. That’d be really funny. You’re like, “Yeah, I know this other algorithm uptown who has been working on this problem, maybe you’d like to give them a call.”

Rongyao Huang: 00:42:27

Right.

Jon Krohn: 00:42:28

I think they’ve been fine tuned to a more relevant task to what I’ve been fine tuned for.

Rongyao Huang: 00:42:33

Right. It would even be funny because we have like Alexa and we have Google Home and we have Siri at home. It would be funny if they can cross reference each other.

Jon Krohn: 00:42:44

Yeah. Exactly.

Rongyao Huang: 00:42:47

Right.

Jon Krohn: 00:42:48

Super funny. All right. I love how we’re personalizing these algorithms so much and also our systems wanted to getting them involved in here. Another completely tangential topic to everything we’ve been talking about so far, but something else that you have been that I know that you’ve talked about before is this idea of Bauhaus for data science. And so we’re going to get kind of another analogy in the picture here. Bauhaus is a famous German school of design. And what comes to mind for me when I hear Bauhaus is a minimalistic design that’s useful, it’s resource efficient and it’s mass producible. I’m guessing that you’re Bauhaus for data science, kind of we want some of those principles in there. We want to have resource efficient data science that’s mass producible. How can we redefine or refine, fine tune the data science process to better follow these kinds of Bauhaus design principles?

Rongyao Huang: 00:44:03

Right. So for the audience out there, this is probably a very interesting shift of topic. But let me give you a little more context. This is basically my thinking, like working in the field of data science for seven, eight years and seeing how on the one side you see all of these technological breakthroughs, right? The theoretical breakthroughs, exciting models coming out. But on the other side, the industry itself is also maturing in the sense of the ecosystem, is getting bigger. You’re seeing supermodel as a service. You’re seeing a lot of automation that’s happening, whether it’s MLOps or AutoML, and you are seeing functional specialization within the field. Before it’s like data scientists, there’s one big hat and you can be doing gazillion different things, depends on where you go. But now it kind of become more clearly defined. These are data engineering responsibilities. These are business intelligence responsibilities, and these are machine learning engineer responsibilities. And these are research data science responsibilities, so on, so forth.

Rongyao Huang: 00:45:25

I’ve started to think where this whole field is going and what does that mean if you are a data scientist or you’re coming into this field. Are we going to be say out of jobs very soon, because all the automation is happening. I’ve been noodling on that myself. And one day I was at the MoMA Design Store, the bookstore, and was reading a book about design. And from there, I got a very interesting analogy, which is Bauhaus. And it’s about how design has transformed the artistic creation and fit that into the modern world that has now become like every aspect of our life is influenced by design. But it used to be this fine art thing that only very talented elite people can create. And a very small group of people can appreciate. Art used to be like that. And it’s kind of what data science used to be as well. It’s like these very cool algorithms that small group of people understand and small group of people know how to utilize to create value. But really we’ve gone to the point that it’s kind of matured enough to start thinking about the design parallel in data science, which is if you have a business problem.

Rongyao Huang: 00:47:01

And if there is a data piece to it that you can solve the problem with, then there is a need or demand for data science. And regardless of whether you are big enterprises or small ones, even a startup with two people, there should be the possibility to make data science useful at your organization. And that would mean as an individual data scientist or a small data science team, you are able to make a bigger impact by focusing on the really human centered part. A lot of other stuff can be automated and automation is happening like in terms of try these bunch of models and give me the best one. Automate my deployment pipeline. Even the test aspect, monitoring my data drift and all that, semi-alert. All those should be automated. And as data scientists, we will benefit greatly from them in my opinion.

Rongyao Huang: 00:48:03

But the part that’s always going to be a data scientist job is tied to the domain, the problem that you’re solving. Very much like a designer, which should always be listening and surveying the users and audience to really get to know their need and transfer that knowledge into a creation of the design and product. Similarly, for data science, there is a domain coupled aspect of it that’s always going to require a human to be there. And how do you marry the business problem and the methods that’s out there, whether that’s state of the art, or it could even be an old method, but you find a lateral use case for it that can really work out. You need to be married in both worlds to actually solve problems and make impact.

Rongyao Huang: 00:49:02

Therefore, I feel this Bauhaus for data science analogy at the center of it is for either a data scientist or data science team sit really close to the problems in your domain, keep an eye out for all the cutting edge stuff that’s happening. Have a really big chess box, utilize as much automation as you can, don’t be an enemy of them. They’re not going to automate our jobs away. They’re going to allow us to make a bigger impact and then make sure you are turning this fly view as fast as you can. And this fly view is that you learn about the business problem. You make experiments and you learn from your experiments and then you deploy and make impact. And then you continue the cycle. By knowing the business problem well and knowing the methods well, you’re able to spin that wheel as fast as you can. And there’s also a third aspect of it, which I forgot to mention is about forming partnership with other people in your organization. And that is a topic all of itself. But I do think a data scientist should care about it. Regardless of whether you have a product manager or not, it’s in everybody’s best interest if you have at least end to end mindset, knowing what your user need after you’re producing what you have, continue to learn, what can be improved. So that’s the Bauhaus idea.

Jon Krohn: 00:50:50

It’s super cool. And the way that I first found out about that was from the end of your MLconf talk, which we’ve kind of been summarizing and riffing on so far in this episode. The talk was mostly about the prehistoric stone age, bronze age transition, but now we’ve been able to also talk about your guesses about the iron age, which wasn’t even in that talk. And then getting to talk about your thoughts here about how data scientists can be taking advantage of automation and specializing while also solving commercial problems in your Bauhaus framework. It’s so cool. Let’s now move on from anything related to your MLconf talk, but move onward to staying on this kind of idea of data science teams and careers. I wonder if you have any tips for listeners who are getting started in data science. I think you have kind of a framework for finding one’s path.

Rongyao Huang: 00:52:01

Oh, yeah. Thank you for bringing it up because it is a topic that I’m very passionate about. And part of that is because I am past funding myself still. I shared with Jon early on that we know there are people out there, folks like Elon Musk probably who figured out what he wanted and what he wanted to do for the rest of his life early on.

Jon Krohn: 00:52:32

At the time of recording, even Elon Musk seems to have cold feet about his Twitter purchase. I don’t know what’s going to happen between now and when this episode is released. But in recent days at the time of recording, he’d gone from this big Twitter deal to saying, “I think you have too many thoughts.” Cold feet. Even he I think sometimes is path finding. In fact, I actually, before you get talking too much about the path finding thing, I think that probably people who kind of the more you are straying away from the well trodden paths, the more your pathfinding framework is probably helpful. Somebody like Elon Musk actually might find your pathfinding framework even more useful for guiding decisions. I don’t know who’s out there, probably not listeners of this show very often, because data science is such a fast moving field. It’s such a new field. Probably anybody who’s listening could make use of some path finding because they’ve either recently been finding their path or they’re continuously finding their path.

Jon Krohn: 00:53:41

I don’t know if we had some kind of … there are careers out there that change maybe a little bit less, but as technology changes and evolves, I think no matter what career, I was trying to think in my mind, maybe if you’re a local government worker, I don’t know, doing some task that’s been the same for the last few decades, but even a job like that has been the same now in these coming years, these coming decades, everything has to change. There’s no job that existed 10 years ago, 20 years ago that will be the same 10 years from now.

Rongyao Huang: 00:54:18

Right.

Jon Krohn: 00:54:20

Because of AI and automation. Anyway, I think your path finding framework is going to be useful to everyone.

Rongyao Huang: 00:54:25

That’s great to hear. Because if you think about it, adaptation is actually at the core of how homo sapiens are successful. We’ve adapted so many times and when it comes to personal roles, this is also a very useful mindset to have. And I started thinking about it because I basically become a parent when I was really young. And you got into this struggle of, okay, I’m trying to figure things out for myself and also figuring things out for my little one. And I only have 24 hours a day and I want to do everything perfect. What do I do? That is a struggle that I’ve gone through myself. What made it worse is this society’s pressure. And we probably find it more in China than in here. Is that they’re saying, “If you haven’t figured out what you want to do or your career before 30 or even 25, you’re doomed.”

Jon Krohn: 00:55:27

Oh no.

Rongyao Huang: 00:55:27

Yeah. You should go into college knowing what you want, study really hard and then just go on that path. So yeah, that’s probably would echo more with people coming from that kind of culture. It’s a lot better here, but still there is the pressure. If you’re making a career shift, a constant question you’ll be asking yourself is, oh, everything is past dependent. I’ve already spent so much time in doing this. Now, if I change to this new career, I would have lost a lot, I would be behind. So all those thoughts are just very natural. But I have a mindset that might help you get out of it and I call it the long term perspective.

Jon Krohn: 00:56:12

The long term perspective.

Rongyao Huang: 00:56:15

The long term perspective.

Jon Krohn: 00:56:17

All right.

Rongyao Huang: 00:56:18

What it means is that very few people, probably nobody will be able to set a five year, 10 year, 15 year goal, a very clear goal for themself and then just be like, but at run full speed at least instead of 10 goals. So that’s not going to happen. What really happens in reality is you always start with hunches, hunches about yourself, hunches about what’s happening out there in the world that might be a fit for you. So you start with very loose hunches. But what’s important is to be able to run quick small experiments along the way. And those are pretty much like engineering. You want to be agile. You want to be agile when you are path finding. And that means it’s shifting from a plan and implement mode. I’m going to plan really far ahead. I’m going to just implement, shifting from that to be test and learn. Maybe I’m good at writing or presentation, let me test it out. Let me write a little blog post or even a small article for my family and friends and get some feedback from it.

Rongyao Huang: 00:57:34

Continue to run these more experiments and the direction will kind of become a little clear. It’s never going to be 100%. It’s going to be jingling a little bit. But you’ll know roughly, your hunches will tell you that you’re on the right path roughly. And it’s also important while you’re keeping running this small experiments to start collecting fundamentals about this direction. For example, if I compare myself now versus when I started off in data science like seven, eight years ago, I have a better understanding of my talent and what I enjoy doing. I enjoy learning a lot. I’m a fast learner. I’m very analytical. I am very curious. So any job that would put me in solely execution mode, just do what I tell you to do, that would kill me. I need a room to do some creative stuff and have ownership. I learn myself better and I learn about the field better, what demand it’s currently having, where is it going? Do I see myself aligned with that? That are fundamentals that if you keep noodling on as you do your experiments, I guarantee you that they’re going to become a little clear as time goes. So yeah, that is my, I guess, framework for the long term perspective, either you are entering into data science or you’re looking to make a career change, just have that mindset is really helpful. And then separately I can probably share a couple very concrete tips that I use basically as part of my daily routine to keep myself mentally sane and stable.

Jon Krohn: 00:59:36

Please, I need those tips soon, right now.

Rongyao Huang: 00:59:40

Oh my God.

Jon Krohn: 00:59:41

I love that path finding framework Rongyao. And some of your examples, they sound like they might have come from your life, the idea of seeing if writing or talking is working. And so I know for example that you just recently started giving talks and I feel so fortunate that you had that hunch that you could give talks because your MLconf talk was so extraordinary. It was one of the best talks I’ve seen in years. And so you really are a talented presenter of this technical information and I can’t wait to see how this evolves as you run more experiments and they continue to prove your hunch correct.

Rongyao Huang: 01:00:25

My pleasure.

Jon Krohn: 01:00:28

Yeah, my pleasure. And so another kind of aspect that I know about you that I think you might have some guidance for some of our listeners out there is around parenting. The pandemic has either helped or worsened work life balance for many. And I think especially those with children have been impacted, especially with things like schools closing and all of a sudden your kid is at home all day, you’re at home all day. Or you’re trying to work, so you are a proud parent. You’re an incredible data scientist and presenter of information. What tips have you learned to keep a good balance while maintaining personal and professional growth?

Rongyao Huang: 01:01:12

I think I do have a lot to say there because the pandemic for better, for worse, I learned a lot from it. One year I stayed home with my little one. I have a five year old monkey. I see a lot of interesting parallel between parenting and being a very productive either individual contributor or a manager like that human aspect. I’ll share two of them here. One is the power of being vulnerable. I used to do this, try to set up that very strong image of a parent. I can just do this and you should always be strong. But what I find helpful is that the moment that you become vulnerable, you actually are creating room for initiatives for other people. And you’re losing up the soil for collaboration actually. And in my example of me collaborating with my little one, an example would be like we’ve gone out for holiday’s fun. And by the time we’re home at like 8:00 PM, everybody’s exhausted.

Rongyao Huang: 01:02:26

And it’s very typical that he would just refuse to walk anymore. I either carry him and do everything for him. And one day I’m really tired too, so I just let it out. I was like, “I’m super exhausted. I cannot walk.” And magic happened from that moment on. And he’s actually started to say, “Mama, if we just carry on for a little bit more, we’ll be home.” And then he’s even like holding my hands and walking me up the stairs. So that same thing applies to being a good collaborator and even good manager is sometimes it’s perfectly fine to be vulnerable. This is the stuff that I don’t know how to do. I have little knowledge on. And the moment you do that, you are empowering other people to step in and offer their value and their expertise. That is one thing.

Rongyao Huang: 01:03:28

And the second thing is about in collaboration always be curious and flexible. And then you see common grounds from there. Because I think one thing you will learn as a parent is there’s never 100% perfect control. It’s impossible to be a parent at the same time being close minded, because kids surprise you all the time. And often case behind every single of their behavior, there is a reason. And if you remain curious and open minded and you learn about that, you will often see through the behavior and find out a way to collaborate. And that is the same thing I feel in an organization, whether it’s within a team or you are collaborating across teams, remain curious about the other person’s needs and be flexible about the approaches, but often case there’s never a closed door. You can always seek common grounds. Those are two interesting parallel that I see in my work and parenting. The second thing I wanted to share for my pandemic experience is obviously how do I keep sane and be productive when you have a five year old jumping around in a house while you’re trying to do your work. I have a couple of concrete tips [inaudible 01:05:06], part of my daily routine I think I would like to share with the audience. One is probably not a surprise, I found meditation really helpful.

Jon Krohn: 01:05:19

Oh yeah.

Rongyao Huang: 01:05:20

And I used to feel that I cannot get into it, sitting there for even 10 minutes. Because there’s a lot of I in that exercise at the moment. I need to be focusing. I need to be not thinking about that. I need to watch my breathe. But really the power of meditation comes in while you are accepting the state as is. And when you are able to remove the I out of it and focus on this moment, this is what’s happening. This is the work that needs to be done. Then you remove a lot of the anxiety and stress tied to it. And you are able to get into the flow. People would often talk about the flow, the state of mind where you cannot even sense time and you’re just into it. And I feel like the meditation exercise, however short they are, 10 minutes in the morning or 10 minutes before bed, help you to recognize your interstate. Oh, I’m actually thinking about that. Oh, I’m actually consciously or anxiously planning for this all the time in the back of my mind. See that, acknowledge it, actually relieve that burden off of you. And you can go about your work afterwards, which is a lot easier.

Rongyao Huang: 01:06:47

And also related to how you prime yourself for productivity. One trick I use is what I call silly little target. So taking my writing thing as an example, I’m trying to write more, trying to speak more, but it can feel like a very scary task basically. But all you need to do is to get started. And to get started you can say, “Okay, I’m just going to have this one silly little target for myself today, which is I’m going to write a hundred words. Think about anything and I put in a document and I’m done.” So not focusing on the outcome. The result will help you to get fear out of the door and get away the discomfort, which really help you to start off with the experimental path that I talked about earlier.

Rongyao Huang: 01:07:54

And that you can use specific techniques like the Pomodoro, you’re like, “Having so many things in my mind, but just let me set this Pomodoro for 25 minutes and I’m going to just get into it.” That helped me a lot. Another priming trick that worked for me is exercise. So no matter how busy I am, I try to do three times a week, about 30 minutes each time of exercise, and that kind of prime myself with I find myself to be focusing a lot easier and stress free after exercise. And that works different for different people. But basically the theme is you need to find your own routine that keep you centered as a person and just carry on your little experiments in your wonderful life.

Jon Krohn: 01:08:49

Nice. I love all of those tips Rongyao. I have so many more questions for you. We’re going to have to get you on the show again sometime. There’s so many things that we didn’t have time to dig into, like even what you do at work. We haven’t even mentioned that you work at CB Insights, which is a platform that uses machine learning and data and algorithms to help large enterprises ask and answer compelling questions as well as then find the answers. We won’t have time to talk about in this episode, but I know that you have some open roles that you’re hiring for. I wanted to mention those. It should be obvious from hearing Rongyao speak that she’s an incredibly talented data scientist who would be wonderful to work with. I don’t know if you have anything else to say about the hiring that you’re doing right now, the roles that are open.

Rongyao Huang: 01:09:49

Right. We’re looking for a data scientist buddy on our team. We are a R&D team embedded in the engineering organization. We do end to end development from question to deploying something in production is all happening within the team. And we’re looking for somebody that has solid programming skills, is a fast learner. It will be great if you have ML and NLP knowledge fundamentals, and you are a great communicator, then by all means talk to us. Let’s do interesting stuff together.

Jon Krohn: 01:10:37

Awesome. Yeah. Sounds like a great role and no doubt an amazing team to work with if you are on it. Nice. And then so as a regular podcast episode listener, which is so flattering for me, I love Rongyao. You mentioned before we started recording that you’d been listening to a lot of recent episodes and so you already know what’s coming up next as we wrap up this episode, it’s a book recommendation. Do you have one for us?

Rongyao Huang: 01:11:02

Yes. If I’m going to recommend one, it’s going to be Range, R-A-N-G-E by David Epstein.

Jon Krohn: 01:11:11

Cool.

Rongyao Huang: 01:11:11

I think it’ll be particularly helpful if you are starting out on a career or you’re making career shift. You will find it very reassuring and encouraging and practical at the same time.

Jon Krohn: 01:11:28

Cool. That sounds great. And then for people who want to follow you and hear your thoughts, you had such awesome thoughts today. Technical things like natural language processing and also kind of professional development things like how data science teams could be structured or more effective as well as just general tips for living a sane life and managing balance across everything. Lots of thoughts that I’m sure people would love to hear in the future. How can people follow you?

Rongyao Huang: 01:12:03

I would say, just check me out on LinkedIn. That’ll be the best way.

Jon Krohn: 01:12:08

Perfect. We’ll be sure to include your LinkedIn page in the show notes. Rongyao, I’ve really enjoyed you being on the episode today. I’ve had so much fun, so many laughs and learned a ton as well. You’re a really gifted communicator and I hope to have you on the show again sometime in the future, so we can check in on the amazing things that you’re up to then.

Rongyao Huang: 01:12:34

It’s my pleasure too and my honor. Thank you, Jon.

Jon Krohn: 01:12:44

Had an absolute blast filming with Rongyao today. I hope you had fun listening to our conversation. In today’s episode, Rongyao filled us in on the history of NLP approaches from the prehistoric bag-of-words era through the word vector stone age to the large transformer bronze age of today. She talked about how the scaling law of increasing model parameter counts by orders of magnitude suggests will continue to obtain dramatic improvements in NLP model capabilities for the coming five to 10 years. She talked about how the coming iron age of NLP could involve overcoming the 512 token sequence length limitation of today’s models, as well as metal learning algorithms. She told us about her Bauhaus inspired approach to data science that allows data scientists to take advantage of automation tools and increase specialization to solve commercial problems. And she briefed us on her long-term career path finding model wherein we carry out small agile experiments on ourselves to test our hunches of our own capabilities.

Jon Krohn: 01:13:47

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Rongyao’s LinkedIn profile as well as my own social media profiles at www.superdatascience.com/583. That’s www.superdatascience.com/583. If you enjoyed this episode, I greatly appreciate it. If you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Jon Krohn: 01:14:24

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another deep and stimulating episode for us today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 583: The State of Natural Language Processing

SDS 583: The State of Natural Language Processing

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

November 14, 2025

November 11, 2025

Podcasts SDS 583: The State of Natural Language Processing

Share

SDS 583: The State of Natural Language Processing

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025

November 11, 2025

SDS 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta