Podcasts SDS 513: Transformers for Natural Language Processing

54 minutes
Artificial Intelligence, Data Science

SDS 513: Transformers for Natural Language Processing

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Denis fills us in on what transformer architecture is, what natural language processing is, and a whole lot more for anyone looking to keep on top of trends in artificial intelligence!

About Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, writing one of the very first word2matrix embedding solutions. He began his career authoring one of the first AI cognitive natural language processing (NLP)chatbots applied as a language teacher for Moët et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an advanced planning and scheduling (APS) solution used worldwide. Denis is the author of artificial intelligence books such as Transformers for Natural Language Processing.

Overview

Denis is an incredibly prolific writer in the field of data science. His most recent book, Transformers for NLP, takes it back to the basics of what NLP is: linguistics and computing. In the history of artificial intelligence, he watched a lot of plateau of folks writing algorithms to solve problems where he believed one universal algorithm could solve all the problems. Fast forward to 2017 when Google is being grilled by the senate and they decide they need something industrial to tackle the issues of issues from singular posts and articles. They abandon the recurrent neural network and begin to look at creating universally sized layers where one is split out to multiple processors all at once to have, as Denis calls it, words analyzing other words. The results of this model on raw data were excellent. OpenAI took the technology and expanded it. In the end, they ended up with a supercomputer-trained model that had essentially learned the language and could perform far more tasks than originally conceived.

The applications for this are vast. Beyond Google search, they can provide summarization for grade school level students, provide code, and other applications. The way we ask things is how we get things when it comes to this. For example, Denis took a BERT model and fed it writing from Kant and then proceeded to ask Kant, via BERT, questions on philosophy. From there we moved on to one of Denis’s previous books on explainable AI, which he defines as model agnostic. Ultimately, explainable AI deals in being understood by humans, as opposed to a black box where we often can’t explain how the results were found.

From here we dove into some audience questions for Denis:

What XAI methods does Denis use most for transformer models?
– The best explainable AI is model agnostic. You want to be able to take the input, look at the output, and tweak it until you find the trigger.
Do we really need these many parameters in transformers for a limited amount of vocabulary?
– We don’t conflate neurons with parameters. There are more connections than there are stars in the universe. We need the parameters because we haven’t even come close to matching the complexities in the mind of humans and animals.

Denis answered further questions on his LinkedIn, so check that out below!

In this episode you will learn:

What are transformers and their applications? [7:54]
Denis’s book on explainable AI [25:08]
AI by Example [35:53]
LinkedIn audience questions [42:00]

Items mentioned in this podcast:

Transformers for Natural Language Processing by Denis Rothman
OpenAI’s GPT-3 model
Google’s BERT
SHAP
LIME
Hands-On Explainable AI (XAI) with Python by Denis Rothman
Artificial Intelligence By Example by Denis Rothman
Mathematical Foundations of Machine Learning

Follow Denis:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00

This is lucky episode number 513 with Denis Rothman, an award-winning author on artificial intelligence.

Jon Krohn: 00:08

Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and bestselling author on Deep Learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Jon Krohn: 00:42

Welcome back to the SuperDataScience Podcast. Today’s guest is the colorful and ethically industrious Denis Rothman. Denis is the author of three technical books on artificial intelligence all of which have come out in the past two years. These books are on AI in general with particular focuses on Explainable AI and the giant transformer models that have revolutionized Natural Language Processing or NLP for short. His most recent book called Transformers for NLP led him to win this year’s data community content creator award for technical book author. Prior to becoming a full-time author, speaker, and consultant, Denis spent 25 years as the co-founder of a French AI company called Planilog, which was acquired three years ago. All told, Denis has been working on AI for 43 years since 1978 and has been patenting AI algorithms such as those for chatbots since 1982.

Jon Krohn: 01:50

In today’s episode, Denis leverages vivid analogies to fill us in on what natural language processing is, what transformer architectures are, and how they’ve revolutionized NLP in the past few years. He also talks about tools we can use to explain to get an understanding of why complex AI algorithms provide a particular output when provided a given input. This episode should be well-suited to anyone who’d like to keep on top of what’s possible in AI today regardless of your background practicing data scientists in particular. We’ll also appreciate Denis’s mentions of particular modeling approaches and software tools. All right, you ready? Let’s do it.

Jon Krohn: 02:39

Denis, welcome to the SuperDataScience Podcast. Where in the world are you calling in from?

Denis Rothman: 02:45

Okay, thank you, and thank you for inviting me. Right now I’m 150 kilometers from Paris. I’m out in the country in the Champagne region then you have Burgundy and all that. I’m around that place.

Jon Krohn: 03:02

Wow. That does not sound unpleasant.

Denis Rothman: 03:05

It’s very pleasant.

Jon Krohn: 03:07

It sounds amazing. Is that like a COVID thing or you’re out there all the time?

Denis Rothman: 03:12

No, no, no. I like Paris and like to be out of Paris. It’s like being in Manhattan. And then, you go out a bit to the northwest, just have to go 20 miles and you’re in the woods. You’re in the forests in New York State. So, around Goshen or places like that.

Jon Krohn: 03:30

I’ve heard. Yeah, I hope to someday spend time outdoors just like this thing. So, as we discussed before the episode started, I’m Canadian. And so, people often have this idea of you being outdoors. But I grew up in Downtown Toronto and now I live in Downtown Manhattan. And I haven’t experienced much outdoors at all. But I’ve heard it’s wonderful. And someday I’ll experience that, too.

Denis Rothman: 03:56

Toronto is a nice place, too.

Jon Krohn: 03:58

Toronto is nice. It doesn’t have a Champagne or Burgundy region around it. We got the Niagara region, which is our best imitation.

Denis Rothman: 04:06

That’s why I chose France in fact because I could live anywhere but I found that the quality of life like you have medieval culture that you can’t find in North America, medieval culture, universities that go back to the 13th century. I like that part. And then, you go to modern Paris. I like that. But I like to travel, so it’s not really a problem.

Jon Krohn: 04:30

I’ve noticed from videos that I’ve seen of yours in the past, you have very interesting art in the background. I think you studied history at points in your career.

Denis Rothman: 04:42

Yeah. I paint. I play the piano. I was born in Berlin in fact, and my father was a military lawyer for NATO so I traveled all around all the time. But my dream was to go to Sorbonne University. That was my thing. Because in those days, the president of university says, “What you came here, it’s because you’re really interested in the history, the geography, archeology, mathematics, linguistics.” So, you can major in something. But in this university, you can go to any class and you can get credits for any. So, I would go into this cross-disciplinary education, which was very fascinating. That’s why I spent so many times. I went to three Sorbonne universities in fact. I just couldn’t stop learning in there. So, yeah.

Jon Krohn: 05:37

Wow.

Denis Rothman: 05:38

Yeah. So, I studied a lot of everything.

Jon Krohn: 05:42

That sounds amazing. That’s like my dream retirement. I wonder if they’ll accept me then.

Denis Rothman: 05:46

And I wanted to start my life like that, like thinking like that. Because at one point, I was working a lot in the states for student money, college money. And I was driving cars, this driveaway thing where they give you a car, and then you can take it anywhere. So, I cross around the state. And one day, I was sitting in Florida, and I say, “Do I want to live here? What do I want to do?” Okay, I really want to go to Sorbonne University, because I could have stayed down in Palm Beach and had a nice life, study there. But no, I wanted to come back to Paris and live this educational thing. And there’s so many cultures right next to Germany, Spain, Italy, Portugal, UK, Belgium. It’s incredible. I’m forgetting countries. I don’t want to leave the viewers out. Like Netherlands, Luxembourg, Denmark. You just sit there and you have all these people there. You’re living in the world.

Jon Krohn: 06:45

Yeah, it’s rich in culture. I am jealous. It sounds like you’re in the right spot to be.

Denis Rothman: 06:50

No, Manhattan is great.

Jon Krohn: 06:53

Yeah. It’s a very concentrated piece of culture. And then, as you say, you go 20 miles out and you’re just in the woods.

Denis Rothman: 07:03

That’s right. People don’t realize that. But you’re only 30 minutes from beautiful nature, just right northwest, just go through Washington Bridge out there and that’s it.

Jon Krohn: 07:13

Yeah. So, amongst all of the learning that you’ve been doing in recent years, there’s been a fair bit of learning and teaching of mathematics and artificial intelligence, machine learning to the extent that you’ve published books at an incredible rate. So, this year, you published Transformers for Natural Language Processing. Last year, you published two books, Hands-on Explainable AI with Python, and just a few months before that, AI By example. So, I’d like to dig into each of these books. I’ve got a copy of Explainable right here but I want to go backward chronologically. So, let’s start with Transformers for NLP. What is this book about? So, I can give a little bit of color maybe for the audience but you could do a much better job. So, natural language processing is the application of data science or machine learning to make use of natural language in some way like written language or spoken language and yes, to maybe automate things, and transformers are particularly interesting in recent years because they’ve been shown to have unprecedented accuracy at a lot of natural language tasks.

Denis Rothman: 08:34

So, yeah, well, let’s take this back a second. When you’re talking about NLP, you’re talking about linguistics. If you’re talking about linguistics and machines, it’s computer linguistics. Okay. So, we’re going back to theory. And there’s one little thing we have to understand is that we’re getting inputs with data. You get a lot of data. So, that’s the input. You have all this raw data, billions and billions pieces of data. And on the other side, you have to do some representation of reality so it doesn’t look like murky results, right? So, up to now, you had all that input and then you had to get good representations. But there were several models like you would do k-means clustering, then you do parsers, then you do recurrent neural networks, and then you can do CNNs and all. So, it was a bit like a lot of tools to do all. So, every time you had to do a task, you had to find out another tool like an SVM. So, for 35 years because I started very early in artificial intelligence, okay, so I saw no change, and I say, “Where is this going?” These people are writing a lot of algorithms. And I wrote one algorithm 25 years ago that’s running all over the world while we’re speaking. So, I say, “Why do they write all these algorithms when you can get one universal algorithm to do the job?” Of course, I wrote it for supply chain management and not NLP.

Denis Rothman: 10:17

So then, all of a sudden, Google around 2017 has this problem. We have 5 billion searches per day. We’re having problem with the US Senate because they keep asking us questions like… I’m speaking like big tech like Mark Zuckerberg is called to the Senate. This is the reason transformers exist. He’s called to the Senate and they say, “You know there’s that post.” And he’s thinking, “What post? There are 2 billion posts a day.” Yeah. But that post. He’s thinking, “What are you talking about? I’m a multi-billionaire. I’m surfing all the time. And they’re asking me about the 1, 500,000,000. I don’t even know what’s in that post. I’m trying to do my best here.” And he says, “I don’t have the tools.” So, he’s thinking, “Go see my team.” And the team says, “Well, we can’t. We just can’t. We have 100 algorithms in there. We’re not making it. We’re not making progress.” Twitter has the same problem, Amazon, Microsoft. So, one point, Google says we have to stop all that. We need something industrial.

Denis Rothman: 11:28

So, instead of having a convolutional neural network, we have layers but none of these layers are the same size, none of these layers do the same thing. That’s like a 1930 car. No, what we want is a V8. A V8 engine looks beautiful inside like eight engines here, right, V8. So, they come up and say, “Let’s forget about this recurrent neural network stuff. We want a V8.” So, let’s start with eight heads, which are like a V8 engine. Let’s start with eight heads. Forget about recurrent stuff and all these layers. And we want to write a layer and we’re going to put the layers, and every layer is the same size. Let’s make every layer the same size that way we have an industrial model. It’s like a rectangle. And we just stack these same layers, same size. They come up and say, “Well, that’s not enough. We’re not going to go fast enough with that. Wait, let’s take one of these layers and split it into a V8. Wow. And now, we’re going to run those eight layers, those eight parts of a layer on eight heads on eight GPUs on eight processors at the same time.” Wow. They’re going to run there. All these words are just going to analyze other words. We just want to say, Denis and Jon. Jon has a guitar behind. Does he play the guitar? Let’s put all that together and see where that word fits into context.

Denis Rothman: 12:56

And once that layer is over, let’s not mix it all up. Just add it and send it to the next layer that will do the same thing, building on what it learned in the first layer but it’s always the same size. So, one point they reached this model and no one knew what was going to do because it was training on raw data and it wasn’t really labeled data. They use label data just to show people. It did excellent results. And then, all of a sudden OpenAI comes along and says, “You know what? That’s a good idea. Why don’t I create a stack not with 16 layers but 96 layers? Instead of using 5 billion params, why don’t I do 175 billion parameters? And why don’t I ask Microsoft for a supercomputer, a $10 million supercomputer with 10,000 GPUs and 10s of 1000s of CPUs?” And now you have a factory. So, now you have this industrial model V8, and it’s just there and it’s going fast, fast, fast.

Denis Rothman: 14:03

And then, all of a sudden they wake up and they say, “Uh-oh, what does it do? How is it possible to do all these tasks?” And in fact, they discovered because it’s called emergence. Emergence is when you don’t know what’s going to happen but it just emerges out of all that training that in fact, the system, a GPT-3 transformer or a BERT. They just learn the language. And once they learn the language, it’s based on what you ask them the prompt. So, if you type nice prompts, it will analyze it as a sequence and it will try to find out what follows. So, in the end, you end up with the GPT-3 model trained on a supercomputer and you can ask it anything you want. Give me the synonyms of headphones and stuff. You can invent your own tasks or give me a grammatical breakdown of the sentence, or recently, why don’t you just take my… when I’m writing, translate it into Python instead of translating it into French or translating to JavaScript. And just to finish the little story, you bounce back to Google and says, “Why don’t we create a trillion, a trillion parameter model?” And that thing is going to be so big that you know it’s going to exceed human capacity. And people saying, “Gee, where are you going to get the computer with the other one that was $10 million that’s one of the top 10?” Google says, “Yeah, but why are we bothering ourselves with all those floating points?” We don’t need all that, those floating. So, let’s build our own TPUs and just cut all that floating-point stuff out of there so we have a domain-specific machine.

Denis Rothman: 15:47

And now, they’ve created supercomputers that we can rent for just a few $100 an hour, which is not much for a corporation. That’s even more powerful than OpenAI has. And then, you can train what you want. And then, the beautiful thing is it bounced backed into Google that has BERT, and Google Search now is based on BERT. Everything is BERT in Google Search. So, you see how we went in… a few years, we went from prehistoric artificial intelligence to super-industrial and industrialized society. And big tech did that miracle. I mean you can say anything you want about them. But what people don’t understand, just like people, like you and me that are working. These are small teams of maybe 10 people. They’re in their corner. They’re trying to find something. They’re not the billionaires. They’re the guys like us just trying to do stuff. And they come up with incredible things. So, we do have to admire a big tech in that respect. You can say anything you want but no one’s going to do as… what they’ve just done, it’s industrial. So, that’s transformers.

Jon Krohn: 16:58

Yeah. You may already have heard of DataScienceGO, which is the conference run in California by SuperDataScience. And you may also have heard of DataScienceGO Virtual, the online conference we run several times per year, in order to help the SuperDataScience community stay connected throughout the year from wherever you happen to be on this wacky giant rock called planet Earth. We’ve now started running these virtual events every single month. You can find them at datasciencego.com/connect. They’re absolutely free. You can sign up at any time. And then, once a month, we run an event where you will get to hear from a speaker engage in a panel discussion or an industry expert Q&A session. And critically, there are also speed networking sessions, where you can meet like mine, a data scientist from around the globe. This is a great way to stay up to date with industry trends, hear the latest from amazing speakers, meet peers, exchange details, and stay in touch with the community. So, once again, these events run monthly. You can sign up at datasciencego.com/connect. I’d love to connect with you there.

Jon Krohn: 18:11

So, these transformers like OpenAIs, GPT-3 that you mentioned like BERT that you mentioned. What applications we’ve got Google Search that you talked about? What applications do you teach?

Denis Rothman: 18:25

You can do like and do question. One of my favorites is summarization for second-grade students. So, you’re going to say, “Denis, this guy, I’m in an interview with this guy that’s supposed to be super intelligent, and he’s interested in second grade summarizing. Maybe I will re-edit this and cut that part out.”

Jon Krohn: 18:50

No, no, I know that that’s hugely valuable.

Denis Rothman: 18:53

So, my second grade summarizing thing, and I can give you like, I’ll give you many others in just the list. It’s one of the most interesting ones. Because, in fact, when we’re talking here, we look smart. You’re talking-

Jon Krohn: 19:08

You do.

Denis Rothman: 19:08

… on artificial intelligence. Whoa, do I look smart? Ask me about plants, ask me about the names of flowers, ask me why these insects live with these flowers in that forest, and they don’t live in another. What? I’m not a second grader. I’m a baby. So, what I like to do now is I’ll take an article that’s new for me, where I’m a baby, not even a second grader. I’m not even a first grader. I’m nothing. And I’ll feed it to GPT-3, and I get this nice explanation where I understand everything. I say, “Wow. I just really liked that feature.” So, it got me thinking,” Why don’t I go to the question-answer thing?” So then, from there, I’m going to go ask the questions but it’s prompt engineering. You can see what I’m getting at. It’s the way you ask it. If you ask it to explain like a college student, you’ll get something you won’t understand and it feel like a second grader.

Denis Rothman: 20:08

You’re inventing the usage, in fact. So, you can go, “Now, I go to question-answers.” And I say, “Well, can you explain like dark holes to me like you would to a child?” And then, he doesn’t. Now, I understand. Can you explain like you would explain to a high school student? Now I can understand better. Can you explain the same black hole like a college student? Wow, great. Could you explain some math with it? Okay. Could you give me some equations? Okay. Now, can explain quantum computing? Right. Can give me the Heisenberg equation? Okay. Can you break it down for me? Could you write some code now for me, where it’s an HTML page? I see the equation. I just want a little graph to show the waves.

Jon Krohn: 20:58

Wow.

Denis Rothman: 20:59

Yeah. So, now I have my HTML page. How am I going to deploy it? Okay. He’ll explain. Oh, I have a problem. I’d like to put OpenAI in a Jupyter notebook. But what’s the code? Can you give me the code so I can just copy and paste it? Okay, great. Okay.

Jon Krohn: 21:19

So, these examples, you’re doing this on a daily basis.

Denis Rothman: 21:21

Yes.

Jon Krohn: 21:21

You’re constantly querying GPT-3.

Denis Rothman: 21:23

Yes, I’m doing it right here.

Jon Krohn: 21:27

Right now. Actually, every-

Denis Rothman: 21:29

I’m doing it like from a TV. It’s like people playing video games. Like I’m here all the time playing around with that stuff. It’s insane. It’s like I don’t know where you’re going. It’s an adventure.

Jon Krohn: 21:42

Yeah. If you’re not watching the video version of this, Denis is pointing at his phone and tapping away at it. But not everyone has access to GPT-3, right? Don’t you need to be approved? I had to wait months. I just submitted an application to get access to the API, and then finally got it. So, it seems like not everyone today could just access GPT-3 unless you have a workaround.

Denis Rothman: 22:08

And that’s the trick. Let’s go back to linguistics. Okay. You go back to linguistics. What are we talking about? We’re talking about we have a lot of raw input. We have a model. And it’s the way we ask things that we get things. And what’s interesting is to play around, like I just said, right? That’s the interesting part. But if you go back to my book, you can get GPT-2. And you can take GPT-2 and you can train it. Because what I did, for example, in chapter three, I took a BERT model. And I took the works of Immanuel Kant, the German philosopher. And I fed all those books into it just to have fun. Then I began to ask him, Kant, some questions, “Where does human logic go? How does human thought?” The goal here is to play around with it. You have to have a lot of fun otherwise you’ll never understand transformers. And you got to get to talk to them to explain. Now, what did I just say before? Google BERT drives Google Search. What I did is I take props like the second grader stuff and I just copy it into Google Search. And I’m deviating the use of Google search by giving it long sentences, not just keywords. Could you explain to me the solar system through the eyes of a second-grade student? Please don’t show me any videos. I just want some text. Skip all that stuff. And I just deal with these two things.

Jon Krohn: 23:48

Wow.

Denis Rothman: 23:51

Yeah, it’s a transformer so it can absorb all of that. That’s what’s new. So, in fact, you can train having fun with transformers with Google Search. You can ask it questions. Could you tell me this? Could you tell me that? And then, you go on as it gives you answers in the system. You can ask it for more difficult questions. Oh, yeah, I got that. I got the Heisenberg equation. I understand. And we keep [inaudible 00:24:14]. But now, could you tell me more? You can talk to it, but people don’t know it’s a transformer.

Jon Krohn: 24:20

Right. So, we’re filming today on September 20th. And I had just happened to be on your YouTube channel before we started filming. And it was today, September 20th that you published a how-to video with more detail on exactly what you just described on how to use BERT in behind Google queries to get lots of interesting information. So, that’s something that listeners can check out. So, Denis, so Transformers for NLP, that was a beautiful and it was a beautiful introduction to what natural language processing is and the history of transformers. You got a lot of great analogies in there particularly like the V8 analogies. But that was just your book this year, Transformers for NLP. Tell us a bit about Hands-on XAI with Python which came out in the summer of 2020. So, I could do my little spiel. Explainable AI is, it’s where we apply algorithms to very complex models I guess like Bert or GPT-3. And we apply algorithms to those so that we can try to understand to get an explanation for how a particular output was reached. Is that right?

Denis Rothman: 25:35

Well, yeah, so let’s go back to linguistics again. Okay. So, basically what we’re saying, “What is an algorithm?” You have an input and you have an output, and you have this thing in the middle called algorithm. So, one problem here is there’s a confusion in many people’s mind is that Explainable AI is explaining the algorithm. Okay. So, that’s an area you can explore. But no, that’s not Explainable AI. Explainable AI is model-agnostic. I don’t even care about the algorithm. What do I care? Google Search. Let’s go back to Google Search. I’m on Google Search. And I type, explain the Heisenberg equation. Okay, what do I get? I can see the result. I don’t need Explainable AI. I know that I won’t like the result because I don’t understand anything on that page. Okay. So, now I’ll do something called Shapley. It’s the theory of games, okay? It’s like a basketball player. You have a team, and you just take one player out. You’re not scoring anymore. You put that player back again, you’re scoring. That’s Shapley. That’s as simple as that algorithm is. Just pull something out, see what happens, puts it back again, and calculates the input. So, I’m saying, “Explain the Heisenberg equation”, which is, in fact, an interesting one because it shows that you can’t find the position and the speed of a particle at the same time.

Jon Krohn: 27:09

Right, yes.

Denis Rothman: 27:10

If you’re looking at the speed, you won’t find the position. If you look at the position, you won’t find the speed.

Jon Krohn: 27:15

I’ve known that one since I was a kid because in Star Trek, The Next Generation, I learned-

Denis Rothman: 27:19

That’s right.

Jon Krohn: 27:19

Right? So, it’s where you [crosstalk 00:27:22]. Yeah, exactly. In order to be able to teleport, you’d have to know this information, you’d have to know the speed and the direction of all of the electrons and everything and pass that information over to somewhere else, beam it over. And so, often, I think they have issues with that, right? It happens all the time.

Denis Rothman: 27:42

That’s right, that’s right.

Jon Krohn: 27:42

They’re like, “The Heisenberg’s uncertainty principle on Dewar is broken.” And now, you ended up with a nose on your ear or whatever.

Denis Rothman: 27:53

Yeah, that’s right. So, when you go back to Star Trek, the Star Trek thing is you just take the input like of Google Search and you see you don’t like it. So, you say now, “Could you explain the Heisenberg equation in Star Trek?” So, now you’ll get this nice explanation that you just gave. And you say, “Well, maybe I can’t tell it, I can’t write about that that way.” Well, Can you explain the Heisenberg equation like for second graders? So, you can see that when you add things and you subtract things, you get different. So, that’s Shapley. That’s also LIME. That’s also Anchors. It’s about all the algorithms are in that book. And it’s model-agnostic. People keep trying to look into layers. I would encourage someone to try to look into a GPT-3 99 layer model with 170 billion parameters and tell me which parameter influenced the input of the record that was in position 2,100,000,000. It’s impossible.

Jon Krohn: 28:56

Yeah, It’s meaningless.

Denis Rothman: 28:58

You can do with small parts. People from Facebook do that. They just plug it in to see some things. But in fact, the funnier thing is the [inaudible 00:29:10], which was around in France in the same days I was at the Sorbonne, and there was big fights between people on artificial intelligence. He wrote interpretable AI and he says, “We’re going to peek into a transformer to see what it means and we’re going to use LIME.” But LIME is a model-agnostic. So, what I’m saying is it’s model-

Jon Krohn: 29:41

Right, right, right, right, right.

Denis Rothman: 29:42

We don’t care. If I go to a store and I buy a phone. And that phone and I go home, it doesn’t work, I don’t care what’s in that phone it doesn’t work. Or if I buy a phone and the ringtone is always wrong, I don’t care. So, it’s model-agnostic. So, you take the input, you look at the output, and you play around with the input again to see how it influences the output. And you see which word or which image, or that’s Explainable AI in a nutshell. And you can do a lot of things. One of the fun things I did, which is a very funny one is I took the US Census data. I had a lot of fun with that one. It was the US Census data. And I had this program that was, in fact, given by Google. But they’re always very careful about this now. I was explaining how you can figure out why someone’s going to earn more than $50,000 or less than $50,000 based on the US Census data. And I was looking at the fields in the data set. It’s in my books somewhere on chapter four or five.

Denis Rothman: 30:51

And I say, “Gee whiz.” Eighty percent of what’s in there is forbidden in Europe. You had race. It’s strictly banned in Europe. Because there’s a legal problem with Explainable AI. In Europe, you have to explain why your algorithm did that. And if you have race in there, you can get a fine up to 20% of your sales. You’re talking millions and for big tech, billions. So, you want to be careful. So, I said, “Gee, how can they do that?” You can only do that in the States. Right? But what is race have to do with revenue? Wow. So, let me take that out. I just pulled that out of there. And I reran an algorithm. They tweaked a bit. And I said, “Yeah, I’m getting good results as good as theirs.” And in fact, you had Jamaican as a race. I mean that was [crosstalk 00:31:45]. So, I just took the whole field out and say, “Get this out of here.” Then I take another field, so we’re back to Shapley again, right? I’m taking another field out and I’m saying, sex, female, male. Does it really matter in 2021 in the States if you have a college degree? Is a PhD woman, female doctor going to earn less than a male doctor? I don’t think so. So, let me take all that out too. Forget it. Take all that out. Because that’s discriminating and it’s bad.

Denis Rothman: 32:24

And today we don’t really want that because you have transgender people that don’t… or people that are transgender, or people that don’t to be considered as male or female. We’re in the new world, the new era. What am I going to do? Put other. So, we’re going to have other on the stat… I just pulled that, feel that’s useless. So then, I go to another field and I’ll stop on that one. I’m saying, “Now they’re saying marital status.” Is the person married, divorced? So, let’s sum it up. I took every field out of there and I just left two fields in there, age and years of education. And I’m saying if someone has 15 years of education starting from elementary school all the way college, that person has a probability of earning more than a person that has no education at all, just drops out in 10th grade. So then, I go back and say, “But age is a factor.” Because if I’m five-years-old, I’m not going to earn as much as when I’m 20, 25, or 30. So, I just found out in the 25 to 45 or 30 to 50-year bracket, you earn a lot more. And then, when you’re older, your brain is not so fast. So, it goes back to baby [inaudible 00:33:36]. And with just these two fields in Explainable AI, look at all the noise in your data. You could just kick all that stuff out. So, it’s both explaining in a model and agnostic way. I didn’t speak about a model here, just data input and output. And it’s trying to be ethical at the same time I say “Get all that data out of there and get the bias out of there. You don’t need it.” Because there’s nothing to talk about. Age and revenue, that’s it… age and education.

Jon Krohn: 34:12

Makes a lot of sense to me. So, in my day job, we build a lot of algorithms for predicting who’s a great fit for a job and it’s the same thing. Things like gender or race, those cannot be in your model.

Denis Rothman: 34:26

Of course, they’re useless. It’s not even a question. It’s ethical. And it’s useless, because it has nothing to do with it. The subject, you can come in. And if that person has either education or the experience that can compensate for a lack of education, we’re looking for competence, for abilities. We’re not looking for insane stuff for where they came from. I don’t care. I’ve hired tons of people in my life. I don’t even care where they came from. I don’t even look at their resume. Half of the time, I don’t even care. They come in. I don’t want to see. Are we in sync or are we smiling together? How do we feel together? And you understand my questions about the job. You understand what you’re going to do. Okay, you have some college degrees, that’s fine. Okay, well, let’s get to work and see what happens. And if you like it, you’ll stay, and if you don’t like it, you’ll go. So, you can spend a few months here. We’ll see what happens. That’s the best way to hire people. Because they really love you. And then, you get into this thing. You say, “Gee, he hired me. He didn’t ask me any stupid questions. I want to work hard. I want to stay there.”

Jon Krohn: 35:35

Right. That makes a lot of sense. All right, so we’ve talked about Transformers for NLP, the book that came out this year. We’ve now talked about just recently Hands-on XAI that came out last year. So, that Hands-on XAI book came out in July of 2020. Just a few months before that in February 2020, you had another book called AI By Example. So, maybe we just quickly what is that book all about?

Denis Rothman: 36:00

Oh, generally, I write a book in between two and a half months and three months.

Jon Krohn: 36:05

Wow.

Denis Rothman: 36:09

That’s how I work. Now, why do I write so fast? You have the whole-

Jon Krohn: 36:16

Because you get GPT-3 to do it.

Denis Rothman: 36:18

That’s right. I didn’t even write the book.

Jon Krohn: 36:21

Alright.

Denis Rothman: 36:22

I have [crosstalk 00:36:22]

Jon Krohn: 36:25

You can just do it in Google with BERT. You just say BERT-

Denis Rothman: 36:27

That’s right, write the book. So, what I’m saying here, you go back to what I was talking about Sorbonne University and education. I have cross-disciplinary education. And my first patent, word to vector patent, word piece was 1982. I registered another patent in 1986 for expert system chatbots. In 1986, I got my first artificial intelligence contract in aerospace and the company now called Airbus. And at the same time, I entered [Luxury 00:37:06]. So, I have so much practical experience in corporations. I never went through the AI winter. I didn’t even notice there was an AI winter. If you told me there was an AI winter then I’ll say, “Well, where is it? Because it’s pretty hot out here.”

Jon Krohn: 37:28

Yeah, not in Burgundy. There was no way a winter in Burgundy.

Denis Rothman: 37:33

No, no. So, Artificial Intelligence By Example is a very simple story. Tushar Gupta from Packt noticed my LinkedIn profile. He says, “You have a lot of experience. Why don’t you share it?” And I say, “Well, I don’t really need the money.” Because I just sold my company. Because in 2016, AI became fashion and everyone’s talking to my company. All of a sudden, I said, “Yeah.” I told my wife, “Let’s sell it.”

Jon Krohn: 38:05

That was Planilog.

Denis Rothman: 38:08

Yeah, that’s right. I sold it. We sold in 2016. And then, I trained people for two or three years. And then, in the meantime to Tushar says, “Why don’t you share all that experience, these patents and stuff you wrote with people that it would give… a nice book where people get case studies and all that.” And I say, “Why would I write a book? I don’t need the money. I just sold my company. I don’t want to do anything. I want to stay home and play video games.” And he’s saying, Yeah-

Jon Krohn: 38:37

He said you don’t write a book for the money.

Denis Rothman: 38:42

No. Well, you don’t write books for money.

Jon Krohn: 38:43

No.

Denis Rothman: 38:44

Any author will tell you that. You don’t earn a lot of money writing books, technical books. You earn money writing like Stephen King, but not writing technical books. I don’t see how you can earn money with that. But I was thinking maybe he’s right, maybe I should share this with my family and friends. Because I never had time to explain my job. And then, there are a lot of people on LinkedIn asking me these questions all the time. Maybe it could help them. And maybe I’ll meet a lot of people that way because I have people from 100 countries on LinkedIn. And maybe I’ll learn stuff from them because I like culture. I like every country. I like China. I like the United States. I like Iran. I like Israel. I like Germany, any country. You give me any country, you always find nice people. Because people are always thinking governments. So, they don’t like the government, which means the whole population is sentenced to death. No, you have a nice people everywhere.

Jon Krohn: 39:47

For sure.

Denis Rothman: 39:48

Yeah, right. So, he got me into that. I wrote the book in three months. So, it wasn’t that much of a big deal. What happens is the book is written in my mind. Like I’m watching TV. And the book is just up there. And it’s like a woman carrying a baby, and all of a sudden, it’s just a pain. I have to get out of my system. So, I’ll be writing at full speed. You just can’t stop me. It won’t stop me. It’s the wham, wham, wham, wham, wham. Sometimes, I even get a chapter done in a day. So, I was writing and writing and writing, and I can’t stop. I just can’t stop it until I get to the last page and I say,” Okay, now I’m okay.” So, it’s a compulsion. It’s not something… and I’ve been thinking about the stuff for decades. I spend every day, I spent at least five hours thinking. Even when I was working in my company, I was always spending two… I stopped working generally around 4:00 pm for operational stuff and I would think until 9:00 in the evening. I read books, philosophy, sociology, linguistics, computer science, math. So, I was constantly building up my theories and I have a theory of artificial intelligence in my mind. So, I just have to organize it for the book. I know where it’s going. I know the next step after transformers. I’m just waiting for things to happen.

Jon Krohn: 41:22

Alright. So, Denis, you’ve given us amazing context on your books. So, you had AI By Example that you explained last there, which based on your experience of 35 years of consulting that you are able to provide that to the audience very quickly. That’s how you got these books written so quickly. In just a few months, you were able to distill your 35 years of consulting experience with artificial intelligence and surely the readers benefit greatly from all that experience. We also talked about Hands-on XAI and Transformers for NLP. So, now let’s jump to some audience questions. We had tons of great ones on LinkedIn. Your audience is so engaged because you do answer all their questions online. And so, today, we’re not going to have time to go through every single question that’s come up. There are so many. But I think Denis is probably going to end up… you’re going to end up going over these.

Denis Rothman: 42:21

Yeah, I think I might even make a post maybe at the end of the week, where mentioned our podcast. And I take all the questions from the comments in your post and make sure that all of them are answered.

Jon Krohn: 42:37

Nice. Well, that sounds really great.

Denis Rothman: 42:38

And then, just tag all the people that ask the questions.

Jon Krohn: 42:42

Perfect, they will greatly appreciate that. So, the first question here is from Serg Masis, who is also an author of a book on Explainable AI. And so, he was curious what XAI methods or libraries you use most for transformer models.

Denis Rothman: 42:58

Okay, so let’s say… let’s go back to Explainable AI. The best Explainable AI is model-agnostic unless you’re a developer and you want to see what’s going on inside. But you might have problems with 96 layer GPT-3 model and 170 billion parameters. So, you can do it. So, you just need the input, the output. And it’s like in a soccer team to see if I take this player out, what happens? So, that’s Explainable AI is model-agnostic, like LIME is model-agnostic. Shapley is model-agnostic. So, you just want to take the input, look at the output, and then tweak, play around with the input and see what happens to the output until you find the trigger. And so, it’s model-agnostic. So, you can use any model-agnostic Explainable AI on any algorithm and it doesn’t even need to be transformers or artificial intelligence, because Shapley existed before. So, it applies to anything. Think of it like a recipe for a nice cake you like and the person says, “I like your cake, but I’m not really a specialist. I can’t tell you what I like in your cake.” So, the person can say, “I like you so much that week by week when I bake that cake, I’ll take some ingredients out until we find which one is missing.” And then, at one point, guy said, “Yes cinnamon, it’s the cinnamon I like in your cake. It’s cinnamon.”

Jon Krohn: 44:37

It’s a great analogy.

Denis Rothman: 44:39

We have the input, the output. That’s it, the ingredients and the result.

Jon Krohn: 44:47

Beautifully said, I love that analogy. There’s another question here that is something that we’ve already talked about, you and me, Denis. So, I’m going to give a summary answer. So, there’s a question here from Jean-charles Arald if you pronounce it the French way. And it’s this point about how these transformer models are getting so big. So, trillions of parameter. Do we really need this many for human language given that we have a limited vocabulary, maybe only a couple of 1000 words for most people? And he makes the point that bees, he seems to suggest they only have dozens of neurons and that’s efficient for them. So, what are we missing in our models? So, I did a neuroscience PhD so I’m going to quickly give some summary thoughts here. And then, I’m going to open up the floor to you, Denis.

Jon Krohn: 45:36

So, bees don’t just have dozens of neurons unless I’m misreading something here. They definitely have at least hundreds of millions, maybe billions of neurons. A human brain has 90 billion neurons, but the key thing here is that we don’t conflate neurons with parameters. So, the question, it says, “Why do we need trillions of parameters?” Well, even a human brain with its 90 billion neurons, the connections, which are equivalent to the parameters in a model, there’s more connections than there are stars in the universe. It’s an obscenely large number. And so, I think that’s the answer is that, but yeah, so our transformer models today couldn’t even approximate. They’re still not as good although you may disagree.

Denis Rothman: 46:21

Yeah, no, they’re not as good. They can’t be. So, if we go back to neuroscience, and because machines are not brains. So, that’s one thing you need to know, too, like a calculator. Texas instrument calculator is not built like our brain. So, that’s like a projection. We have to take out. It’s like children looking at a puppet show and thinking that that’s a real person. No, the machine is not a real person. It has nothing to do with the brain, in fact, but let’s keep on that topic and not try to elude it. So, we say, of course, you have a lot of neurons, and you can’t mix them up with the connections and the connections are the parameters in the transformers. So, trillions is nothing. But there’s another problem, much deeper for both systems, for both machines and humans, which you know as a neuroscientist, is that when we build a representation, we don’t build it in one part of the brain. Like if I want an apple, okay, apple, no. I have the language part that’s going to do something then I have the color part.

Denis Rothman: 47:31

There’s so many things are lining up in there. And it’s not exactly common to every human being. It’s about that because I can have an apple associated to someone who threw me an apple when I was six years old and hit me. So, I have another part of my brain saying an apple, no. And then, other person says an apple a day keeps the doctor away. And then, you have this dopamine part. So, it’s extremely complex to see what is going to fire up in a brain with a word. And it’s equivalently difficult with a transformer because of the billions of opinions you have on the web. So, one person can say, “I like gun control.” The other say, “No, I don’t want gun control.” I want to take a vaccine. I don’t. So, it adds up to different representations. So, the model has to take all that into account. And they have to feed some ethics into it. So, it’s big. So, trillions is nothing in fact. It’s just the connections. And the neurons are very few neurons, in fact, in models like that. And then, the question about can we find easier models?

Jon Krohn: 48:42

Yeah, exactly. So, there’s a question from Dr. Chiara Baston, who’s in Italy. And yeah, she asks can we do better with simpler models?

Denis Rothman: 48:49

Well, you can’t find simpler than Shapley. The thing is, probably when you’re looking at posts or books, people show you all these diagrams. And no, it just add up. I’m in my kitchen, and I forgot to put enough sugar in my cake. So, when it comes out, my children say it’s not enough sugar in there. So, that’s bias, right? They’re saying, I want more sugar. Or if there’s too much sugar, like United States can spill a lot of sugar into a pastry and we have less than Europe. So, I’m going to say that’s bias. Why did you pour all that sugar in there? So, you have the ingredients, you have the recipe. But think of that, how many people can go to a restaurant with, I would say even in a McDonald’s restaurant because people always making fun of burgers. But yeah, well, how is the bread made? Tell me the ingredients in the bread. No one can do that.

Denis Rothman: 49:47

We’re in a complex world. It’s not easy. Even if you have Shapley which is very simple or LIME, you just take work. And even talking about bees, that’s a problem too because we’re forgetting something, is their memory, is they have patterns they’re using with their body. The bees go around in certain ways to signal things to other bees. And they’re using a language that we don’t… we’re trying to understand. So, nature is extremely complex as well. Like an ant. An ant has a few neurons. Yeah, well, what about a whole group of ants? Wow, lots of brain. And no one can understand it.

Jon Krohn: 50:35

Yeah, that’s a really great point as well. So, as I mentioned, we did not have time in the episode unfortunately to get a response on air to every question, but it sounds like Denis is going to make a-

Denis Rothman: 50:47

Yeah, yeah, I’ll answer all the questions.

Jon Krohn: 50:48

He’ll answer all these questions.

Denis Rothman: 50:48

Yeah, sure.

Jon Krohn: 50:48

And so, it’s-

Denis Rothman: 50:51

Somewhere at the end of the week.

Jon Krohn: 50:54

Nice. Well, so, that’ll be up before the episode airs. So, Denis, obviously, people should follow you on LinkedIn. That’s a great place to get in touch and ask questions. Are there any other places that people should follow you?

Denis Rothman: 51:06

No, I just work on LinkedIn.

Jon Krohn: 51:08

Perfect.

Denis Rothman: 51:09

Because I work on LinkedIn. And when I’m finished, I go see my family and friends.

Jon Krohn: 51:15

That sounds great. When you’re in the region that you’re in, it must be very nice visiting-

Denis Rothman: 51:20

Yes, yes. Yes. Because we get along with our neighbors. We can go downtown and eat right in front of a medieval cathedral, and in Paris and the places I live and go around. And we have monuments. There’s place where there’s even a castle across the way from where I stay. So, yeah.

Jon Krohn: 51:40

It’s beautiful. I can see you, yeah, one social medium is enough.

Denis Rothman: 51:43

That’s right.

Jon Krohn: 51:44

All right. Denis, thank you so much for being on the program. And we’ll have to have you again sometime. Thank you so much for your time.

Denis Rothman: 51:49

Sure, when you want. Okay, bye-bye.

Jon Krohn: 51:57

What a character Denis is. I had an absolute hoot filming this episode. Today, he filled us in on the history of transformer architectures particularly highlighting OpenAis, GPT-3 model and Google’s BERT model. He talked about how with his Transformers for NLP book you can learn how to find tune the GPT-3, precursor algorithm GPT-2 to perform state-of-the-art natural language processing capabilities like question answering and text summarization. And he talked about how Shap and LIME can be used to explain how an AI algorithm is arriving at its output no matter whether it’s a simple algorithm or a billion parameter transformer model.

Jon Krohn: 52:40

As always, you can get the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Denis’s LinkedIn profile as well as my own social media profiles at www.superdatascience.com/513. That’s www.superdatascience.com/513. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter and then tagging me in a post about it. To support the SuperDataScience company that kindly funds the management, editing and production of this podcast without any annoying third-party ads, you could consider creating a free login to their learning platform. It’s www.superdatascience.com. You can check out the 99 days to your first data science job challenge at www.superdatascience.com/challenge. Or, you could consider buying a usually pretty darn cheap Udemy course published by Ligency, a SuperDataScience affiliate, such as my own, Mathematical Foundations of Machine Learning course.

Jon Krohn: 53:47

Thanks to Ivana, Jaime, Mario, and JP on the SuperDataScience team for managing and producing another terrific episode today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Podcasts SDS 513: Transformers for Natural Language Processing

SDS 513: Transformers for Natural Language Processing

Podcast Transcript

Share on

Related Podcasts

January 9, 2026

January 6, 2026

January 2, 2026

Podcasts SDS 513: Transformers for Natural Language Processing

Share

SDS 513: Transformers for Natural Language Processing

Podcast Transcript

Share on

Related Podcasts

January 9, 2026

SDS 956: From Agent Demo to Enterprise Product (with Ease!) feat. Salesforce’s Tyler Carlson

January 6, 2026

SDS 955: Nested Learning, Spatial Intelligence and the AI Trends of 2026, with Sadie St. Lawrence

January 2, 2026

SDS 954: Recap of 2025 and Wishing You a Wonderful 2026