Podcasts SDS 473: Machine Learning at NVIDIA

73 minutes
Data Science, Machine Learning

SDS 473: Machine Learning at NVIDIA

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

We cover the cutting-edge interdisciplinary research Anima carries out, the world-changing innovations this research kickstarts, what a workweek for her looks like, the skills to develop and the tools to know, how biological neuroscience research informs deep learning, and more!

About Anima Anandkumar

Anima Anandkumar is a Bren Professor at Caltech and Director of ML Research at NVIDIA. She was previously a Principal Scientist at Amazon Web Services. She has received several honors such as Alfred. P. Sloan Fellowship, NSF Career Award, Young investigator awards from DoD, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum’s Expert Network. She is passionate about designing principled AI algorithms and applying them in interdisciplinary applications. Her research focus is on unsupervised AI, optimization, and tensor methods.

Overview

We started off by discussing our shared practice of yoga. Anima has been studying the practice of yoga through social media and digital resources, which she said is a great example of the communal aspect of yoga. For me, yoga was a hugely transformative force in my life that helped me be more present outside of my work. As Anima says, life is precious and we need to find ways to be productive that are sustainable and healthy, both during the pandemic and on.

On any given week, Anima serves both as a faculty member at CalTech and a researcher at NVIDIA. Luckily, her research often goes hand-in-hand and informs each other. Currently, Anima’s team is developing broadly generalizable AI that can be adaptive which includes not only the development of algorithms but how to assign benchmarks to their progress. They also train robots at scale in simulations with expanding computational abilities (she talks about one day hopefully having a supercomputer as a car). NVIDIA’S sweet spot is problems with a lot of computational complexity. Building broadly applicable AIs are important, according to Anima, to this revolution. Today people are using thousands of NVIDIA GPUs in parallel to carry out models. Anima and NVIDIA are focused on the scalability of this and going beyond narrow tasks to robustness in a real-world application for much harder problems.

To this end, we discussed tensors which have always been at the core of Anima’s research. She saw the limitations of linear algebra in unsupervised learning. A tensor offered more dimensions for containers. This, in practice, allows for the ability to examine relationships and context to determine information around data (for example “apple” could be a fruit or a company, depending on its application). We discussed some of the more detailed parts of tensors and matrix operations, which can be studied further in a resource Anima shared below.

Omniverse and Isaac are two of Anima’s favorite projects at NVIDIA. They bring the infrastructure and tools together in an “AI factory” that mimics physical machine line factories in a digital twin which includes a graphics aspect, a physics aspect, and a process to close the gaps between simulations and reality. To accomplish these tasks, Anima’s team utilizes PyTorch and Weights & Biases. They also utilize some native tools through PyTorch to streamline processes.

We pivoted into discussing interdisciplinary sciences, something that interests me specifically in how neuroscience can inform the creation of neural networks. Anima talks about how we ourselves learn unsupervised from very early in life, how can we apply that to machines that can utilize generative feedback and actually be able to ponder? We discussed the needs to reach this, domain knowledge in addition to data, zero-shot generalization, faster deep learning methods, and other tools and practices. She wants to build strong foundations for a potential future generalized artificial intelligence built off her work.

A foundation of strong mathematical skills and skills in deep learning tools is key for getting into this field and making an impact. Don’t be afraid to get your hands dirty and ask questions, test hypotheses, and do it all over again. Be a quick learner, willing to take risks, but understand what the risks mean. We discussed the metaphor of the elephant and the blind man, which features a blind man examining the low-level parts of the elephant and not understanding what each part means in the larger picture. You need to have a high-level understanding and high-level viewpoint on your work.

In this episode you will learn:

Anima’s recent discovery of yoga [5:20]
How does Anima balance her work? [12:25]
Applications of Anima’s work [14:45]
Tensors [22:55]
Anima’s favorite NVIDIA projects [35:35]
What tools does NVIDIA use? [41:55]
CalTech interdisciplinary science [47:41]
The path to generalized artificial intelligence [57:19]
The skills to have to get into this field [1:00:27]
LinkedIn questions for Anima [1:07:03]

Items mentioned in this podcast:

Spectral Learning on Matrices and Tensors by M. Janzamin, R. Ge, J. Kossaifi and A. Anandkumar
Data Science Insider
TensorLy
GTC 2021 Conference
MinkowskiEngine
NVIDIA Isaac
NVIDIA Omniverse
NVIDIA Clara
PyTorch
Weights & Biases
Automatic Mixed Precision for Deep Learning
Bongard-LOGO
Learning Deep Learning by Magnus Ekman

Follow Anima:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon: 00:00:00

This is episode number 473 with Dr. Anima Anandkumar, Director of Machine Learning research at NVIDIA, and professor of computing and mathematical sciences at Caltech.

Jon: 00:00:12

Welcome to the SuperDataScience podcast, my name is Jon Krohn, chief data scientist, and best-selling author on Deep Learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today and now let’s make the complex simple.

Jon: 00:00:43

Welcome back to the SuperDataScience podcast. I am so excited to have professor Anima Anandkumar on the show today because she is an absolute rockstar. Anima obtained her electrical engineering PhD at Cornell, before carrying out postdoctoral research at MIT, and then landing a professorship at the University of California Irvine. She now is distinguished as a Bren Professor at the California Institute of technology, typically known by its short form Caltech. A university ranked by several sources in recent years as the world’s number one research institution.

Jon: 00:01:19

As if that didn’t already merit my absolute rockstar moniker for Anima, she’s also Director of Machine Learning research at NVIDIA, the world’s number one hardware manufacturer in the artificial intelligence space, and a top research institution in its own right. In today’s episode, we cover the cutting-edge interdisciplinary research Anima carries out at NVIDIA and Caltech, applying highly optimized mathematical operations to allow state-of-the-art machine learning models to be executed on NVIDIA’s state-of-the-art hardware such as on their prized GPU’s. This blending of leading software and leading hardware enables world-changing innovations across disparate fields, from healthcare to robotics, and Anima provides countless examples of such real-world applications. Anima tells us what it’s like in the workweek of a researcher like her bridging the academic and industrial realms, including the open source data science tools and techniques that she most highly recommends.

Jon: 00:02:21

We discuss the skills you should develop if you’d like to devise or deploy state-of-the-art artificial intelligence approaches yourself. And Anima blows my mind by filling me in on how biological neuroscience research is inspiring deep artificial neural networks that can learn in just a few training examples like an infant human can. This episode is relatively technical at times so might appeal most to seasoned data scientists. But I did my best to break down and rephrase the main points of all the technical content we delved into in order to enable anyone to follow along, including folks only getting started in data science. This episode will be hugely interesting to anyone aspiring to carry out cutting edge AI research, or even if you’d simply like to better understand how AI is changing everything. All right, you ready for this incredible episode? Let’s go.

Jon: 00:03:21

Hi Anima, welcome to the show. I’m so excited to have you here. Where in the world are you?

Anima: 00:03:28

Thank you Jon. It’s a pleasure to be on the show. I’m currently in California in Bay Area. Such a beautiful day outside, but like you know, most of us are still in virtual meetings. But the good thing is we can connect to people across the world now.

Jon: 00:03:46

It’s true. We do have a beautiful day here in New York as well. I haven’t been to my office now in a year but it’s starting to get better. It’s amazing. Over the last month I’ve been watching every day the positive infection rate for coronavirus on Manhattan. A month ago it was 3% and today it’s under 1% so getting there.

Anima: 00:04:09

Indeed, I’m fully vaccinated and it’s just such a load off. You feel mentally relieved, and slowly getting to meet people, and getting back to some form of normalcy. On the other hand back in India where I’m from, where I grew up, the sad thing is there is such a big surge. And I hope we can get all the supplies, and vaccines there, and do our best to save lives.

Jon: 00:04:45

At the time of recording, at the beginning of May, it is one of the biggest coronavirus tragedies that we’re aware of on the planet. There could be places where there’s less testing, and maybe it’s also very bad. But in terms of any places getting tested, It’s really terrible how it’s skyrocketed in the last few weeks. I don’t know, maybe that’s something if listeners are listening, there are lots of places to contribute. You’ll probably see on social media people are posting places that you can donate so that supplies can be provided. Shortages of oxygen for example are a big issue right now in India.

Anima: 00:05:19

Yes, indeed.

Jon: 00:05:20

Speaking of India and the pandemic, you and I before starting recording we were talking about how you just discovered yoga, which is interesting because you’re from a region that is the birthplace of yoga, right?

Anima: 00:05:33

Yes, indeed Jon. I’m actually from Mysore, which in the 18th century was the King of Mysore who in a way rediscovered yoga. Yoga was in the scriptures, but he really encouraged people to practice that and formed yoga schools. And that’s where it took off to the world from there. But growing up I did Indian classical dance Bharatanatyam which has elements of yoga, which I suppose is similar to that. But I wasn’t kind of doing yoga in a very disciplined manner. And when the pandemic struck and we’re all inside, the question is what can I do without a gym, without any equipment? And that’s when I started doing it daily and saw such an immense improvement in my overall wellbeing, in my flexibility. And just how my emotional and mental state, especially when we were going through so much of uncertainty around the pandemic. I really hope more people discover the joys of doing yoga. And I got to learn that you’re a yoga teacher, which [crosstalk 00:06:47].

Jon: 00:06:49

There are some really funny… Maybe at some point I should add them to my YouTube channel, because people will probably find it really funny. But years ago I was a yoga instructor and there’s even videos from… Cosmopolitan Magazine created a portal for fitness called CosmoBody and I did yoga videos.

Anima: 00:07:08

Wow! [crosstalk 00:07:09] CosmoBody, that’s something I’m going to find.

Jon: 00:07:15

I don’t know. But there are some clips on YouTube, and I guess I could add them to some random thing on my channel. Maybe I’ll do that. I don’t know. Maybe I’ll be too embarrassed.

Anima: 00:07:23

I should totally go and take a look at those, maybe I can get useful tips to include in my practice.

Jon: 00:07:29

I don’t know. What have you been using to… If you weren’t studying it before, and now you’ve been doing it every day, how have you been studying it?

Anima: 00:07:39

I think that’s where there is the positive of social media, there are lots of videos. But also an Instagram just looking at how people are correcting poses. It’s kind of like now we are acquiring knowledge in so many different ways, and I think that’s useful because you can no longer have a physical instructor amidst a pandemic. But there are so many other ways to discover doing yoga and so many interpretations, which I think is very healthy to me. That’s how we evolve culturally, when people start bringing in their own ideas into also what yoga is.

Jon: 00:08:23

Yeah. I agree. Basically you’ve gotten an eclectic mix from various sources so you’re kind of like, “I’m going to do it every day or pretty much every day, but exactly my source for the particular flow that I’m going to do today is variable.” That’s cool.

Anima: 00:08:39

Yes. And sometimes I do it just with no video all right at this point. It’s like [crosstalk 00:08:44].

Jon: 00:08:43

Nice, you’ve really gone along.

Anima: 00:08:45

Yeah. Kind of movement as a dancer, kind of just move so have to fuse everything together. And I think that’s the most important part of yoga that’s been helpful, is to look inward. I mean it’s not a competition of how much can I balance or am I better than these other people? Leaving all that and just kind of looking inwards and listening to my body, I think that’s been the most beautiful aspect of yoga.

Jon: 00:09:16

Yeah. I mean, I don’t want to focus on any part of my journey. This is all about you on this episode, but yoga was a huge transformative force in my life that brought me… I used to be a trader at a hedge fund and I started doing regular yoga practice, and I would have experiences on the mat where for a few seconds I was like, “Wow! I’m here in my body in this room.” Instead of being caught up in the trades that I was making that day, which is what was going on in my mind all the time. And I was like, “Wow! It would be nice to feel present for more than just a few seconds.” And eventually I realized that I wasn’t going to be able to keep being a trader.

Anima: 00:09:57

That’s really beautiful though that you could discover that, because so many people are just caught up in the stresses of daily life. I mean, I think that’s what the pandemic has also shown us, how life is precious, and how we want to make the best of it. And find ways to be very productive, but in a sustainable way that’s also healthy for us.

Jon: 00:10:23

Beautifully said. Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such I only subscribe to a handful of email newsletters, those that provide a massive signal to noise ratio. One of the very few that meet my strict criterion is the Data Science Insider. If you weren’t aware of it already, the Data Science Insider is a 100% free newsletter that the SuperDataScience team creates and sends out every Friday. We pour over all of the news and identify the most important breakthroughs in the field of data science, machine learning, and artificial intelligence, simply five news items. The top five items are handpicked, the items that we’re confident will be most relevant to your personal and professional growth.

Jon: 00:11:18

Each of the five articles is summarized into a standardized easy to read format, and then packed gently into a single email. This means that you don’t have to go and read the whole article, you can read our summary and be up to speed on the latest and greatest data innovations in no time at all. That said, if any items do particularly tickle your fancy then you can click through and read the full article. This is what I do, I skim the Data Science Insider newsletter every week, those items that are relevant to me I read the summary in full. And if that signals to me that I should be digging into the full original piece, for example to pour over figures, equations, code, or experimental methodology, I click through and dig deep. If you’d like to get the best signal to noise ratio out there in data science, machine learning, and AI news, subscribe to the Data Science Insider which is completely free, and no strings attached at www.superdatascience.com/dsi. That’s super datascience.com/dsi. And now let’s return to our amazing episode.

Jon: 00:12:25

All right, speaking about being productive in a sustainable way, you are an incredibly accomplished person and I’ve been so excited… I’ve known for months that you were going to be on the show, and I’ve been excited for months. You have two primary roles, you’re the director of machine learning at NVIDIA, and you’re also a professor at Caltech. I guess my first question is how do you strike that balance in a given week? Do you have kind of like, “I spend a certain number of days on one or the other.” Or how does that work?

Anima: 00:12:58

Yeah. That’s a great question Jon. I think right now I’m really privileged to be in a place in the AI revolution where there’s a lot of open research. What we are doing at NVIDIA is open sourcing, and publishing widely, and making that available to the community. Democratization of AI is such an important part of the goal. And that’s where university and industry research can go hand in hand, and find ways that we can benefit both a university and the open source community. In that sense there are certainly projects that are ongoing in both the places, and there are sometimes good synergies and we exploit that. It’s [crosstalk 00:13:49].

Jon: 00:13:48

That’s perfect.

Anima: 00:13:50

And especially with the pandemic, things going virtual, it’s a lot more seamless to look at different projects and how they are progressing. But I do miss sometimes the in-person interaction.

Jon: 00:14:03

100% [crosstalk 00:14:04]. Actually, we’re interviewing right now, we’re hiring data scientists. And I had an amazing interview with someone last week and I was like, “Would you be comfortable meeting in-person?” We don’t have an office right now, we decided to save money by not having an office. And so he came over with my existing data science team to my apartment and I got them sandwiches. And we got to do this interview in-person, have a physical whiteboard, and it’s just so nice. There is really something about it. Anyway, it’s great to hear that there’s this perfect synergy between both of those parts of your life. Do you want to fill us in on some specific projects, some specific applications that you’re carrying out?

Anima: 00:14:51

Yeah. Absolutely. At NVIDIA my team is looking at how to develop next generation AI algorithms. Meaning from these currently narrow domain, mostly supervised learning, how do we go to broadly generalizable AI that can work with a labeled data that is robust, that’s adaptive? Ultimately it becomes embodied meaning you have robots that are agile and intelligent. All lofty goals, but grounded with foundations in terms of how do we develop these new algorithms? How do we have benchmarks and infrastructure to evaluate them? And that’s where the synergy of NVIDIA’s of course large-scale GPU infrastructure, data centers, but also verticals. If you see what NVIDIA has been doing over the last few years is to build robust infrastructure for different verticals.

Anima: 00:15:52

NVIDIA Clara for healthcare, NVIDIA Omniverse for graphics, and using that as a one-stop shop for all kinds of graphics processing along with AI methods. Isaac for robot learning, how we can train robots in simulation. Because we can do that at scale we can get slots of simulated data, and then take that and deploy them onto real robots on our EGX, or the Edge platforms with Jetson’s and Xavier’s. And now we have cars that will soon be super computers. We’ll have such amazing computational abilities in our future cars that will… Self-driving is one aspect of it, but even the integration of all the sensors, the availability of good user interfaces, driver assistance.

Anima: 00:16:46

All these are aspects where we are partnering with companies like Mercedes Benz and BMW, from having cars that are very intelligent to having manufacturing facilities that are also intelligent. And you can have robots that are social, and that can navigate, and do all kinds of tasks in such complex environments. At the NVIDIA side there’s so much activity going on, and it’s really looking at where we can make a very big impact on these all challenging problems. And that starts with building robust infrastructure, great GPUs, great hardware efficiency, and then enabling algorithms that are able to exploit that.

Jon: 00:17:37

I guess in a way, NVIDIA’s sweet spot probably is in problems that have a lot of computational complexity, where those kinds of efficiencies that you have in being the world’s foremost GPU manufacturer. This might be known to many listeners, but we should probably spend two minutes on it just in case. Which is why people train machine learning models, particularly deep learning models on graphics processing units? The GPU’s. I could do a little spiel, but I feel like you could do it much better than me.

Anima: 00:18:12

Yeah. Thank you Jon. Many people don’t realize that this deep learning revolution would not have happened if not for NVIDIA GPUs. And taking a step back, when we think of the deep learning revolution we all know deep learning consists of deep neural networks. It’s the flexibility of these networks to learn features that was really critical, but they’ve been around for a long time. What made the difference in the past decade? One was data, we have web-scale data. Dr. Fei-Fei Li created the ImageNet and the rest is history. But you had to finally run them on compute infrastructure that was scalable, and that could easily implement neural network primitives. And that’s the common point between GPUs and deep learning, is the availability of linear algebra primitives. And then quickly building on top of that like cuDNN and other CUDA frameworks to enable the scale, otherwise we would never have realized this. But that is again the beauty of it, is the common language of math, linear algebra, that’s the foundation of almost all algorithms and for any-

Jon: 00:19:32

Matrix multiplication everywhere.

Anima: 00:19:33

… Yes. The world is about matrix multiplication, and now tensors which I can get into in a bit, but-

Jon: 00:19:41

Yeah. Oh, sorry. Go ahead.

Anima: 00:19:41

… The important aspect is for NVIDIA to have the foresight to be not just focusing on graphics acceleration back in 2000s, but to say, “We need CUDA because we need to democratize GPU programming. And that’s how we can then look at applications beyond graphics, and other scientific applications, and ultimately AI.” I think that kind of formed foresight into building platforms that are broadly applicable is so important for this AI revolution.

Jon: 00:20:14

Beautiful. I guess in the 2000s primarily NVIDIA GPU’s which are by far the most widespread GPU’s, by far the state-of-the-art GPU’s from any chip manufacturer. And I guess in the 2000s they were primarily being used for rendering graphics, so people who are editing videos, people who want to be able to play video games with a really fast frame rate. And anybody in the 2000s who wanted to be able to do those kinds of things at the state-of-the-art level, they’re buying in video GPU’s. But because of that open sourcing of CUDA and allowing people to program with CUDA, it allowed people like Alex Krizhevsky at Geoff Hinton’s lab in the early 2010s to take Fei-Fei Li’s dataset that you mentioned, the image net dataset, and allow that to flow through a neural network architecture now called the AlexNet, after Alex Krizhevsky. And he had this idea of let’s take two NVIDIA GPU’s, he knew how to program them with CUDA. And then split the data training over these two GPU’s and now that kind of idea happens… I mean you probably know better than me, but there must be individual projects where people are using thousands of NVIDIA GPU’s in parallel to carry out this matrix multiplication, that’s a key part of training a neural network.

Anima: 00:21:32

Absolutely. When we are now looking at the large-scale language models, how do we enable such heavy computational requirements? We have open-sourced Megatron. Again the source code is available online, where it’s a mix of data and model parallelism. How do you get the best efficiency out of say thousand GPUs or even more? And I think it’s exciting times. What is the power of scale? What can scale bring to AI? But also what is the power of good hardware design, good algorithm design? Now we are going from somewhat low hanging fruits. In the beginning, looking back it’s obvious because once you have data, and the GPU’s, and neural networks, magic happens. But now the question is where do we go from here? And that brings to going beyond narrow tasks to broadly generalizable AI. Looking at robustness, real world applications require fairness, detection of bias. These are much harder problems, but that’s where now we are in a position to address them, we are in a position to do research on them. And that’s where I see great opportunities.

Jon: 00:22:55

Nice. There’s a million places that we could go from right here, but you’ve mentioned tensors a few times. Is clever programming around tensors and how they’re applied to GPU’s? Is that part of the key to being able to generalize more and more? I guess we should explain what tensors are too in a few sentences.

Anima: 00:23:12

Yeah. Absolutely Jon. Thanks for bringing that up. Tensors have been at the core of my research even before deep learning was in the pitcher. And the idea how I got into tensors was as I said, linear algebra is at the foundation of most of machine learning methods. But we were also starting to see the limitations of what linear algebra can do, and that was in the framework of unsupervised learning where you want to discover underlying latent factors. For instance, think of a huge collection of documents, say news articles, and you want to automatically categorize them. But as each news article could have multiple topics in there, so you could have not just a monolithic like gene classification, and now you have lots of mixtures of topics, and they’re also not labeled.

Anima: 00:24:11

And so these kind of hard discovery problems you can ask, “How can I use different machine learning methods to discover the underlying topics?” And if you’re looking at simple methods such as spectral methods, looking at principal component analysis, extract the most informative part of the signal to linear algebra, that gives you some information but not everything. And so the natural question was can we go to more dimensions?

Jon: 00:24:42

[crosstalk 00:24:42]. Sorry. Principal component analysis allows us to take on labeled data like you mentioned. We could have a dataset of millions of news articles, and we don’t know whether they’re politics or sports. And then as you said, there could be situations where a politician goes to a sports game, and is that news article… Does it fit into politics or sports? It’s hard to tell and you have these mixture topics. With principal component analysis we can estimate what we call the principal components, the dimensions in the data that correspond to the most variation in the data, that account for the most signal. But you and I even prior to recording, and now that we’ve been recording, I think you mentioned the word for the first time, spectral. Actually I don’t know what that means in this context so you’re going to have to explain it to me.

Anima: 00:25:37

Yeah, absolutely. When we’re doing principal component analysis, we are extracting the spectrum of the matrix. Spectrum means Eigenspectrum. What are the principal directions where there is the most energy? People use the term spectrum to mean the Eigenspectrum, mostly the top eigen components. That principal has been at the foundation of a lot of machine learning methods. But the limitation is principal component analysis looks at only the co-variants of the data matrix, meaning only pair wise relationships. In the case of the document example, you can think of it as looking at co-occurrence of two words in a document. Let’s say the word apple occurs a lot in the document, you don’t know is it about fruit, is it about a company?

Anima: 00:26:29

Let’s say the word apple and orange go together… Even then orange could also be a company and a fruit. But let’s say you keep expanding now to more dimensions, you have co-occurrence of apple, orange, banana, grapefruit. And then now at some point you become more confident that it’s a fruit because you have now information of these higher order relationships. And that’s the main principle of how tensors can be much more powerful in giving access to such information. And those are the structures we exploit in the algorithms we are designing. And back in the early 2010s when we started working on these tensor methods, the idea was to apply to such document corpus in an unsupervised way. And at scale over billions of documents, be able to extract topics automatically.

Anima: 00:27:22

And when I went to Amazon Web Services, I helped build this onto their cloud platform in the topic detection tool. It was really nice to go from a theoretical concept, understand its properties, analyze that, build robust code now, and have that be running at scale. Those are the end-to-end kind of things we can do in AI now, and in a very short period of time. And that’s very exciting.

Jon: 00:27:50

Yeah. That’s another beautiful example of the way that you straddle the academic world and industrial applications. You’re doing research at the time… I don’t know, was it your postdoc research at MIT?

Anima: 00:28:06

It was the beginning of my research as faculty at UC Irvine.

Jon: 00:28:10

There you go, at the University of California, Irvine Campus. And you’re able to take that theoretical research, this ability to use tensors, which we still should define. To be able to perform this kind of topic modeling in a more powerful way than has been done before, and then apply that directly at Amazon Web Services. Who at that time like they are still today, the biggest cloud provider on the planet. And when you’re talking about at scale there is no bigger scale than [crosstalk 00:28:42].

Anima: 00:28:43

Absolutely. And that’s where I think it’s the combination of core algorithmic foundations with scale, I think that would create a lot of magic. And coming to your point about defining tensors, think of them as extension of matrices to more dimensions. At the very kind of basic definition is matrices have rows and columns. Now, if you extend it to three dimensions it’s like a cube, it’s a multi-dimensional array. But the difference is it’s not just a collection of elements. Just as in a matrix, you can now think of a low rank matrix, you can think of projecting onto a subspace. We can similarly use the tensor as well to do operations. What would a low rank tensor look like that’s effectively using fewer parameters to describe this huge cube? And you can extend this notion of spectral analysis to tensors, and if you’re interested we have a book called Spectral Learning on Matrices and Tensors by Now Publishers. It’s openly available, you can download it.

Jon: 00:29:56

Oh, it’s freely available?

Anima: 00:29:57

Yes. It is.

Jon: 00:29:59

Wow! That’s nice [crosstalk 00:30:00]. Beautiful, Spectral Learning on Matrices and Tensors, your book which is on all of these topics that we’ve been talking about the last few minutes. Linear algebra, tensors, matrix operations, identifying the most important components of a tensor or a matrix like the eigen spectrum, the eigen vectors, and eigen values. Very cool that that’s open source.

Anima: 00:30:26

Yeah. Absolutely. And we also have TensorLY as an open source framework that has multiple backend. It’s Python based but you can connect it to of course PyTorch, but also JAX, MXNet. And the idea with this is you can easily now use tensor methods. You want to decompose a tensor, already there are efficient methods available and running efficiently on GPUs. But you can also now define tensor operations as layers of a neural network. As we mentioned in the beginning, neural network layers are primarily consisting of matrix multiplication and other simple operations, but there’s no need to make them just matrices.

Anima: 00:31:10

We can expand to more dimensions and that’s what tensors provide, the ability to have much better inductive bias of data. Think of your data as multi-dimensional. Think of video or multimodal data, there’s no need to put them all into a matrix in these layers. If you keep track of the different dimensionalities, you can come up with networks that are much more compact. So you compress them you have fewer parameters but actually better accuracy, because they’re generalized better. And they’re also more robust, so you’re getting multiple benefits together by expanding your neural network architecture to have these huge possibilities of tensor operations.

Jon: 00:31:58

Very cool. As an example I’ll try to illustrate and you can tell me if I’m right or wrong. You mentioned videos, with video we have of course… If you pause the video you have the rows and columns which is a matrix. But even just with that paused video if it’s in color, you have to have three color channels, red, green, and blue. Already you can’t capture the information in a single matrix so you could say, “I’ll have a red matrix, a green matrix, a blue matrix for this paused frame of video. But why not put that in a tensor with depth?” Now you have rows and columns for all the pixels, and three layers of depth for the three color channels, but we can press play on the video.

Jon: 00:32:42

We can have that three-dimensional tensor have a fourth dimension which is time. And you can’t picture that in your mind. Our brains don’t do it but computers can represent any number of dimensions. And I guess what you’re saying is if we instead of trying to deconstruct that video into matrices, and force that through a neural network, we can leave it in its four-dimensional initial state and the operations are more efficient. And also I guess we identify more signal. And then you said something about inductive bias. I don’t-

Anima: 00:33:19

Yes. I mean the inductive bias means trying to mimic the structure that’s present in data. We’re doing that with convolution because we expect translation in variance to be what we see in the way matrices are formed. And similarly if the data is multimodal, you have all these four dimensions or more dimensions. You’re trying to kind of mimic that structure and that means you should have a much better neural network with that.

Jon: 00:33:51

… That makes perfect sense too. Instead of trying to break it into pieces and study them in that kind of reductionist way, you’re looking at the whole picture, or the whole video holistically, and identifying patterns in that. That makes a lot of sense to me. The library that you’ve opened sourced for that out of NVIDIA is TensorLY. Tensor, T-E-N-S-O-R, LY, L-Y? TensorLy?

Anima: 00:34:12

Mm-hmm (affirmative). Check it out. We also have notebooks and tutorials. Recently we gave a NVIDIA GTC talk at the GTC Conference that’s also openly available. Lots of resources. I should also add another tensor framework that we have open sourced from NVIDIA, which is the Minkowski Engine, which focuses on sparse tensors. If further you have sparsity as the structure, which is the case say with point clouds and other 3D processing. That kind of information can you also propagate? Can you keep your network to do sparse tensor convolutions efficiently, and do that on GPU as well? That’s also I think an excellent framework for all the upcoming area of 3D vision and beyond.

Jon: 00:35:06

Nice. And so Minkowski, I’m taking a stab at spelling this correctly. M-I-N-K-O-V-S-K-Y.

Anima: 00:35:13

K-O-W -S-K-Y.

Jon: 00:35:16

Come on you Russians. That’s what we get for trying to take the Cyrillic alphabet and throw it into ours. All right, so Minkowski. Brilliant and very cool to hear about these particular open source projects that you’re working on at NVIDIA. Earlier when you were talking about NVIDIA work, you were talking about so many different kinds of applications, Clara for healthcare, Isaac, I assume named after Isaac Asimov or robot learning?

Anima: 00:35:50

Absolutely. Absolutely.

Jon: 00:35:52

Are there any particular projects at NVIDIA that you’re really passionate about, or involved in that you’d like to dig into some detail on?

Anima: 00:36:00

Yeah. I’m very passionate about Isaac and Omniverse. Because want Omniverse does is bring together the two important strengths of the company together in a very synergistic way. All the graphics processing, rendering, and AI tools, how do we bring the infrastructure? And it starts with first having all the assets. We are creating an AI factory which mimics what the BMW manufacturing plants look like. And for that, everything from cars, to humans who are assembling it, and other machinery. How do you bring all this to the simulation environment?

Jon: 00:36:42

A digital twin.

Anima: 00:36:45

Absolutely. A digital twin. Creating that is also an involved process, and that’s where also having great tracing technology which renders highly photorealistic images, and being able to have that visual quality. But more importantly also the physics of it. If you’re now say holding onto some object, it should be physically valid, because then we can transfer it to a real robotic arm by first doing it in simulation. There is the graphic side of it, there is the physics simulation. And doing all this first in simulation and asking there’s a domain shift between simulation and reality, how do you close that gap? What are robust methods that can handle it? And one of the projects we’ve been doing is with a four legged robot we call Laikago that can know that we have very realistic simulations, that are physically valid. And we teach it all kinds of skills, and that’s the beauty of simulation.

Anima: 00:37:49

You can give it all kinds of curriculum. You can try to let it first walk on a slow treadmill, maybe only one side of it is moving. You can try to get it to turn, you can try to ultimately get it to skateboard in simulation which we all did. And then you can bring it to the real robot, and we do hierarchical reinforcement learning, which means you don’t learn from scratch in simulation. Say I have a goal or rather a game play is being done, for robotics this would just not transfer. And it’s too hard to do everything from scratch. The idea is to bring in all the control and robotics background into that, and ask how do we add learning on top of it? And have the right mix, and with that we can do efficient transfer to the real robot.

Anima: 00:38:37

Same with the robotic arm, how do we get something that humans do so easily, we pick up all kinds of objects and we can do all kinds of crazy juggling tricks. Can we get a robot to juggle? And the idea is it’s much more feasible in simulation first, because you can have a lot more trials and that’s the beauty of it. We’re very excited about the possibility of sim to real by using the powerful graphics rendering, and physics-based simulations where NVIDIA has deep expertise in, and combine that with AI methods.

Jon: 00:39:19

Nice. It’s kind of an overarching theme that I’m starting to get from this. I mean you said this, but guess I’m bringing it back up because it’s becoming increasingly clear. That one of the most beautiful things about what you get to do at your job at NVIDIA is that NVIDIA has all of these hardware assets available to them. GPUs, I didn’t know about the robotics aspects of it. And then you get to figure out how can we efficiently create software tools? And that goes down to the lowest levels. Tools like TensorLy and Minkowski that allow you to efficiently perform not just matrix multiplications, but other more advanced tensor operations at this very small level.

Jon: 00:40:12

And then how can we build up those very small linear algebra operations at scale across GPUs? Use a whole bunch of GPUs to in parallel perform these linear algebra computations. Efficiently learn, and then we can take those model weights that maybe we train up in a simulated environment. And then put them into an actual physical real world device whether it’s the BMW factory or… I didn’t catch the name of the four legged robot. It’s something like-

Anima: 00:40:51

Laikago.

Jon: 00:40:57

… Laikago?

Anima: 00:40:57

Yeah. Absolutely, you’re right Jon. The idea of NVIDIA as a company, there’s always the full stack view, and the barriers to collaboration are non-existent. That’s where I can think about methods that go to the core of hardware efficiency. For instance, better quantization methods to having frameworks like TensorLY, and Minkowski engine that can do efficient tensor operations, but also have keen Python fronted for developers to use. And then on top of it we have all these amazing platforms like Omniverse, and Clara, with very application focused infrastructure. And how do we contribute that impact all the way through? And that’s where I think NVIDIA is being part of a big family.

Jon: 00:41:49

That [crosstalk 00:41:49] sounds great.

Anima: 00:41:51

Innovation is so seamless.

Jon: 00:41:53

Wow! People who do machine learning at NVIDIA are like you, what kinds of tools do you use? You mentioned PyTorch already. I imagine to get really accelerated… I mean obviously things like CUDA the library for programming GPU’s, cuDNN which allows CUDA to perform deep neural network operations. Obviously those are some of the kinds of tools that NVIDIA is closely involved with, but how about you? What do you use day to day?

Anima: 00:42:23

My team is working on a variety of different tools. Absolutely PyTorch is the one we tend to use a lot because of the ease of programming. But also tools like weights and biases, for instance TensorBoard is the common one. But with weights and biases you can track your training very well and keep track of experiments. Those kind of tools are great to value addition. Our team values that quite a bit.

Jon: 00:42:52

It’s an MLOps tool?

Anima: 00:42:54

Yes.

Jon: 00:42:55

Machine Learning Operations tool, weights and biases. They’re cool and they do a lot of education as well which is neat.

Anima: 00:43:03

Yes. Indeed, and PyTorch Lightning is another one I’m very excited about. Because of the modularity on top of PyTorch you can now make the code reusable, you can put different modules together. It’s very clean code writing so I think it brings out better practices into how we organize our software code. Those are the ones I would suggest. And the other important one is AMP or Automatic Mixed Precision. That’s-

Jon: 00:43:36

I haven’t heard of that.

Anima: 00:43:37

… natively available with PyTorch and Tensor Flow, and the idea is that… This is developed by NVIDIA to maximize the GPU efficiency, so automatically you’ll figure out what is the level of precision you want in different operations. And that’s where we are heading where [crosstalk 00:43:56] don’t need full precision. You can get away with really low precision. And tools like that are really great in improving efficiency.

Jon: 00:44:07

Cool. That’s really useful. I hadn’t heard of a specific tool for doing that. We’ve hacked together ways of reducing precision within Tensor Flow or within PyTorch, but it’s cool that there is a library for optimizing that. In case the listener isn’t aware this is hugely useful, because when you’re using hundreds of GPU’s in parallel for doing your computation on this massive dataset, the way that you’re representing your numbers, the precision that you have on those, the number of places after the decimal that you’re handling, it turns out that you don’t need very many.

Jon: 00:44:44

And by getting rid of many of the numbers after the decimal, you can save a lot of space. So your compute is a lot cheaper, things progress a lot more quickly, training progresses more quickly. Inference time and production runs more quickly, so maybe an application that you thought you couldn’t run in production now, “Oh, we just found… And just get rid of some of this precision, and we can run it in production at the cost that we were looking to.” So hugely valuable especially when you’re working at a massive scale like you are.

Anima: 00:45:16

Indeed, and it turns out we can push this even further. One of the papers, we had NeurIPS last year, where NuerIPS is the main machine learning conference.

Jon: 00:45:27

The premiere machine learning conference, no question.

Anima: 00:45:29

Indeed, where we figured out that in most of these if you’re looking at floating point representation, there’s a part for the exponent, there’s a part for mantissa. Essentially it’s like exponent is how many you want these wide dynamic range, and then within that, the precision lies by the mantissa. And we figured out you can just throw away the mantissa so it can just be [crosstalk 00:45:57].

Jon: 00:45:57

What! Wow! That’s crazy.

Anima: 00:46:01

Yeah. And the inspiration comes from neuroscience. Markus Meister who’s a neuroscientist at Caltech whom we collaborated with. And there’s a lot of evidence in the brain that’s how information is stored. That that’s what leads to low power requirements, and still the ability to represent signals with a large dynamic range. And we’re able to now train networks with as little as eight bits in the logarithmic number system that is as good as full precision framework, say with the [inaudible 00:46:38] all of standard models which is very exciting. If you can get down to very low bit requirement, you can then run them on Edge devices. You can really decrease the cost of training and inference, and make this widely available. And that’s where you see there are existing tools, but we are also looking futuristically how much can you get rid of precision? And still have useful models?

Jon: 00:47:10

That’s super cool. I didn’t know we were that far along. And I’m excited to hear that, I’m also excited to hear the neuroscience link. Something you probably don’t know about me yet Anima, is that I have a PhD in neuroscience. In my book Deep Learning Illustrated, whenever I have the opportunity, I love drawing parallels between the neuro scientific inspiration of artificial neural networks, including deep learning neural networks. And I’m glad to be able to add another one to my arsenal here. And it also gives me the perfect segue to talking about Caltech interdisciplinary science in general. It sounds like by being able to work with neuroscientists… I mean you just gave an example so it doesn’t just sound like. It’s obvious that through these kinds of interdisciplinary approach that you get through working at Caltech, you get exposed to ideas that lead to truly transformative applications like you just described in the precision space. Do you want to tell us a bit about what Caltech interdisciplinary science is all about?

Anima: 00:48:13

That’s great Jon, you wear so many hats. And I do believe that neuroscience is very much crucial for getting AI to the next level, going from narrow AI to broadly generalizable AI. We humans have amazing abilities in our reasoning skills. In our abilities to compose entirely new aspects say in music or an art that was never seen before. We are able to learn as babies in an unsupervised way. Just look at the world around us get such a robust representation of the world. How do we manage to do this? In another project with [inaudible 00:48:57] who’s a neuroscientist at Caltech, what we’ve been exploring is the feedback mechanisms. In our brain we are looking at… If you are looking at me, you’re processing through the retina, and it goes into the visual cortex in a feed-forward manner.

Anima: 00:49:14

But we also have feedback from eye to cortex and other parts of the brain based on an internal model. And the idea is it has some kind of generative information. That’s how we dream, we have this innate, because we can visualize without even opening our eyes. And that ability is currently missing in AI models. What if we build this? What if we can provide the ability to not just do a feed-forward prediction, like we do it on the ImageNet dataset to recognize what’s in an image, but also have a feedback. Have a model that’s learned in a generative way, and let it kind of ponder more. The idea with feedback is you’re not just doing a one-shot decision. You are instead kind of thinking in a way of course without realizing, “Is this really part of the internal model? And if not can I filter it?”

Anima: 00:50:16

And that gives us inherent robustness in our visual perception. And we see the same happen also with new artificial neural networks. That feedback is able to handle different kinds of noise and corrections that were never seen during training. You can kind of have this ability to be much more robust to unseen correction, have Zero-Short generalization. We are now working on Few-Shot generalization benchmarks. And that’s a very promising I think aspect of taking inspiration from neuroscience, to build much more robust mechanisms for artificial neural networks.

Jon: 00:50:58

Nice. I’m going to try to in my own words repeat back some of what you said to distill it for the audience. Another big idea here is that we can use… Our understanding that in the brain we’re not just in real time taking in the visual information, but exactly like you said, through inferotemporal cortex, prefrontal cortex, we can loop and we can bring up visual imagery. And that if we can simulate that better in our artificial neural networks, in say our deep learning neural networks, we might be able to reduce the number of examples that we need to train a model.

Jon: 00:51:40

You talked about Few-Shot Learning or Zero-Shot Learning. And it sounds like we’re at a stage now where we’re addressing that Few-Shot idea where based on just a few training examples, this feedback loop kind of allows the neural network to simulate more real-world circumstances, or be more robust to circumstances it hasn’t seen before. Today if we think about the standard Feedforward Neural Network like an AlexNet architecture that we talked about earlier, it can’t do anything like that if it hasn’t seen examples exactly like… Well, not exactly like this. You can have some generalized but very limited generalizations. Over the last 10 years, deep learning networks generally generalize poorly to outer sample test images.

Jon: 00:52:34

The idea here is that by looping it is like in a biological brain, we could in fewer examples learn and maybe even have Zero-Shot learning. Actually like even an infant human can deduce things about the real world, and kind of know what output is going to result from a given input, without having had any training examples?

Anima: 00:53:04

Mm-hmm (affirmative). Absolutely. I think that’s at the core of [fatuity 00:53:10] with the current neural networks. If we can make them much more robust like humans too, and learn in a self supervised way, which is another important development. Instead of having explicitly labeled data, can you create supervision on their own? Can you create different kinds of the data transformations that we know are invariant to what’s in the image? Can you use that information? And there’s also evidence that babies when they’re trying to understand the world around them, moving their head, they’re trying to reason about occlusions, the foreground, background separation. This kinds of self-supervised learning could also be biologically motivated. The generalization aspect that we’re talking about is so important. One of course to make the AI product, but also for the sciences.

Anima: 00:54:03

At Caltech, we have the eye for science initiative that aims to bring AI into all scientific applications. But the primary challenge in so many of those is we require Zero-Shot generalizations. For instance if we are asking about discovering drugs, we want to make a molecule that’s probably never been done before. How do you reason about properties of molecules that are not well represented in the training data? And this is where we also need domain knowledge. It’s not just data driven, but there’s a lot of knowledge that’s available. For instance, with the molecule, we know Schrodinger. The famous equation that tells us about the energy of the molecule, and all the properties, how soluble it is, how toxic it is. And these are aspects that we need to bring in. One of the projects at Caltech we’ve been working with Tom Miller, who’s a professor of chemistry, and bringing in that domain knowledge and asking, “What are some features like molecular orbitals that we can compute cheaply?”

Anima: 00:55:14

Think of that as a domain specific creature. But combine that with graph neural networks, you’re getting a hybrid model that combines the best of domain knowledge, and the flexibility of neural networks. And that’s able to do the zero-shot generalization. You only need to train on small molecules say up to size 30, and that’s where getting training data for that is cheap enough. And then directly generalize to much larger molecules, 10 times or more. And that ability means we can now replace traditional methods with deep learning based methods that are thousands of times faster. You’ll get this amazing speed up and Zero-Shot generalization when we collaborate with domain experts. That’s one great example. Another one is in the domain of partial differential equations, which is at the core of so many scientific applications. Navier-Stokes is a famous one for modeling fluid turbulence.

Anima: 00:56:15

That it’s highly multi-scale, very expensive. That’s what if you’re doing for climate modeling, you would need supercomputers of the world to be able to simulate that because of the complexity of those calculations. And what we’ve seen is the ability of neural networks, what we call neural operator that is based on Fourier transform which is again a fundamental operation. These Fourier neural operators are able to replace traditional solvers, and get thousands of times speed up, and still [inaudible 00:56:50] the fidelity. And the cool thing is you can just use low resolution training data, and directly generalize to higher resolution evaluations. This form of Zero-Shot generalization or Few-Shot generalization becomes so critical in scientific applications. What we are building say with neuroscience, and getting inspiration hopefully has a broad impact in so many of these areas.

Jon: 00:57:18

That’s really cool. This generalization is obviously an important theme to you. Being able to take what might be considered narrow applications of a machine learning approach, and seeing how this generalizes to other domains. And I guess as you’ve mentioned artificial general intelligence, this is a theoretical algorithm that could learn in the same way as a person. The same diversity of things that an individual person could, and I guess we’re making steps in that direction. That you are specifically making steps in that direction by coming up with these kinds of ideas, Few-Shot learning, Zero-Shot learning. And by being able to apply them broadly across domains, this is increasingly generalized artificial intelligence.

Anima: 00:58:09

Absolutely. And the idea is to build strong foundations for that, and be able to carefully test what are the generalization capabilities? Scientific applications give a great way to test that, because we have ways to verify is this is drug-like molecule or not for instance. Or is this fluid simulation having physical properties we expected to have? There are other metrics we can evaluate with which is great. But also going back to how we are visually processing information, can we have good benchmarks where humans do very well, but AI methods is challenging? It just doesn’t over-fit to some examples and cheats its way to do well.

Anima: 00:58:57

We’ve created a benchmark called Bongard-LOGO, that’s in fact inspired by a classical Bongard challenge that was created to look at human cognitive reasoning abilities. But now we’ve made it ready for the deep learning age, and asking how well do these Few-Shot learning methods work on visual reasoning, and concept learning challenges? Something that’s very natural for us humans, how well can machines do? And again, having challenges like that also means that researchers can develop new algorithms and test them carefully. Because if we can’t test them, then you don’t really know their true abilities.

Jon: 00:59:42

That Bongard, I’m going to… How do you spell that?

Anima: 00:59:46

B-O-N-G-A-R-D.

Jon: 00:59:48

I almost guessed that. I almost said that, you heard me kind of say, “I’m going to guess.” And that’s what I was going to guess I promise. I wouldn’t lie to you.

Anima: 00:59:55

I believe you.

Jon: 00:59:55

That’s exactly how I was going to spell it but I didn’t. I can’t prove it. I’m glad that you believe me Anima. And if you’re ready for me to kind of move from these amazing ideas, tools like Bongard that allow us to evaluate increasing generalization of these AI approaches. Obviously the work that you’re doing is extremely fascinating, cutting edge research. It sounds to me from the discussion we’ve been having so far, obviously there’s some programming ability that people need to be doing this work. We talked about tools like PyTorch, MLOps tools like weights and biases. And I assume that kind of thing comes in handy when we’re working with the kind of scale that you’re working at, Distributed learning over many GPU’s, that kind of MLOps is hugely useful to ensure that we’re doing it efficiently.

Jon: 01:00:56

And I dare say it sounds like knowing the fundamental mathematics well. So linear algebra, you talked about partial derivative calculus, and that’s music to my ears. We already know about Spectral Learning on Matrices and Tensors, your book for learning about linear algebra. I’ve been working on this book Mathematical Foundations of Machine Learning, that focuses on linear algebra, and calculus, and probability theory, because I’ve seen this same gap where… I totally think that the most fun and interesting way to get into machine learning initially is by learning how to call things with high level APIs. Learning how to make PyTorch layers, seeing this amazing power of the way that a neural network can approximate inputs to outputs.

Jon: 01:01:46

But then you start to run into barriers. There’s limitations on what you can do by calling these kinds of high level functions, and that’s why I created this curriculum. It’s a Udemy course that I have in collaboration with SuperDataScience, and so people can check that out. But it sounds like those are all things that people need to be doing research like you do at NVIDIA or Caltech. What are other critical skills that people need to be working with you? What do you look for in people that you hire?

Anima: 01:02:17

Yeah. Those are all great resources Jon. And thank you for doing the very important work of making linear algebra, and these mathematical foundations accessible to the broader public. I think that combination is so critical, having that strong mathematical foundation with deep learning tools. And you want this mix of somebody who is not worried about getting their hands dirty, they’re hacking their way a little bit, they’re kind of using their intuition but not solely on that. You should also try and make sure the experiments are reproducible, try to reason about what is the principle that’s emerging here? Can we scientifically experiment and come up with some hypothesis and test them carefully? That’s I think that important aspect of having the strong theoretical foundation. I teach Foundations of Machine Learning course every winter at Caltech, and the videos are available online as well.

Jon: 01:03:18

Nice. Beautiful.

Anima: 01:03:20

That kind of foundation that I emphasize. And many times we start with just classical machine learning. For instance, PCA, we talked about. And the idea is, is it applicable to all real world applications? Certainly not, but understanding why it works, and where it doesn’t work is critical to build that foundation. And so taking the time to work on that, and then at the same time be not afraid to run experiments at scale, and be ready to do some engineering work as well. Because a lot of what we are doing is cutting edge. If we are for instance looking at these simulation tools, we want to build new simulations, or we want to build new Python bindings for that. There’s also those aspects, some of them could be a bit mundane, but you need that to bring the overall idea to fruition.

Anima: 01:04:16

Somebody who is a quick learner who is willing to take risks, but understand what the risk means. Because research is always about doing new things and that is risky. And having that balanced approach of really motivated, and positively thinking about taking these risks into the unknown, and at the same time enjoying the journey and asking, “Let me take a step back sometimes and think about what these experiments are telling me. Is there a theoretical reasoning behind that?” Even if I can do it only the linear edgy, is this something that is meaningful? Or maybe it’s not and there’s a reason behind that. That aspect of back and forth…

Anima: 01:05:06

The analogy I give is say chemists or biologists who go to labs and do experiments, that’s how we want to think of ourselves as AI researchers. You’re now going and peeking into neural networks, and you’re trying to tease out answers by designing good benchmarks, good experiments. And they should be reproducible, good software practices, but ultimately it’s also the scientific understanding you glean out of that. The ides is if you have the elephant and the blind man, they each only looking at small parts of the puzzle, you won’t get the big picture. You have to have the ability for the big picture, but enough focus to also get some of the details right.

Jon: 01:05:51

Do you want to elaborate on that analogy, the elephant and the blind man? Do you want to flush it out? I don’t know if everyone knows it. It’s a good one.

Anima: 01:06:00

Okay. Sure, the elephant and the blind man, the idea is if the blind man just figuring out parts of the elephant, they don’t understand what this overall being is. If you’re too focused on some low level details but you miss the big picture, you don’t have understanding of other parts of the framework, then you can be in trouble. The ability to both zoom in and zoom out is so important.

Jon: 01:06:29

Yeah. Exactly. It’s something like this one blind man that’s feeling the trunk of the elephant, and you deduce, I don’t know, it’s like a snake or something. I can’t remember how the story goes. Or if you’re feeling the tail and you think it’s a fluffy animal. I don’t know.

Anima: 01:06:46

Yeah. I don’t remember [crosstalk 01:06:47].

Jon: 01:06:47

It’s too hard to do. I don’t know. I can’t memorize things like that. But perfect, the analogy is understood. That is beautiful insight and I love all those topics. It shows just how fascinating the work that you’re doing is. All right, I asked on LinkedIn a week ago before we recorded, if people had any questions that they’d like to ask you. And there were tons of really great questions there, maybe you’ll have a chance to answer some of them asynchronously. Because we’ve already been talking for so long and I know that I’ve already made you run over a meeting. I know that Anima is already going to have to go and apologize to somebody because I didn’t tell her what time it was, and how long we’ve been recording for.

Anima: 01:07:30

Because we are having such a great time here.

Jon: 01:07:32

I’m being very selfish. I’ll just have one question which is from Noah Gift, who is actually a recent guest on the episode, on episode 467. He has a lot of experience with Caltech and he said, “Do students at Caltech still play ultimate Frisbee at lunch? This is how I originally learned Python, was while playing ultimate Frisbee. Maybe they learn AI, ML this way too.” Do you know anything about all this ultimate Frisbee being used to learn AI, ML concepts?

Anima: 01:08:05

Yeah. Absolutely. Caltech has such a great mix of nerd with sports. Even before the pandemic, we had all these outdoor blackboards so people can kind of write math equations, discuss a bit, take a ball, play, comeback. We really encourage a great atmosphere for learning, and of course the California weather also most of the time cooperates with us in that aspect. But that’s a great memory to think of.

Jon: 01:08:43

That’s very cool. I didn’t know anything about that and I guess I’m going to have to visit Caltech at some point. And then the final question for you which is one that we always ask is a book recommendation. Obviously we already have your Spectral Learning on Matrices and Tensors. That sounds like an awesome book for digging into linear algebra, and tensor operations, do you have any other recommendations for us?

Anima: 01:09:06

Yeah. I’m really excited about the new book that you can preview online called Learning Deep Learning or LDL, by Magnus Ekman, who’s also a fellow NVIDIAn. And I was very happy to write a forward for it, because what I saw was just such a great way to get started with deep learning. Having a firm understanding of the principles like we discussed before, but also program code that’s efficient, that’s runnable, that’s something you can do hands on. And this is such a great mix and starts from the starting foundational principles of deep learning, but goes to the latest frameworks such as GANs, language, models. It’s very up to date while also making sure the fundamentals are present.

Jon: 01:09:53

It’s a beautiful recommendation. And given that that book has the same acquisition’s editor as I do, Debra Williamson Pearson, I know that she will be delighted to hear that you make that recommendation as well. And I hear that you wrote the forward to it so I’m sure Magnus and Deborah very much appreciate that as well. All right, Anima, this has been such an amazing episode, I learned so much. I wish I could keep you here for hours more and ruin your calendar for the rest of the day, but I will let other people get to you. Thank you so much for being on the program and hopefully we’ll get a chance to have you on again soon.

Anima: 01:10:25

Thanks a lot Jon. I really enjoyed it and I also got to learn so much. This was such a pleasure.

Jon: 01:10:37

I almost never get nervous about anything in my life, but I must admit that I was nervous in the weeks leading up to filming this episode, because Anima is such a massive name in the machine learning world. As it typically turns out with almost everything we worry about, there was nothing to be anxious about at all. Anima was so personable, down to earth, and fun. I absolutely loved filming this episode with her and I hope you enjoyed it as much as I did. Anima engrossed us with so many fascinating topics today, including invaluable open-source projects she works on at NVIDIA, like TensorLy for applying linear algebra operations, and Minkowski for working with sparse tensors. Her favorite data science tools include PyTorch Lightning, AMP, and weights and biases. We talked about the importance of linear algebra, calculus, software engineering, experimentation, and being willing to take risks in order to be an AI researcher at institutions like NVIDIA and Caltech.

Jon: 01:11:36

And we talked about how cutting edge GPU’s, the clever application of linear algebra within software, and ideas from biological neuroscience are allowing machine learning to become more and more generalizable. Enabling AI models to learn from far fewer training examples, and to be much more robust to data they haven’t encountered before. Amazing, this is definitely an episode I’ll be listening to myself because there’s so much to learn. I can’t imagine I took it all in during filming. As always you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, and URLs for Anima’s LinkedIn profile, as well as my own social media profiles, at www.superdatascience.com/473. That’s www.superdatascience.com/473.

Jon: 01:12:24

If you enjoyed this episode, I’d of course greatly appreciate it if you left a review on your favorite podcasting app or on YouTube, where we have a video version of this episode. To let me know your thoughts on the episode please do feel free to add me on LinkedIn or Twitter, and then tag me in a post to let me know your thoughts on this episode. Your feedback is invaluable for figuring out what topics we should cover next. All right. Thanks to Ivana, Jaime, Mario, and JP, all the SuperDataScience team for managing and producing another amazing episode today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 473: Machine Learning at NVIDIA

SDS 473: Machine Learning at NVIDIA

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

November 14, 2025

November 11, 2025

Podcasts SDS 473: Machine Learning at NVIDIA

Share

SDS 473: Machine Learning at NVIDIA

Podcast Transcript

Share on

Related Podcasts

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025

November 11, 2025

SDS 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta