SDS 568: PaLM: Google’s Breakthrough Natural Language Model

Podcast Guest: Jon Krohn

April 21, 2022

Welcome back to the Five-Minute Friday episode of the SuperDataScience Podcast!

This week, Jon updates listeners on one of the industry’s biggest breakthroughs to date –Google’s new natural language processing model, PaLM.

 

On April 4th, Google announced a new large natural language model called PaLM — short for Pathways Language Model — that is a truly remarkable breakthrough.
On 28 of 29 English-language natural language processing tasks they purport to have tested, they achieved state-of-the-art results — defeating well-known large language models of recent years like GPT-3 by sizeable margins in many cases. These tasks span a broad range of capabilities from question-answering to sentence-completion and from common-sense reasoning to natural-language inference. PaLM even performed well on multilingual NLP tasks like translation between languages despite only 22% of the training data being composed of non-English language.
Not only that, but PaLM goes beyond natural language alone. It can also solve programming questions like converting C code to Python code, or taking a natural-language-only problem like you might see in an undergraduate computer science course and responding with working code as a solution. And all of this happens despite having 50 times less Python code in its training data relative to previous state-of-the-art programming-language models.
So, how does PaLM achieve these feats? First, PaLM is quite a large model, with 540-billion parameters, which is about three times more than GPT-3, but it’s also only about a third of the size of the largest model that Jon’s aware of today, China’s WuDao 2.0. The single biggest innovation within PaLM is that it leverages Google’s Pathways approach to training large language models, which allows for concept-specific modules to be shared by myriad different computational pathways within the model.
What’s next for PaLM? Tune in to find out what the future has in store when it comes to natural language processing breakthroughs. 

 ITEMS MENTIONED IN THIS PODCAST:

DID YOU ENJOY THE PODCAST?

Podcast Transcript

(00:05):
This is Five-Minute Friday on PaLM: Google’s Breakthrough Natural Language Model.

(00:19):
On April 4th, Google announced a new large natural language model called PaLM — short for Pathways Language Model — that is a truly remarkable breakthrough.
(00:30):
On 28 of 29 English-language natural language processing tasks they purport to have tested, they achieved state-of-the-art results — defeating well-known large language models of recent years like GPT-3 by sizable margins in many cases. These tasks span a broad range of capabilities from question-answering to sentence-completion and from common-sense reasoning to natural-language inference. PaLM even performed well on multilingual NLP tasks like translation between languages despite only 22% of the training data being composed of non-English language.
(01:03):
Beyond these established natural-language tasks, Google also tested PaLM on a new benchmark suite called Beyond the Imitation Game Benchmark (or BIG-bench, for short). These BIG-bench benchmarks span 150 different natural language tasks that are particularly tricky as they require machines to, for example, distinguish cause and effect, understand context-specific language, and guess movie names based on emoji. Not only did PaLM outperform prior machine models on many of these tasks, but it also exceeded the capability of an average human on some of them — although it still fell well short of expert human capability.
(01:43):
As a staggering example of PaLM’s capacity, it can even explain brand-new jokes that it couldn’t possibly have learned from its internet-based training data. Here’s one such a novel joke that PaLM was prompted to explain in natural language: Did you see that Google just hired an eloquent whale for their TPU team? It showed them how to communicate between two different pods!
(02:07):
Alright, so PaLM responded with an outstanding explanation of that joke; here it is: TPUs are a type of computer chip that Google uses for deep learning. A “pod” is a group of TPUs. A “pod” is also a group of whales. The joke is that the whale is able to communicate between two groups of whales, but the speaker is pretending that the whale is able to communicate between two groups of TPUs.
(02:29):
Wow! That’s crazy! That is really amazing that it’s able to understand the intention behind these jokes. Well, not to literally “understand” it but to be able to explain it. Not only that, but PaLM goes beyond natural language alone. It can also solve programming questions like converting C code to Python code, or taking a natural-language-only problem like you might see in an undergraduate computer science course and responding with working code as a solution. All of this happens despite having 50 times less Python code in its training data relative to previous state-of-the-art programming-language models.
(03:08):
So, you may wonder, how does PaLM achieve these feats? First, PaLM is quite a large model, with 540-billion parameters, which is about three times more than GPT-3, but it’s also only about a third of the size of the largest model I’m aware of today, China’s WuDao 2.0. The single biggest innovation within PaLM is that it leverages Google’s Pathways approach to training large language models — which allows for concept-specific modules to be shared by myriad different computational pathways within the model — and PaLM did this at a hitherto untested parameter scale. To repeat that, the key innovation within PaLM is scaling up this powerful Pathways modeling approach to half a trillion parameters — many-fold more parameters than had previously been trained using the Pathways approach. And that also explains how PaLM — again, an abbreviation of Pathways Language Model — got its name.
(04:06):
What’s next? Well, according to Google’s results at various orders of magnitude, it appears that PaLM could benefit from trillions of parameters or more. Google haven’t disclosed what they’re going to do next with this model, but given how large language models like PaLM reliably exhibit more and more emergent behaviors the larger they become, I’d say it’s a good bet that we’ll be hearing about mind-blowing new PaLM feats soon from a new variation that has over a trillion parameters. As for what you can do next, check out the link in the show notes to read more examples of incredible linguistic inference, cause-and-effect, and coding capabilities that PaLM has today.
(04:43):
All right, that’s it for this Five-Minute Friday episode. Keep on rockin’ it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 
Show All

Share on

Related Podcasts