Welcome back to the Five-Minute Friday episode of the SuperDataScience Podcast!
Continuing with last week’s FMF theme, Jon updates listeners on another natural language processing breakthrough –the DALL-E 2 model from OpenAI.
Last week’s Five-Minute Friday episode featured Jon reviewing Google’s PaLM, and this week, he’s back with OpenAI’s natural language model DALL-E 2.
When the first iteration of the DALL-E was released last year, Jon could not stop talking about its outstanding results! Named after the robot WALL-E from the eponymous Pixar film, that model used a smaller version of the well-known GPT-3 model.
While GPT-3 is trained purely on language, DALL-E is trained on a dataset of text that describes specific images. This makes DALL-E multimodal: It is both an NLP model and a machine vision model. This multimodal functionality enables DALL-E to churn out staggering visual examples of whatever text your mind can dream up. Want an illustration of a baby shark in a tutu serving ice cream? Provide that as an input to DALL-E, and it returns countless examples of precisely that bizarre illustration. Want to see a teapot shaped like a Rubik’s cube? Again, just ask DALL-E, and voilà, you’ve got it! Want to see examples of cameras from every decade of the 20th century? Even temporal information like this is encoded in DALL-E, so no problem.
But all of that incredible text-to-image functionality was already available in DALL-E last year, so what’s improved in the brand-new DALL-E 2 model? Well, to start, it’s simply more proficient at the same kinds of tasks:
- It has 4x greater resolution, resulting in larger, more realistic-looking images.
- In comparisons judged by human evaluators, DALL-E 2 was also preferred over the original DALL-E 72% of the time for caption-matching and 89% of the time for photorealism.
But DALL-E 2 is not just better at the same kinds of tasks; DALL-E is also capable of accepting images as part of its input and so is capable of entirely new kinds of tasks. Finally, DALL-E 2 can receive any given image as input and create variations inspired by the original. They’ll be roughly consistent with the original in terms of both style and composition, but the variations will all be unique and never seen before.
Tune in to this week’s episode to learn more about DALL-E 2.
ITEMS MENTIONED IN THIS PODCAST:
DID YOU ENJOY THE PODCAST?
- What amazes you the most about DALL-E 2? What real-world applications can it already address for you?
- Download The Transcript