Kirill Eremenko: This is episode number 271 with the legend of visual journalism, Alberto Cairo.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur, and each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
Kirill Eremenko: This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. We’ve got over two and a half thousand video tutorials, over 200 hours of content and 30 plus courses with new courses being added on average once per month. You can get access to all of this today just by becoming a SuperDataScience member. There is no strings attached. You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership, you get access to any new courses that we release plus all the bonuses associated with them. Of course there are many additional features that are in place or are being put in place as we speak, such as a slack channel for members where you can already today connect with other data scientists all over the world or in your location, and discuss different topics such as artificial intelligence, machine learning, data science, visualization and more, or just hang out in the pizza room and have random chats with fellow data scientists.
Kirill Eremenko: Also, another feature of the SuperDataScience platform is the office hours, where every week we invite valuable guests in the space of data science and interrogate them about their techniques, about their methodologies in the space of data science, and you actually get a presentation from the guest and you get an opportunity to ask Q&A at the end. In some of our office hours, we just present some of the most valuable techniques that our hosts think are going to be valuable to you. All of that and more you get as part of your membership at SuperDataScience, so don’t hold off, sign up today at www.www.superdatascience.com, secure your membership and take your data science skills to the next level.
Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen. Super, super pumped to have you back here on the show today because the guest for today, I’ve been hunting this man down for months. We’ve been inviting Alberto or trying to get a spot in Alberto’s super busy schedule for months now, and finally it’s happened. I just got off the phone with nobody else, but Alberto Cairo, and we had an amazing, amazing chat about data visualization. If you’re not familiar with who Alberto is, Alberto is a journalist, he’s a speaker, an author. He’s also the Knight chair in visual journalism at University of Miami. The knight chair means that he’s endowed by the Knight Foundation, which recognizes and puts certain journalists into leading positions as tenure professors in academia. There’s only a handful of Knight chairs in the US, maybe a couple of dozen, and Alberto Cairo is one of them.
Kirill Eremenko: All of these credentials should speak for themselves as to what kind of calibre of a journalist and data visualization expert Alberto is. He’s presented at numerous conferences and he’s actually published two books already. You might be actually familiar with them. The first one is called The Functional Art and Introduction to Information Graphics and Visualization, came out in 2012. The second one is The Truthful Art: Data, Charts, and Maps for Communication, came out in 2016. What’s exciting is that Alberto’s third book is coming out, it’s called How Charts Lie: Getting Smarter about Visual Information. It’s coming out in October this year, October 2019, and you can actually already pick it up on pre-order. We talked about Alberto’s book and you get some very useful insights from this book for your visualization practices, and also for understanding visualizations better.
Kirill Eremenko: Plus, we talked about plenty of other things on this podcast. Here’s a couple of teasers of what you’re about to experience. Why do people misinterpret visualizations? The Simpson’s paradox, the ecological fallacy, four kinds of literacy, being conscious about visualizations, exploratory data analysis versus communicating results, how to design effective visualizations, and ethics in data visualization. Those are just a few topics that we touched on. As you can imagine, it’s going to be a value packed podcast. Without further ado, I bring to Alberto Cairo, the legend of data visualization.
Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen. Today I’m super excited because I’ve got a legendary guest on the show, Alberto Cairo calling in from Miami. Alberto, how are you going today?
Alberto Cairo: Hey, doing good. How are you?
Kirill Eremenko: I’m doing very well, and super pumped to talking to you. I watched your presentation at Microsoft yesterday as we were chatting just before the podcast, and my God, you have some very interesting approaches to visualization. I’m very excited to dig into these today.
Alberto Cairo: Likewise. Thanks for having me.
Kirill Eremenko: Yeah. No, pleasure’s mine. How is Miami this time of the year? I saw on your Twitter feed that you’re spending … you’re finally taking some time away from all the presentations and conferences, and I guess spend some time with family. Are you looking forward to that? How’s that going to be?
Alberto Cairo: Oh yeah, I’m so super looking forward to that. One thing that I usually joke about Miami is that I am originally from Spain, from Northwestern Spain, a region called Galicia, and Galicia is very rainy and dark and windy and cold, and Miami can be rainy sometimes particularly during the summer because clouds build up during the day and you get a downpour at the end of the day, but most of the time is warm and sunny. I got used to this weather very quickly and I love it here, and I’m looking forward to those three months of staying at home, no trouble. But I will have tons of work. I mean I’m not planning to basically rest, so will be working on tons of stuff. It’s only that I can do it in my backyard, next to the swing, which is the luxury that I have.
Kirill Eremenko: Yeah, no, that’s very exciting. But it doesn’t get too hot in … I’ve only been in Miami briefly and then I went to Florida Keys. I was wondering, it doesn’t get too hot? Because in Spain, for instance, in summer, last year, I think it was like 37 degrees Celsius or something like that.
Alberto Cairo: Oh yeah. If you go to the south of Spain, you can get to 40 degrees Celsius or even more, 40, 42. Miami doesn’t get that warm. However, what happens is that you have crazy humidity. You need to hydrate all day basically. But if you do that, you’re fine. I mean, if you always carry water with you, which is advisable, then you’re fine. But you need to like this kind of weather. I mean, if you are a cold weather person, you will suffer mightily, mightily here. But I’m a warm weather person, so I really enjoy Miami.
Kirill Eremenko: Yeah, yeah. I understand. Indeed, it’s really humid. As soon as you get out the plane, you start sweating like crazy.
Alberto Cairo: Yeah. Exactly, yeah.
Kirill Eremenko: Which part of Miami?
Alberto Cairo: I live in a neighborhood called Kendall, which is in Southwestern Miami. I am not close to the coast, to Miami beach. I’m closer to the Everglades, which is the large natural park, the swamp. It over here. I usually joke that I’m closer to the alligators than I am to the dolphins.
Kirill Eremenko: Or the sharks.
Alberto Cairo: Or the sharks, yes.
Kirill Eremenko: Okay, got you. Okay. Well, very cool. Very excited for your time in Miami for the next few months to have a rest. A well deserved rest because as we were chatting before the podcast, you’ve got your third book coming out in October. Once that happens, you’re going to be on the move going to conferences pretty much every day. As you said, you can see it as a problem or as a huge opportunity.
Alberto Cairo: Yeah. It’s a problem-
Kirill Eremenko: How are you feeling?
Alberto Cairo: Yeah, it’s a problem or an opportunity. Yeah, the book that comes out in October, it’s actually my first book for the general public. The title is How Charts Lie, although perhaps a more appropriate title would be how we lie to ourselves with charts. The way that it is written, it’s very informal, very nontechnical. It’s an introduction to how to become a better reader of charts. Not a better designer, but a better reader because it’s for the general public, it’s not for designers. It’s how to correctly interpret all the line charts and bar graphs and data maps that we see every day in social media and the news media, how to extract the right meaning from them. I don’t know. Perhaps it will … I don’t know. It will sell well, it will attract lots of attention. Who knows? Yeah. I already have several speaking engagements lined up for the fall in relationship to it, just to help with the promotional efforts.
Kirill Eremenko: No, that’s very exciting, and totally agree that a book for the general public, well, especially from somebody of your level in the space of visualization, it’s necessary because there’s people who want to hear from you, but maybe they’re not technical, they don’t have the technical background to understand certain concepts or to keep up with certain concepts. A book for the general public I think is a great idea. What are some of the main things that you color off in this book? What are some of the main themes?
Alberto Cairo: Yeah. What I did in the new book was to basically ask myself, if I had not learned anything myself, about data visualization by studying or practicing it, what are the most elementary skills or pieces of knowledge that I need to have in order to be a critical, not designer, but a critical reader of these kinds of products in news media? Right? Obviously, I cover things such as the main principles of data visualization that you can read about in any more technical books, like the ones that I wrote in the past, such as the Truthful Art, for example, right?
Alberto Cairo: Principles such as visual encoding, what is visual encoding? Right? Visual encoding basically is getting your data and then mapping your data onto objects, and then changing some properties of those objects in proportion to the data that you’re trying to represent. It could be the length of the object or the height of the object or the color of the object and so on and so forth. Those properties we call them encodings. Right?
Kirill Eremenko: Mm-hmm (affirmative)
Alberto Cairo: In the past I taught these skills to people who wanted to work in data visualization. What I do in the new book is to try to explain these very elementary principles to people who are not going to be graphic designers or visualization designers or data scientists, but who are going to be consumers of those kinds of products. So they need to be prepared to read them correctly, and in order to read them correctly, you need to understand data visualization at the symbolical level, so understanding the principle of mapping data onto objects at the grammatical level, meaning that you need to learn about encodings. In the third level, which is the core of the book actually, it’s the semantics level.
Alberto Cairo: Once you are able to understand the mechanics of a graphic, how to read it, right? Then you need to be able to interpret it, right? It’s at the semantics level. What is the information that that graphic is caring, how to extract the right insights, or the right inferences from the chart that you are seeing. I think that these skills are of greater value for anybody. Right? The problem is that the literature about data visualization, and this includes my own previous books, they are aimed too much at people who want to specialize in the field. We don’t really share some knowledge, right? We have basically the same, similar levels of knowledge. Right? There are challenges that … Basically what is happening is that there is an increase in the sophistication that visualization designers have, but there is not the same increase in sophistication in the readers who consume these types of data visualizations, right?
Alberto Cairo: There’s a growing gap between, let’s say, the communities made of visualization designers, data scientists, statisticians, et cetera. We are developing new methods every day, we are making all these fields advance very quickly and improve very quickly and create new tools and so on and so forth, but the general public is falling behind, right? My interest in the past few years has been, how can we help the general public bring themselves up to speed with all these new techniques? Obviously, I cannot write about data science. I’m not a statistician, I’m not a data scientist, but I’m a visualization designer. I asked myself, what can I do to help my dad, for example, who’s a medical doctor, not trained in statistics, not trained in data visualization, what can I do to help my dad bring himself up to speed with data visualization?
Alberto Cairo: I wrote the book that way. If I had to explain to a nontechnical person what data visualization is about, why it is so important, why it can be so powerful, but at the same time how dangerous it can be as well, if you’re don’t use it correctly. How would I write that book? That’s the frame of mind that I put myself into to write this new book.
Kirill Eremenko: I totally understand. I like how you say in your talks, that good data visualizations have two really powerful qualities, that they’re persuasive, and they’re memoral right? If you see a good visualization, not only understand what hopefully, probably understand properly, always communicate property, what’s the underlying insights are, but also you’re able to memorize it because it’s an image and you can see it in your head and you can maybe describe those insights later to somebody else. I think perhaps those are the two reasons why more and more publications, such as the Wall Street Journal, New York Times and so on, they’re moving to visualization.
Kirill Eremenko: The amount of info graphics and visual representations of information, whether it’s about elections or about population statistics or about crime rates and things like that. The amount of info graphics out there is crazy, and now they’re getting interactive and they’re getting more and more exciting and interesting on these publications. That’s very interesting.
Alberto Cairo: There is a reason. There is a reason for this increase, which is that if you ask people who work in data journalism departments or graphics departments in news publications such as the ones that you mentioned, Wall Street Journal or 538 or the New York Times or ProPublica or many others, the Financial Times, all of these publications are considered the gold standard in using data visualization in the news. They will all tell you the same thing, which is that if our data visualization is well designed, and it covers a topic that the public is interested in obviously, it will become extremely, extremely, extremely popular. I mean, some of the most popular pieces of content published in the past decade by some of these media publications have been data visualization.
Alberto Cairo: The most popular, and this is a factoid that I usually talk about in some of the talks in relationship to the new book, How Charts Lie, one of the things I say is that the most popular piece of content ever published by the newyorktimes.com, the New York Times is the most important, most serious newspaper in the United States, and one of the most important newspapers in the world, the most popular piece of content ever published by the New York Times online is a data visualization.
Kirill Eremenko: Oh wow.
Alberto Cairo: It’s a data visualization that is commonly called … Yeah, it’s commonly called the dialect map. You can Google it up. The dialect map, New York Times. The actual title is How You, Y’all and Youse Guys Talk, or something like that. I don’t remember exactly what the title is, but everybody knows it as the dialect map. Basically it’s a tool that asks you several questions. How do you pronounce this word in English? Or how do you refer to this particular phenomenon or this particular animal in English? What word do you use for that? Based on your responses to the questions that are posed to you, basically what you start seeing is a bunch of maps that predict where you’ll probably live or where you’re from, right? Based on some of your-
Kirill Eremenko: In the United Sates, right?
Alberto Cairo: In the United States, although recently they created a version for the United Kingdom.
Kirill Eremenko: Oh wow.
Alberto Cairo: Yeah. It’s a lot of fun. That project is the … The reasons why this project is so popular or viral it has to do with how interesting the topic is, but also because it’s a visually … it’s a visual tool, right? And it is so well designed and so well done, and it’s the most popular piece of content ever published by the Newyorktimes.com.
Kirill Eremenko: Why would you say people like visualization so much?
Alberto Cairo: Well, I mean it appeals to us, visualization, because first of all, it’s visual and we are visual creatures. We prefer to see things rather than to read things. We’ve basically evolved to be visual creatures. I mean, a huge part of our brain is devoted to processing visual information. Then another version of a data visualization is that, as I mentioned before, I mean, it’s persuasive and it’s memorable when it is very well designed. The way that I usually put this in talks and in the new book is that if I did a visualization which was well designed and it reveals certain insights coming from the data, once you see those insights, you can not unsee them anymore. Basically they stick to your brain. It’s like they are very memoral. That’s another reason. Visualization is much more memorable if it is well designed, right? Sometimes than text alone, right?
Alberto Cairo: By the way, visualization is not just visualizing things, visualization is very often the combination of visuals with words that supplement those visual, right? The best data visualizations are usually combinations of visual optics with words that reinforce each other. We call the annotation layer in the world of data visualization. They are also beautiful objects, right? We human beings like beauty, beautiful, to see beautiful objects and enjoy, right? Good visualizations are highly enjoyable. There’s maybe a bunch of reasons. This may be just some of them, a few of them.
Kirill Eremenko: They trigger an emotion, right? Like that example-
Alberto Cairo: Yeah, yeah, absolutely.
Kirill Eremenko: … that you gave about the hockey stick, right?
Alberto Cairo: They can be joyful, right? As very common … we say commonly these days, they may spark joy, right?
Kirill Eremenko: Yeah.
Alberto Cairo: They-
Kirill Eremenko: Or they can terrify, right? You have that-
Alberto Cairo: They can terrify you. I mean, they can terrify you, they can surprise you, they can … I don’t know. They can be emotional. The same way that a good text can be, right? Texts can also elicit emotion sometimes, but there is something more visceral, something more direct in the use of visual objects to do that.
Kirill Eremenko: Yeah. Therefore, because a visual creates this imprint and creates … As you said, if you see it, you cannot unsee it, it’s a bit dangerous or sometimes sad when visuals are, as you put it, either misused or misinterpreted, and people see the wrong thing or are shown the wrong thing and therefore now they cannot unsee the wrong thing, and that creates a whole [crosstalk 00:20:46]
Alberto Cairo: It’s so persuasive, so powerful that they can overpower you, all right? They can basically become means that controls your thoughts. That’s a whole another reason why I wrote this new book, right? To basically warn people about how careful we need to be when reading visualizations, right? There are many examples of that. One example that I have in the book is … which I use by the way to explain one of the core principles of reading data visualization, which is that when you see a data visualization, one of the key things that you need to do is to come up with the right description of what you are seeing, right? I do this as scatter plot, which I borrow from a friend of mine, Heather Cross, who is a statistician. It’s a scatter plot that shows the positive association as a positive correlation between cigarette consumption and life expectancy, country by country. When you take a look at the country level data, the association between cigarette consumption per capita and life expectancy is positive, right?
Kirill Eremenko: Wow.
Alberto Cairo: Imagine this scatter plot. Now the way that I, that you would describe, that we commonly describe that kind of chart, and I know this because I have done this myself, is to say if you see the x-axis, cigarette consumption per capita and the y axis, the vertical axis, life expectancy per capita, and you see that one of them is positively correlated with the other, the way that we usually describe that kind of chart is, the more cigarettes we consume the longer we live, right? But that is not the right description. If you describe the chart like that, you are biasing people’s perception of that chart and you are biasing your own perception of that chart. Because what you’re only maybe considering is that you’re looking at the data aggregated at the national level, and that can be very dangerous because it could be an example of a Simpson’s paradox, right?
Alberto Cairo: The phenomenon that data that gets aggregated at certain level may display patterns that may disappear or reverse completely once you disaggregate the data at lower levels of aggregation. It’s a perfect example to explain these phenomena. I do this in the book. Because once you disaggregate the data at the regional level, at the local level, and you go down to the individual level, you will see that the relationship that was positive before, more cigarettes more life expectancy, reverses completely; more cigarettes, less life expectancy. Why the reversal? The reversal is related to wealth, right? The wealthier a country is, the more cigarettes people in that country can consume. The wealthier a country is, the more cigarettes per capita you have. But at the same time, the wealthier a country is, the higher the life expectancy is as well, because people can pay for better health care, right?
Alberto Cairo: Basically what you’re seeing there is a spurious correlation between the … Well, it’s not really spurious. The correlation really exists, but it only exists at the national level, not at the individual level, which is the level that you are interested in. If you want to know, for example, whether smoking cigarettes is good for you, you should not look at data at the national level because the correlation that you see at the national level may not reproduce at the individual level.
Kirill Eremenko: Got you. At the national level, every point on the chart is a country, at the individual level-
Alberto Cairo: Exactly. Every [inaudible 00:24:10] at individual level. So the x-axis is cigarette consumption, the y-axis will be life expectancy, and just see a positive association. The more cigarette … the bigger the cigarette consumption is, the further to the right a point is, the further up the point needs to be as well.
Kirill Eremenko: Yeah, yeah. No, that’s very interesting. Or that other … There was another example in one of your talks I had in my mind just now, that had the same thing that if you … it depends on how you interpret it, right? How you … Oh, the chocolate and Nobel prize winners. That the example.
Alberto Cairo: Yeah, the chocolate and [crosstalk 00:24:51]
Kirill Eremenko: Can you tell us about that?
Alberto Cairo: Right.
Kirill Eremenko: I love that example.
Alberto Cairo: Yeah, that’s an example that I don’t use in How Charts Lie. I use it in the previous book, in The Truthful Art. Basically it’s like, if you take a look at a scatter plot, it’s a very similar. Imagine a scatter plot at the national level, each dot is a country. Then on the x-axis, you plot a chocolate consumption per capita. So the farther to the right a country is, the more chocolate per capita that country consumes, and then the further up on the y-scale, on the vertical axes, the larger the number of Nobel prizes per ten million people you have. There is a very strong positive association, is correlation. It’s linear. It’s a linear association between chocolate consumption per capita and Nobel prices per capita … per ten million people, right? The more chocolate consumption … The bigger the chocolate consumption, the bigger, the larger the number of Nobel prizes.
Alberto Cairo: But obviously you cannot enfigure that there’s a relationship between those two things. That’s the first thing, right? The classic correlation is not causation, right? But we need to go beyond that, right? The correlation is not causation is a mantra that we have been repeating for decades now, and it’s basic knowledge, it’s an elementary knowledge. We need to keep repeating it because it’s very easy to infer causation based on some mere correlation, but we need to go beyond that, and that’s what I need to … I try to do in the new book; explaining concepts again such as a Simpson’s paradox or the ecological fallacy, right? That the ecological fallacy being inferring something about yourself, for example, based on data that is aggregated at the national level or the regional level, right?
Alberto Cairo: You cannot infer something about yourself, whether cigarette consumption is good for you, right? Individually, based on data that you’re seeing at the national level, right? Because there may be confounding variables that you’re not taking into consideration, for example, wealth in this particular case. I am emphasizing all of these examples so much in our conversation today and also in the new book, because this is a mistake that I have made myself, because I was careless about data, right? Describing the cigarette consumption chart or life expectancy chart as the more we smoke, the longer we live. Well, that’s not true. The way to describe a scatter plot showing the positive as a correlation between cigarette consumption and life expectancy would be to say that there is a positive association between cigarette consumption and life expectancy, but that doesn’t mean that one of the variables causes the other, and this relationship may disappear once we start disaggregating the data.
Alberto Cairo: We need to warn people about these kinds of phenomena when we present it to them. At the same time, a reader of charts need to be prepared not to just look at the graphic and move away, but to read the graphic carefully and think about the chart because if you don’t pay attention to the chart, right, you will probably be misled by the chart, you will struck the wrong inferences from it. Charts, maps, graph, et cetera, they are not meant to be seen, they are meant to be read like a piece of text. You need to read them and think about them carefully. Right? Otherwise, you would probably be misled by them.
Kirill Eremenko: Got you, and I really like what you say about why people misinterpret charts and how we can … what is missing in that puzzle. When you talk about the four kinds of literacy, so the normal literacy as in reading, the one we’re used to-
Alberto Cairo: Reading and writing.
Kirill Eremenko: Yeah. Articulacy, numeracy-
Alberto Cairo: Articulacy.
Kirill Eremenko: Numeracy and the graphicacy. Do you mind telling us a bit about those?
Alberto Cairo: Sure. Sure, sure.
Kirill Eremenko: What are the last two?
Alberto Cairo: Yeah. These are not terms that I have invented. They haven’t been around for many, many years. I learned about all these in books such as Innumeracy, which is a very famous book about how to interpret numbers correctly, and also a book called Mapping It Out, by a cartographer called Mark Monmonier. In Mapping It Out, Monmonier says that, and I agree with that, that in order to consider a source, educated citizens nowadays, we need to be able to do more than just merely read and write. That’s basic literacy, right? We need that obviously.
Alberto Cairo: We cannot abandon that obviously. But we also need articulacy, which is the ability to express ourselves correctly through spoken words. On top of that, we need numeracy. Numeracy is basically the elementary skill, being able to think critically about numbers. I usually equate it, compare it to some sort of sixth sense in the back of your brain, that it starts ringing when you see a number in news media that doesn’t sound right.
Kirill Eremenko: Like a BS meter, a bullshit meter.
Alberto Cairo: Yes. Yeah, but it’s not conscious. It’s sort of a sixth sense, that you see a number in the media and say, “There is something dubious about this number. There’s something wrong about it. I don’t know what it is, but it doesn’t sound right.” That’s a numeracy at work. Numeracy is a skill that can be developed. You can be educated in that, right? You don’t need to become a statistician or data scientist to have elementary numeracy. Right? Obviously if you want to become really, really numerate, it is better if you formally study statistics and data science. But I’ve come to believe that any regular citizen, like myself, I’m not a statistician, I’m a journalist and a graphic designer. I have come to believe that any citizen can educate themselves, herself or himself in basic numeracy.
Alberto Cairo: Then on top of that, you have a graphicacy, which is graphical literacy, right? The ability to interpret, to read and interpreted correctly maps and charts and graphs and any sort of visual that represents the numbers, right? How to extract the right meaning from them, and it all begins with attention. You need to basically put yourself in the frame of mind that says that what you’re seeing is not an illustration, is a visual argument. In order to understand that visual argument, you need to pay attention to it, right? Then you need to apply some elementary principles of chart reading that I explain in the book and in talks, et cetera, such as don’t read too much into a chart. A chart shows only what it shows and nothing else, right? Because we tend to project what we want to believe onto the charts that we see every day in news media and that’s very, very dangerous, right?
Alberto Cairo: Double check the sources. Where did the data come from? Right? You need to ask yourself whether the numbers that are displayed on the chart are measuring what they say that they are measuring. This is another critical thing to do sometimes, right? So is it measuring the right thing, and what methods were used to measure these particular phenomena? Right? These things don’t take longer than five or 10 minutes, and they can take you a long way to avoid most of the cases in which you can be misled by a chart that you see in news media.
Kirill Eremenko: Yeah. I really liked a lot your principles of graphical literacy. So definite, it’s not something that is taught at school. If you don’t mind, let’s go over them. I think they’ll get a lot of value. Maybe starting with the foundational one that you call as number zero, is your data measuring what you think is measuring?
Alberto Cairo: Yeah, [crosstalk 00:32:14] measuring what you think their measuring. Yes.
Kirill Eremenko: That’s a very important question, right? Have you seen examples of when charts are created-
Alberto Cairo: Oh, yeah.
Kirill Eremenko: … with the wrong data?
Alberto Cairo: Yeah, I have seen. I have seen the samples of charts measuring the wrong thing and saying that they are measuring the right thing. Yeah. I don’t know. But, for example … I don’t know, not adjusting for inflation, for example. Right? How many times have we seen stories in news media saying, “The latest Marvel movie, the latest superhero movie, is the highest grossing movie of all time.” Right? Then you take a look at the data and you realize that the data is not adjusted for inflation. That statement is not true obviously, because you’re basically using the absolute values, when you should be using the adjusted values in order to make that comparison. That happens all the time, and sometimes we don’t pay enough attention, and therefore we are misled by those charts. Right? I have plenty of examples of this in the book. The one that is most popular with people in conferences is, is that I once saw a map, this plain number of heavy metal bands all over Europe-
Kirill Eremenko: Oh, yes. That one.
Alberto Cairo: Yeah, you saw that in the talk. That’s a good chart by the way. It’s not a bad chart. But it’s an example of how to double check the source, because I actually double checked the source in that particular case, because when I saw the map, number of heavy metal bands per million people per country, I asked myself, “Well, what is this source of this chart calling heavy metal? Are all the bands out there counting really heavy metal, or do they belong to other musical genre, et cetera?” Before tweeting the map and popularizing the mapping in social media, I actually went to the source and made sure that they are actually counting … that they had a more or less strict definition of what heavy metal is. Obviously it’s very hard to define, but you can set some boundaries in there and basically assess whether they are counting heavy metal bands, or they are also including … I don’t know, pop or rock bands or hard rock bands that are not really, really heavy metal.
Alberto Cairo: I took a look at the source. I use these fun examples and talks in the books to explain people how important it is to spend at least one minute or a couple of minutes double checking that, verifying that, before you put that chart that you have seen in social media in your own feed, for example. Because the chart may be wrong, and if the chart is wrong, then what you’re doing is spreading misinformation, right? We should … We all have a responsibility as citizens not to spread misinformation, or at least try not to spread misinformation. We all make mistakes, right? We all spread misinformation, but if we only spent one minutes or two thinking about what we are seeing, it will be less likely that we will spread misinformation among our peers, or family or friends in social media.
Kirill Eremenko: Yeah. That’s a common problem these days in the world we live in, where people just catch onto something they hear and they start spreading it. It’s very evident, for instance, in the political space where something happens and people think it’s really bad, they start spreading, and they don’t know the full story, they don’t know what actually happened. Then when the full story emerges is completely different, and now there’s all this deformation is already happened. People are calling each other-
Alberto Cairo: Look, it happens to all of us. This is something that I make very clear in talks and in the book, it happens to all of us. It has happened to me, it will keep happening to me in the future. However, it is less likely that it will happen to me today than it was, say, five years ago or 10 years ago. Right? It was more likely before just because now I’m a little bit more conscious about how I consume media, how prone I am to be misled by numbers or by stories or by charts. I try to be a little bit more careful, and if we all try to be a little bit more careful, we would not be able to avoid 100% of problems or cases in which we may be misled by a number or by a chart, but if we only avoid, say half of them, that means half less misinformation around there, right?
Kirill Eremenko: Yeah. With the hard rock bands, as far as I remember from your talk, they had Bon Jovi in that …
Alberto Cairo: No, they didn’t. No, they didn’t. That’s the key thing. That’s what I explained in the talk and also in the book, that the reason why I double checked the source of that chart is that if you look into the literature about the history of heavy metal or even if you go to the Wikipedia page about heavy metal, you will see that there are some bands that are mentioned in there that is a little bit dubious that they are heavy metal. For example, I think that the Wikipedia page mentions Poison, which is a glam rock band from the ’80s and ’90s.
Alberto Cairo: I doubt that that band can be really called heavy metal. It’s like if you … I mean, heavy metal, what is … Heavy metal is Metallica, or is a …I don’t know, Slayer or Judas Priest and all these bands, or Iron Maiden, right? Poison is a fine rock band, but it’s certainly not heavy metal, I would say. They don’t mention Bon Jovi. None of these bands that I have some times seen being categorized as heavy metal. They don’t appear in the source. I mean, the source only counts all the sub genres of heavy metal.
Kirill Eremenko: Yeah. I guess that’s your journalistic investigative minds. It’s interesting to see you coming from a journalism background because then you can apply this curiosity, this investigative approach to digging in and being … double checking all the facts. How would you say that somebody can just develop that without being a journalist, without the background that you have?
Alberto Cairo: Through practice. It’s also practice. As I said before, I mean, I am a little bit better at doing this today than I was say 10 years ago. The way that I wrote both How Charts Lie and my previous book, The Truthful Art, was trying to remember how I was 10 years ago or 15 years ago. What didn’t I know 10 or 15 years ago that I should have known? I try to basically summarize all that into some key principles. Take a look at the source, ask yourself whether the source is counting what they said that they’re counting, make sure that the data is displayed in correct scales, that they are not destroying the scales of the chart. Ask yourself whether the chart that you’re seeing is showing sufficient or insufficient information, right?
Alberto Cairo: Is it showing the right amount of detail in order for you to figure it out what’s going on, right? Try not to project your own beliefs onto the chart that you are seeing because a chart shows only what it shows and nothing else. Be really, really careful because we are prone, all prone to doing that, right? Try to curb your own impulses a little bit to see your own views confirmed by the data that you are seeing. Take a look at whether the patterns that you are … that the chart is displaying are really there or not, right? You’ll ask yourself, be a little bit more attentive. Only by doing that, as I said before, you will not be able to avoid all cases in which you may be misled by chart but you will avoid many, and by doing that you will become a better chart reader.
Kirill Eremenko: Or creator, right? That’s-
Alberto Cairo: Or a creator, right.
Alberto Cairo: … very important as well.
Alberto Cairo: Yeah. Yeah. It’s very important as well, because many of these problems or many of the mistakes that we make when reading charts, they’re very common, even among practitioners like myself, like journalists or graphic designers, et cetera, that sometimes we are a little bit careless with the data that we handle. I speak based on my own experience. I mean, I take a look back, 10, 15 years ago and I see some charts that if it were today, I would have never had [inaudible 00:40:23] such as pie charts in 3D and with shadows and shades or highlights and things like that that totally distorted the data, or scatter plot, the one that I mentioned before, which I described such as the more cigarettes we consume the longer we live. But no, that’s the wrong description for that chart. That’s not how to describe that chart, because that’s not what the chart is showing and so on and so forth.
Kirill Eremenko: Got you. I’ll probably, here, jump to your, fifth principle of graphical literacy because it fits in really well. When you build visualizations, you recommend to build narratives and test utilization. Specifically, I really liked what you said about beginning of the text, have … rather than just starting to throw visualization together, once you know what you want to display, think of a long sentence that will describe visualization, and then break it down into pieces and visualize that. Could you tell us a bit more about this approach, please?
Alberto Cairo: Sure. Sure, sure. But before I do that, I need to also emphasize that visualization can be used with multiple purposes in mind. When you take a look, for example, at the classical cycle of data science diagram, right? That you can read about in books, it just hardly weakens, are for data science and many others. Visualization comes in two different steps in that cycle, because visualization can be used to either explore data and discover things from the data, and we call that exploratory data analysis, obviously. Right? It can also be used to communicate your findings, right? What I specialize in is on the second use of visualization. I’m not an expert in exploratory data analysis, right? There are many people who work in these fields, people who work in scientific visualization, and in data science, and specializing visualization for exploration.
Alberto Cairo: What I specialize in is in helping scientists and other kinds of experts in communicating the results. When you already know what you want to say, once you have come out with the conclusions of your study, and you want to communicate those conclusions, how you do it. Then when I teach these principles to specialists, I describe that technique that you have just mentioned, that this is a little trick that I learned throughout the years, to never begin with the visualization itself but always begin with a very long description of what you want to say, right? An elevator speech, or what you want to describe.
Alberto Cairo: This is not a technique that I have invented. I need to credit the sources for this technique because I shamelessly stole it from some friends of mine. I heard about this technique from Juan Velazco, who used to be the graphics director at National Geographic Magazine, he’s a friend of mine, and also Javier Zarracina, who is the graphics director at vox.com, both long time visualization designers. Very, very talented, very nice people.
Kirill Eremenko: Both from Spain, right?
Alberto Cairo: Both from Spain, yeah. There’s some sort of Spanish Mafia in the world of visualization in journalism. They are both from Spain, yes. Anyway, they both described this technique one day in a conference that I attended, a couple of conferences that I attended, and it all begins by writing a very long sentence of what you want to say. What is the story? What is the narrative that you’re trying to convey? Right? Begin always with that, begin with a very long sentence. Oh, my study focus on this and that, I discover this and that, the exceptions are these and that, the limitations are this and that, and you write a very long sentence about that, and my conclusions are such and such, and possible autonomous explanations may be such and such.
Alberto Cairo: You’ll begin with a very long sentence, and then what you do is to split up that sentence into its natural components. You try to find the natural breaks in that sentence, and then you split it up into four, five, six different components. Each one of those components may become the headline of a different section in your visualization or in your scientific poster or in your whatever it is that you’re writing, your article, right? Those will be the main themes, the main topic in your design, and they may become the titles of the sections for your design. Then what you do is to design the visualizations that support the assertions that you’re making in those pieces of the sentence, right? You put your visualizations underneath each one of the pieces of the sentence. By doing that, you’re basically, first of all, providing the elevator speech itself.
Alberto Cairo: If people don’t want to really dig very deeply into your visualizations, they can still read the long sentence because the long sentence is after all the headlines over your sections, so they can get away … they can just read that, right? And get the gist of your story. But then, if they want to really double check whether what you are saying is right or not, they can take a look at your visualizations, as your charts or graphs, your maps, whatever visualizations that you’re designing.
Kirill Eremenko: That’s a very powerful approach. On top of that, I would like to probably talk a little bit about building narrative into visualization. With this day and age, one thing is just to create one image, which can be very useful and insightful, but sometimes and more often we see these infographics that combine multiple images and a whole story behind them. In one of your talks, I really enjoyed that whole story you built around the population of Brazil as you were doing some research or visualization on how the population of Brazil has changed from 2000 to 2010. But then once you added additional charts about the fertility rate, you were able to tell a much clearer story. If you don’t mind, could you tell us a bit about that and how that played out and the whole thing-
Alberto Cairo: Yes.
Kirill Eremenko: … behind that?
Alberto Cairo: Yeah, [inaudible 00:46:15] It’s actually quite weird to do a podcast about visualization because you need to verbally describe the chart. But this is an example that appears in my first book, The Functional art, and it’s a story that I published when I was working for a media organization in Brazil. I lived in Brazil for a few years. We published this very large poster about population pattern changes in Brazil. It’s a story made of several graphics, and the first thing that you see is basically a map and a bunch of bar graphs that shows you the population increase, between 2000 and 2010, right? Basically, the population of Brazil increased everywhere, right? At the national level, at the regional level, at the local level, with some exceptions. There are several regions that lost population rather than gaining population. But in general, the population of Brazil grew between the two years.
Alberto Cairo: Well, that’s interesting per se, right? But we decided to start, in collaboration with demographers … I rarely do these kinds of project alone because I’m not an expert on anything, right? In collaboration with demographers and some political scientists, we started digging a little bit deeper into the data provided by the Brazilian Census Bureau. One critical piece of data that appear in the news releases that we were getting and the data that we were getting, is that Brazil’s fertility rate, which is the number of children per woman in a country, was strangely or surprisingly different to what it was expected, right? When you think about fertility rates, when you think about rich nations, for example, rich nations tend to have low fertility rates, right? If you think about Germany or Spain or whatever, western nations, relatively high income in general, they tend to have fertility rates that are around 1.5 children per woman, 1.8 children per woman and so on and so forth.
Alberto Cairo: They are relatively low. If you go to very poor nations, right? For example, Afghanistan or Yemen, fertility rates are very high, five children per woman, six children per woman. Some African nations also have very high fertility rates. I think that Nigeria is around four right now. That’s the average, right? Then if you go to the middle of the spectrum, middle of the income spectrum countries such as Brazil for example, right? Fertility rates are usually between 2.5 or three point something children per woman. That’s the benchmark of these kinds of nations, right? But when you take a look at the data, that is not true. I mean, the fertility rate of Brazil, if you ask Brazilians themselves, right? I know this because I did it. If you ask Brazilian journalists, what do you think that is the current fertility rate of Brazil, you will get numbers such as 2.5 for three children per woman.
Alberto Cairo: Just because we have this idea of Brasil in mind as a nation that is still in development, right? Or a nation that is still very poor, and certainly there’s a high degree of poverty in Brazil, but that is not true over the entirety of the country. Brazil is a continent, right? When you take a look at the data, you will discover that fertility rates in Brazil have dropped very dramatically in the past 50 years, and the current fertility rate of Brazil is around 1.8 children per woman. That was a second piece of content that we put in that poster that we designed. Because obviously, if you have such a low fertility rate, 1.8, that’s below the replacement rate. The replacement rate is the minimum number of children per woman of fertility rate that a country needs to have in order to keep the population stable.
Alberto Cairo: If your fertility rate drops below 2.1, which is this magical number, right? Your population will become older, and will start shrinking in the future, just because you do not have enough children. If your fertility rate drops below that number, your population will become older, and in the future will start shrinking. If you ask Brazilian demographers about future population patterns in Brazil, they will tell you that, that Brazil’s population is predicted to become older and to start shrinking around 2030 or something like that. That’s a problem. Why? Because well, Brazil has a public health care system, it has retirement, obviously public social security like the United States. These population patterns would put a lot of pressure in Brazil’s public finances. How can you face that? Well, there are several things that political scientists have recommended to face these future situation.
Alberto Cairo: If you think about it, what I have done over here is basically to use the technique that I explained before. My very long sentence would be, “Brazil’s population has grown bigger but fertility rate is way below expected. As a consequence of these, Brazil’s population will become older, and it will start shrinking in the future. This will be a problem. Here’s how to face these problems.” That’s a very long sentence. You split it up into its components, and then you compare each one of these headlines, these little titles, with the graphics that show the evidence for the assertion that you’re making. What we did was to use maps and bar graphs to show population change, align chart to show the drop of fertility rates in Brazil in comparison to other countries all over the world.
Alberto Cairo: We used a population pyramid to compare Brazil’s population today versus Brazil’s population based on age groups in 2050. A line chart to show Brazil’s population growing but then it started shrinking in 2030, and so on and so forth. Basically, it’s a good example, I believe, to illustrate how these narrative principle works, right? It doesn’t work always, but when it does, when you can structure your information this way, it can be really, really powerful.
Kirill Eremenko: It also takes care of the audience, because if you just showed a chart where you’re showing how the population of Brazil grows from 2000 to 2010, people might … even though the chart’s showing the correct insights, people might misinterpret it and extrapolate that the population is going to keep growing, and by 2020, it’s going to-
Alberto Cairo: Or they may miss important features of the data, right? That’s why I emphasized before, the importance of using text in data visualization. Again, we call this the annotation layer in data visualization. Let’s say that you are doing a line chart showing progress in sales in your company, and there is a sudden spike in a particular point in time, you better put on an annotation in there because otherwise people will wonder, why is there this spike over here? What’s going on? Because you need to try to explain it. Put an annotation in there, right? That annotation layer is really, really relevant in data visualization. Pairing, again. Pairing the visuals with the copy, with the texts that you can write to emphasize the important points in the data, to supplement the data a little bit, to reinforce the main messages that you’re trying to convey, or to avoid misinterpretations, right? Also, to avoid misinterpretations of the data that you’re presenting.
Kirill Eremenko: In that sense, I really like the grammar of graphics, how did they describe the multiple layers of visualization. Multiple, starting from the axes all the way to different colors and including annotation. Once you understand, basically as they call it in the book, the grammar of graphics, it really helps-
Alberto Cairo: The layer-
Kirill Eremenko: Layers.
Alberto Cairo: [inaudible 00:53:45] grammar of graphics. Yeah. This is another one of those concepts that I try to explain to the general public in the new book, in How Charts Lie. I talk about the grammar of graphics. Obviously, I do it in a much less technical way that Leland Wilkinson did in his famous book, the Grammar of Graphics, or Hadley Wickham does when talking about ggplot2, but I still describe it. I still teach this principle in the new book.
Kirill Eremenko: Definitely. That’s very interesting. Unfortunately, we won’t have time to go into the rest of the principles of graphical literacy. For our listeners, if you’d like to learn more about them, I highly recommend picking up Alberto’s book, which is available on pre-order, right Alberto?
Alberto Cairo: Yeah, it’s already available everywhere for pre-order; Amazon, Barnes & Noble, independent bookstores. It’s basically everywhere. [crosstalk 00:54:38] Yeah. Yeah, it comes out in October the 15th, but yeah, you can order it now.
Kirill Eremenko: Guys, girls, go get that book. It’s going to be epic. I’m definitely going to pick up a copy. In the remaining five or so minutes, I wanted to just quickly touch on something I’d love to get your opinion on, and that is ethics in visualization. We already spoke a little bit about being conscious about what you reshare, how you read charts and double check the data behind that, and I think with how we’re moving more into a technological world, with more and more screens around us, with soon wearable devices and things like that, ethics is going to be super important. What is your stance on ethics in visualization? What recommendations can you give to practitioners listening to this?
Alberto Cairo: Oh Wow. That would take another entire book to talk about. I may write about that in the future. I have that on the pipeline, to write a book about how to handle data, and particularly when you are going to visualize it. I don’t have very formed thoughts at the moment because again, I may use this new book to think clearly about these sorts of principles. But there’s lots of people writing about these things already, not from the point of view of visualization but more from the point of view of data science in general. I’m a bookworm, I would like to recommend books. I would recommend, for example, Cathy O’Neal’s Weapons of Math Destruction. I think that is a good introduction to thinking about the implications of the data that we handle every day, how to handle it carefully, clearly, and ethically. I think that is a good introduction to that.
Alberto Cairo: If you like something a little bit more controversial and aggressive, which I really, really enjoy and that makes you think, even if you disagree with the book sometimes, because it’s so aggressive, I would really, really, really recommend Mike Montero’s new book. His new book, I believe, is called Ruined By Design. He has a word design in the title, but it’s a book about data science. It’s about a book about technologists, how technologists gather data, how the data is handled or mishandled, how careful we need to be with the tools that we create and think about the possible consequences of the tools that we create and that we put out for the public to use, and so on and so forth. Mike is a very passionate speaker. He’s also a very passionate writer.
Alberto Cairo: Again, you may not agree with everything that he says in the book, but it’s one of those books that even if you disagree with it sometimes, it makes you think deeply, and it makes you stop and think, “Is this guy right? Am I doing things correctly?” Ethics begins with that; with doubt. With doubting about your own decisions and making … have a dialogue with the book itself. The book makes you think clearly. Those are two of my favorite books to start thinking about how to use data ethically, and visualization as an extension of that. But there are many others. For example, Meredith Broussard, she has a book title Artificial Unintelligence, which I really enjoyed. This is by MIT Press if I’m not wrong.
Alberto Cairo: Virginia Eubanks, she has another book titled Automating Inequality, which is about how algorithms may promote or may perpetuate societal inequality. That’s another book that make me think. Again, none of them covers visualization graphics in general, but you can not understand visualizations separately from the data that visualization is representing. Any book or any thoughts about the ethics in data visualization, necessarily needs to begin with thinking about the data themselves.
Kirill Eremenko: Well, totally I love it. You’re definitely a book warm. That’s so many interesting books that I’ve just been writing down. Yeah, now I’m very curious about this one, Ruined By Design by Mike Monteiro.
Alberto Cairo: You should really read it. I mean, it will make you feel angry sometimes, I think, but for a very, very good reason. I think that he makes a very good case. I think.
Kirill Eremenko: That’s wonderful. Well, on that note, Alberto, thank you so much for coming on the show, sharing all your insights. It’s been a huge pleasure. Before I let you go, what are some of the best ways to get in touch for your work? Of course, in addition, or apart from purchasing your book, which I highly recommend to everybody if you love this podcast, go and get Alberto’s new book, How Charts Lie. In addition to that, what are some other ways that people can follow you and get access to all these great things that you’re creating?
Alberto Cairo: Sure. The best ways, I use Twitter quite a lot. My handle is very easy to remember, is my first name and last name. So it’s Alberto Cairo, @AlbertoCairo. You can find me on Twitter. I’m also on Facebook and on LinkedIn. I’m both in LinkedIn and Facebook, but I use Twitter most of the time, as a way to promote things that other people do, graphics that other people design, articles that I read, papers that I have discovered, books that I’m reading, whatever. I use it as a platform to share, basically, things that I see and that I enjoy. I also have a web blog. The web blog is the title of my first book, The Functional Art. It’s thefunctionalart.com. That’s my web blog, and that’s the platform that I use to write a little bit more extensively about things that I see or so. Those are the best ways, I would say.
Kirill Eremenko: Got you.
Alberto Cairo: My E-mail address is very easy to find, in any of these platforms.
Kirill Eremenko: Fantastic. Also, everybody listening, Alberto, you have a huge 45 and a half thousand followers on Twitter. Yeah, it’s a great community to be part of, I guess, to follow-
Alberto Cairo: Yeah, and it’s a-
Kirill Eremenko: [crosstalk 01:00:36] his insights.
Alberto Cairo: It’s a fun community, as well. There is one virtue that the visualization community has, which is that it’s very welcoming to newcomers. If you want to get started in data visualization, you just need to basically get started. Start designing your graphics, putting it out there, asking people for advice, asking people for feedback, and most people, or 99.9% of the people who I know in the visualization community are very constructive, welcoming, friendly, and it’s a great community to work in.
Kirill Eremenko: For sure. I find that to be true across all of data science. It’s surprisingly such and inspiringly so such a wonderful community of helpful-
Alberto Cairo: Yeah, absolutely.
Kirill Eremenko: … people.
Alberto Cairo: The [inaudible 01:01:21] community is very similar to the visualization, as far as I have seen. Yeah.
Kirill Eremenko: Fantastic. Well, once again, Alberto, thank you so much for coming on the show and sharing all these amazing insights. Super, super excited to chat, and good luck for the book once and for all the touring that you’re going to do in a couple months from now.
Alberto Cairo: Thank you so much for having me again. It was a pleasure.
Kirill Eremenko: There you have it, ladies and gentlemen. Thank you so much for being part of today’s episode of the SuperDataScience podcast. That was Alberto Cairo. What an epic person. What an epic expert in the space of data visualization. I got a ton from this podcast, got so many takeaways, and I hope you did too. Just from this conversation, you can tell the depth of thinking that goes into Alberto in his visualization. You’re going to find all of the infographics that we talked about in the show notes for this episode at www.www.superdatascience.com/271. That’s www.superdatascience.com/271, and just have a look through them. Look at, for instance, the cigarettes versus life expectancy, or the Brazil visualization that we were talking about, or the Nobel Prize and chocolates visualizations.
Kirill Eremenko: Just look at all of these different visualizations that you’ll find there, and notice the depth of thinking that went into creating them, and you will recognize a lot of the things that Alberto was actually talking about on this podcast, from understanding if your data is measuring the right thing that you wanted to be measuring and that you think it’s measuring, to building narratives and creating a narrative structure in your visualization and conveying those insights in a certain way so that people can better understand them. Also, if you see Alberto’s visualizations on the Internet, you’ll find that they’re definitely very persuasive and very memoral. Of course if you enjoyed this podcast, make sure to pick up Alberto’s new book, which is called How Charts Lie: Getting Smarter about Visual Information, is coming out in October, 2019, but you can already pick up a copy now. You can pre-order a copy on Amazon or Barnes & Noble, on Amazon UK, or wherever you’re shopping for your books.
Kirill Eremenko: Highly recommend putting on a pre-order so that you get it fresh once they’re live. What I really like about this book, as Alberto described it, is that it’s for the general public, and that means if you’re not that at depth at data visualization, you’re going to get a great headstart. But if you’re already a data scientist, and you’re already visualizing a lot of things and you’re pretty experienced in this space, it will help you see visualization from the eyes of your audience, and understand what kind of issues they’re going through, what kind of challenges they’re facing. I think it’s a very valuable skill to empathize with the people that you’re creating this for, for your audience. That can be very, very powerful.
Kirill Eremenko: Of course, as usual, if you know anybody who can benefit from this podcast, somebody who’s interested in visualization, somebody who’s a fan of Alberto Cairo, or somebody who’s dabbling on the verge of getting into visualization or not, send them this podcast, give them this gift of insights into what the world of data visualization’s all about, and you might even help them change their lives, changing their careers and progress forward. Share the love, share this link; www.superdatascience.com/271 with anybody who you think could benefit from it. On that note, thank you so much for being here today, make sure to follow Alberto on Twitter and any other social media, and I look forward to seeing you back here next time. Until then, happy analyzing.