Kirill Eremenko: This is episode number 329 with Data Storytelling Coach Isaac Reyes.
Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
Kirill Eremenko: This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. We’ve got over two and a half thousand video tutorials, over 200 hours of content and 30 plus courses with new courses being added on average once per month. And you can get access to all of this today just by becoming a SuperDataScience member. There is no strings attached. You just need to go to www.superdatascience.com and sign up there, cancel at any time. In addition with your membership, you get access to any new courses that we release, plus all the bonuses associated with them.
Kirill Eremenko: And of course there are many additional features that are in place or are being put in place as we speak, such as a slack channel for members where you can already today, connect with other data scientists all over the world or in your location and discuss different topics such as artificial intelligence, machine learning, data science, visualization and more. Or just hang out in the pizza room and have random chats with fellow data scientists. Also, another feature of the SuperDataScience platform is the office hours where every week we invite valuable guests in the space of data science and interrogate them about their techniques, about their methodologies in the space of data science. And you actually get a presentation from the guest and you get an opportunity to ask Q&A at the end. And in some of our office hours we just present some of the most valuable techniques that our hosts think are going to be valuable to you. So all of that and more you get as part of your membership at SuperDataScience. So don’t hold off. Sign up today at wwww.www.superdatascience.com. Secure your membership and take your data science skills to the next level.
Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen. Super pumped to have you back here on the show today and today’s guest is an expert data storyteller, a founder of a coaching firm in the space of data visualization and data storytelling and a TEDx speaker, Isaac Reyes. So Isaac is the founder of StoryIQ, a company which previously was known as DataSeer and recently they’ve rebranded. And what they do there is they go into companies and actually teach data science teams on how to present, how to tell stories with data so that they can deliver better insights to their executives. They’ve worked with some leading companies in the space of technology in the U.S. and across the world. And what is very exciting is that this is a space that is definitely booming right now because data science has been around for a decade now and more and more companies are seeing that there’s value in data science but the problem is the communication.
Kirill Eremenko: You’ve heard me a couple of times mentioned on this podcast that the most in demand data scientists, the highest paid data scientists, the most popular data scientists are the ones who can act as a bridge between the technical insights and the audience that needs to act upon those insights. Because one thing is to be able to crunch numbers and build models, but a whole different thing is to be able to explain those models, to explain those insights to the business decision makers and stakeholders. And so that’s what Isaac does. He coaches people how to do that. He coaches whole teams and in this podcast he shared some of his key takeaways, his key insights that you can already apply in your career today. So here’s a brief overview of some of the exciting things we’ve talked about.
Kirill Eremenko: First of all, of course we defined what data storytelling is and why it’s important. Then we talked about the four keys to data storytelling. So Isaac has his own methodology, which he’s developed over the years and you’ll find out what are the four key ingredients of data storytelling. We talk about the first three at the start of the podcast and we get carried away with visualization. So now you’ll hear the fourth one towards the end of the podcast, it will come up and then it will trigger another conversation there. We almost forgot to dive into it. Then you’ll find out what an impact metric is and other little hacks you can use in your visualization. And then we’ll actually dive into psychology behind visualization. Something I’ve never done before and I really appreciated that in Isaac’s approach that it’s very profound and well researched and very scientific approach to data visualization.
Kirill Eremenko: So you’ll actually understand how people think about the visualization and how you can better structure visualization to facilitate the understanding of these visualization. So specifically we’ll talk about the Cleveland and McGill theory, or also it’s called the ranking of Elementary Perceptual Tasks Scale by Cleveland and McGill. And this will help understand which charts to use and why. A very scientific approach to selecting a chart. And then we’ll talk about the Gestalt laws and other scientific research in the space. We’ll talk about the proximity law and the similarity of law. There are many other laws, but we won’t dive deep into all of them. Those will be in the show notes if you want to look at them further. And then we’ll talk about the Gestalt laws specifically. We’ll cover the proximity law and the law of similarity. There’s another scientific approach to visualization.
Kirill Eremenko: So in a nutshell, there’s just a few of the topics that we talked about in this podcast. There’s many more insights and I can’t wait for you to learn from Isaac and get the valuable knowledge that he’s about to share. So without further ado, I bring to you and Data Storytelling Coach, Isaac Reyes.
Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies and gentlemen, super pumped to have you back here on the show. And today’s super special guest is Isaac Reyes calling in from Manila, Philippines. Isaac, how are you going today?
Isaac Reyes: I’m going well, Kirill, it’s great to be here.
Kirill Eremenko: Well, I’m very excited to have you here, man. When I was calling you, I thought you were going to be in Sydney, but as we discussed, you’re in Manila now. But yeah, so cool to talk to an Aussie again. I love some things, one of your talks you used the word mate and it’s such a common thing in Australia.
Isaac Reyes: Yeah, that’s right. So just in terms of being in Manila, like I bounce around a lot. It’s the life of a data science trainer and in regards to the accent, I could talk to you like this, mate but I’m just going to turn that off for the blog.
Kirill Eremenko: Yeah, yeah. Gotcha. So I was wondering like as in our company, in SuperDataScience on Slack, I don’t know, maybe like inadvertently, but I’ve always been using, Hey mate, hey mate, hey mate with everybody I work with and now it’s become like a habit. And I don’t even know like until I watched your talk just recently, I was like, “Oh, I only realized in that.” In other countries they don’t say that. Like what do people say to each other in a work environment? Like in the U.S. for example, like how do they address each other, if not by name?
Isaac Reyes: Yeah, we’ve been doing a fair bit of work in the U.S. I’ve noticed in more casual work cultures it’s more of like a, “Hey, what’s up?”
Kirill Eremenko: Yeah.
Isaac Reyes: Yeah. But the mate is, yeah, not as common.
Kirill Eremenko: Interesting. Interesting. I should check with our team because we have 16 people from all around the world. They’re all using, Hey mate!, now. Only now realize it’s not a common thing around in other countries.
Isaac Reyes: No.
Kirill Eremenko: Oh yeah. Okay. Well, how’s Manila these days?
Isaac Reyes: Yeah, so we just had a typhoon pass through yesterday, Typhoon Tisoy and it caused quite a bit of havoc down in the south of the country but Manila, we were fairly untouched. I mean just strong winds and rains and flooding and company closures, but that’s pretty mild in terms of just how bad it can get here when a typhoon decides to rear its ugly head in terms of the more powerful ones. So we got off okay.
Kirill Eremenko: Wow. Wow. Well good. That’s good to hear that everything kind of passed by all right. How is Manila in general? Like I’ve been to the Philippines, I’ve been to Cebu, to the Northern part of Cebu, so that’s like one of the islands there and also to Malapascua above that. Mostly for scuba diving, but like I’ve never been to Manila. What’s it like?
Isaac Reyes: Yeah, look, Manila… I mean it’s a city with 20 something million people and it’s going through the period-
Kirill Eremenko: 20 million!
Isaac Reyes: It’s the same.
Kirill Eremenko: That’s like the size of Australian in population, almost. Like we have 25 million in the whole of Australia. This is just one city.
Isaac Reyes: That’s it. Yeah. Packed into like, I don’t know, it must be like a 30 kilometers across or something like that. So it’s just an insane population density in the city, but it’s the… So the numbers just came out the other day. The country is growing faster than any other country in Asia. So it’s a just a really exciting place to be with just all the growth and all of the companies are starting to get more data-driven as well. So, I’m not here permanently, I bounce around between here and Singapore, Sydney and New York. But it’s whenever I’m here, it’s just a great energy because the people are just so friendly and everyone’s just positive about where the country’s going.
Kirill Eremenko: Mm-hmm (affirmative). And as you mentioned before the podcast, it just made sense for you to set up a shop or you set up your data science visualization training company in Manila. Why is that? Could you just clarify for our listeners please?
Isaac Reyes: Yeah, sure. Yeah. So there’s some amazing, especially data viz and data storytelling talent here. So we’re primarily a company that focuses on data viz presentations and data storytelling. And whilst the data science community is more in its infancy here, there are really amazing data viz and data storytelling professionals here. I’m half Filipino myself and a lot of Filipinos have this kind of affinity for the creative arts. And what we find is that the people we’ve hired here really bring that into their data stories. They add flair. The way they tell stories is world class. And so yeah, we made a strategic decision to have a major office here.
Kirill Eremenko: Mm-hmm (affirmative). okay. Okay. Very cool. And how long ago was that move?
Isaac Reyes: So we first started in 2014. So yeah, we’ve had an office here for five years.
Kirill Eremenko: Hmm. Interesting. And having said that, even though like your office is in Manila, you’ve worked with some of the leading companies. I know you asked not to disclose the names on the podcast, but like some of the leading companies in technology in Silicon Valley, how does that add up?
Isaac Reyes: Yeah, so data storytelling is a… It’s kind of like a nation data science, that’s quite hot right now. And we started putting out some thought leadership pieces, going to some conferences. So we, myself and my co-founder Dominic, we have done a few talks at some of the bigger data science conferences like ODSC. We both put out Ted talks. And so what we found is that some of the leading companies in the U.S. and Europe, they would find these talks online and then reach out and say, “Look, we really need these skills for our team. Just how to kind of take business data and convert it into a compelling message.” And so yeah, they reached out that way.
Kirill Eremenko: Okay. Very, very cool. And yeah, like I’m burning to find out more about your method of data storytelling and teaching data storytelling. Before further ado, let’s dive straight into it.
Isaac Reyes: Yeah. Sounds good.
Kirill Eremenko: Like what is data storytelling?
Isaac Reyes: Yeah. So, at its essence, data storytelling is about taking business data and presenting it to your audience in a way that they can consume, but also a way that gets them to actually feel something so that they can go and take action.
Kirill Eremenko: Mm-hmm (affirmative).
Isaac Reyes: And what I find is that, and I was guilty of this for many years in my career as a data scientist, I would just present charts and information. And when you just present charts and information and tables or metrics, you’re just presenting facts and people don’t feel anything when they just read facts.
Kirill Eremenko: Why is it important for people to feel something?
Isaac Reyes: Because people just don’t go and take action unless they feel something, right? They have to feel fear or they have to feel inspired. Right? Imagine, I think back to all the times I tried to lose weight, the only times I ever really got going was when I felt like, “Ah, okay, this is just really bad. I’m not in a place for that right now.” Right. But that’s when I was like, “Okay, I’m doing a meal plan. I’m doing a gym plan.” So you need to get your stakeholders to actually feel something if they’re going to take action and look, real numbers can get people to feel something. Like, if you just show someone a table that shows that net profit, you’ve lost $100 million for the quarter, that will get people to feel something.
Kirill Eremenko: For sure.
Isaac Reyes: But you can with numbers that aren’t so extreme but are perhaps indicative of a longer term problem. You may need to take those numbers and then lay a compelling narrative and storyboarding and staging and your story to really influence change and get people to feel something.
Kirill Eremenko: Mm-hmm (affirmative). Okay, gotcha. Okay. And so yeah, you were saying you have been guilty yourself. Was just presenting facts rather than making people feel something.
Isaac Reyes: Yeah, absolutely. So, I used to work at Quantium many years ago, which is, you’ve probably heard of them, data analytics company based down in Sydney. And then after that I taught statistics for a little bit and then I headed up a data science team for a company called Altus also down in Sydney. And all through that period, while I was working for those companies, I would present pretty much just the data in a fairly dry way. And it wasn’t until I started presenting to higher level executives that some of them would mention, like let’s make this have more impact. And that kind of got me thinking around, “Okay, how do I make it have more impact?” And really it comes down to the narrative that you use. Like visually if you’re presenting in a slide deck or in a dashboard, like what verbal narrative is written there. And then if it’s in a presentation, what verbal narrative is the presenter actually delivering through the mouth in the presentation? All of that’s really key.
Kirill Eremenko: Okay. Let’s talk a bit about the keys to data storytelling and one of your presentations, you mentioned like the four components. Would you mind walking us through that?
Isaac Reyes: Yeah, no worries. So we’ve thought a lot about the best way of teaching data storytelling skills in like one or two days. And what we’ve done is we’ve distilled it down into just four simple keys. The first key is the audience. If you’re creating a data story, you’ve got to think who is my audience and what do they need to know. Actually, that’s where a lot of data scientists really go wrong because they start to think, “Okay, I’m interested in XYZ, and ABC. And so I’m just going to present all of that to my audience.” When maybe your audience is only interested in X and C for example. So you’ve got to be really careful around like what are you presenting? And if your, and this is very common in the data science community, you’ve done all this analysis, you’ve worked really hard to do all this great analysis and like an excited puppy dog [crosstalk 00:17:07]-
Kirill Eremenko: Unleash all of it onto your audience.
Isaac Reyes: Exactly. Just like, I used to have a dog that would like bring me this kind of whatever he’d find on the street and like dump it on my porch. And it’s like, “Okay, good dog.” But I wasn’t really interested in that. And so it’s kind of like that. Like we are like those excited puppy dogs and we’re like, “Look at all this great analysis that I did.” when often it’s about, “Okay, what deserves to go into the main presentation and what am I going to push to the appendix here?” So the audience is number one. Now once we figured out… Oh, and I’m sorry, I will mention one more thing about the audience. That’s really key in deciding what media are you going to use to present your data story. I can’t tell you so many times where I’ll work with an analytics team and it’s like, “Okay, why did we use Power BI here?” And people look around and go, “Well we bought a Power BI license last month, that’s why we use Power BI.”
Isaac Reyes: I have seen a team create this amazing interactive dashboard and then present it to their stakeholder. And the stakeholder kind of says, “Okay, look, I love the work that went into this and this is going to be a useful tool for like this other team. But I just really wanted a three minute PowerPoint deck that tells me what I need to do to move the business forward.” So yeah. So really being audience centric with every single design decision that we make through the data storytelling creation process, from media through to the metrics we choose, even down to the colors and the and the color palette and all of that. So audiences first.
Isaac Reyes: Then the second key is the data. So we’ve got to get the right data that our audience is interested in. And often that’s easier said than done because obviously we’ve got to link data from various sources and prepare it for analysis. But more key is metric selection. And this is an area where I went so wrong early in my career in this area where I would… Maybe I’d be in stat mode where I’m like, okay, average mean, median mode, that’s standard deviation and all of that. And the thing is the metrics that statisticians use are often very different to the metrics that a business is most interested in. And so a business might be more interested in totals counts, total profit, percentage churn, metrics that you didn’t learn in STAT 101.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative).
Isaac Reyes: Okay. The data key is all about going, yeah, what metrics are we going to use? And it’s more often than not a combination of metrics that is going to best answer the business question at hand.
Kirill Eremenko: And so-
Isaac Reyes: So to give an… yeah, go ahead.
Kirill Eremenko: And so by data you mean not just like the incoming data that you need to collect, but also what is your data that you’re going to present. Going to look like what kind of aspects of the data are you going to focus on what like as you said metrics are going to derive.
Isaac Reyes: Yeah, exactly. Yeah. And I probably could have defined it more clearly. Exactly. So it’s yeah, it’s about, yeah, just data ingest and then cleaning it up and getting it ready for reporting. And then, yeah, exactly as you said. What are the actual pieces of numerical information that make it into the final presentation.
Kirill Eremenko: Gotcha, gotcha. Okay, makes sense. So you’re going to give an example?
Isaac Reyes: Yeah. The example is, I was working with a telco the other day and someone was creating this amazing data story around customer churn and the initial draft of the data story just showed the percentage of customers that they lost in that quarter. And that’s all well and good. But the C level execs are also going to be interested in not just the percentage of customers we lost, but dollars lost. Like you’ve got to bring it back to how much money have we lost this quarter and on an ongoing basis, how much are we going to lose. And then also the count as well. So what the execs ended up wanting there was this nice combination of percentage of customers who left plus revenue lost plus the count.
Kirill Eremenko: Mm-hmm (affirmative). Okay, gotcha. And so that was something that you overlooked and they had to like correct the presentation afterwards?
Isaac Reyes: Exactly. Exactly.
Kirill Eremenko: Okay, so we’ve got audience, we’ve got data ingest and output and what are the other two components?
Isaac Reyes: Yeah. So now we’re getting to the really exciting one, which is the visuals. So you know who your audience is, you know the data you want to present, now it’s like how are we showing this? And a lot of data viz professionals will often jump the gun and go, “Well I’ve got to use a chart.” But sometimes a table will outperform a chart. Particularly when you’ve got lots of different variables that need to be shown in close proximity to each other. There is a point where you just can’t force all of those variables into a chart unless it’s going to be a chart that looks very, very busy. So it’s about deciding when do I use tables and when do I use charts. And then sometimes you actually use neither of those. Sometimes it might be, and well what we call at StoryIQ, we call it a impact metric.
Isaac Reyes: So that’s just one single number that you’ve decided, this is really important. I’m going to blow this up in size and give it prominence. Just like think of a Steve jobs keynote where he just puts like one number on the screen. That’s what an impact metric is.
Kirill Eremenko: Gotcha. Okay. That makes sense.
Isaac Reyes: Yeah. And so the visuals key to data storytelling is all about what key am I going to use? Sorry, what a display mechanism am I going to use to display my data. Now if you’ve decided to go with the chart and chart is obviously very compelling, then it’s about, well how do I choose the right chart for the job? And what we found is, and I was guilty of this myself, people don’t often have a solid methodology for chart selection.
Kirill Eremenko: Mm-hmm (affirmative). Exactly. I totally agree with that. Yeah, often is just random. Okay, I’m going to try this. This looks good. Okay, let’s go with that.
Isaac Reyes: Yeah, you’ve hit the nail on the head. And sometimes I say that like, chart selection is mainly just driven by divine inspiration. People just trial and error until like, Oh yeah, that looks pretty. And I guess it shows the data. So now the good news is that psychologists and statisticians have collaborated on this problem and there’s been a bunch of research. William Cleveland and Robert McGill in 1984 wrote some amazing papers in this area. And they’ve summarized for us what chart types perform well for certain data types and what chart types don’t perform well. And so that third key is about choosing the right chart types, but not based on what you think looks nice, but based on the science of human vision and perception.
Kirill Eremenko: Interesting. Can you share some things from that? This the one from your talk that the research was called the ranking of Elementary Perceptual Tasks Scale, right?
Isaac Reyes: Exactly. Yes. So it was a really interesting study that these gentlemen did. So Cleveland and McGill, the two statisticians working at AT&T in 1984
Kirill Eremenko: AT&T? Like the…
Isaac Reyes: Yeah.
Kirill Eremenko: What? They were around in 1984?
Isaac Reyes: Yeah. Yeah. The old school. Yeah. AT&T, yeah. I think that that might’ve even been more of the glory days than even now. I’m not sure.
Kirill Eremenko: All right. Okay. So what did they come up with?
Isaac Reyes: Okay. These guys were interested, what charts perform well. Bar charts, pie charts, line charts, dot plot, scatter plots, et cetera. So what they did was they said, “Let’s round up 51 human participants for an experiment and let’s break down charts into what they called Elementary Perceptual Tasks.” And so they said, “Okay, a bar chart in its essence, a basic bar chart just ask you to compare lengths.
Kirill Eremenko: Yeah.
Isaac Reyes: All right. And then a line chart at its very essence is asking you to compare points against a Y axis.
Kirill Eremenko: Makes sense.
Isaac Reyes: Right. Makes sense. All right. And then a pie chart asks you to compare areas.
Kirill Eremenko: Mm-hmm (affirmative).
Isaac Reyes: Okay. And I guess you could say angle as well, but… So they broke down every single chart type into what they called, it’s Elementary Perceptual Task, i.e., the task that the user has to perform in their brain to perceive differences in the numbers. And then they ran a series of experiments on these 51 participants. It was a pretty good experimental design. I used to actually work as a biostatistician many years ago. So I was quite critical like when I was reading the study, but it’s very well designed. Then what they found is that certain chart types just beat the living crap out of other chart types and it would just surprise you. So you’re probably aware already, but the… So they found that comparing positions against a common access is something humans do very, very well. So like line charts-
Kirill Eremenko: Oh like a line chart, yeah.
Isaac Reyes: Yeah, exactly. So yeah. So the line chart performs really well. Any guesses what else might perform well, Kirill?
Kirill Eremenko: Well, I’ve seen your talk so I know the answers. The next one would be if you shift the Y axis for one of the lines, the humans still do a pretty good job.
Isaac Reyes: Absolutely. Yes. So that’s exactly what they found. Let’s say you’ve got two line charts on your slide, but one of the line shots is positioned a bit lower than the other one.
Kirill Eremenko: Yeah.
Isaac Reyes: And then you want people to compare across charts, what they found is that generally like you don’t want to do that. Like you want to align everything if you can on the same axis. But they did find that if you do do that, it’s not too bad. But yeah, you do lose a little bit of accuracy there. And then the [crosstalk 00:28:11].
Kirill Eremenko: Accuracy of perception.
Isaac Reyes: Yeah, exactly. Yeah. I should probably thresh that out a bit more. So what they would do in these perceptual tasks is they would ask people, “Okay, what do you think the distance is between these two dots based on what you’ve been given on the Y-axis.” And they would find that the guesses of the humans relative to the actual measurements for some perceptual tasks would just be so much higher than others.
Kirill Eremenko: Mm-hmm (affirmative). So a line chart, two line charts on the same Y-axis with the same Y-axis is the chart number one. Number two is two line charts with slightly shifted, like Y-axis doesn’t start at the same level, right?
Isaac Reyes: Yeah. Yeah.
Kirill Eremenko: Okay. What’s number three?
Isaac Reyes: And then moving down the scale, the regular bar chart on a flat baseline where you’re comparing lengths generally performs quite highly. It’s a good robust chart type.
Kirill Eremenko: I like it. The example you gave where it’s like you and your brothers standing and comparing heights.
Isaac Reyes: Yeah, that’s it.
Kirill Eremenko: And standing back to back.
Isaac Reyes: That’s it. Yeah. Humans just do a great job of looking at two people standing back to back and guessing what is the difference in their height. We’re extremely good at that. Yeah. And that carries over to the bar chart. We are good at comparing heights and lengths on a flat baseline. So, yeah. So far we’ve established the line chart and the bar chart are two great chart types. The standard bar chart that is… When we moved down again though in the next experiment they were comparing lengths on a non-flat baseline. So I want you to imagine, Kirill, that I was in Brisbane and I’m standing on a table trying to compare my height to you when you’re standing on the floor.
Kirill Eremenko: Part.
Isaac Reyes: And then… Yeah, yeah, that’s right. And we go and ask someone to guess the difference in our heights and they’re really going to struggle to put an accurate number on that.
Kirill Eremenko: Mm-hmm (affirmative). And in terms of bar charts, what’s the analogy there?
Isaac Reyes: Yeah, so this goes back to the stacked bar chart. So stacked bar charts perform reasonably well for the bars that are on the baseline at the bottom of your chart because they get the flat base. But for all of the, I call it the middle stack syndrome problem, for all of those like components in the middle, you can’t follow a trend in them if it’s time-based data, if you’ve got a categorical X-axis, then you just sitting there struggling to compare at lenghts.
Kirill Eremenko: Yeah. I love the example you give with the, what’s it called? Business insider charts. Like, tell us why do you like business insider charts?
Isaac Reyes: Yeah, I get myself in trouble for saying this. But I actually check Business Insider every day because they’ve got a section of their website that’s called Chart of the day.
Kirill Eremenko: Yeah.
Isaac Reyes: And every day they upload a new chart and they have some commentary about it. But I do enjoy that website because it’s an excellent, reliable source of bad chart for my training courses.
Kirill Eremenko: Love it. Love it. Okay. Yeah, so we’ll share that example of the bar chart called Apple global revenue share and indeed like stacked bar charts, whenever I look at them like it’s so hard as you say, like especially towards the middle, like can’t tell what is going on. If they’re not labeled, it cannot compare like this slight gradual differences between consecutive bars?
Isaac Reyes: Yeah, absolutely.
Kirill Eremenko: Does that mean like never to use stacked bar charts?
Isaac Reyes: Oh, that’s a great question. And hard to answer in a way, but… And the reason it’s hard to answer is that in data viz, you really never want to… What I found is that you probably don’t want to be saying never for certain things. Now I will say never in regards to explode 3D pie charts. Never use those. But generally you can usually find a situation where you can break the rules. So I find like data storytelling is like any other artistic endeavor. It’s sort of like, yes, there are rules and we generally don’t break those rules. But sometimes you can find exceptions where it does make sense to break them. Generally, I would avoid stacked bar charts where I can, and there are often alternatives that can be used. So yeah, generally they are avoided just because of this problem of not being able to compare the lengths accurately of the middle segments.
Kirill Eremenko: Mm-hmm (affirmative). Okay, gotcha. And what comes after that? What’s even worse than a stacked bar chart?
Isaac Reyes: Yeah. So even worse than human’s abilities to compare lengths on a non flat baseline is human’s ability to compare areas and angles. And we can probably all think of a chart that involves the comparison of areas and angles, the lovely pie chart. And, yeah. So you know, there’s been some great commentary about pie charts over the years and with Tufte, author of the book, the Visual Display of Quantitative Information, he says the only thing worse than a pie chart is several of them.
Kirill Eremenko: Amazing.
Isaac Reyes: A great quote.
Kirill Eremenko: Why do you think people are so bad at comparing areas?
Isaac Reyes: Yeah, that’s an interesting one. I’m not sure. It could be how we’ve evolved over time. So I’ve got my own personal theories on this that are probably a bit wacky, but one of my theories is that we compare heights really well because thousands of years ago humans were pretty uniformly like our width, it was pretty… Well, not really uniform but height was the thing that you would look at to kind of size up someone that was coming at you or to size up prey and things like that. That was one theory I had around why height was so important. But yeah, why are we so bad at areas? I mean I think area is a more complicated thing to compare between two areas, right? So with height you think about it in kind of mathematical terms. We’re only looking at one dimension whereas area is like two dimensions and I think our brain is just… It’s got like another hundred thousand years of evolution before we can actually do a better job of it.
Kirill Eremenko: Yeah, and even if you like draw two circles side by side. Like one big one and one small one, the first thing if you try to compare them, the first thing your mind does is looks at the height of each circle and compares that. Whereas the area is, what is it pir squared, right? Or pid squared divided by four. The area is squared version or a squared analogy of the height and your brain needs to square that. And we’re not good at… We are linear thinkers, we are not quadratic or squared thinkers. We don’t think in square terms.
Isaac Reyes: Yeah, that’s a really good point. Yeah. I didn’t think of it that way. And absolutely. Yeah. We are very linear. Yeah, makes sense.
Kirill Eremenko: Interesting. Okay, cool. So avoid areas at all costs predominantly. Sometimes use bar charts and the best option is your simple line art.
Isaac Reyes: That’s right. And I’d add the dot plot as another great chart as well. So, like let’s say we were comparing the GPAs of like 10 students. What you can do is just have the X-axis is the name of each student, the Y-axis is average GPA. And then instead of having a bar, you can just have a dot. Right. And that’s another high performing chart type. And then the lowest on the scale is color.
Kirill Eremenko: Interesting. Why is that?
Isaac Reyes: Yeah. Yes. Let’s use the GPAs example. So we’ve got 10 students, their GPAs range from, I’ll use the Australian system here for some universities it’s like from one to four. And then, actually I think it’s zero to four actually. If you fail everything. And so instead of encoding, a 3.5 as a certain length bar and a four as another length bar, you could encode them as color. So…
Kirill Eremenko: Like a heat map.
Isaac Reyes: Yeah, like a heat map and we just suck at it, right? As in, it’s like… So you’re looking at one shade of like, I don’t know, greeny red and then you’re looking at some other color of red and then you ask someone, “Well, how much better did this person score than the other person?” just based on color and it’s just like, “I don’t know.” And so… And well it calls into question heat maps in general. It’s like, well, if we can’t convert a hue and saturation of colors back into numbers, then why do we use heat maps? And why don’t we just kind of stick bars on our maps, since we encode lengths better than colors. And I think heat maps really are still useful, but they’re more useful for outlier detection. Like if you just need to say that one color is really different to the other, that’s when they work well.
Kirill Eremenko: Or when you don’t need to see the magnitude of the effect but rather than just let go that trend or like overall gradients that like the east of the U.S. has larger snowfall or Central U.S. and so on. Just kind of like see which way to look at rather than measuring like the magnitude of the effect.
Isaac Reyes: Great point. Yeah, absolutely.
Kirill Eremenko: Okay. Okay. Very cool. I got another one for you. Can you walk us through… So that was the Cleveland McGill theory, which I found very exciting and will link to that in the show notes if somebody wants to read a bit more about that. Could you talk a bit about the Gestalt laws? I found that part of your training extremely insightful as well.
Isaac Reyes: Sure, sure. So, let’s say we’re producing a line chart and I’m using ggplot2 or I’m using Excel or something like that, the legend will almost always automatically be at the bottom of your chart. And the big problem with this is that your eyes have to track from each categorical component of the legend up to the line. And then you go, “Oh, okay. That word links with that line.” Doesn’t sound like much work but it definitely adds up, particularly if you’ve got like four or five lines on your line chart and it’s just extra cognitive work that your user has to do. And so some German psychologists in the 1920s discovered some laws that kind of govern the way humans see the world around us. And these laws are still taught in third year psych up to this day. And they really just discovered that humans see certain things that are arranged in certain ways or have certain design traits as being grouped together. And so, one of these is proximity. And so, humans just see things that are near each other as being close together.
Kirill Eremenko: Mm-hmm (affirmative). Can you give an example?
Isaac Reyes: And so, one of just the quick… Yeah, sure. So let’s just say, let me just think. All right, you [crosstalk 00:40:59].
Kirill Eremenko: It’s hard with no video, right? Just on a podcast.
Isaac Reyes: It is now, but I take this as an interesting challenge actually. So, let’s say that you walk into a crowded room.
Kirill Eremenko: Mm-hmm (affirmative).
Isaac Reyes: And rather than the people standing equidistant from each other, people have arranged themselves into groups of people talking, right? You naturally just form clusters based on geographic proximity.
Kirill Eremenko: And you might think those people know each other even though they might’ve just met.
Isaac Reyes: Exactly. Yeah.
Kirill Eremenko: Okay. Okay. Makes sense. Yeah. So, that’s one of the laws. Proximity. So what does that mean in terms of charts?
Isaac Reyes: Yeah. So in terms of charts it means, for things that you would like to have your audience associate together, it means you should put them near each other. So for the legend, you would put… Let’s say, I’ve got a line chart of the GDP growth of Australia, New Zealand, Fiji, Oceania countries. I’m not going to have Australia and Fiji and all those countries in my legend down the bottom, thus forcing my user to eye track up to the chart. I’m just going to put the word Australia at the end of the damn line and it just makes it so much easier to consume.
Kirill Eremenko: Makes total sense. I love how you said, like Bill Gates and Microsoft. What would be the one thing you’d ask Bill Gates if you had him for dinner? Had him over for dinner.
Isaac Reyes: Yeah. This is one thing I’m going to get myself in trouble for one day. But yeah, if I did find myself two minutes in an elevator with Bill Gates, that’s all I’ve got, I don’t have any more time, I probably… Yeah, wouldn’t be asking about world hunger or curing world diseases. I would probably just ask him, yeah, why is there no option in Excel for me to just right click and just put the legend next to the line. That’s the one thing I’ll ask.
Kirill Eremenko: Oh my gosh. Yeah, that definitely very valid point. Okay, so that’s the proximity Gestalt law. What’s the next one? And how do they work together?
Isaac Reyes: Sure. Yeah. So the next… Well they’re not really an order, but yeah, another Gestalt law is similarity. And so humans group things together in their brains if they have some sort of similar trait. And so, if I’ve just got a grid of like, I don’t know, 10 dots and some of those dots are red and some of those dots are gray and some of those dots are green, you’ll group the ones that are red and the ones that are green just because they’re similarly colored. And it goes for other traits. Like if I’ve got like 10 circles and three triangles, you will group the triangles together. And so, that’s the Gestalt law of similarity. Humans group things together that are similar. And so for our charts-
Kirill Eremenko: Which sounds really trivial, right? Like when you think about it. Why do they have to put them into laws like, isn’t this stuff really obvious?
Isaac Reyes: Yeah, absolutely it is. But it’s almost like they… It needs to be defined because it’s like we do forget it. And so for example, like on our charts, you might put… Going back to that example with the Oceania countries, I would make the word Australia the same color as the line for Australia. Thus increasing the similarity there. Now, yeah, in regards to it being obvious, some of the other laws were a bit less obvious I think. Yeah, those first two were… Can seem quite trivial, but yeah, there are a whole bunch of other Gestalt laws. Like there’s the Gestalt law of continuity, the Gestalt law of enclosure. Some of the others just weren’t immediately, yeah, obvious and might be a bit more interesting.
Kirill Eremenko: Okay. Okay, cool. So we’ll also link those in the show notes, not to go into too much bit of a tangent here, but sounds like really cool. Gestalt laws. How did you come across these things? Both the Cleveland and McGill, Gestalt laws and you’ve mentioned some other people already on the show. It was Edward Tufte and so on. This is a very unconventional approach to data visualization. Very scientific approach. Why did you decide to dive deep like this into data viz?
Isaac Reyes: Yeah, so I dove deep here because about five years ago, I’d been working as a data scientist for about three years. And before that I’d worked in data analytics and biostats. And when I was working in a commercial analytics company in Sydney, what I found as I kind of got more senior in my career, I started hearing the execs that we’d be presenting to, they would always have these murmurs and comments about how they would feel that data science teams was speaking a different language to the rest of the business. And I’d hear them complaining that they weren’t getting dividend or, and returns on their investments in data science teams. They’d talk about how data science teams were just focusing on the wrong problems. They weren’t driving value, all of this. And I started to really think about how, if we found good results, like how do we present them to management in a way that’s just super, super consumable. And I heard a comment from a colleague who said, he looked at a presentation from someone and said, “Oh wow, that’s so simple. Like even a C level executive could understand it.”
Kirill Eremenko: Like a five year old. Right?
Isaac Reyes: Exactly. And there is some truth to it. Like they have quite… These are the sharpest people in the room. No doubt. I’m not going to take away from that. But they sit through a lot of presentations and what comes with that is a very short attention span and low tolerance for like just a waffly presentation. And so I started to just think about, yeah, how can I make my own presentations short, snappy, concise, get to the point. And I started reading up on presentation theory and all different ways of telling stories and opening like theories like called bluff. It’s like bottom line up front. Like bring your takeaway right to the front, things like that. And I looked at what McKinsey and Bain are doing and all of this. And then I said, “Yeah, but this doesn’t fix the charts.” That’s when I started to read all the literature around data viz and charts.
Kirill Eremenko: Okay. Gotcha. Very interesting. So I can see that like your methods are really profound and thorough. What about your method of teaching this stuff? At StoryIQ, how do you teach these things when you go into a company. For some companies it’s even unfathomable to hires a coaching company to coach their data science team on data storytelling. So walk us through the process. Like once you have that initial conversation and you get the project, how do you teach the team?
Isaac Reyes: Sure. So we start with the fundamentals and what we find is that even talented data people have often not gone through the fundamentals of data visualization and data storytelling. So, we start with those four keys that I was mentioning before, we go through.
Kirill Eremenko: So just to recap, audience data, visuals and oh, we didn’t get to the fourth key. I think narrative is your fourth key.
Isaac Reyes: That’s right, yes, yes.
Kirill Eremenko: Maybe time to comment on that. What’s the narrative about?
Isaac Reyes: Sure. So if your audience data and visuals are good, then you’ll have a pretty good presentation or dashboard, but you’ll just be presenting facts. And so it’s the narrative that brings your story to life and actually gets people to figure out immediately, what is the insight here and what do I need to do? And a lot of presentations will just not get to the point. They don’t have a reason for their existence. To give an example here, let’s say I’ve got a chart on… Okay, I was looking at some data the other day actually on a U.S. teen’s favorite social media platform over time. And what this chart shows is that basically Snapchat is getting killed. Twitter’s getting killed, Facebook’s getting killed but Instagram’s gaining market share every year against all of the other competitors. And instead of putting the title of my chart as a U.S. teen’s favorite social media platform over time, which is really… Yeah, go ahead.
Kirill Eremenko: What you would normally do, right?
Isaac Reyes: But yeah, well… And I taught statistics for first year stat for three years. That’s exactly what I taught my students. They’d be like, “Okay, so what’s the chart title?” I’d say, “Well, what is your Y-axis? Okay, what is your X-axis? And then repeat whatever your Y axis and X axis represents and that’s your title.” And that’s how the whole academic community does it.
Kirill Eremenko: Yeah. Very true.
Isaac Reyes: And it’s fine for a formal setting like that but the problem is that it’s redundant because your audience can read, they can read what your Y-axis says, they can read what your X-axis says. So why just concatenate them together into your title. It makes no sense. And so instead we encourage people to go, “Okay, my title for my social media chart is going to be let’s pull marketing spend from Twitter and Facebook and re-diverted into Instagram in Q4.”
Kirill Eremenko: Very cool. So like an action, take call to action. Nice. Yeah. That’s like a narrative.
Isaac Reyes: Yeah. And that’s the narrative. And tying back to that other query about how do we go into companies, that’s really what brings us in because the buyers of the training who are actually signing off on this, like the managers just say thank goodness, right? Like we’re getting to a point where people sit in a meeting and just get to the point.
Kirill Eremenko: Yeah. Yeah. Straight up. Okay. Okay. That’s a really cool tip about, having the title build the narrative already. What other tips can you give on building a narrative into your data story or turning your presentation from just the raw facts into a data story?
Isaac Reyes: Yeah, I’d say probably the first one is… Okay, let’s say you are doing a PowerPoint presentation and let’s face it, whether we like it or not… Well, I’ve got a colleague who says power corrupts and PowerPoint corrupts absolutely.
Kirill Eremenko: I love PowerPoint. I don’t know. What is he talking about? That’s one of my favorites tools.
Isaac Reyes: Yeah. Look and Kirill, I like it too. I mean there’s a reason that most Ted talks are done PowerPoint. It is a great storytelling tool. And so, if you are working in PowerPoint, the first thing I’d say for great narrative and storytelling is to shut down PowerPoint and think about what is the structure of my story? How do I want to open here? How do I want to close? What’s my beginning, what’s my middle, what’s my end? What is my key takeaway? And maybe even get some post it notes out and say, “Okay, here’s all of the information I want to get across. What order do I want to get it across in?” So that’s one of the first things.
Isaac Reyes: And then just in terms of the narrative, being a little bit creative. As data scientists, we often will just fall back on the data and let the data speak for itself, all of that sort of thing. There’s really room for blowing the audience away with using more exciting narrative. So, as an example here, let’s say I was presenting data on the relative sales of the top three smartphone manufacturers over the last 10 years, let’s say. So we’ve got Apple, we’ve got Samsung, we’ve got Huawei. And I could present that in a way that just goes, “Okay, Apple led in Q1 of this year. And then Samsung gained share in this quarter.” And I could keep going that way but if I was presenting that to a less formal audience… Well let’s say it’s the business setting, but I’ve got scope for kind of being a bit creative, I might walk on stage and say in the beginning there was the iPhone and or I might say, “In the beginning Jobs created the iPhone and he saw that it was good.” And so you’ve got your chart that shows that the initial phases of Apple’s journey back in 2007. Next slide you might say, “And Samsung also saw that the iPhone was good. So they said let there be a galaxy.”
Isaac Reyes: And so you’re kind of tying in like a theme from a book or story or song from popular culture into your data story to just make it more entertaining for the audience. And then the third phase, now you’re showing the growth in Huawei. Next slide and you say, “And Huawei said, let us make a phone in their image after their likeness.”
Kirill Eremenko: Nice.
Isaac Reyes: Yeah. So, yeah, that’s just some ideas around… Yeah. Tying in more engaging language… That’s a fairly extreme example. I probably wouldn’t do that for a CFO or something, but it’s just an example of you’re tying in something from popular culture into your stories.
Kirill Eremenko: And ties back into the point that you’ve mentioned at the very start. You got to make sure people feel something right. By making it entertaining you, you’ll make sure people are having fun. Like I’ll give you an example. One time I was presenting a very kind of technical solution to a mailing campaign we did at an industry fund, like basically, like a pension fund in Australia, an industry fund and 200 people of executive managerial level… It was not a data science conference by any means. It was a conference for pension funds but we just had a really cool success there. And so the way I was like… Like I didn’t even know like the things you’re sharing here today, but kind of intuitively I felt that if I just present what we did in the way we did it, everybody’s going to fall asleep.
Kirill Eremenko: I shifted the paradigm. I love that. I love that phrase. I shifted the paradigm and I said I’m going to present it in a way that’s, I will start a presentation by saying, “I’m going to show you what it’s like to be a data scientist. Most of you don’t even have data science teams in your companies. And we had some massive success. We’re able to reduce like a backlog from 45 days down to like three days using data science. Compared to like this year compared to other years. And what I’m going to show is like how data scientist thinks and what kind of work we do.” And so I started presentation like that and already got them on the edges of their seats. Like feeling that, “Oh, I’m going to be like a data scientist.” And so then I walk them through the technological parts aspect in a simplified manner, but they were already hooked rather than just listening to this boring stuff.
Kirill Eremenko: They felt like them living this story, like when you watch a movie, you associated with the character, right? So here I got them to associate with my journey to associate with the decisions and fears and frustrations that I went through and like they felt really emotional at the end of the day. It was all, I don’t know, maybe one of the only talks that actually goes like an ovation at the end.
Isaac Reyes: Love that.
Kirill Eremenko: Like everybody was super pumped about it and like our company got so many questions after like, how do you build data science team? What is data science? And so on. Because everybody was like hook. They loved the whole journey.
Isaac Reyes: That’s it Kirill. And it’s because you got them to feel something. When you talked about your own frustrations and successes in data science, that’s coming from the heart and they would have felt it as well as you relayed the stories. That’s great.
Kirill Eremenko: Fantastic. Fantastic. Well we’re actually almost out of time, Isaac. Thank you so much for coming on the show. Like I have a lot of fun and I think we could keep talking about this forever. There’s so many tips and hacks about all this stuff. So before I let you go, is there any like materials you might have that are publicly available on StoryIQ that people can reference and like learn more and also where is the best places to get in touch with you on find you if somebody wants to learn or ask you some questions or maybe there’s some companies that are in need of getting this training into their organization?
Isaac Reyes: Sure, no worries. Yeah, so I guess as a first step for next steps with learning, I would say my co-founder Dominic, his Ted talk just went up on YouTube literally last week. So if you go into YouTube and type in Dominic Bohan, that’s B-O-H-A-N, Ted talk, you’ll see his Ted talk on data stories. And then in terms of getting in touch, I’m nowhere near my LinkedIn limit, so feel free to just add me up on LinkedIn, happy to connect with any listeners, and then you can reach us through StoryIQ.com
Kirill Eremenko: Fantastic. Fantastic. Thank you once again so much and one final question for you. What’s a book that’s changed the course of your career or the trajectory of your life that you can recommend to all listeners?
Isaac Reyes: Yeah, great. I would recommend Stephen Few’s book, Show Me the Numbers. I normally don’t recommend it when people ask me for a book, but since this is a data science audience and more technical, we’re used to getting through really dry heavy books and so you will be able to get through this one and just packed with so much applicable date viz and data storytelling knowledge. So yeah, Stephen Few, Show Me the Numbers.
Kirill Eremenko: Fantastic. Had the book mentioned a couple of times in the podcast already, so…
Isaac Reyes: Oh, awesome.
Kirill Eremenko: Yeah, make sure to check it out guys and girls, Stephen Few, Show Me the Numbers. On that note, Isaac, once again, thanks so much for coming on the show. Really enjoyed our chat and I look forward to connecting in person one day.
Isaac Reyes: Thank you so much, Kirill. I really appreciate the research you put into this interview as well. It was just a pleasure talking to someone who’s really done their pre-research, so, thanks so much.
Kirill Eremenko: Cheers, thank you.
Kirill Eremenko: Thank you ladies and gentlemen for being a part of our conversation today with Isaac, I definitely enjoyed myself and got to learn a lot of new things about data visualization and data storytelling. It’s always a great pleasure to talk to somebody who’s dedicated their whole career in focusing in one space and developing that and not only developing themselves, but also teaching other people and coaching others. My favorite takeaway from this podcast was the Cleveland and McGill ranking of Elementary Perceptual Task Scale. You never actually think about it, some of these things might feel intuitive sometimes, but until you have scientific backing until you have this evidence that people perceive line charts better than bar charts, bar charts better than stacked bar charts, stacked bar charts better than area related charts, area related charts better than color related charts.
Kirill Eremenko: Until you have that scientific explanation of it, you’re still prone to making mistakes. I think now that we’ve covered this and especially if you go and research this a bit further, whether it’s a Cleveland, McGill system or the Gestalt laws, I think that can give you a really strong boost in your visualizations. And then I think that’s what this podcast was. I get invitation to explore this space further. Of course we couldn’t have covered everything in an hour, but already you can see the value and if you do then I highly encourage you to explore this space further. We’ll link to all the materials necessary for you to proceed in the show notes as usual. On that note, make sure to connect with Isaac. He’s on LinkedIn and also on a storyiq.com, that’s where you can get in touch with him. Whether you want ask him some questions about visualization or your company is looking to get expert storytellers or develop expert storytellers in house and Isaac and his team can definitely help with that.
Kirill Eremenko: As usual, all of the materials mentioned on this podcast as well as the transcript are available in the show notes, which are located at www.superdatascience.com/329. That’s www.superdatascience.com/329. We spoke with Isaac after the podcast and there’s a possibility that him or somebody else from his team will be joining us at DataScienceGO 2020 in November. That’s not 100% yet, but it gives you a feel for the caliber of people that we’re inviting to speak at DataScienceGO, so if you’re able to make it, would love to have you there. And as usual, if you know anybody in the space of data science who is interested in storytelling, who needs to get better or is trying and striving to get better at storytelling, then send them this episode. It’s very easy to share. Just send the link www.superdatascience.com/329 and maybe you will change the trajectory of somebody’s career and help them become better at data visualization.
Kirill Eremenko: And on that note, thank you so much for being here, my friends. I can’t wait to see you back here next time. And until then, happy analyzing.