SDS 231: Data Visualizers: The Storytellers of Data Science

Podcast Guest: Mollie Pettit

January 31, 2019

Data is used for many things in the private sector. But when can we use data for good?

Mollie Pettit is a data visualizer who takes passion in not only telling significant stories with data, but using it to make a difference. Tune in to hear how!
ABOUT MOLLIE PETTIT
Mollie Pettit is freelance data visualizer based in Chicago. She focuses mainly on interactive data visualization projects. She previously worked for Datascope Analytics for several years before owning and operating her own LLC for data science and data visualization. She speaks at several Data Science Conferences a year.
OVERVIEW
The topic this week is an exciting and unique one. When you get data…how can you convey what it’s saying? And how can you show people that it matters?
Data visualization is the storytelling of data science. You can mine data, scrape it, have it all available, but how do you know what the data actually means? That’s where data visualizers come in. Mollie Pettit is a prolific data visualizer who works specifically in interactive data visualizations as a freelancer, working with clients such as the Illinois state police to gather and visualize data about the possibility of racial profiling in traffic stops.
We dive in to Mollie’s work history, her time with Datascope Analytics in Chicago before moving into freelance work, and her many speaking engagements at conferences throughout the year, before moving into some project specifics and her case study.
Mollie’s case study on the possibility of racial bias in Illinois traffic stops is a fascinating one. She looked at the stop rate between populations of races. However, that doesn’t tell the whole story. To better visualize the data gathered, she pointed out places where the accuracy can falter when various variables are taken into account (the percentage of a population that drives, the accuracy of a census response, and other factors). To combat this, Mollie went deeper looking at further data such as what’s the percentage of stops with black drivers that resulted in a search to get an idea for the possibility of true racial disparities within the data. Ultimately she concluded that black and Hispanic drivers are searched 3x more than white drivers while black drivers have a lower “hit rate” (possession of contraband) than white and Hispanic drivers.
Mollie is a big believer in learning project-to-project. She believes in picking a field in data science based on the projects you want to do and not the other way around. Likewise, she’s big on finding the tools that fit your project needs and experimenting with how you tell the story of your data.
IN THIS EPISODE YOU WILL LEARN:
  • The benefits of attending data science conferences. [07:14] 
  • The differences between data scientist and data visualizer. [08:32] 
  • What tools are best for your project? [11:40] 
  • What is D3.js? [17:52] 
  • D3.js’s place in the future of visualization tools. [26:11] 
  • Mollie’s case study on racial bias in Illinois traffic stops. [33:07] 
  • Finding significance in statistics. [50:00] 
  • The ethical application of data. [53:00] 
  • ”It’s hard to fix problems when you don’t know what the problems are. And it’s hard to know what the problem is when you don’t have data.” [54:50] 
ITEMS MENTIONED IN THIS PODCAST:
FOLLOW MOLLIE
EPISODE TRANSCRIPT

Podcast Transcript

Kirill Eremenko: This is episode number 231 with Data Visualizer, Mollie Pettit.

Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. Each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
Kirill Eremenko: Welcome back to SuperDataScience Podcast, ladies and gentlemen. Super excited to have you on the show today. Today we got a very exciting, lively, and energetic guest joining us for the episode, Mollie Pettit. Mollie was one of our speakers at DataScienceGo 2018. She did a fantastic job. The audience totally loved her presentation and what you need to know about Mollie is that she is a data visualizer. A professional data visualizer. And right now you might be wondering why am I stressing that she’s a data visualizer and how is that different to a data scientist. Well, in this episode you will find out exactly why and how those two terms are slightly different.
Kirill Eremenko: Also in this podcast we will talk a lot about D3.js a JavaScript Library for creating outstanding, phenomenal, mind blowing visualizations for data science projects. You’ll find out exactly when to use D3, exactly when not to use D3, and what are the advantages and disadvantages of this tool. Mollie uses D3 quite a lot.
Kirill Eremenko: Also in this podcast you’ll get a case study. We’ll discuss one of Mollie’s case studies which is about Illinois traffic stops and police officers pull over people and what kind of biases may exist there, may not, and how she went about exploring it. I will provide a link where you can actually look at this project as you listen to this podcast or after you listen to this podcast.
Kirill Eremenko: And finally, we’ll talk about using data science for good and how Mollie participates in those projects and how you can get involved as well. So, a podcast saturated with lots of topics, lots of interesting things that we’re going to discuss. Can’t wait for you to check it out. Without further ado, I bring to you Mollie Pettit, a professional data visualizer.
Kirill Eremenko: Welcome to the SuperDataScience Podcast, ladies and gentlemen. Today I’ve got a super exciting guest on the show with us, Mollie Pettit. Mollie, how are you doing today?
Mollie Pettit: I’m doing great. How about you?
Kirill Eremenko: Doing well as well and how’s Chicago these days? That’s where you are today, right?
Mollie Pettit: Yeah, that’s right. Chicago is great. A little bit chilly right now opposed to where you’re at. It just started to snow.
Kirill Eremenko: Wonderful. Wonderful. So, you haven’t always been in Chicago, right? You moved there a few years ago.
Mollie Pettit: Yeah, that’s right. I moved here three years ago. Before that I was actually living in Abu Dhabi for a few years and before that California for grad school.
Kirill Eremenko: Wow. Such a crazy story. ‘Cause we met at DataScienceGo and as much as I wasn’t able to attend your talk, but I just watched it on the DataScienceGo recordings and you really have a crazy story, like how you went into geology and then visualization and things like that. So, I’m really excited to dive into this and learn about and share with all our listeners.
Mollie Pettit: Sure.
Kirill Eremenko: Before we get started on that, tell us a bit about who is Mollie Pettit. Like, what would you say to somebody you meet for the first time? How would you describe what you do professionally right now?
Mollie Pettit: Who is Mollie Pettit? Yeah. So, I am a freelancer. I do data science and data visualization. I do a variety of projects. Nowadays my focus is mostly on doing a lot of interactive data visualization projects, but I still do … but a lot of it sometimes involves analysis before the visualization. Then others are just straight up data science analysis projects. So, that’s a lot of what I do.
Kirill Eremenko: Okay. Okay. Wonderful. So, you moved to Chicago for your job, is that correct?
Mollie Pettit: I did, yeah.
Kirill Eremenko: Okay. So, do you mind sharing with us what company do you work for right now?
Mollie Pettit: Sure. So, right now I actually just work for myself. I started a one person LLC, which is not creatively named Mollie Pettit, LLC for the moment. I actually originally moved to Chicago to work with Datascope Analytics who I worked with for a couple years, not the data science consultancy, which has actually since then been bought by, well acquired by IDEO. So, the Datascope Analytics became the data science team at IDEO and it’s growing.
Kirill Eremenko: Okay. Okay. Cool. I didn’t know that you started your own business. How’s that going? How’s the Mollie Pettit, LLC going?
Mollie Pettit: Yeah, it’s going well. There’s a few things that are about to be live which are exciting and kind of more projects coming up, about to be started. It’s going well. It’s nice. It’s enjoyable to have the freedom to kind of have the hours that you want and work from where you like. I like that.
Kirill Eremenko: In the meantime you’ve also been very busy attending and speaking at conferences, right? How many did you attend this year? It’s crazy. Like, you were at DataScienceGo, Data Science Salon, the Tapestry, is it what tens, hundreds? How many did you attend?
Mollie Pettit: No. Not quite that high. I think I’ve been to maybe three or four this year. And I’m about to go to another one. So, I spoke at Data Science Law in New York, Data Science Law in Miami, DataScienceGo. I feel like maybe I’m missing something, but I can’t remember. Like, I spoke at Data Science Law in LA, but that was the end of last year. Then I’ll be going to Tapestry in a couple weeks. So, I’m looking forward to that.
Kirill Eremenko: Okay. Nice. Nice, nice. What inspires you to speak at conferences? It obviously takes a lot of your time? Why do you do it?
Mollie Pettit: I think there’s a lot of really nice things about speaking at a conference. I think one, it gives you the ability to tell people about something that you’re working on, something that you’re really excited about. It gives you really good opportunity to meet a lot of other people who are also in data science or in data visualization or are looking to get into it. A lot of really interesting conversations. You get to learn about what other people are doing. Yeah, I think those are a lot of the reasons that I enjoy. Also, there’s that added benefit of just getting to travel. Travel around.
Kirill Eremenko: Yeah, that’s true. That’s true. And kind of like what you said, it broadens your horizons, helps you not just think outside the box, but sometimes in order to think outside the box we need some external input which is already outside the box in order to like start thinking like that.
Mollie Pettit: Sure. Yeah.
Kirill Eremenko: Cool. Well, I’m very excited to talk about, probably to start our conversation with what we were debating about just before the podcast: data science and visualization. Are these the same thing? Or two adjacent fields? I’d totally love and appreciate your opinion on that. Can you share with more of us why do you think or why is your position that data science and visualization are actually quite different areas? As far as I understand.
Mollie Pettit: I think there’s overlap and I think that I would say that data visualization is an important part of data science. But I think that when you start getting into interactive kind of front end of data visualization, which is a lot of what I do now, the reason that it is a bit different is because it requires the use of different tools and languages. For example, when I do data science I tend to use Python, whereas when I’m doing front end data visualization I use D3.js, which is a JavaScript Library.
Mollie Pettit: So, there is different languages being used and I think that there’s a lot of overlap. If there was a Venn diagram, they would definitely cross in the center, right?
Kirill Eremenko: Uh-huh. (affirmative).
Mollie Pettit: Because something that’s really great about data visualization is once you’ve done data science and you have the interesting insights and you have these things that you want to then get across to an audience, which could be a massive public audience or perhaps it’s just an internal audience, data visualization is something that can then be used to tell that story really well. I think that having a data science background is very helpful in doing data visualization. But when you’re doing data visualization versus data science, you have just different focuses. With data science you’re trying to really uncover these interesting insights and if you’re doing EDA, for example. Whereas with data visualization, you are trying to display those insights in a way that it’s very easy to understand.
Kirill Eremenko: Gotcha. What’s EDA?
Mollie Pettit: Oh, exploratory data analysis.
Kirill Eremenko: Uh-huh. (affirmative). Okay. Cool.
Mollie Pettit: All right.
Kirill Eremenko: That’s okay. I actually also identify that visualization can be used for two things. That you can use it for, I call it visual data mining, VDM.
Mollie Pettit: Oh, for sure.
Kirill Eremenko: And the other thing is obviously presenting your insights and creating these beautiful visualizations.
Mollie Pettit: Yeah.
Kirill Eremenko: And I like how in your talk you mention what D3 is good at and before you describe what it’s good at you actually said what it’s not good for. One of the things it’s not ideal for is when you want to do that exploratory data analysis. When you want to do quickly put something together, identify what are the insights, what are the trends. It doesn’t have to be attractive. It doesn’t have to be super presentable. Just get some quick insights from the data.
Mollie Pettit: Yeah. Yes, exactly. Like you said, there are multiple reasons to do data viz and some of them are much more tied into data science like using data visualization for this exploratory aspect. Were you wanting me to get into what D3 is and what it’s not good for and what it is good for?
Kirill Eremenko: That’s a good point. Yeah. Let’s do that, because I think we’ve heard D3 on the podcast before by some speakers especially with Nadieh coming to the podcast. We talk about Nadieh Bremer here. Yeah, give us a guide. It looks like D3’s a tool that’s used most often. Is that about right?
Mollie Pettit: Yeah, it’s a really popular tool and there’s a few reasons for that. D3’s a little bit more complicated. It has a more steep learning curve than some other tools that someone might use. For instance, people sometimes might use a wrapper that will allow them to still use Python to create some of these visualizations, but the benefit of using D3 itself is that it is really flexible and customizable and you can make these visualizations do exactly what you want with a lot of different interactions. Hover and click and various things like that. So, it’s extremely customizable. It lets you tell the story that you want to tell.
Kirill Eremenko: I love D3 myself. I tried it when I was back in Deloitte we had an option of picking a tool for a project and we didn’t end up using D3, because it was too complex, but nevertheless, my director and I we decided to have a challenge who can learn D3 the best in like two or three weeks it was. And we had to come up with a visualization. It was really fun. D3 is kind of like working with the webpage. 
Mollie Pettit: Yeah.
Kirill Eremenko: On a webpage you right click and we’ve all probably done this back then. You right click and click “view page source” and you look into the HTML and see the sets and so on. So, D3 actually manipulates all of that dynamically to place different objects on the screen and so it’s really cool because it’s so structured. Even though it’s a programming language, it’s so structured in the way that HTML is so structured. I found it fascinating. You’re right, it has steep learning curve, but it’s so fun to try to do that because instantly you get feedback, right? You see a rectangle on your screen and then all of a sudden it turns into a circle. It all happens dynamically that whole library.
Mollie Pettit: Sure.
Kirill Eremenko: So, smooth. I like the smoothness of it.
Mollie Pettit: Yeah, it is. It’s a steeper learning curve, but once you’re gotten over that hump, you’re able to do so much.
Kirill Eremenko: Yeah. That’s true. True. When did you first encounter D3?
Mollie Pettit: Actually, I have a question for you. Did you enter that challenge and how did you do?
Kirill Eremenko: Oh, yeah. It was just my director and I and nobody else wanted to join because it was too complex apparently or something. He was visualizing some client data about trains or something like that and I was visualizing … see what I did was I took our team, it was like we had 15 people on the team or 12 or something, and I got the data internally about the billable hours, how much hours they’re billing and how much hours they’re spending on training and how much hours they’re spending on something else, like admin work. And I put those into like … and I called it the Pie Factory, because I created a pie chart for every person. And you could like click on it and all this information would pop up. You know, what clients they’ve been working on, how much money they’ve billed. You had to really put into perspective how much money everybody’s bringing into the business.
Kirill Eremenko: Personally, I think I won, because I finished mine on time even though it was simpler than his. I finished on time, but his was more complex and it was very … also had some cool dynamic visualizations in there. It was great fun in there. This was something I found in your talk very interesting. At the end actually, you got some questions and one of the questions was: how do you learn the tools? How do you choose what to learn? And what you said was that you don’t actually pick the tools you want to learn you pick the project you want to do. Like a PET project or a work project and then you find along the way you just decide or you see what you need, what tools you need to accomplish the task at hand and you actually go and learn those tools as you’re doing a project. I thought that was amazing advice.
Mollie Pettit: Yeah. I think really often people when they get a new project or task that they’re going to try to tackle they think about, “Okay. Well, what do I know that can help me tackle this?” But I think it’s nice and better to go at it in terms of what’s the best way that this can be tackled? Do I know how to do that yet? If I don’t, maybe is this a good opportunity for me to learn that thing to tackle this problem?
Kirill Eremenko: Yeah. And you also mention in your talk that … what was that company, Datascope that you worked for?
Mollie Pettit: Datascope, yep.
Kirill Eremenko: Yeah, in Datascope that they had their philosophy. It’s if you have a project, you need to use the best tool for that project as opposed to a tool that might be good enough that you know really well. So, even if you know five tools that might be good enough, maybe you should use the one that’s the best. If you don’t know it, doesn’t matter. Go learn it. I love that.
Mollie Pettit: Yeah. It’s a good opportunity to learn it. That was something I really enjoyed about working at that company. I think that it’s easy to have the other mentality of I’m gonna do what I know and I think working there really kind of got that out of me and got me to a point where I felt way more comfortable being like, “Oh, yeah. I don’t know this thing. Let’s figure it out.”
Kirill Eremenko: Yeah and that should be the mentality of a data scientist, right?
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: Like, constant curiosity. Anyway, let’s jump back to D3. So, what is D3? What does the abbreviation stand for? What is triple D?
Mollie Pettit: Yeah, D3 stands for data driven documents.
Kirill Eremenko: Okay and what does that mean?
Mollie Pettit: Data driven documents. Data is what you’re going, you know, the data that you’re going to be putting into some sort of visualization. Documents is your web document. So, your website. Driven would just be the act of I guess putting that into the website. So, using data to make stuff on the web.
Kirill Eremenko: Nice. Nice.
Mollie Pettit: Is basically what that means, yeah.
Kirill Eremenko: So, when was the first time you encountered D3?
Mollie Pettit: I think the first time I encountered D3 was early on at Datascope, actually. So, when I first [crosstalk 00:18:41].
Kirill Eremenko: Was it a project?
Mollie Pettit: No. So, when I first started at Datascope, they used to have this set up where when somebody was new at the company rather than going right onto a client project, they would have an opportunity to do a PET project. They would dabble, they would kind of slowly get involved in client projects, but this kind of gave them an opportunity to get settled then to learn something new that they wanted to learn. So, when I first started I decided to do a PET project that was a network app, this web app that would be a network diagram of Star Trek characters, because I am a Trekie. So, I scraped every single Star Trek episode transcript and movie transcript and put together this app where people could select any combination of episodes and movies and hit “engage” and a network diagram would appear using D3 that would show the connections between the various characters in that selection of episodes and movies.
Kirill Eremenko: Wow. Wow. Very nerdy.
Mollie Pettit: Very nerdy, yeah. And then once that diagram appeared people could click on a node to focus on it and have it highlighted and its connections and choose particular characters they were interested in. So, it was fun.
Kirill Eremenko: So, how long did that take you?
Mollie Pettit: The actual visualization part I’m not sure. The whole project took a couple of months, but that was … I mean, I was not just doing that. There were other things happening at the same time. That also though involved a lot of things in preparing the data to be visualized. Like the scraping of all the transcripts and getting everything set up in such a way that it would be usable in a visualization. So, there was a lot of different steps for that project.
Kirill Eremenko: Gotcha. What I love about approaching that is by the end of those, it sounds like quite a lot, a few months, by the end of those few months, you have a super brand new skill. You might not be the expert at D3, but you know that there’s certain things that you know how to do. Like, in three months you might be 70% up to speed or 80% up to speed with what D3 is all about and how to use it. So, you build up so much confidence in that time, wouldn’t you say?
Mollie Pettit: Yeah. Yeah, for sure. It was definitely a great introduction to D3 and also I mean, I hadn’t even actually done a huge amount of web scraping at that point, so that also was a very good crash course in that, because these were not straightforward set up sites. They were very inconsistent. So, there was a lot of exceptions to account for.
Kirill Eremenko: Okay. Gotcha.
Mollie Pettit: So, that was good to do. There’s a lot of different things that I had to do for this project, so I learned a lot along the way.
Kirill Eremenko: You were kind of like in both fields. You are both in data science and you’ve done data science work and you’re in visualization. As I understand you’re doing more and more visualization work now.
Mollie Pettit: Yes.
Kirill Eremenko: Why the shift? Why did you decide to move away from the data science, I guess the web scraping, the algorithms and so on and move more into the space of visualization?
Mollie Pettit: It’s not because I don’t enjoy data science, I do. And I still enjoy that I still get to do it when I’m doing data visualization projects sometimes and I like having the occasional straight up data science project, but I think the reason I like to focus on data visualization is honestly I just find it really fun. I really enjoy creating this ability to tell stories really well. An ability to highlight things that are really interesting and also coding when you’re creating something in D3 for instance, you know, you write a few more lines of code and you hit “refresh” and you get to see this new thing that you added. So, that’s really nice too.
Kirill Eremenko: Yeah. More room for … it’s kind of like quicker feedback. You get the results faster.
Mollie Pettit: Yeah. Yeah.
Kirill Eremenko: Rather than waiting a few months. Okay. All right. Would you recommend this path to data scientists? Maybe listeners who are tuning into this podcast who are not yet sure if they want to do data science, visualization, how would somebody make up their mind of which way they want to go?
Mollie Pettit: Pick a project and do it. That’s the best way I can ever think of to figure out if you like something. I think that if people really enjoy kind of the visual and design aspect but still want to use some data science I think in order to understand which way you want to go, you really just have to pick some projects and do them. I think that’s how I learned what direction I wanted to go every kind of step of the way is I just kept doing things. I kept learning new things and once I started kind of getting into D3 and visualization I realized I really loved it. I started … well, I, while still at Datascope, started asking to be on more visualization projects and by doing more and more of them I realized I just really liked that and I kind of started focusing more on that direction. I think the way to know if you like something is to do it.
Kirill Eremenko: Gotcha. I can see that D3 and from my experience with it and from the visualizations I’ve seen … there’s, by the way, there’s a really cool library by Michael Bostock. It’s called, what is it called? Blocks. Bl.ocks. Or something like that?
Mollie Pettit: Oh, yeah. The website. Yeah.
Kirill Eremenko: Blocks.org. Like that, but it’s like bl.ocks.org or something like that.
Mollie Pettit: Yeah.
Kirill Eremenko: We’ll put it in the show notes. There’s some really amazing D3 visualization and templates that you can use and copy and adjust and just explore all open source. So, I can see that D3 is way ahead in terms of the capabilities than other tools. Like, even Tableau, which I love dearly, great tool, but it’s more agile. It’s more drag and drop. It allows you to create visualization that are fast, but at the same time even though it has a lot of flexibility, nowhere near to what D3 offers. The price you pay in D3 is you have to code. You have to design your visualization –
Mollie Pettit: Right, yeah.
Kirill Eremenko: – very carefully. So, what I want to ask you is, what do you see in the future? Do you see that D3 has a future? It’s been around for a couple of years and it’s had a really interesting path, but do you see other tools edging it out and more people moving to tools like Tableau and more drag and drop, self-serve analytics type of tools? Or do you see that there is a market, there’s a place for more sophisticated tool like D3 in the space of data visualization?
Mollie Pettit: Yeah. I think that there’s room for both and I think they have different applications and different reasons to be used. Like you said Tableau is really great and something that’s nice about it is you don’t have to learn a whole language. Yeah, you don’t have to code. You can very quickly make some really beautiful things. Because you’re not actually coding though, you have less control. So, if you’re trying to do something very complex, you may eventually kind of hit a roadblock and hit the end of the capabilities of being able to customize the way you want. D3 is more complicated to learn and is harder to learn, but it is much more customizable and flexible and you are able to customize things in the way that you want. You don’t really hit these roadblocks that you might hit with Tableau.
Mollie Pettit: So, I think that they both are very great and they have different strengths and different weaknesses. So, I think they’re both going to stick around.
Kirill Eremenko: That’s good, because in one of the previous podcasts I had one of the guests made a good comment that it’s important to understand also what is the future of a tool before you go and learn it. You know, like is this tool going to be around?
Mollie Pettit: Sure.
Kirill Eremenko: And by the sound of it, D3 is going to be around. But by the way –
Mollie Pettit: Yep. That’s how I –
Kirill Eremenko: – how is the community of D3?
Mollie Pettit: Sorry. How’s the community?
Kirill Eremenko: Yeah, is there a community in D3? People, like when you have a question or somebody has questions, do they post it online and is it easy to get answers and help and guidance?
Mollie Pettit: Oh, yeah. That’s a good question. So, one thing that’s really nice is Bl.ocks, which you’ve mentioned. Which is a lot of times if you have something that you’re trying to make, especially when you’re first starting, you can often find an example for it in Bl.ocks. So, what Bl.ocks is really nice for is you not only get to see this interactive visualization right in front of you, but the whole code is right below it. There’s also, let me make sure I have this right, blocksbuilder.org. And something that’s nice about blocksbuilder.org is you can access any of the posts that are posted on Bl.ocks, but it allows you to write there, edit them, and what that’s good for is –
Kirill Eremenko: Nice.
Mollie Pettit: Yeah. What that’s good for is, let’s say you’re looking at some code and you’re like, “Hm. I’m not sure exactly what this line does.” And you can edit it and see if you break it or see if the color does change. You know, you can do things straight in there to very quickly get an understanding of what things are doing. So, that’s really nice and then also I don’t know if you’ve actually, have you heard of Observable?
Kirill Eremenko: Nope. No, I haven’t heard of it. What is that?
Mollie Pettit: So, Observable. It’s kind of like a Jupiter Notebook, but for D3.
Kirill Eremenko: Oh, nice.
Mollie Pettit: So, Observable is a website and it was also started by Mike Bostock. But yeah, it has that kind of set up where you can easily kind of like tell a story, but then within that story have code and have a working, interactive visualization in the middle of it. Very much like a Jupiter Notebook, but specific for kind of front end interactive stuff.
Kirill Eremenko: Wow.
Mollie Pettit: Yeah, so I think that there are some, like you can definitely find some D3 answers on Stack Overflow, but I think something that’s really nice about D3 is you can also just find a lot of examples. So, even if you can’t necessarily find someone who’s asked the same question, you can probably find someone who’s done the thing you’re trying to do.
Kirill Eremenko: Gotcha. Gotcha.
Mollie Pettit: Yeah.
Kirill Eremenko: There’s even a conference in San Francisco about D3, right?
Mollie Pettit: There is, yeah. There’s D3.unconf. The last one was last September. It didn’t happen this year. But it will … I’m pretty sure it’s gonna be happening next year. I’m not involved in planning that, so I don’t have specific details. As far as a community, there’s also a D3 Slack that I’m a part of that has upwards of, let’s see, I’m looking at it now, about four thousand members. There’s a help section in there. So, sometimes people will post in there and say, “I’m trying to do this thing, but it’s not working. How do I do it?” And people will respond there.
Kirill Eremenko: Gotcha. Gotcha. What’s an unconference, by the way, while we touch on this?
Mollie Pettit: That’s a good question. I can tell you a little bit about what it was. So, it’s a lot less, at least this particular unconf, it wasn’t full of talks. So, there was only one or two talks. I believe they were done by Nadiah as well as, Nadiah Bremer as well as Sarah Drasner. Those were at the very beginning of the unconf. The rest of it were these discussion sessions where there would be maybe four different discussions going on at the same time and you would choose a room to go to and you would discuss that topic. Sometimes that would involve someone being at a computer and kind of pulling up things that people were talking about that were either D3 related or just visualization related. It was just kind of these guided conversations and a way for people to kind of meet other people who were doing a similar thing. So, it was less talks and more discussion.
Kirill Eremenko: Wow, interesting. And how big was this discussion? Was it like hundreds of people?
Mollie Pettit: Not in each discussion, no. I’m not even … I’m trying to think how many people were there total. Probably within a couple hundred total and each discussion probably had upwards of 50 or so people in it.
Kirill Eremenko: Interesting. Interesting. I heard from unconferences first from Pablos Holman who was at DataScienceGo as well. I have never been to one, but I find it’s a quite interesting concept. I gotta check it out.
Mollie Pettit: Yeah. I really enjoyed it. It was my first unconf, but it was great.
Kirill Eremenko: Okay. All right. Cool. Well, thanks for that overview of D3 and the future. I hope all the listeners are pretty excited and I can personally vouch for it. It’s a really fun experience. I don’t use it anymore, but what I learned in the process of learning it really was fascinating and helped me even improve the way I understand websites. The way I understand interactivity and what’s possible with visualization.
Mollie Pettit: Yes. It definitely improves that knowledge, for sure.
Kirill Eremenko: And next I wanted to talk a bit about the case study that you shared with us at DataScienceGo.
Mollie Pettit: Oh, sure.
Kirill Eremenko: The case study of Illinois traffic. I found that very interesting how like policemen pull over people and you were actually investigating whether there’s bias, specifically racial bias and how police officers pick the cars that they pull over, the cars that they search, and then the citings that they hand out. That was a really cool project. How did that all start?
Mollie Pettit: The way that started was I went to a meeting and I don’t remember what the meeting was called. This was I think a little over a year ago. This meeting was for people in tech who wanted to use their knowledge and use what they could do to help in some way. So, the people that were at this meeting were people in tech who wanted to find some way to volunteer and help out and then also organizations that wanted that help.
Mollie Pettit: So, at that meeting I ended up meeting Karen Sheley or Shelley. I would like to check on that. So, at that meeting I met Karen Sheley who works for the ACLU. She had mentioned that she really needed or at least really wanted to have some sort of a data contact, because they were trying to put together a traffic stops report that would just go through the analysis of this traffic stops data. Who police is pulling over and searching and citing, et cetera, in different law enforcement agencies across Illinois. What they were really just looking for is somebody that they could call on for help. Like, if they had questions about the analysis or the data. I was there with a colleague and we were like, “We can do more than that. We can help with the analysis.” The people who were doing the analysis at the time, it was mostly some simple Excel stuff that was being done. We wanted to kind of help them do something more complicated with this so that they could have an even more in depth report.
Mollie Pettit: So, we worked with them to do this analysis and look at the search rates, et cetera across different agencies. Then it eventually evolved and I started working with them on a website that would walk people through this analysis that had been done and they could look at these data visualizations that would be interactive. They could choose different agencies. They could click on things and get more information and it would really tell the story of what these racial disparities in traffic stops look like in different agencies.
Kirill Eremenko: Gotcha. You mentioned that the website at the time when we were recording this is not yet live, but it’s about to go up. So, by the time recording is live, it’s definitely out there already. What’s the website? Where can people go maybe right now while they’re listening to this podcast?
Mollie Pettit: Yeah. If you go to illinoistrafficstops.com, you’ll be able to find it.
Kirill Eremenko: Nice. Nice. So, it’s similar to the visualizations that you shared at DataScienceGo, right?
Mollie Pettit: Yes. I shared some of the visualizations at DataScienceGo. Yeah, I think I had a bit of a Chicago focus, but on this particular website you can look at any of the agencies.
Kirill Eremenko: Okay. Fantastic. All right. So, that’s how you guys met and that’s what you helped them or decided to help them out with. So, how did the project go? So, like you got this idea, then what happened? Like, was this part of … you obviously had a job at the same time. So, this was like a free time project that you were doing?
Mollie Pettit: Yeah. So, when I first started it was. It was a free time project. It was something I was doing when there was time. But actually something that was really nice is I was able to incorporate it into Datascope at the time. So, as a consultancy, sometimes there is downtime, right?
Kirill Eremenko: Uh-huh. (affirmative). Yep.
Mollie Pettit: Sometimes you just finished a project and you’re gonna start another project in a week and you’re waiting for that to start. So, I convinced everyone there that we kind of bring this in internally and when people had a down week if they wanted to work on this, they could. So, for a little bit it was kind of an internal project at Datascope. That was really great, because then we were able to utilize this time that would have been downtime anyway to do something that we thought was really exciting to work on and important. After the acquisition, one of my former colleagues at Datascope and I kind of kept up with it. Chris Kucharczyk. So, him and I have been the main people kind of working on it this past year. Then more recently a good friend of mine, Alex Alleavitch came on as a front end engineer.
Kirill Eremenko: Okay. Gotcha. All right. So, now we’ve got the picture painted and this is our… Super excited and impatient to find out what is this project all about, how did it go. So, tell us the starting point of the project. What kind of data do you have? Where does it comes from? And then we’ll go from there.
Mollie Pettit: Sure. Yeah. So, the data that we have is whenever a person is pulled over in Illinois, the law enforcement officer is required to fill out a form and that form details information about who was pulled over, what was their gender and race, information from their driver’s license, why was that person pulled over. Once they were pulled over, did the officer search that person? If that person was searched, was contraband found or not? Then what was the result of that stop? Was that person cited? Or given a verbal or written warning? So, that’s the data that we’re working with. What the data looked like raw was one, you know, line of data for each stop that occurred.
Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Gotcha. And just to clarify, the officer had to guess the gender, the race of the person.
Mollie Pettit: Gender would be on the driver’s license, but the race they needed to guess, yeah.
Kirill Eremenko: Uh-huh. (affirmative). Okay. Gotcha. So, then you visualize that. Unfortunately, we can’t share the visualization on the podcast, but we’ll include a link to the website, the illinoistrafficstops.com, is that right? Is that the URL?
Mollie Pettit: Yes, illinoistrafficstops.com. Uh-huh. (affirmative).
Kirill Eremenko: Yeah, we’ll include a link in the show notes and people can check it out there. But basically you have this visualization of what different races the police officers would stop, and where do you go from there?
Mollie Pettit: The first thing that we looked at was who was stopped. We didn’t end up focusing on a stop rate metric though, because there’s a few reasons and I kind of talked about this in the talk, but some of the reasons why we decided not to do that was because it’s not a metric that’s very accurate, because if you were going to do a stop –
Kirill Eremenko: Tell us first of all, what is a stop rate? Like, I found that part of your talk very interesting, ’cause that’s the first thing I would jump at, right? You’re thinking through all these reasons that you mention just now, the stop rate is indeed the first thing that comes to mind. So, what is a stop rate? And then why did you decide not to go with that part?
Mollie Pettit: Sure, yeah. So, the stop rate would refer to the metric calculated by dividing a races stopped population by its driving population. So, of the drivers of a particular race, how often are they stopped is the stop rate. And … oh, sorry. Go ahead.
Kirill Eremenko: So, for instance, if you have let’s say, I don’t know, let’s say you have a hundred thousand white people in a city and over that period of time, over a year, or whatever period of time you’re looking at, if ten thousand white people are pulled over by police, then the stop rate would be ten percent. Ten thousand divided by a hundred thousand. But if you have let’s say 50 thousand African American people in the city and they were also stopped ten thousand times, then the stop rate there would be greater, it would be 20%. Ten thousand over 50 thousand. Is that right?
Mollie Pettit: Sure. Uh-huh. (affirmative).
Kirill Eremenko: Okay. So, that’s your stop rate. But this is the part I found really interesting. It’s not the best metric, because we not knowingly actually make some assumptions about these two data sets by calculating the stop rate. Can you tell us about these assumptions we make? Once you uncover them in the video I was like, wow, indeed this is true. That does make sense why it wouldn’t be so accurate. So, what would you suggest are the assumptions?
Mollie Pettit: In the talk that I gave, one of the first things I did was I kind of show the stops demographics of Chicago and then compare that to the stops demographics, or sorry, the population demographics of Chicago and show the differences there. So, what people often want to do is they want to take the population of a city and they want to assume that that’s the driving population and then create a stop rate from that, but there’s a few issues with that. One is that you don’t actually know what the driving population is of a city. You don’t know who drives to work. Maybe some people drive much further to work or take the train or walk or maybe people are driving through other cities in order to get to work. So, the driving population through a city might be very different or like a town, I think that’s a lot more relevant for small towns that the people who are actually driving through that town, that population might be different than the town itself. So, comparing those two things isn’t all that accurate, because you don’t really know what the driving population was. So, that’s one.
Mollie Pettit: And then another thing that was kind of an issue is that on the traffic stops form, the traffic stops form and the census are a bit different. So, on the traffic stops form, Hispanic/Latino is listed as a race along with Black, Asian, White, et cetera. Whereas on the census form Hispanic/Latino is listed as an ethnicity and then races are separate. So, you choose one, are you Hispanic, Latino, or not and then also what’s your race. So, that makes comparing these two forms tricky.
Mollie Pettit: Then another thing is that when someone’s filling out the census they are self reporting, whereas an officer who has pulled somebody over is making an educated guess of the race of that person. So, there’s a lot of things that makes it hard to compare this data for an actually accurate metric.
Kirill Eremenko: Gotcha. Makes sense. That’s very, very insightful. And so what did you do instead?
Mollie Pettit: Yeah, so instead what we decided to do was to focus on after a person was already stopped, what happened? So, a big focus is looking at the search rates. So, once all of the stops that involved Black drivers, what’s the percentage of those stops that resulted in a search? So, looking at that, you can compare what are the search rates for each race, how does the search rates of Black and Hispanic drivers compare to that of White drivers in that particular agency. That’s where you can I think get a much more accurate read on various disparities, racial disparities within the data.
Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Okay. Gotcha. And then you actually developed another metric which is to do a benchmarking, right?
Mollie Pettit: Yeah, that’s right. Uh-huh. (affirmative).
Kirill Eremenko: Tell me about it.
Mollie Pettit: Yeah, so … oh, go ahead.
Kirill Eremenko: Pardon. No, just tell us how that works, if you don’t mind.
Mollie Pettit: Yeah. So, a common critique of the application of this text is that the rate at which drivers are searched. Some people think, “Well, maybe that’s not a good indicator of bias.” Perhaps a officer in his line of work has noticed particular trends that causes him to search a particular group of people more. So, then he would just be doing appropriate police work, because he’s using his experience to inform his decisions. So what we also did is we looked at what are the search hit rates for various races. And what I mean by a hit rate is, was contraband found or not. And what we’ve found by looking at hit rates, is that in general across agencies there was very few agencies where there was a significant difference, like a statistically significant difference between the search rate of White drivers and minority drivers. In the cases where there were significant differences, it was often that the minorities had a lower hit rate than the White drivers.
Mollie Pettit: So, in Chicago, if you’re looking at consent search rates, Black and Hispanic drivers are searched about three times more than White drivers. But if you then look at the hit rates, Black drivers actually have a lower hit rate and Hispanic drivers is about equal, but neither of them are actually that significantly different than the White hit rate. So, their search rates are much higher, but the hit rates are not.
Kirill Eremenko: Uh-huh. (affirmative). Gotcha. And I was very impressed and I think this is something that we need to all do more of that in your visualizations you actually presented statistical significance. I think you came up with a very eloquent way to do it. You just make something more transparent, like a data point or a part of the realization more transparent, less opaque if it’s not statistically significant or if it’s less statistically significant than the other dots.
Mollie Pettit: Yeah.
Kirill Eremenko: That seems really clear. How did you come up with that idea?
Mollie Pettit: You know, that was something we were wracking our brains with for a while. We were realizing that being able to show the statistical significance would be really important in this, because if you’re not showing what’s significant and what isn’t, you’re only telling a part of the story and it can lead also to making conclusions that aren’t quite right, because you’re assuming that all of these are equally important. So, over time we kind of came up with this ideas of just trying out using opacity. So, yeah, as you said, things that are statistically significantly different. So, if you’re looking at a plot, if a rate for that particular race is statistically significantly different than the White rate for that particular agency, it’ll be fully opaque and otherwise it’s gonna be a lot lighter, a lot more transparent.
Kirill Eremenko: So, what’s –
Mollie Pettit: As soon as we implemented it and we could see what it looked like, we’re like, “Ah, this is it.”
Kirill Eremenko: Yeah. It’s a great technique I think. I think it’s a good tip as well for our listeners to take away. Once they see your visualizations they will be convinced that that’s one of the best ways. What would you say –
Mollie Pettit: Thank you.
Kirill Eremenko: Thank you. What is the test that you use for statistical significance? Let’s talk a bit about that, because a lot of data scientists, especially if you’re starting out don’t even consider the importance of doing statistically significance tests.
Mollie Pettit: Sure. So, we used the Z test for two population proportions. That’s what it’s called.
Kirill Eremenko: Okay. And so in a nutshell, what does it allow you to do?
Mollie Pettit: It allows, oh gosh, let’s see. I haven’t had to talk about this.
Kirill Eremenko: Just in short, why do you need to do statistical significance test? What is the risk if you don’t do one?
Mollie Pettit: Oh, sure. So, one of the things that we’re showing in this visualization is we’re comparing the rates of two races. We’re comparing the search rates of Black Drivers by this agency versus White drivers. But let’s say you’re looking at a town and only two people were pulled over or only two Black drivers were pulled over. Those rates are going to be less significant, because there’s not enough data. There’s not enough information. If you pull over two Asian drivers and you search one of them, that means 50% of the Asian drivers in that city were searched. That’s high.
Kirill Eremenko: Yeah. Yeah.
Mollie Pettit: But when you realize, only two people were pulled over, like that’s not a statistically significant comparison.
Kirill Eremenko: Gotcha. Gotcha. That’s a great example. So, basically it shows you need more data. Like, there’s not enough data to make conclusive or any statistically significant conclusions from that [crosstalk 00:52:02] to derive any conclusive results.
Mollie Pettit: Sure, because that number is still a valid number, right? It’s still exactly the rate that is existent. It’s just not enough to say that there is a difference when you’re comparing it to the other rates.
Kirill Eremenko: Uh-huh. (affirmative). Totally, totally agree. It’s cool to see somebody in the space of visualization doing that, because sometimes even practitioners in the space of like machine learning don’t do that. I’ve seen models being deployed that haven’t been checked for statistical significance. Whereas in visualization it’s even easier to forget about that. So, it’s a great … you’re leading by example so other people… Even when you’re doing visualization it’s important to test these things.
Mollie Pettit: Yeah. Thanks.
Kirill Eremenko: Okay. So, another thing. So, with this bias, right? I liked what you said in your presentation that you’re not doing this to point fingers at people and say, “You’re biased.” Or “You’re biased.”
Mollie Pettit: Right.
Kirill Eremenko: Sometimes this bias happens unconsciously or subconsciously and by looking at the data, because this is like an important ethical consideration, right?
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: While looking at that data, we can at least shed light on this bias and people become more aware of things they might be doing unconsciously. I think that was a very nice way of putting it. That data science isn’t here to shame people or here to cause, provoke people to more conflict. It’s here to point out what is the state of things. Let’s shed some light on –
Mollie Pettit: Yeah, exactly. Exactly. Like, what does the data actually say? What is actually happening? Yeah, exactly. Exactly what you said. The whole purpose is not to point fingers. The purpose of doing the analysis and doing the website, we’re just really hoping it’s going to act as an informational tool both for the public, but also I’m hoping that officers at agencies across Illinois might look their own agency up and if there are disparities in the data then they might think about why that is and how many they can fix it. I think it’s a really helpful tool just to bring these disparities to light so that the law enforcement agencies of Illinois can make informed improvements in their agency.
Kirill Eremenko: Yeah. Totally, totally agree. We’re getting close to the end of the podcast. I want to kind of leave this thought with our listeners, a quote that you mentioned in your talk. I don’t know if you actually had this thought written down, but it came out really well and you said, “It’s hard to fix problems when you don’t know what the problems are and it’s hard to know what the problems are if you don’t have the data.”
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: I think that was really cool. So, in general racial bias is something we want to fix and it’s a problem, right? But you can’t really know what’s … sometimes these things happen, sometimes we don’t know the details of these things. You can’t know the problem in full unless you actually go and analyze the data, which I think you’ve done quite successfully with this project of yours.
Mollie Pettit: Thank you.
Kirill Eremenko: The Illinois traffic stops. Do you have any plans on doing any more similar projects where, you know, like PET projects where you help organizations that need, that need to use data to do good in the world?
Mollie Pettit: Yeah. Honestly, I would love that. I would love if I could spend the majority of my time on projects like this. I don’t know that I will ever be able to be spending all of my time on projects like this, because they don’t always pay. This was mostly a volunteer project. It’s something that I just really wanted to do. It started out kind of on the side and at some point I decided to take a break from work and just focus on it for a month and finish it.
Kirill Eremenko: So, you obviously did a very successful project with this Illinois traffic stops initiative and I’m sure it will help lots of people. Do you have any plans on doing more projects like that where you help organizations that use data and data science for good?
Mollie Pettit: Yes. Ideally, that’s something I would really love to do. So, there’s first of all this project could be expanded. There’s a lot more things that could be added to the site and more things that could be dug into. But additionally outside of that, really wanting to do as much of this kind of work as possible. In fact, the people that I worked on this particular project with, if you do go to illinoistrafficstops.com and go to the bottom, you’ll see that there’s a little section, a little support section basically detailing how a lot of volunteer hours have gone into creating this. Despite wanting to do it full time, you know, wallets don’t always allow that. So, there’s a place where if people want to contribute to the continuation of this project as well as other social good projects, they can donate with that link. Anything donated will only go to basically pay for the creation of more projects either this one or similar that are all social good focused.
Kirill Eremenko: Fantastic. I commend you guys on that. That’s an amazing idea. In fact, I’ll be one of the first people to donate. I, honestly, this is one of the first things I’m going to do after this podcast. I often, like, I want to help in the world, but oftentimes I kind of like stop, because I hear stories that with a lot of organizations that you donate to, you don’t know where the money’s going. You don’t know if it’s going towards the admin or is it going somewhere else. You know, in certain countries it might be going in exactly the opposite direction than what you think. But if these little initiatives, little projects that are run by people that I personally know, I know that this is going to be used for good that is going to actually help contribute to the world. So, thank you so much for doing that.
Mollie Pettit: Yeah, exactly.
Kirill Eremenko: You have me on board with that already.
Mollie Pettit: Fabulous.
Kirill Eremenko: Awesome. Okay. Well, Mollie, thank you so much for coming today on the show. Being fantastic. I loved your talk. I loved our conversation today. Before I let you go, where would you say our listeners can best find you, get in touch, follow you and your amazing visualizations and projects?
Mollie Pettit: Yeah. So, the best place to find and follow and interact me would probably be Twitter. My handle is Mollzmp, which is M-0-L-L-Z-M-P.
Kirill Eremenko: Uh-huh. (affirmative). Mollzmp. Okay. Gotcha.
Mollie Pettit: Yep.
Kirill Eremenko: Okay. We will include that in the show notes and yeah. We have one more final question for you today. What’s your favorite book that you can recommend to our listeners to help them become better at their careers?
Mollie Pettit: I have a book in mind. It’s D3 specific. So, if you’re out there listening and D3 is something that you are interested in learning about and trying your hand at, my favorite book to recommend people for getting into learning is called Interactive Data Visualization for the Web and that is by Scott Murray.
Kirill Eremenko: So there you go, ladies and gentlemen. Interactive Data Visualization for the Web is your book recommended by Mollie. Mollie, thanks again for coming on the show today. Had a fantastic time with you and I’m sure lots of listeners will get amazing insights from our today’s chat. Thanks so much.
Mollie Pettit: Well, thank you.
Kirill Eremenko: So, there you have it. That was Mollie Pettit and I hope you enjoyed this episode as much as I did. Lots of great energy, lots of laughs, and lots of interesting that we talked about such as D3, the case study about Illinois traffic stops, and using data science for good. So, make sure to check out the illinoistrafficstops.com website where you can play on with this case study and actually see the interactivity of D3 in action on the website. Also, if you can afford it, then at the bottom there’s a link where you can support Mollie’s effort of doing data science for good. I think that’s a great way to give back to the community. These projects often are very helpful, but there’s no funding for them. We can all help like that. Or on the other hand you can use your own data science skills to create your own data science for good project or participate in one and look out for those. I think it’s a wonderful, fantastic thing. A fantastic way of giving back to the world through your data science skills or if you don’t have the time, through supporting others.
Kirill Eremenko: Also, Mollie asked me to mention that she has a Meet Up in Chicago. So, if you’re in Chicago and you want to go to Mollie’s Meet Up, then you can find the link to this Meet Up in the show notes or you can go to meetup.com and look for Chicago Data Viz Community. Otherwise all of the links for this episode will be in the show notes at www.www.superdatascience.com/231. That’s www.superdatascience.com/231. You can get the link to the Meet Up in Chicago, the Illinois traffic stop URL, the Twitter handle for Mollie’s Twitter. Make sure to follow her there. And all the other items that we mentioned in this podcast.
Kirill Eremenko: On that note, thanks so much for being here. I look forward to seeing you back here next time and until then, happy analyzing.
Show All

Share on

Related Podcasts