SDS 151: Women in Data Science & How to Help

SDS 151: Women in Data Science & How to Help

Women in Data ScienceWelcome to episode #151 of the Super Data Science Podcast. Here we go!

Girl power! Be amazed at how Lucy D’Agostino McGowan makes it possible to excel in the male-dominated fields of Data Science and Biostatistics. Listen as she tells about her career journey, R Ladies Nashville, and data information overload.

Subscribe on iTunesStitcher Radio or TuneIn

About Lucy D’Agostino McGowan

Lucy D’Agostino McGowan has recently finished her Ph.D. in Biostatistics at Vanderbilt University. She co-founded R-Ladies Nashville, an R community that is committed to advocating gender diversity. Recent projects are focused on propensity scores, shinydashboards, and google drive on R.


Truth is… women in Data Science field and other minority genders are sometimes being underrepresented. Now, it's time to level up the playing field!

Listen to today’s Super Data Science Podcast episode as Lucy D’Agostino McGowan talks about her experience and commitment to bridging the gender gap.

Lucy first shares how she made a jump from Religious Studies in undergraduate to Biostatistics as her postgraduate studies. R-Ladies Nashville is a movement she co-founded and very proud of. She will give you an overview of how it started, how to join (even for males), and how the usual meetups go.

Since data science is proving to change the future of the industries, diversity in the field should be a priority. She reiterates that efforts must also come from men as women push to break the stereotype. Lucy gives ways on how to be sensitive and help colleagues belonging to minority genders succeed in data science. Raise awareness and be supportive of the movement. Practice a gender-neutral working environment.

Lucy then talks about her career journey and future plans. Lucy will be a postdoctoral fellow in the Biostatistics Department in Johns Hopkins University. Her work as a biostatistician focuses mainly on observational data space. She developed a method for analyzing and inference from data not randomized from medical records. Using propensity scores is a better option and removes the bias of randomization according to her.

What is a better tool – SAS or R? She gives insights on the pros and cons of both tools. She emphasizes also that data scientists should first be aware of tools being used by organizations they are going to work with.

Moreover, as a data scientist, you’ll face challenges like communicating the complex methods to make it easier to understand for someone in a different field. You should use your skill and learn to communicate well to help other people. 

Lastly, Lucy gives solutions to information overload, a growing problem in data science. She says to use best practices in reproducing information. Learn how to determine what’s useful and what is noise according to Lucy. Document and condense as it will be digestible and be utilized by the receiver. Practice prioritization of information.

This episode will be very insightful for all of us so start listening!

In this episode you will learn:

  • R-Ladies Group continues to advocate gender diversity in the field of Data Science. (08:40)
  • How can guys be more supportive of their female colleagues? (16:30)
  • Ways men can appropriately approach colleagues belonging to minority genders in the workplace is discussed. (20:10)
  • Lucy D’Agostino McGowan talks about her career journey and future plans. (22:45)
  • Propensity Score Matching (PSM) reduces bias in observational studies. (25:05)
  • What’s the better software – SAS or R? (34:00)
  • Learn how to build dashboards using shinydashboards through her online course in Data Camp. (41:17)
  • The best practice to prevent and solve information overload is presented. (58:19)

Items mentioned in this podcast:

Follow Lucy

Episode Transcript


Full Podcast Transcript

Expand to view full transcript

Kirill Eremenko: This is episode number 151 with R enthusiast, Lucy D'Agostino McGowan.

Kirill Eremenko: Welcome to the Super Data Science Podcast. My name is Kirill Eremenko, Data science coach, and lifestyle entrepreneur. Each week, we bring inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now let's make the complex simple.

Kirill Eremenko: Welcome back to the Super Data Science Podcast everybody, super excited to have you on board today. I've got a very energetic guest on the show today, Lucy D'Agostino McGowan. Lucy is a person of many, many different talents, many different skills. What you need to know about Lucy is that she started off by studying a Bachelors in Religious Studies and Italian. After that, she has successfully switched her career into data science. She's just completed a PHD in Biostatistics. She also runs the R-Ladies group in Nashville, Tennessee. We talk about that quite a lot.

Kirill Eremenko: This episode will definitely be very interesting to women in data science, or maybe you know somebody who's a woman in data science, and you can forward it to them, but also be useful for men as well, because I ask specific questions on how we as men can be more supportive of women who are joining the space of data science, or who are already in the space of data science, but maybe are feeling a bit excluded or feeling as if they are the minority, and we want to make them feel just the same. Everybody wants to just be successful in this space, and build a career.

Kirill Eremenko: We talk quite a bit about that, so I think this will be both useful to men and women for your future career, and other people's future careers, for those around you. Also, Lucy has just launched her first course. It's on Data Camp. It's about building dashboards with Shiny. You can relearn a bit about that as well. You can check it out there as well.

Kirill Eremenko: There's lots of different skills and topics that we're going to cover off, and we're even going to talk about SaaS along the way. So really, there's so much. I can't even list all the things here. Let's just jump straight into it. Without further ado, I bring to you Lucy D'Agostino McGowan, a PHD in Biostatistics [inaudible 00:02:55].

Kirill Eremenko: Welcome ladies and gentlemen to the Super Data Science Podcast. Today I've got a very exciting guest is here on the show, Lucy D'Agostino McGowan from Nashville, Tennessee. Lucy, how are you today? Welcome to the show.

Lucy McGowan: Hi. Thank you so much. I'm doing well. I'm excited to be here.

Kirill Eremenko: Yeah, I'm so excited as well. You mentioned it was snowing. It's April, and it's snowing in Nashville, what's that all about?

Lucy McGowan: Yes, it's terrible. We've been having really lovely weather, and then all of a sudden today we had a big cold snap. So we had some flurries this morning. I don't know what it's all about. I'm hoping it goes back to being warm soon, I'm ready for Spring and Summer to kick in.

Kirill Eremenko: Yeah. So true. It's been crazy all around the place. I've actually never been to Nashville, but as we discussed before the podcast, there's lots of great musicians in Nashville. Can you tell us a bit about that? What kind of music comes from Nashville?

Lucy McGowan: Yeah. We're most well known for country music. That's what people usually think of when they think of Nashville, but we have all different kinds of music here. We have a lot of recording studios, and so people come for not just country music, but for all different genres. But it's great because just about any venue, bars, and pizza places, and restaurants all have live music pretty much all the time, which is really nice. It's really cool, lots of opportunities to get to hear really great musicians.

Lucy McGowan: Another thing that's cool about Nashville is a lot of people are here trying to make it, so we have a city of a lot dreamers, which I think is a really neat place to get to live. We have a lot of people with big aspirations, yeah.

Kirill Eremenko: That's awesome. Is it the same type of country music as in Texas, or other parts of the US? Or is it that its own specific?

Lucy McGowan: I think of it as a little bit different, but I'm not sure exactly. I don't know how to identify what makes it different. We do have lots of country music, but there's also a general Nashville feel that's a little bit different from something like Austin, or other places in Texas that have also great music, but a little bit of a different feel.

Kirill Eremenko: Interesting. Okay. I'm going to put it on my list. I'm definitely going to come to Nashville.

Lucy McGowan: Yeah, please do.

Kirill Eremenko: Yeah. I heard a similar story about, it was about New Orleans.

Lucy McGowan: Yeah.

Kirill Eremenko: Not about country music, but just about jazz, right? About that style of music. Blues and Jazz.

Lucy McGowan: That's right.

Kirill Eremenko: When I went, and it totally lived up to its promise though. I went to, what is it? Frenchmen street there, and it's so lovely. Have you been?

Lucy McGowan: I have been, yeah. I went to New Orleans when I was in college, I went to help clean up after Katrina. But it's beautiful. We went. Yeah, the French Quarter is great, and Bourbon street is really cool. New Orleans is awesome. It got a really cool vibe.

Kirill Eremenko: Is Bourbon street that long one with all the bars and the clubs?

Lucy McGowan: Yes. Yeah.

Kirill Eremenko: All right. Well, I personally prefer the more traditional Frenchmen street.

Lucy McGowan: The French Quarter, yeah. Yeah, that's the nice where you get the look, the real [crosstalk 00:06:09].
Kirill Eremenko: Bourbon street just never ends. It just goes on for kilometers. Okay, how about this. I'll recommend my third bar from my favorite jazz bar from New Orleans, and you recommend your favorite country bar from Nashville. Sounds good?

Lucy McGowan: Okay. That sounds great.

Kirill Eremenko: If anybody is ever in New Orleans, my favorite jazz bar there is called The Spotted Cat. Fantastic music. They have vibe music all the time, like many places. But somehow it just feels great in there. So if you're ever there, check out Spotted Cat. All right, your turn.

Lucy McGowan: I'm going to recommend The Listening Room. The Listening Room is basically where local artists who often are song writers, so they're not as well known, but they write for really well-known artists. They come and play. They play every day of the week, and then they often play shows on the weekend as well. As part of it, you get dinner, and they have drinks. It's a really fun show because they explain the background of how they wrote the song, and what inspired them to write it. Then they hear it.

Lucy McGowan: So hearing the people who actually did the writing singing their own songs, I think, is really neat. And, they're all very, very talented. That's my big recommendation. I think that's great.

Kirill Eremenko: Nice. The Listening Room. Thank you. That's very detailed for sure. Detailed and lively. We can see that you're into your music.

Lucy McGowan: Yeah.

Kirill Eremenko: Okay. All right. Well, let's dive back into the topic of for the first time for today, into the topic of discussion, data sciences. Surprise surprise, data science podcast, not music podcast.

Kirill Eremenko: Yeah, so you are the organizer for the R-Ladies group in Nashville. Of course, among other things. That's probably one of the things that I've never met a person who's an organizer of the R-Ladies group, and we'll definitely be talking about that. You're doing your PHD at the moment, is that correct?

Lucy McGowan: Yes, I actually just ... I just completed my PHD about a month ago. I defended my dissertation, so I'm kind of ...

Kirill Eremenko: Congrats.

Lucy McGowan: Thank you. Yeah.

Kirill Eremenko: Fantastic. That's in Biostatistics.

Lucy McGowan: That's correct, yeah.

Kirill Eremenko: Awesome. Okay. Well, there's lots of things to cover, and how all that is related to data science, women in data science using R, Hadley Wickham's cocktail recipe book, and all these other things. So where would you like to get started?

Lucy McGowan: Yeah, we can start with R-Ladies, and then I can talk a little bit about my PHD work, and what I'm going to be doing now that I'm finished if that sounds good.

Kirill Eremenko: Sounds good. Let's kick it off.

Lucy McGowan: Yeah.

Kirill Eremenko: Tell us a bit about R-Ladies.

Lucy McGowan: Okay. R-Ladies is a really neat organization that was founded a couple of years ago. It's basically, it's now an international worldwide organization. The main goal is to promote gender diversity in the R-community.

Lucy McGowan: Basically, the thing that got us started was there was an analysis on looking at gender diversity in the R community, minority genders in general, and seeing that we just don't have great diversity across the board. In leadership, package development, conference speakers, conference participants, educators, and users, in all areas we could use some improvement.

Lucy McGowan: It started in San Francisco. A woman, Gabby there, she started out a program. Then she was wildly successful getting people involved. Then London started out after that, and then since then, it's exploded. So now we're all over the place. But it's really cool.

Lucy McGowan: We do a lot of different things. The main thing is offer local meet ups. In different cities, we have these R-Ladies chapters that offer meet ups for women to get together and learn how to use R, and communicate results that were obtained from R, and all different things like that.

Lucy McGowan: Actually, we just recently started a remote group as well. For people who don't live close to a big enough city to attend these meet ups, there is an R-Ladies remote chapter that offers remote meetings.

Kirill Eremenko: Wow.

Lucy McGowan: Then we also do things ... Yeah, it's really cool. Then we also do things like helping women to get more involved in the community at large. We have a couple of packages that we maintain that people who haven't worked on our packages before can contribute to, to ease into that process. Then we do things like abstract revisions, and so people who are interested in submitting an abstract to speak at a conference maybe for the first time can submit their abstract, and we can all review it, and help get it to a place that we think would be more likely to get accepted. It's been a really cool, it's been a great opportunity.

Kirill Eremenko: That's really cool. Sounds very exciting for attendees. I'm looking at the website. So guys and girls who ... is this only for girls? Or guys can check this out as well?

Lucy McGowan: Yeah. Generally, different individual chapters have different policies. But the over-arching general policy is that we are primarily focused on minority genders. But if you are in the majority, if you're a man that was interested in coming, you can come as a guest. It's usually how it works. But different people have it different ways. Generally, we try to prevent having things like the majority of the room being men.

Kirill Eremenko: Yeah, it defeats the purpose, right?

Lucy McGowan: Right, yeah. But coming as a guest ensures that there's at least a 1:1 ratio there.

Kirill Eremenko: Okay. Gotcha.

Lucy McGowan: Definitely, excited about allies joining up.

Kirill Eremenko: Gotcha. The website is, all one word. You have so many locations. It's like San Francisco, London, Istanbul, Paris, Boston, Los Angeles, Melbourne, Madrid, Nashville, New York, Barcelona, Columbus. It's not just US, it's like all over the world. That's so cool.

Lucy McGowan: Yeah. It's really neat. It's really exploded. We have a great directory too where you can see just women that are involved in R-Ladies, but you also can get a list of speakers. So I think gender diversity in terms of being invited to speak at conferences is something that, just in general, I think the data science community has been doing a good job of working towards.

Lucy McGowan: We offer a list of speakers so if you're trying to find women to come to your conference, you can find some good women who are confident in R that way.

Kirill Eremenko: Yeah. No, that's really good. I must say, this is a really great movement. I highly support this, because unfortunately I've also encountered that in STEM, Science, Technology, Engineering, Mathematics, there's not that many women. Therefore, girls and ladies who do want to get into it, they feel sometimes, I guess, that they won't be ... like they'll be left out when they do go in, that it's a male dominated place. Organizations like this one, like R-Ladies, just encourages those. Like not everybody has to go into R, or data science, or analytics. But those who want to, they know that there's places like R-Ladies where they can find like-minded people, and connect, and not feel like a minority all the time. So this is really cool.

Lucy McGowan: Yeah. No, it's doing great. There is really great mentorship that comes out of it, which I think is really cool.

Kirill Eremenko: Awesome. Tell us a bit about the Nashville chapter. How often do you guys have meet ups, and how many people attend usually?

Lucy McGowan: We have meet-ups typically around between once a month, and every other month depending on speaker availability. Typically, the way it works is we invite in an outside speaker from the Nashville community to come and talk about something.

Lucy McGowan: The talks vary from things like maybe the speaker has built a package that they want to tell us about. That's one type of talk. Or maybe they are exploring using a package, and they basically have gone through, and really learned the nitty gritty details of the package. So they'll come and describe it to the group. Or maybe they're trying to solve a specific problem, like data cleaning, for example, so they've looked at a bunch of different packages in R, and basically done a synthesis of the different ones that are good for different things. Or sometimes we, like if there's a big conference, and we have a couple of women that have attended the conference, we'll have them come back and do a synthesis, and overview of what they learned.

Lucy McGowan: That's typically the type of thing that we have. We tend to have between 25 and 30 attendees per meet-up, with that varying depending on we offer both lunch time, and evening meet-ups, because we found that some women, it was easier for them to come during lunch if they have child care responsibilities in the evening, and for other women it was easier to come in the evening. So we offer both options there as well. But some of our meet-ups tend to have ... will pull in a little bit more than 30, and some will be less, but that's around 30.

Kirill Eremenko: Okay. That's just very interesting. What are the talks usually about? You mentioned, R packages, and so on. Do you have talks about the community? Talks about women in the R-space, or in data science?

Lucy McGowan: Yeah. Most of the talks have been more on the technical side that we've hosted, although I know other chapters have had general mentorship sessions and things like this that have gone really well. We haven't hosted anything like that yet in Nashville. Most of ours are more technical talks.

Lucy McGowan: But we have had, so there is a Women in Statistics and Data Science conference that offers both sides, the technical and also some of the more soft skills that you can learn along the way. So sometimes women will come back, and we'll talk about what we learned at those conferences that will serve to fill the other void. But we haven't had as many just on gender inclusion and things like that.

Kirill Eremenko: Okay. Gotcha. I actually have an interesting question I just thought of.

Lucy McGowan: Yeah.

Kirill Eremenko: How can guys in the space of data science be more supportive of their female colleagues, or even aspiring women who want to get into the space, how can we make our contribution to helping people get into the space, or thrive in the space?

Lucy McGowan: Yeah, that's a great question. I think it's something that I notice a lot of people are really interested in, which is really exciting, because I think that it's awesome that people are looking for ways to be able to be helpful. I know, actually fellow R-Lady, Katelyn Haddon, she recently had like a whole tweet thread that maybe if you do show notes that I could send you the link afterwards to link to.

Kirill Eremenko: Of course. Of course, yes.

Lucy McGowan: Yeah, where she offers a whole bunch of different ways that she's experienced people who helped her along the way, both men and women who have helped her become the best data scientist that she can be.

Lucy McGowan: But I think, just a general thing, I think just general awareness, and listening to what women around you are saying, or just one way that can be really helpful because I found that sometimes if I haven't experienced something directly, it's hard for me to understand what it might be like. But there's lots of women in the field who have directly experienced different things along the way that maybe have been a little bit of hardships when they're trying to figure out the best way to navigate their data science careers.

Lucy McGowan: As men, I think that men can just by listening to their experiences, and honoring them and also if there is a way that they feel like they could be helpful in that way, and taking that extra step, I think just that alone can be really useful because sometimes the worst pushback can come from someone who just haven't experienced the thing that you're experiencing, and tries to claim that it doesn't exist because they haven't experienced it.

Lucy McGowan: I found that one of the more helpful things is even just validating that this type of thing is some kind of gender discrimination, and things like that actually really is something that exists, and needs to be addressed. If men can be just a little bit more aware of that, I think that's a great step in the right direction.

Lucy McGowan: The other thing is doing things like amplifying voices of women. I think something that a lot of people experience just in general is maybe in a meeting setting, sometimes women will talk about that they might say something, and then another person might repeat the same thing that they said, but for some reason, they have a little more cloud, or maybe because there's this unwritten bias that they might get heard a little bit better than the woman who initially said.

Lucy McGowan: So if you're a man in that situation, and you see that happening, being able to amplify that like, "I actually heard Lucy say exactly what you just said." Just nudging in the right direction. I think that kind of stuff is really helpful in having a male ally, especially when there is that bias, I think that can be really, really can do a lot of good. Those are two little ways.

Kirill Eremenko: Yeah, gotcha.

Lucy McGowan: Katelyn had a great write up that I think is really ... I'll link to it, because she lists a bunch of other ways that are really quite useful.

Kirill Eremenko: Okay. No, that's totally ... I totally agree with that. On that, I had a question, sometimes what would you suggest in situations where, as a guy you don't really know if you can do that or not, because sometimes you will try to be helpful, but then the person that you're trying to help out, she might feel that you think she's not empowered to stand her own ground.

Lucy McGowan: On her own.

Kirill Eremenko: Yeah. Sometimes, even though you want to help, and I've been in a situation like that, you want to help, you don't really know how the person will react, and you end up not doing anything. What would your advise be there?

Lucy McGowan: Yeah. Part of a lot of these, and it's hard sometimes if you're in the moment in a meeting where you can't turn to the person and be like, "What would you like me to do right now?" That's obviously a tough situation. But when it's someone who's your colleague that you're going to be working with a lot of the time, I think just the act of communication can be a really good way to navigate something like that.

Lucy McGowan: Maybe you're in a meeting, and you see something like that happen, and you don't know how that person would like you to react. So then maybe just after the meeting, you can say, "Hey, I noticed this happening. Did you notice that? Is there something that you think that would have been helpful for me to say, or would you rather that I just let it go?"

Lucy McGowan: Even just recognizing, like calling out that you saw something happening, I think can be really empowering, even if you don't call it out in front of the group. Although, if she is willing to let you call it out in front of the group, that can also sometimes be helpful to just have a second voice that says they're noticing it. But, yeah, I think communicating with your colleague is a great way to be able to try to navigate that, because I agree, some people maybe wouldn't want that kind of interaction.

Lucy McGowan: The other thing I think that's tricky, it's a hard line to find, but being able to figure out how to be communicative, and be there, and supportive. But also not putting the burden. I think a lot of times, like a lot of us are really excited about trying to uplift women in the field. But then at the same time, we end up taking on a bit of an extra burden by we're trying to do our own jobs, but then we're also trying to make sure that an entire group of people isn't getting discriminated against.

Lucy McGowan: That can take up some energy. So I think being really cognizant of when you're asking someone how you can help, to not put the burden on them to define exactly what you should be doing. This is trying to figure out, it seems like a tricky balance of figuring out the best way to say, "I notice that's happening, and I really want to be helpful. Would this be useful?" Rather than saying ... making them do the extra work to tell you exactly what they think is the best thing to do.

Kirill Eremenko: Gotcha. Okay. Well, thank you very much for that overview.

Lucy McGowan: Yeah.

Kirill Eremenko: Let's shift gears a little bit. Let's talk about your career, and you've gotten to AR. You've just finished your PHD. Tell us a bit about your journey. How did you get there, and what's coming up ahead for you?

Lucy McGowan: Yeah. So my undergraduate degrees are actually really different from what I did in my Master's, and my PHD. But I did my undergraduate in Religious Studies and Italian. So I would say humanities major.

Kirill Eremenko: Wow. So different.

Lucy McGowan: Yeah. Yeah, very, very different. I loved that work. I really liked the humanities, and I think I learned a lot in terms of my communication, and my written skills, which I use on a daily basis now as a data scientist and bio statistician. I gleaned a lot from that training.

Lucy McGowan: But then just after graduating from undergrad, I entered a Master's program in Biostatistics at Washington University in St. Louis. Then I worked a little bit as a bio statistician in the department of surgery there, and then started my PHD at Biostatistics at Vanderbilt University, which I just finished up this past month. Now I'm about to start a post doc at John's Hopkins in their Biostatistics department.

Kirill Eremenko: Wow. I heard a lot of good things about John Hopkins.

Lucy McGowan: Yeah, it's going to be great. I'm working with Jeff Leek. He does a lot of these Coursera courses through their data science group. So I'm going to be working, part of my time is going to be spending.

Lucy McGowan: My dissertation work, and my work during my PHD is largely, they're kind of observational studies based, so generally it means that I've developed methods for analyzing data, and making inferences based on studies that weren't randomized. Kind of these non-minimized type studies. So most of my work is on using data that was collected using electronic medical records to draw conclusions.

Lucy McGowan: So at John's Hopkins I'm hoping to extend that work a bit. But then also I'm going to be moving into the data science pedagogy space. I'm going to be working with Jeff Leek. He's got a bunch of data science courses on Coursera, so I'll be helping develop those as well as evaluate their efficacy, which I think will be really exciting.

Kirill Eremenko: That's very interesting. It's just so many things are already gotten out of this. Tell us a bit more about the non-randomized control samples. If you can share some of the method in a nutshell, because I think that's a very interesting topic.

Lucy McGowan: Yeah. Basically, the work that I do, we use what's known as propensity scores. Propensity scores, they're essentially like a balancing score. In a randomized setting, the reason we're randomizing is, because we want to balance between our exposed and our unexposed group before we make any kind of inference about whether that exposure causes an outcome.

Lucy McGowan: But we have so much data now just generally in the world. I think electronic medical records are a great example because that's just data that's collected as you go to the doctor. It's not like you are enrolling in a study, or intentionally trying to see if some exposure does something. You just happened to be exposed to a lot of things just in your daily life, and your doctor happens to be recording that as time goes on. Maybe we want to be able to use that data to answer some medical questions that maybe we couldn't answer in a randomized setting because randomized trials are expensive, or maybe in certain cases, they're not ethical because they can't be always randomizing people to certain different treatments and things like that.

Lucy McGowan: That's the set up. So a propensity score is a way to try to achieve that same kind of balance, but without having to do randomization. So it's essentially the actual quantity, the mathematical quantity is your probability of being exposed, conditional on your observed features. So on things that we can observe about you.

Lucy McGowan: We try to calculate that probability, and then we use that probability to be able to balance an exposed and an unexposed group. Then once they're balanced, we can then draw inferences between them.

Lucy McGowan: My work uses weighting methods to balance the two groups, but there's a lot of other ways. You can use matching algorithms, or you can use some different advanced modeling techniques to allow that to happen. But I think it's really with this movement towards having all this data that's just generally available that maybe wasn't intended to be part of a study, but we still want to be able to use it to answer questions. These type of tools are becoming, I think, really important.

Kirill Eremenko: Gotcha. Are you going to write an R Package around that?

Lucy McGowan: Yeah. There are some great R Packages that do some of this propensity score, weighting and balancing, and things like that. But I have, as part of my dissertation, another piece of this that is really important is when you've done, after you've done your propensity score, balancing a thing that people are often really worried about, especially in an observational setting where you weren't randomized to be on exposure, is that what if we've missed.

Lucy McGowan: You propensity score is based on your observed features, so it's conditional on the things that we've collected about these patients, or different people that are involved in your cohort that maybe you missed something that was really important, like you didn't collect data on it, and because it isn't randomized, we don't know if the effect that you're seeing is due to the exposure, or it's due to this thing that we missed.

Lucy McGowan: I have worked on developing methods for doing sensitivity analysis, which basically allow you to see how bad would it have to be to change our results, or what size of something that might be missing, how big would that effect have to be in order to make R exposure outcome result be null.

Lucy McGowan: I do have an R package out there that was part of my dissertation work called Tip-R, and it allows you to these tipping point analysis where you can see a sensitivity analysis of if I was missing something, how big would that something have to be in order to tip my analysis to make it no longer conclusive in the direction that I thought it was.

Kirill Eremenko: Wow. Congrats on releasing the R package. That's a big step in anyone's career.

Lucy McGowan: Thank you. Yeah.

Kirill Eremenko: Yeah, maybe that will be helpful for someone, Tip-R, for analyzing then something can make it tip over the edge.

Lucy McGowan: Exactly.

Kirill Eremenko: Okay. That's really interesting. Was it hard to make the move from Religious Studies and Italian into Bioscience, and statistics, and algorithm?

Lucy McGowan: I don't know, it wasn't as hard I think as it might have been. I really enjoy mathematics, and logic, and programming. I had taken a lot of courses throughout my undergraduate career in the math and statistics department. I had, going into my Master's degree, I had all the normal prerequisites, so like math, linear algebra, calculus, and calcs, and everything like that.

Lucy McGowan: From the mathematics perspective, it wasn't too hard because I pretty much was at a similar level as my classmates from that perspective. I had done a thesis, an undergraduate thesis in the Religious Studies department. It was this small qualitative analysis. But one thing that was interesting from it was that I really wanted to turn it into a quantitative analysis the whole time.

Lucy McGowan: I was looking at social justice movements in the catholic church, that was my specific area. But it easily translated into public health, and the kind of work that I do now. My Master's research focused on racial disparities in public health. It was a natural transition from my undergrad research, although it moved more into the quantitative space.

Lucy McGowan: It naturally progressed. But I also, I think a piece that made this transition also a little bit smoother, my father is a statistician, and my grandfather is also a statistician.

Kirill Eremenko: There we go.

Lucy McGowan: Yeah, so I grew up around the area, and I think I have ... I think that definitely plays a-

Kirill Eremenko: Runs in the family.

Lucy McGowan: Yeah. It plays a role, because I was very familiar with the work that they do, and how even though I was working in the humanities, I could easily think about how that translates into the quantitative side.

Kirill Eremenko: That's really cool.

Lucy McGowan: Yeah.

Kirill Eremenko: That's not to say, that's just a very interesting coincidence, but it's not to say for our listeners that if you don't have statisticians in the family, you can't be a data scientist if you studied arts. Not at all. Anybody can make that move.

Lucy McGowan: Right. Definitely.

Kirill Eremenko: Your example is a good testament to that.

Lucy McGowan: Yeah. No, I think the transition ... I know, it's actually really interesting in my department. My advisor, the woman that he advised just before me, this is totally coincidental, but she also studied Religious Studies as an undergraduate degree.

Kirill Eremenko: Wow.

Lucy McGowan: I feel like we were the only two that have made this exact transition, and we happen to have the exact same advisor, which is a very strange coincidence. I do think that lots of people make the transition from all different backgrounds into data science and statistics. Yeah.

Kirill Eremenko: So for John Hopkins post doc, are you going to have to move to California?

Lucy McGowan: No. John's Hopkins actually it's in Baltimore. But I'm actually not moving. I'm married. My husband is in Nashville. He is working here, and so my advisor there has worked out a great way for me to be able to ... I will be doing, I'm going to be in person when I first start up to get to know everybody, and feel like I'm part of the team. So I'll be there for about a week when I'm first starting, and then I'll be going in about once a month to re-acclimate and get some face to face time with everyone. But otherwise, I'm going to be working remotely, which is another great thing I think in data science in general, that we have the opportunity to be able to work remotely so often because so much of what we do is you essentially just need a computer, and maybe a quiet place to work.

Lucy McGowan: Then, of course, the team that I'll be working on has meetings three times a week that I'll be calling into, so I'll get a lot of time to be able to collaborate and things like that. But yeah, I will be staying in Nashville. So I'll still be here when you visit.

Kirill Eremenko: Great. Great. I'll make sure to hit you up when I come over and check out that Listening Room bar you mentioned.

Lucy McGowan: Yeah, the Listening Room. That's right.

Kirill Eremenko: Gotcha. Okay. There's a couple of other topics that I'm really excited to talk about. On your LinkedIn, I noticed that you've got quite a bit of experience with SaaS, and you were even the SaaS ambassador back at your student days. I was wondering, we haven't had many guests talk about SaaS. Could you give us a quick run down of what SaaS is all about, and how it compares to R?

Lucy McGowan: Yeah. That's a great question. This is really, it's an interesting ... and this is not anything to do with whether or not SaaS are better, but I use R much more frequently now than I use SaaS. Partly just because SaaS is expensive and I don't have it on my local machine. But during my Master's program, I almost exclusively used SaaS. That was when I was a SaaS student ambassador. I actually was a content instructor for a while for SaaS. So I was a contractor for them for a little bit. I taught a little bit with their education group.

Lucy McGowan: SaaS essentially offers a product that allows you to do a lot of the analysis parts of data science. Their product, I think of it as split into two main functions, although there is a lot else that you can do in SaaS besides this. But there is the data step where you're doing a lot of data cleaning, and data manipulating. Then there's what people call the proc step of the procedure, so that's where they're doing a lot of model fitting, or fitting different correlations and things like that.

Lucy McGowan: It's very much a procedural language. So now you can do a lot of things via a guru where you can point and click, but you also can still script your procedures. They're a little bit more constrained as in comparison to R just in that the procedures are set, and you can set certain options. But you're generally not writing your own functions, and things like that. Although, you can create what they call macros that are akin to functions. I think of it as a little bit of a stricter language.

Lucy McGowan: Some things that SaaS offers, they're very much validated, so they have a lot of employees that work for their company that spend a lot of time validating their software to make sure you're always going to get the same answer, and that answer ought to be correct, and things like that.

Lucy McGowan: They have extensive documentation that explains both the statistics, and the programming side of their product. Yeah, it's really interesting. I think SaaS offers something a little bit different from R because it is a monetized product. So it's relatively expensive, but a lot of pharmaceutical companies are still using SaaS a lot. I think a lot of, in the financial industry, people use SaaS a good amount. At least over here in America, that's been something.

Lucy McGowan: I grew up actually in North Carolina, which is SaaS is Headquartered in Cary, North Carolina. I was very familiar with the SaaS ecosystem. It's known as a really good place to work. So that was something that I grew up knowing a good amount about just because I knew a lot of people who had parents or something that worked for SaaS.

Kirill Eremenko: Yeah, makes sense.

Lucy McGowan: Yeah.

Kirill Eremenko: I also, it's I guess what I also see in, not necessarily just in SaaS, but in other software as well. Like, for instance, Oracle and others where it's a software that has been around for a while. It's good, it is expensive though. But because it's been around for a while, a lot of corporations have already adopted it. Therefore, it's entrenched in them. There is this movement to open source software, but there is no way in the world that huge organizations which have lots and lots of money are going to try save on cost, or are going to very quickly adopt R programming, or very quickly adopt open source database systems even though they might have their advantages, simply because it's going to be very risky, and very costly to make that transition just due to the sheer volume of those organizations.

Kirill Eremenko: That's why people who are as data scientists, it's good to be aware of the tool that the organization you're going into is using. If it's SaaS, or if it's Oracle, or if it's SQL Server, then it's a good idea to learn that tool because even though R program is getting more and more adopted, and it's free, and it's open source, it's not guaranteed that every organization is going to adopt it tomorrow. It's going to take years, tens, 20, 30 years maybe.

Kirill Eremenko: Some organizations might choose not to go with that change because they like SaaS, and because as you say, SaaS offers a bit of a different spectrum of products, or the way the tool works rather than R.

Lucy McGowan: Yeah. Totally. I'm a big fan of open source products, and now I do typically use R rather than SaaS. But I think that that's really good advice in general, knowing what your company is using, and trying to make sure that that's something that you're able to use as well.

Lucy McGowan: Something that I've seen a lot, I don't know if you've seen this too. My husband works for a bank. He works for a Swiss bank. They use SaaS, but he also has recently been able to introduce them to R. It's been interesting because he is very proficient and fast because that's their main bread and butter. As you said, all their data is basically in these SaaS databases. So it's costly for them to think about moving it out of that framework, so he does do a lot in SaaS.

Lucy McGowan: But then one thing that, a lot of these close source options; because they do such extensive validation, and things like this, they're a little bit slower to adopt like newer methodology. Being able, I think, as a data scientist, being able to integrate both your knowledge of maybe some of these close source resources that we have to be using, being able to integrate both that and open source, like something like R Python where they're going to have more cutting edge methodology. Being able to do both, I think is really valuable.

Lucy McGowan: I've seen people be able to do that, so a lot of the data still remains in SaaS, and a lot of the activities that happen all the time will be done in SaaS. But then when there's a new question being posed, and maybe a new machine learning technique, one that's new, tried, and it's something that hasn't yet been implemented in some close source language, being able to also figure out how to try to answer the question using an open source software. I think it's a cool thing that you can value add, and you can give to your company.

Kirill Eremenko: Gotcha. Totally agree. Very, very valid point there. Okay, I now wanted to jump to our next topic for discussion, and that is that you have a new course which you just released on Data Camp. Congratulations.

Lucy McGowan: Thank you.

Kirill Eremenko: Tell us a bit about that.

Lucy McGowan: Yeah. So I just, a couple of weeks ago released a course on building dashboards with Shiny Dashboard. I'm really excited about it, because it's kind of neat. It's different from what I do in my PHD work. So it's like all more visualization, and being able to present data in an easy to digest way.

Lucy McGowan: But I think it's really an important skill because one of the things that collaborators I work with, and especially some in Biostatistics we're typically dealing with clinicians who are ... they're content matter experts, which I think data science in general isn't something that's really common where we're working with these content matter experts that maybe are not statistics or data science, or computational experts, but they know their field really well.

Lucy McGowan: So being able to visually show them what we're working on as we go in something like a dashboard format I found to be really invaluable because they can see things in the data that maybe we wouldn't necessarily recognize because I'm not a doctor in this specific area that I might be helping with, so they might see things that I wouldn't know about. Then at the same time, I might recognize some statistical anomalies that they wouldn't necessarily understand it, being able to present that visually is really important.

Lucy McGowan: Those are kind of some reasons why I'm really passionate about building things like dashboards to be able to display your work as you go, and to be able to communicate your results.

Lucy McGowan: There's a package in R called Shiny Dashboard that allows you to build these interactive dashboards in a really visually appealing way. It's actually pretty straightforward to do. We've got a short four hour course that it's got some videos in our space as well as a lot of exercises to get you on board with it.

Lucy McGowan: We use some fun data sets. I've got some NASA data sets that I pulled from ... NASA has an API where they provide different data. Then we also use some Star Wars data. It's cool. By the end, as a user, you build your own dashboard. It's got an interactive leaflet map, and all different things. It's a pretty neat course.

Kirill Eremenko: That's so cool. You can tell that it's a fun course by just watching the intro like when I watched it. At the end, I loved the fire.

Lucy McGowan: I know. Yeah, that was fireball.

Kirill Eremenko: Well, I was like, "Where did that come from?" It ties into the mood. You're very excited when you're talking about it there. That's fun to see that.

Lucy McGowan: Yeah. Good. I was excited about that. They added that in post production, so I hadn't seen it till it launched. But I was pretty excited. That was cool.

Kirill Eremenko: That's so cool. I just wanted to ask you, have you used Tablo before?

Lucy McGowan: I actually have not used Tablo. I'm familiar with what Tablo is, but I've never used it. No, I do most of my graphics in R. Even actually when I was using SaaS, I did a lot of my graphics in R. But I know that lots of people love Tablo. I think I haven't worked ... though I've worked primarily in academia.

Kirill Eremenko: Tablo is more of business.

Lucy McGowan: It's just not as prevalent in [crosstalk 00:44:39].

Kirill Eremenko: Yeah.

Lucy McGowan: Right, yeah.

Kirill Eremenko: I just wanted to, for the benefit of our listeners, to say that Shiny is like Tablo. It's not, I wouldn't say it's Dragon Drop, but in terms of the dashboard functionalities. I haven't used it myself. I've seen dashboards in Shiny, but I would say that it's similar. The purpose is similar. If you're a avid R user, then it can serve as well, for you as well as Tablo serves in the business community.

Kirill Eremenko: Yeah, definitely worth checking out. We at Super Data Science don't have a course in Shiny at the moment. We might in the future. So if anybody is interested, this sounds like a great opportunity.

Kirill Eremenko: You already have 3787 students enrolled. That's really cool.

Lucy McGowan: Yeah. We got pretty good numbers right up. I think it's a testament just the package itself. I'm not an author on that package. I just teach the class, but I think that it's a great package. The R Studio folks have done a really great job with Shiny in general. But Shiny Dashboard is an example of something they've built out in a really great way. Very user friendly.

Kirill Eremenko: Yeah. Speaking of R Studio, it takes us ... By the way, we'll link to the course on the show notes. So if anybody wants to check it out, definitely have a look.

Kirill Eremenko: Speaking of R Studio, let's talk Hadley Wickham. Hadley Wickham is the creator of ggplot. You mentioned he actually recommended for you to come on this podcast, which I'm very grateful. Hadley if you're listening to this, thank you very much.

Kirill Eremenko: You mentioned you worked with R Studio for six months. Could you tell us a bit more about that experience?

Lucy McGowan: Yeah, I'd love to. Now it's actually become a formal internship program that they offer for Summer internship. This year is the first year that they've formalized it with an application process and everything. It's really exciting. I think they're going to have a lot more people doing it.

Lucy McGowan: But I worked with them last year. I worked on the tidyverse team. Had this team, I was directly under Jenny Brian. She's awesome. She's really great for the R community. She's the best teacher I've ever met, but she's also just ... She's taught me a lot just about workflow, and code style, and things like that. She's really good at utilizing version control, and are really great, and easy to use, and also utilizing all of the things that it's good for kind of way.

Lucy McGowan: When I was working with Jenny, the thing that we did is we built the Google Drive package, which is basically an R Package that interfaces with the Google Drive API. So it lets you essentially do anything you could do from the Drive, let's say from your R console. So things like uploading files, downloading files, copying files, sharing files in different ways, and being able to visualize what's on your Drive Navigating folders, and things like that.

Lucy McGowan: That was our primary project when I was working there. But it was really an awesome experience. I think asides from being able to work from begin to end on a package, and being able to get it on CRAN, and get some real users, and some real bugs and things like that. That process was wonderful, but then also being able to work directly with another coder.

Lucy McGowan: In academia, a lot of times we work by ourselves. We don't do a lot of pair programming and things like that. At least not at the PHD level, because we're working on our own research. A lot of it's independent. It was really nice for me to be able to develop these skills, to be able to work on a team, and also be able to work directly with one other person coding alongside Jenny. That was really awesome.

Lucy McGowan: Then the tidyverse in general has just a great ... they have a coding style that's really very readable, and very easy to implement. So that was something, when you're coding by yourself sometimes I think it's hard to think about implementing a style. But then once you start working with a team, you need to be able to adapt to what they are used to. I found that being able to adapt that particular style even when I'm coding by myself, I use that now, and I think it's been really an exciting thing.

Lucy McGowan: Yeah, it was awesome. It was a great experience. I would highly recommend it. I think, for this year, I think their internships have ... I think the application process is closed, but they'll have them again next year I'm sure.

Kirill Eremenko: Where are they located by the way?

Lucy McGowan: Our studios, they're actually ... most people are remote. The tidyverse team is 100% remote. Everybody works ... Hadley lives in Texas, Jenny lives in Canada, people live all over the place. The Hub where the most R Studio employees live is Boston. There are a good number of people that come into a Boston office. But for the internship, it's definitely not required. In fact, I'm not sure that any of the interns have ever come into a single location, but they do have things like work weeks where we'll all come out and work together for an extended period of time to get to see each other face to face, and work out any things that are easier to work out in person than remotely.

Lucy McGowan: I did attend one of these work weeks. But other than that, I just met about once a week with the team. Then we communicate largely via Slack.

Kirill Eremenko: Yeah. Okay. Cool. I'm surprised Hadley Wickham is in Texas. I thought he was from New Zealand originally.

Lucy McGowan: Yeah, he is. He was from New Zealand. He lived in New Zealand. Then I'm not sure what brought him here initially. I know he worked at Rice for a little while, Rice University. That was over here in the States side. Then I guess he just decided to stay. But he works remotely from Texas. That's where he ... yeah.

Kirill Eremenko: Good. Good, okay. Well, thank you. Thank you for that overview. Maybe some of our listeners will join on to that internship program. If not this year, then next.
Lucy McGowan: Yeah.

Kirill Eremenko: I got a few rapid fire questions for you as we're nearing the end of this podcast. Are you ready for this?

Lucy McGowan: I think so.

Kirill Eremenko: Okay. What would you say is your biggest challenge you've ever had as a data scientist?

Lucy McGowan: That's a great question. The biggest challenge I've ever had as a data scientist, I think ... Well, I was pondering this question before. I came up with a couple of answers, but I don't love any of them. I think what I'm going to say, I think-

Kirill Eremenko: The biggest challenge is that you don't have any challenges.

Lucy McGowan: I know. No, I think it's lots of small challenges. I did recently defend my dissertation, which I think it was challenging. I don't know that it was the biggest challenge. The thing that's challenging, I think in general about trying to distill the papyrus journal to data science, but for the works that I was doing, I spent four years working on a dissertation, and then you have to distill it down into about a really quick presentation. We have about an hour to allow everyone to see the things that you've worked on. I think that being able to pull really complex ideas into this digestible small token that you basically can hand to someone and say that this is the thing I'd want you to take-away with. I think that's a really challenging problem that we face.

Lucy McGowan: But it's also an exciting thing to get to do as data scientists. I think its something, thinking about ... when I think about the field that data science is moving towards, I think one thing that's going to become increasingly important is being able to take these really complex models, or these complex methods that we're using, and be able to explain them in a way that the general public can understand.

Lucy McGowan: I've faced that challenge over and over again, so I don't know that it's the single biggest challenge, but it's something that I've faced a lot. I faced it certainly in trying to distill all my dissertation work for my defense. I hope that's a good answer.

Kirill Eremenko: Yeah. That's a great answer. That's a great answer, and I agree with that, that in data science I find it's interesting that you can't just make it coherent and cohesive, and make sense, and it's like compact. It also has to be entertaining, or it has to be at least engaging at the same time, right?

Lucy McGowan: Right.

Kirill Eremenko: People are not going to want to even know things if you make it really dry, people are going to get bored even with the most exciting insight. So we have to convey in an entertaining manner. That's also part of the challenge I guess.

Lucy McGowan: Yeah.

Kirill Eremenko: Okay. Next question. Apart from your PHD career, because we've already talked about that, but apart from that, what is a recent win you can share with us that you've had in your role as a data scientist. Something that you're proud of.

Lucy McGowan: Like as I said, I guess defending my PHD [crosstalk 00:53:55].

Kirill Eremenko: That's why I [crosstalk 00:53:55].

Lucy McGowan: Yeah, also we've already discussed my data camp course, so that was exciting doing that, getting it launched. Let's see. A recent win that I have had in data science. This is ...

Kirill Eremenko: Yeah, you've done so many things already. We've talked about the R-Ladies, we've talked about PHD, we've talked about the course. I guess I'm stretching in here a lot by asking you for another one.

Lucy McGowan: Yeah, for an additional one, I know. I'm sure, I mean, there's lots of small wins along the way, but it's hard to think of ... it's always hard to think in superlatives for the best one.

Kirill Eremenko: Yeah, that's right. Well, I'm sure there's probably lots of ones where you've helped somebody. I consider those, even though they're small sometimes, I consider them quite important at the same time.

Lucy McGowan: Yeah, I do have one that I ... Recently, this is not necessarily a testament to my help, but I did have a, there was an R lady who ... Our open site it's this group that is basically a group of R users that are dedicated to open science. They do really cool work. But one of the things that they do is every year, they offer what's known as an on-conference.

Lucy McGowan: This is essentially they invite between 50 and 70 or so researchers. They pay for your travel and accommodations. But they bring you out to a location, and you work for about a week, or a little bit less than a week altogether on building, mostly it's building R packages, but doing different things to help the R community.

Lucy McGowan: It's like there's brain storming, and there's community building, and networking. It's just a really, really cool conference. I went to it last year. But it's tough because they can only ... it's a small group that they can invite. But I had, there's an R lady that was really interested in going. She was working on her application, and she had a couple of us review. So I helped review her application.

Lucy McGowan: It's totally a testament to her that she was one of the people selected. Not at all a testament to me, but I was really excited that she just recently found out that she was selected to go. Actually, we had a lot of R ladies that ended up getting selected to attend. I think that felt like a bit of a win. I don't know that I can really claim the win myself, but it was something that I was excited about, because I think growing that community in a way that ... being able to give back after I could get such a good experience from it, and then being able to encourage someone else to be able to get that same great experience I think was really exciting.

Kirill Eremenko: Yeah. It's important, helping each other out. That's a great one. Yeah.

Lucy McGowan: Yeah.

Kirill Eremenko: Thanks for sharing, and congratulations to your colleague. It sounds like an exciting opportunity.

Lucy McGowan: Yeah.

Kirill Eremenko: Okay. What is your one most favorite thing about being a data scientist?

Lucy McGowan: I think generally, the reason data science appeals to me is that I just really like being able to apply my quantitative skills to a wide variety of problems. My degrees are in biostatistics, that's generally applying to the medical field. But throughout my tenure, even just during my PHD, I worked as a data scientist for a small startup, then I worked for R Studio for a little bit. I do some work for the department of Veteran Affairs.

Lucy McGowan: I've been able to work for a really wide variety of types of topics, and answer all different types of question, which I think is just so fun. I just love that. I also, I really love the community. I think because it's such a broad field, and people are working on such varied complex topics. Once you have your tools set, you have your computation abilities, and your communication skills, and your statistical acumen, you can learn and contribute to so many different areas. I think that's something for me that I really love about data science in general.

Kirill Eremenko: Okay. That's a great answer. I can totally agree to that. Yeah, it's so exciting to be in this field, and be able to apply your skills set all over the place, and the community, is great. What's there not to like, right? Everything is great in data science.

Lucy McGowan: Yeah. True.

Kirill Eremenko: Okay. Next one is; from your perspective, from all the things you've seen in ... we've covered so many topics just even in just this podcast. I'm sure you have a much broader exposure to the field in general. From everything you've seen, where do you think the field of data science is going, and what should our listeners prepare for to be ready for the future that's coming ahead in the next three or five years?

Lucy McGowan: Yeah, that's a great question. I think in general we're moving to this age of information overload, which has two consequences for data scientists. So the first I think is largely really a good thing. When I see a study now, I want to know exactly where the data came from, and how it was analyzed. Often, if it's something that I actually think is important to me, I want to be able to reproduce the result. I see this more and more, this craving for even more information even though we're in this information overload space, because we have the ability to share information so easily.

Lucy McGowan: I think it's really cool, and it's leading to really neat innovations. But what it means as data scientists is that we're really pushed to use best practices in terms of reproducibility so that this can be possible. Things like version control, and working in a scripting language instead of a gooey, and documenting and justifying data changes, and all that kind of stuff I think is becoming increasingly important because we have this ability to transfer information so easily.

Lucy McGowan: Then the second consequence of a data or an information overload would be that it can be hard, I mean this reflects on what I was saying earlier, but I think it can be hard for the general public to be able to sift through what's important, and what's noise.

Lucy McGowan: I think that means that as data scientists, we need to brush up on our communication skills, and being able to distill really complex things. This really complex model that I just fit; while I've fully documented it for those people that want to be able to reproduce it, I also need to be able to condense it into a digestible sound bite that doesn't water it down, but also doesn't overload the public with unnecessary information.

Lucy McGowan: That's what you were saying before about being able to make it exciting and interesting, but somehow still hold on to the truth that comes from it, being able to make sure we're communicating things that are both true, and also important and not being able to sift through all of the nitty gritty to get it condensed down to the single point that we're trying to get across. I think that is also going to be really important in the coming years for data scientists to be able to gain that skill.

Lucy McGowan: Aside from being able to fit that really fancy model, being able to explain it in a really easily digestible way so that people can actually utilize the information that you've gleaned from it.

Kirill Eremenko: Love that answer. From the whole concept of information overload, I can totally feel it even just in my own skin. As time passes, as the years go by, it becomes harder and harder just to keep up with everything, and almost feels like that you get bombarded. You go on Facebook, or you go on Instagram, you get bombarded by so many photos, by so much information that unless you look at something right now when it just came up, you will never ever see it again because it will get bulked down, and it will get drowned in all the information that will come afterwards.

Kirill Eremenko: My question to you on that is I totally agree that we need to become better at how we convey information, and make sure we condense it, and we distill it. But it almost even feels to me like that the volume at which this information is coming, and even the speed at which, the speed of this information is growing, we won't be able to keep up. What do you think? Do you think that as a race, we'll be able to keep up with this information? That five years down the track, we'll be still able to deal with it in a manageable way, or will it completely over-flood us?

Lucy McGowan: Yeah, that's a great question. I think part of that, and I don't know who would do it, stickle there is set to determine this. But I think a big piece of it is prioritization of information and figuring out who determined what information is getting prioritized, and what's floating to the top of our news sources, or whatever it might be, and what isn't. I think that's going to be the thing that's going to really define whether or not we can stay afloat among this information overload, because if things are adequately prioritized in such a way that the really important information is floating to the top, and the things that are being discovered or maybe aren't so important are sifting to the bottom, then I think we'll be just fine.

Lucy McGowan: But if the policy makers or whoever is defining what's getting the top prioritization. If that can't get worked out, I think that you're right. It will be really hard to go through. I tend to be an optimist, so I think it's all going to be okay.

Kirill Eremenko: Yeah. Well, let's hope so. I'm sure it will be okay, but maybe in a different way. Like we'll be a different world that maybe we're just not used to yet. So that will be ...

Lucy McGowan: That's right. Yeah. I mean, maybe we have to self prioritize better what information we take in.

Kirill Eremenko: Yeah. True. Or maybe, what's his name? Elon Musk, and his neuro-link, right?

Lucy McGowan: Yeah.

Kirill Eremenko: That will be a thing. We won't have this capacity threshold that we're about to hit, or we've already hit. We'll be able to process like ten hundred thousand times more information than we're processing now, and the problem will be gone.

Lucy McGowan: That's great. That's right. That's it. I think I like that source. We'll just get better brains.

Kirill Eremenko: Yeah, true. All right. Well, thank you very much Lucy for coming to the show. Great conversation. Where can our listeners get in touch with you, find you, follow your work, and just hear more about all the things that you're doing?

Lucy McGowan: Yeah. Thank you so much. I really enjoyed this conversation as well. I'm on Twitter. Lucystats is my twitter handle. I have a website Then I also, one of my fellow student at Vanderbilt, Nick Strayer and I run a blog called Live Free or Dichotomize. That is Often there it's mostly R. I write about different things that kind of little solo analysis that I've done in R, little fun things that you can try in R, little tips and tricks.

Lucy McGowan: Then Nick is a really great data visual edition guru, so he writes a lot. He uses R and Java Script primarily. He has a lot of really cool visualizations that he puts on our blog. Yeah. That's where you can find me.

Kirill Eremenko: Awesome. Of course, your new course, and R ladies as well.

Lucy McGowan: That's right.

Kirill Eremenko: Yeah. Okay. Cool. We'll put all those links in the show notes. I have one last question for you today, what is a book that you would like to recommend to our listeners?

Lucy McGowan: Okay. I have a book that was recently recommended to me by two people. A colleague at Vanderbilt as well as Stephanie Hicks, a bio statistician at John's Hopkins. I actually, I have not yet read this book. This is a strange recommendation, but I really trust both of their instincts. I'm currently on the wait list at the library to get a copy of this book. I'm really excited to read it. If your listeners read it we can read it along together, and maybe we can communicate via Twitter something about what we think of it.

Lucy McGowan: The title of the book is "I Know How She Does It; How Successful Women Make the Most of Their Time" by Laura Vanderkam. Essentially, my understanding of this book is basically an analysis and discussion of time logs from 143 different women that make at least $100000 year with at least one child, living at home.

Lucy McGowan: It talks about how these women define success, and also how they're able to find a successful work life balance. Generally, it seems that a big take home is that a lot of them don't confine their work to the traditional Monday to Friday to 5:00 schedule unlike to everything else. About 75% it seems work out of these women work outside of work hours, and do personal stuff during the core work time.

Lucy McGowan: This book, I'm really excited about because not only, I think its evidence based and data driven way to look at thinking about work-life balance. It also provides a little bit of insight into how different people work in different ways.

Lucy McGowan: I think for men and women, this could be an interesting book because it can help you relate to your colleagues even if it's not something that you ... I think it can also be helpful for you individually. But to be able to understand better why your colleagues might work the way they do for various reasons, I think that this will be a neat book for data scientists in general, because I think work life balance is really important for any field especially for a field like data science where you, in theory you can work from anywhere, anytime. So being able to figure out the best way to make that work with your particular constraint, I think is really important.

Kirill Eremenko: Awesome. Well, thank you so much. The book is "I Know How She Does It: How Successful Women Make the Most of Their Time" Great recommendation. Guys, check it out. Then chat to Lucy on Twitter about what you thought of the book.

Kirill Eremenko: Once again, thank you so much Lucy for coming on the show. Fantastic having you here, and thank you for sharing all the amazing insights with our audience.

Lucy McGowan: Thank you. It was great.

Kirill Eremenko: All right. There you have it. That was Lucy D'Agostino McGowan. I hope you enjoyed this podcast. Thank you so much Lucy for coming on this show. There were so many valuable things that she shared with us. Of course, some great chat. I love how we started off by talking about something completely unrelated about music. I liked a lot of things we talked about.

Kirill Eremenko: Of course, I'm going to say that the most valuable here is women in data science. I'm not saying that just because it's common to say that these days and things like that. I really believe that those ladies, girls, and women who do choose to get into the space of data science, or for that matter any kind of science, technology, engineering, mathematics space, which is just inherently and historically dominated by male professionals, I truly believe they deserve the recognition, support, encouragement, and deserve to be treated equally with everybody else because it's a passion. It's something that you like, and it's something that you want to be doing. You don't want to be always ...

Kirill Eremenko: If you put yourself like if I put myself as a man in their shoes, and I don't want to always be worrying about am I being included in things? Am I being looked down upon? Am I being mistreated in certain ways? I want that not to be a concern. I want just to be worrying about my career, my profession, and progressing.

Kirill Eremenko: Whether you're male or female, I'm sure you've picked up some valuable insights from what Lucy shared today. Try to apply that in your career. Try to, especially if you're a man, try to help women feel treated equally, and included. When you see acts of misjudgment, call them out, and just make it a safe and fun place for everyone. We want everyone to have an enjoyable career in data science.

Kirill Eremenko: On that note, you can get the show notes, and the episode transcript at Super Data Science. Just head on over to, and there we've got a ton of extra materials for you, a ton of links that Lucy shared with us after this episode from that interesting blog post that she was talking about to the different communities that were mentioned, to her course, to her own LinkedIn, to her own Twitter, and even Hadley Wickham's book of recipes for cocktails. Hadley, if you're listening to this, I can't wait to have you on the show, and we can talk more about your cocktail recipes then.

Kirill Eremenko: On that note, thank you so much for being here today. If you did enjoy this podcast, leave us a rating on iTunes. It means a lot to us and will help spread the word, and get more people involved in this data science movement.

Kirill Eremenko: I look forward to seeing you back here next time. Until then, happy analyzing.

Kirill Eremenko
Kirill Eremenko

I’m a Data Scientist and Entrepreneur. I also teach Data Science Online and host the SDS podcast where I interview some of the most inspiring Data Scientists from all around the world. I am passionate about bringing Data Science and Analytics to the world!

What are you waiting for?


as seen on: