SDS 315: Making Data Accessible

Podcast Guest: Gabriela de Queiroz

November 19, 2019

After a ton of positive feedback, we’ve got Gabriela back with more insights such as Gabriela’s whirlwind of talks this year and how she manages her time, her talk at DataScienceGO, the MAX & DAX projects at IBM, and Gabriela’s projects R-Ladies and AI Inclusive.

About Gabriela de Queiroz
Gabriela de Queiroz is a Sr. Developer Advocate/Sr. Engineering & Data Science Manager at IBM where she leads the CODAIT Machine Learning Team. She works in different open source projects and is actively involved with several organizations to foster an inclusive community. She is the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 150 chapters in 45+ countries. She has worked in several startups where she built teams, developed statistical models and employed a variety of techniques to derive insights and drive data-centric decisions. She likes to mentor and shares her knowledge through mentorship programs, tutorials and talks.
Overview
When Gabriela was first on the show, it was her first ever podcast appearance and it was incredibly well received. The promo video of her podcast has been featured in her talks over the past year and all that led her to appearing on three more podcasts. People were excited most about Gabriela’s work at IBM and her community of R-Ladies and the importance of female-first data. With all this, how does Gabriela find the time for 20 talks in one year? She’s been trying to get more selective in her talks. She keeps them local to her home because they’re a form of relaxation for Gabriela.
As for her talk at DataScienceGO, I had great feedback from both those who were new to data science and those senior data scientists. The secret here is that Gabriela tries to mix her topics, she leaves heavy math out of it and stay in the middle of the field. Her talk was titled Deep Learning for Everyone. She mentioned in the talk her motivation is that there is more than a hundred new papers on machine learning published daily—how do you keep up with it? There are tactics to keep up with the latest models, but Gabriela’s talk offered a path to explore the internet and everything new. How can you start and continue your deep learning journey? As to what works especially well at DSGO, Gabriela thinks it’s the mixture of the hands-on workshops and specified and general talks that can speak to virtually every group that comes through the doors at the conference.
MAX and DAX are, respectfully, the Model Asset Exchange and the Data Asset Exchange at IBM, which were both referenced in Gabriela’s talk. Gabriela’s team ran into a problem, which is across the board in our industry, of data licensing. We’ve got them for software, but it’s not there yet for data, despite how much it’s needed. There was a need for someone to get data sets without worrying about licensing, that’s what DAX is for. CDLA, community data license agreement, is one such license format that DAX employs to ensure their data is open and usable to data scientists. These two asset exchanges come together to give a complete set of tools and ecosystem for data scientists both seasoned and brand new to the field.
Gabriela’s other project, R-Ladies, now has 170 chapters in 46 countries with 55,000 members worldwide. So, if you want to start a chapter of either organization what do they do and where do you start? If someone wants to start a chapter, first send an email to info@rladies.org to start the process. You’ll get information on infrastructure and R-Ladies provides everything needed to start your own chapter. Recently, Gabriela has been less involved, allowing the chapters to grow on their own. She’s shifted her focus to the problems of discrimination and bias in AI. She’s working on a new group, AI Inclusive, to bring more diversity into the AI field to bring awareness to the lack of involvement of certain groups in AI. She makes the point that the communities least involved in AI are going to suffer the most from the mistakes the AI industry makes.
In this episode you will learn:
  • What’s new with Gabriela [7:14]
  • How does Gabriela find the time? [10:50]
  • Gabriela at DSGO [16:45]
  • IBM’s Data Asset Exchange [27:48]
  • What is Docker? [44:32]
  • Gabriela’s code background & team [46:34]
  • R-Ladies [54:50]
Items mentioned in this podcast:
Follow Gabriela

Podcast Transcript

Kirill: This is episode number 315 with Founder of rladies.org, Gabriela de Queiroz.

Kirill: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex, simple.
Kirill: This episode is brought to you by DataScienceGO 2020, our very own Data Science Conference. We’ve already done three events in the past three years and we’re moving into our fourth year in 2020. And to give you a feel for what to expect, here are some stats from DSGO 2019. We had 620 attendees fly in from 25 different countries. 38 speakers gave talks. 150 plus business decision makers attended the sessions as well and get this, 2,400 cups of coffee were drank during the networking sessions.
Kirill: So DataScienceGO is not just a place where you will get all the top data science skills that you need for your career, that’s definitely a huge component of the conference. But also it’s a great place where the community comes together to network. At DataScienceGO you will meet data scientists and professionals from companies like Accenture, AIG, Wells Fargo, MasterCard, Facebook, Google, IBM, Microsoft, Salesforce, Teradata, Amazon, eBay, Shopify, and many many more. So this is a great opportunity to meet a network of your colleagues, to meet and start catching up with your mentor or maybe to even meet the manager at the next company that you’ll be working for.
Kirill: At DataScienceGO 2020, we’ve been almost doubling every single year. So we’re expecting about 1,000 attendees at this next event. DataScienceGO is happening on the weekend of the 6th, 7th and 8th of November 2020. And you can already secure your tickets today at datasciencego.com. And one more thing is that we actually have different tracks. So we found that this is a very important component for attendees. And we have tracks tailored to your experience. So if you’re a beginner, there’s a beginner track, which will help you get the skills to break into data science. If you’re an intermediate practitioner, there’s an intermediate track for you to progress to advanced and if you’re really advanced, there’s an exclusive advanced track just for you.
Kirill: So whatever your level, you can find the right track, the right talks, the right workshops, the right sessions, and case studies and panels at DataScienceGO. So on that note, this is the best conference for you to attend to skyrocket your data science career so make sure to secure your ticket at datasciencego.com today, and I can’t wait to meet you in person in California in November 2020.
Kirill: Welcome back to the SuperDataScience podcast ladies and gentlemen, today we got a super special episode, Gabriela de Queiroz is returning to the podcast for the second time. After a barrage of positive feedback for her very first appearance on a podcast just over a year ago at the SuperDataScience show, Gabriela is back with lots of exciting insights.
Kirill: In this podcast, you will find out how Gabriela gives dozens, literally dozens of talks per year. Just this year alone, she’s given already 20 talks plus the 26 or so talks that she gave last year, and you’ll find out why she does it, how she feels about it and how she in general finds time to do that, manage a whole team, also appear on podcasts. Then we’ll talk a little bit about DataScienceGO and Gabriela’s talk at DataScienceGO.
Kirill: So Gabriela joined us for DataScienceGO 2018 and for DataScienceGO 2019. So she will not only share some insights from her talk, but also you’ll find out the good and the bad about DataScienceGO. So there is no censorship here, you will hear what you found valuable in DataScienceGO and what we need to improve at the conference. So I think that will be interesting for you to hear as well. Then we’ll talk about the MAX and DAX projects that Gabriela is working on at IBM. So MAX is a model asset exchange and DAX is a data asset exchange. So Gabriela already spoke about the model asset exchange last time she was on the show. It was a new project for IBM. This year, the project has grown and now they’ve added the data asset exchange, which everybody should be excited about because there you can get free data sets with absolutely free copyright so you can use them for whatever you want.
Kirill: In fact, we’re actually go into the topic of software versus data licenses. And you’ll find out what the CDLA license type is, and what it means. So we very often come across situations where we can find the data, but can we use it or can’t we use it and can we use it for our personal use, we can use it for commercial use and things like that. So you’ll find out quite a lot about these licenses. And also in the DAX project, you can get the data and just use it and that is fantastic that IBM is doing that.
Kirill: And also will talk about of course, Gabriela’s project, which has been working on for seven years, which is R-Ladies, they’ve grown 50% since we last spoken to over 150 or even 170 chapters in over 40 plus locations around the world. So we’re very, very excited about that. I’m very excited to hear about R-Ladies, you’ll find out how you can start your own chapter with R-Ladies and also Gabriela shares her new project, brand new project called AI Inclusive, which you can find at ai-inclusive.org, another huge passion project of hers to help the AI community be even more inclusive, and even more welcoming and accepting, and you’ll find out about that and you’ll find out how you can take part and participate in that amazing undertaking that Gabriela has just started recently.
Kirill: So without further ado, I can’t wait for you to check out this podcast please welcome for the second time around Gabriela de Queiroz, Founder of rldies.org and ai-inclusive.org
Kirill: Welcome back to SuperDataScience podcast ladies and gentlemen, super pumped about this episode because for the second time around, I have Gabriela de Queiroz on the show. Gabriela, welcome. How are you today?
Gabriela: I’m great. Wonderful. Yeah, I’m so glad to be back. It’s been an amazing journey for the past year so I’m glad to be back here.
Kirill: I know it’s been like over a year since you were here last time, it’s crazy how time flies so fast.
Gabriela: Yeah, yeah.
Kirill: What’s been going on for you this whole year? What’s new?
Gabriela: Oh my god, so many things. So when I participated, was my first podcast, was my first ever podcast. I got so many good feedback and people writing to me. So the video that you guys made it, it created a such impact in the community and it’s something that I’ve been showing in several talks when I say you have to watch this.
Kirill: The video of your podcast? The pre-video for your podcast, the promotional video?
Gabriela: Yeah, like a promo. Yeah. Oh my god. I love that. I can watch over and over and over again. And it’s so amazing. Yeah. So we did the first podcast ever. And then after that, I did three other podcasts.
Kirill: Wow, congrats. That’s so cool. Well, that’s amazing. It’s like a chain reaction it started.
Gabriela: Totally. Yeah.
Kirill: That’s so cool. So, which podcast did you do?
Gabriela: So I did two. So two in English and one in Portuguese. The one in English was, one it’s called Making Data Simple where I talked about my work with open source and especially inside IBM and so on. The other one was, we were discussing, it was a group of three people. We were discussing the importance of female first spaces. And the third one, which was the Portuguese, which is my first language. So it was my focus in the Portuguese speaking countries, it was the participation of women in technology.
Kirill: Very, very interesting. And yeah, all of those some topics, which you are very passionate about. So that’s exciting. So how did these … First of all, how did the students or listeners from our podcast, give you the feedback and to contact you? Was it through LinkedIn? What was the most common way of people getting in touch?
Gabriela: Yeah, pretty much LinkedIn. I would say 99% of the people that got in touch was through LinkedIn, maybe another 1% through email, and majority was through LinkedIn.
Kirill: What was a common theme? What were people most excited about from your first appearance on this podcast? Because you talked a lot of things.
Gabriela: Yeah, so they were excited about my path and the work that I’m doing at IBM, and then the R-Ladies, the whole community. So they kind of they relate, there are several aspects they could see themselves in my journey. So they wrote back, great to see your journey and seeing a woman from Latin America being successful and so on. So it was great.
Kirill: Fantastic. So exciting to hear. Okay, all right, what else has happened? So it’s three podcasts now. Now you attend like 26 conferences per year. And now you have the podcasts on top of that, how do you even find the time?
Gabriela: Yeah. You have to be focused and it’s impossible to do everything. So, I have to prioritize what are the talks or the places that I would make a bigger impact, or the conferences where I think it’s going to be important to that audience to hear what I have to say. So I’ve been trying to be more selective in terms of talks. And I don’t do very much cross country. So I do more on the west coast where I’m based. So there are a lot of tricks that I learned through the way of doing all this talks. Yeah, this year, slowed down the first semester. But even though slowed down in the first semester, I think I’m hitting the 20th talks that I gave this year.
Kirill: That is so profound. Even hearing it, I can’t imagine that. You’ve done 20 talks in one year, given that you’re sold out the first semesters, probably 20 talks in the past, I don’t know, eight months? That’s crazy. Wow.
Gabriela: It’s something, do you know one thing that I realize is giving talks is a way for me to be resilient, to refill my energy level. So it’s a way for me to connect with people, it’s something that I love the most. So I go and then I connect and then I talk and then I’m giving back. Some people can see this is a stressful journey. For me, it’s like it’s a fun journey. So it’s a way for me to refill my tank.
Kirill: Yeah, that’s amazing. And maybe that’s also why are your talks are so good. As in, you mentioned you’ve been selected. So I’m very honored that you chose to come to DataScienceGO, our conference in San Diego both years. So well, the two plus years. So you came in 2018 after our podcast and then you also came this year just a few weeks ago. So first of all, thank you so much and I’ve had some great feedback about your talks, both times amazing. Everybody loved them. How was your experience in DataScienceGO?
Gabriela: Always amazing. It’s always amazing. It’s like seeing old good friends from the people behind the counter, checking you in, making you welcome. So from that place, actually before that, so let’s go even further. Once I get an email with the invitation, it was all everything very well set up. So I didn’t have to worry about anything, everything was taken care of. And then getting there, the check in process and then the team is always … They try to make feel you comfortable and to make sure that you have everything that you need. So it’s a very smooth conference in terms of being a speaker and the whole community, people are engaged. This talk to be honest, I was freaking out.
Kirill: Why?
Gabriela: Because we had what, 500 this year?
Kirill: 640 this year.
Gabriela: What, even more than that. Yeah. So I got to the main room and I’m like, I don’t know how am I going to go in this stage with so many people in front of me. So I remember getting there in the morning before the talks, when the room was empty. And I was looking the size of the room. And I’m like, wow. You’re like in a few hours, I’ll be there. All right, I got this. So it was amazing. And again, the whole feedback and I think people were engaged with the topics that I was talking about. So it’s always an amazing experience. And I hope to be there again next year, because it’s on my top priority of [crosstalk 00:15:30].
Kirill: That’s so awesome. I would love to personally have you there. I’ve had great feedback from what is interesting about your talk, and I’d love to dive into a bit more on this podcast of what you spoke about at the event. But what it’s interesting about is that I’ve had great feedback from beginners and also from advanced practitioners. So you managed to somehow create a talk that benefits both those starting out like even in the description of your talk, it’s like even if you have zero experiencing in deep learning, you will… I’ll show you how to get started.
Kirill: But also I specifically spoke to practitioner, and he said my top three favorite talks was Sarah Aerni from Salesforce, somebody else and Gabriela de Queiroz from IBM or from R-Ladies. And like, I was very impressed by that, that a very advanced practitioner was this highly in a senior level who’s actually … he came to the conference to hire somebody that other senior practitioner to join his company. He was like, “This was one of my favorite talks”. I was like, wow. How do you do that? How do you manage to combine something of value for beginners and advanced practitioners? What’s your secret? I’ve been trying to figure it out for a past couple weeks.
Gabriela: I don’t think there is a secret but when I’m putting together a talk, I try to be conscious about I don’t know much about the audience. So I need to make sure that I’m going to cover basic stuff as well as advanced stuff. But I try to be also conscious about I don’t want to do math, a talk with formulas and math, I want to be doing more applied work. So I try to mix everything. So then if it’s a beginner, if it’s more like you have more beginners in the audience, it’s going to fit. If you have more advanced audience, your talk is going to fit as well. So that’s kind of how I work when I’m preparing my talks is I try to be in the middle and go back in both directions.
Kirill: Okay, fantastic. Fantastic. Before we move on to the conference, I would really like to know you’ve been to 20 events, at least just this year, what would you say is something that we do well at DataScienceGO, something that helps us stand out, that you see people really getting value out of?
Gabriela: Yeah, so I think that it’s interesting, it’s like you have the workshops that like one day it’s all workshops. So, if people are interested in doing hands on they can do it and then they have two other days, there are more talk focus, and then you have panels. So, you have a mix of everything. So, we have workshops, panels, talks. And then you have talks there are more for aspiring data scientists and you have talks that are more general. So I like this mix. So you are not only focusing on one group, you are trying to accommodate everybody, which I think it’s awesome for a conference. So we are not excluding anyone. We’re trying to include everybody.
Kirill: Okay, fantastic. Thank you. That’s great feedback. And to be fair, so that people also see the developing side, what’s something that you can give some constructive feedback on, we can do better next time? I’m really curious so that we will do better, but I’d love to hear your insights on the one thing that we could improve?
Gabriela: One thing? Okay, so the only thing is this year, we had some delays on the talks. So the schedule was not exactly what was showing in the app, or in the website. So I remember talking to people when they were not sure, which talk was going on so they were a little bit lost. So maybe next year, if we can have I don’t know maybe a monitor something like when you go to the airport, and you have the departures and arrivals. So something like that, where you show the talks and then you say, this talk is delayed, is going to start at X hours. So people know what to expect.
Kirill: Got it.
Gabriela: Yeah, that’s the only thing. And for me in particular, as I speaker, it didn’t matter much, but I think for the audience, I remember being asked, which talk is going on right now? So that’s the feedback. And I’m making this up. I have never seen this panel with the status of the talks anywhere, but this is DataScienceGO, it’s a conference that they try to innovate every year, maybe next year, we have something like this.
Kirill: Yeah. No, that’s a very valid point. And thank you for the idea. We’ve had that feedback and I also noticed, of course, we were running back behind a bit on schedule. Something we’re definitely going to look into next year and it will be very precise and also in case there are certain delays then we’ll make sure there are like signs or as you said, maybe even a screen, that can be very helpful. And that’s definitely something that we will improve.
Kirill: Okay, so that’s some great feedback. Thank you on both sides, the good and the room for growth. And I’m always glad to hear that kind of comments, especially the ones where we could do better because it’s only our third year and we are planning to do this forever, hopefully. So, as long as there’s room to grow, that’s always exciting. Some things we can improve on. Okay, and so on that note, let’s move on to your talk. So unfortunately, I didn’t have the chance to attend your talk in person and the recordings, which will be available soon in DSGO website. They’re not ready yet. So I haven’t had a chance to watch it before our talk today. But that’s exciting because then I’m like one of the listeners who hasn’t seen the talk either so you can run us through from the beginning. What is the title of your talk? And what is it all about?
Gabriela: Yeah, so it was deep learning for everyone or ready to use deep learning models. I use the two titles, because they both fit in this talk. And the idea was, when you are the data scientist, it doesn’t matter if you are aspiring data scientist or more like senior data scientist in the world of machine learning, in particular, deep learning, it’s so hard for you to keep up with everything that is going on in the field. So I mentioned it the talk, like my motivation between the stock was there is more than 100 new machine learning papers being published every day in the archive. So how do you keep up with that? There is no way that you’re going to read 100 papers every day. And then if you search for deep learning you have, I don’t know, more than 4 million results.
Gabriela: If you search for deep learning courses, you have almost 200 million results. So how do you start in this journey? Or how do you keep up with all this information and then, with the open source that we created inside IBM in my team that I mentioned before is the model asset exchange, which is a free and open source place for you to find the state-of-the-art deep learning model. So it’s a good way for you to getting started in this field. And also if you want to go even deeper, so instead of browsing around the internet and trying to find a path, I’m giving this path for you. And then I go for the day, let’s say, we have 30 models that are ready for you to use with the variety of domains, you have text, audio, image, video et cetera. We have different deep learning frameworks like TensorFlow, PyTorch, Keras. And then we have like two versions. We have the deployable ones, we have the trainable ones.
Gabriela: And the only thing that you need to get started it is Docker. So that’s it. Like simpler than that, it cannot be. And then I guide the audience through the ways that you can access the models and I show some examples that if you are interested again, you are going to watch the recording but the audience that is listening to this podcast, they should go and watch the recordings as well because I guide them through ways to access this API. And one may think it’s everything is standardized. So if you go and look to one model and then you go to another model, they have the same standard way. So you are not going to get lost between, how things are being done.
Gabriela: And then I show that you can also train using your own data. And then I also show that there are some tutorials like a learning path. So, we are like, if you don’t know how to start there is this learning path and then you can go through everything. And then the biggest, the highest point of my talk was I showed a demo of something called the veremax. So the veremax, it’s a video theremin, and theremin is an electronic music instrument invented in 1920, where you use your hands to make sounds, you use your hands-
Kirill: In the air?
Gabriela: Yeah, yes.
Kirill: Yeah, I’ve seen that one. You’re hovering your hands over in the air and it’s registering where your hands are and is making music from that.
Gabriela: Exactly. So Va Barbosa, from our team, he created the Veremax, which is a video theremin. And it uses a model from the model asset exchange, the human pose estimator, and it was built with TensorFlow JS. And then it’s me doing some music. So I make a joke that this video I was trying to be a DJ. So it’s me there trying to make some music. So that was, I would say the best moment of the talk where people were laughing and they were curious, how I did that.
Kirill: That’s so cool. Next year you should bring that instrument and play it on your talk.
Gabriela: Yeah, I know.
Kirill: That would be cool.
Gabriela: Yeah, it was fun. So that was the talk. The talk was to guide you through, how can you start and continue your deep learning journey?
Kirill: Okay, Okay, fantastic. So just to get people up to speed because you mentioned DAX on the previous episode podcast, and I’m sure you walked the audience through it on DataScienceGO. But just to get those up to speed, who haven’t seen one of the other, DAX is IBM’s data asset exchange where you are able to download as you mentioned, these templates or these ready to use models, and just start applying it right away. Is that a good summary?
Gabriela: Yeah. So there are two things actually, there is MAX and DAX.
Kirill: Okay, what is MAX?
Gabriela: So MAX is the models, so it’s the model asset exchange, which you can go … that’s the model part that I was talking about where you go and find those ready to use deep learning models. DAX in other hands, is the data asset exchange. One thing that we noticed when we were trying to train some of the models, we had issues in finding data sets that we could use, because of the license, that we could use without any issues with the license. So in the data world, we don’t have a good license framework. For example, software, you have so many different licenses and it’s well understood in the industry where at data you don’t. So how do you know that you can use this data in your enterprise application? Maybe you can’t, because the license is not good.
Kirill: Yeah. We’re just not there yet. Yeah. The world hasn’t developed this framework for licenses around data. It’s needed, but we’re getting there.
Gabriela: Exactly. So it’s something that has been the conversation, especially within some companies and so on. So we found that there was a need for this, you have a place where we are going through the license ourselves. So you don’t have to worry about the license. You can use this data set on this website without worry. We have a standard format as well. So just to give an idea, there is one license that is an open data license called CDLA, which stands for the community data license agreement. And if people is interested in learning more, they go to cdla.io. And then when we are trying to release new data sets, we try as much as we can to use the CDLA license. So if you go to the data asset exchange, you’re going to see we have several of them that it’s under CDLA.
Kirill: And that means that anybody can use it for any purposes?
Gabriela: Yes, exactly.
Kirill: So it’s pretty much like Open Season, you can be download the data set, you probably just need to provide a reference to where it originated from, but you can use it for personal use, for commercial use. Kind of I think I’m reading on your website that it’s somewhere like Linux, it’s from the Linux Foundation.
Gabriela: Exactly. Yeah. It’s from the Linux Foundation. CDLA was created because of the needs, the same thing, the needs of a standard license in the industry where we didn’t have before. So if you look for other places and other companies, they are doing this as well, they are trying to come up with a way for them to share the data in a standard and in an open way, where you don’t have to worry about the CC BY, all the Creative Commons kind of license. So yeah, so that’s the data asset exchange. So the data asset exchange and the model asset exchange they come together now. So now you have the data. Now you have the model, so the whole thing works together. So that’s the goal.
Kirill: I see. So last time we spoke you had, as I understand you had just launched the MAX, the model asset exchange in 2018. And then, I see on the website now that you launched a database asset exchange just recently in July 2019.
Gabriela: July.
Kirill: That is so cool. So congratulations. That’s a big evolution, how’s the whole project going?
Gabriela: It’s going well. We have, I think, maybe 20 or plus, actually, there is a new batch coming up this week. So it’s going to be more than 20 data sets that you are going to be able to use on the data asset exchange. Some of them they came from IBM Research. So in some ways, they are exclusive, you are not going to find this data sets anywhere because they came from IBM Research. So we work with the research side of the house to get all this data that they are creating, and we try to make it available through the data asset exchange. So you go to the data asset exchange and know that you can use that data without worrying.
Kirill: Aha. Okay. Well, I actually, I interviewed someone from IBM Research on the podcast. There you go. Guillermo Cecchi. Do you know Guillermo? He’s from the East Coast though?
Gabriela: No. I don’t. I don’t probably know, it’s such a huge organization and even the IBM Research, it’s huge, but I don’t know his name from the top of my head.
Kirill: Okay, that’s okay. So 20 data sets and growing, does that mean once somebody had downloads and they can only use it with the model asset exchange assets or I go and download one of these data sets and use it with my own tools just for my own exploration?
Gabriela: Anywhere.
Kirill: Anywhere.
Gabriela: It doesn’t need to be through the model asset exchange, it can be anywhere. You can do whatever you want. We also try to have a notebook. Every data set has a combining notebook where you can see how we are exploring the data, how we are manipulating this data and then doing some kind of modeling on top of that. So, again, I think the whole idea is we are trying to cover the very beginner, someone that doesn’t know how to handle data to someone that it’s more interested in the modeling or in the deployment side of the house. So we try to cover both.
Kirill: Fantastic. And I love that large companies like IBM are embracing this whole notion of giving things away for free. Like one way would be to, “No, okay, in order to get this data set, you have to sign up to IBM and pay a certain fee, or you can only use it with our tools and so on. But no, here’s a great data set that we’ve created and curate, which takes time to create and curate. It’s totally understandable. There’s a lot of effort. But I’m really glad that IBM is opening these things up to the community to utilize because at the end of the day, it’s just going to help people train faster, educate faster, and build better things and make the world a better place. So I’m really excited about that.
Gabriela: Yeah, it’s just amazing. And as a data scientist, I’m amazed to be part of this journey. So we are creating as a team, all this to give to the open source community.
Kirill: Got it. So what are some of the examples of data sets? Do you know some of that come off the top of your head that you can name some exciting ones, what industries are they from or what problems do they address?
Gabriela: Yeah. Right now we have a bunch of them they are more text focus. But there is one that it’s interesting that it’s a video one that we are just creating the notebook that is going to go with this data set. There is a bunch of them coming up from different IBM project called Debater datasets. I don’t know all the use cases from the top of my head but as you go through the website that we are going to show on the show notes, you can go in and check.
Kirill: Okay. Yeah. I’m on here.
Gabriela: Yeah.
Kirill: I’m on here right now I can see for instance you have Contracts Proposition Bank, text from approximately 1000 English compliance sentences obtained from IBM’s publicly available contracts, annotated with a layer of “universal” semantic role labels. Fashion-MNIST, a dataset of standardized images of fashion items from 10 classes. What else you have? I have Weather Data, JFK Airport, local climatology data originally collected by JFK Airport. Another one is Forum Classify Dialog Act Classification for Online Discussions. Very cool. Very cool. This is exciting, great variety and you’re planning on adding more data sets with time?
Gabriela: Yeah, this week we are going to release a batch of 13.
Kirill: Okay, fantastic. Well, that’ll take it up. That’s very cool. And so tell us how does this work, with this now that you have the docs, it’s really cool that you have the whole ecosystem, you have the data and then you can download the models and practice. So help me understand if I’m somebody who doesn’t have deep learning experience or in a specific part of deep learning, I don’t have experience, let’s say image classification or text analysis. So I can go and model asset exchange on MAX and download a model that, what is adjustable? What is pre code because every neural network is different.
Kirill: You’ve got to do hyper parameter tuning, got to understand how many layers you’re going to have, how many neurons you’re going to have in each layer, what activation functions to use and other things like that. With somebody as not a beginner, so somebody who is a beginner, what controls do I have inside these models to adjust these parameters and the way that I want this deep learning architecture to look?
Gabriela: Right. So as a beginner, my suggestion would be go with the deployable ones, the ones that are already trained, if we trained once. So you don’t have to worry about tuning the hyper parameters and doing any tweaks. So you don’t have to worry about that. So it could be more about like, “Here’s the model, let me understand how this works and let me see the application. So as a beginner, I think it’s interesting for you to see how things are going to be used. What I can do with this model? I can create like let’s say object detector. So I can create a web app or there was one cool one that I have to mention that we released last week. It’s a yogi.
Gabriela: So it’s based on the human pose estimator. So it’s a system that use the human pose estimator to guess, which yoga pose the user has used before.
Kirill: Nice.
Gabriela: Yeah, so it’s more fun for beginner to see the application and then once you understand on the high level how things work, then you can go let’s say to the GitHub page and then you look how things are being glued together, what are the scripts that you are using? And then you could go even further. I think when you go a little bit further and you want to try things out, then you’re going to the trainable piece. And the trainable ones what we did to facilitate your life is to create a standardized way to train your model.
Gabriela: So if you follow the step by steps, you’re going to be able to train without major issues. So even with the hyper parameters, something that we are working now it’s like how can you tune and all that. So again, for beginners, I would totally start with the deployable. Once you feel more confident, then you go to the trainable piece, and then you see how things work.
Kirill: Fantastic. That’s really cool. I’m looking at this human pose estimator right now. And it’s very cool. You have that photo of the three astronauts standing, and it’s estimating what pose they’re sitting in. It’s so funny.
Gabriela: Yeah, the yoga one, it’s even more awesome because there is a video and then you make a pose, and it is going to say, “Now make, I don’t know, salute”. And then you have to do the salute. And then it’s going to say, “Yeah, you are doing it right”. So it’s a fun demo to show. And people get interested in how things work. So it’s cool to show what are the things that you can do with these models.
Kirill: Okay, okay. Very interesting. Yeah, I’m looking you have like, what is it? Over 40 models right now?
Gabriela: Yeah, almost 40.
Kirill: Okay, that’s really cool.
Gabriela: 30 plus.
Kirill: Almost 30. Yeah, very cool. So yeah, no I understand now. So you download a model, start applying it, that’s a really cool way to learn. Do you apply and then you’re like, okay, how does it work? It’s kind of driving a car, you get in, you drive, you learn how to drive but then if you want to you can open the hood of the car, understand where to put the oil, I still don’t know where to put the oil.
Gabriela: Yeah, all that mechanics. Yeah, that’s a good analogy. It’s exactly like that. You don’t need to know the whole thing if you don’t want to, but if you want to go deeper, then yeah, you have to open the thing and then you can start playing around and changing and also you can you customize your car.
Kirill: Yeah, that’s right. And what I like about that is even if you are not able to learn right away, how it works inside, you can still continue deploy. So you deploy a few models or you want to look inside under the hood, you look and then you’re like, “Okay, I learned a little bit, but it’s still a bit too complex”. You don’t have to get discouraged. You don’t have to stop your progress, there you can go and continue deploying more models playing around with them and come back and learn a bit more about how it works under hood later on.
Gabriela: Exactly. And also, we give you different flavors. Let’s say, you can create IoT applications with the model asset exchange. So you can use the node read flow for example, if you are someone that are more familiar with this side of things, you can use or you can use code pan, you have the JavaScript, CSS, how things work. So depends on where you are coming from, what is your background, if you have any background in this area, or if you don’t have any background at all. We try to create something that is going to be suitable for everybody.
Kirill: Yeah, no, that’s really cool. And these are free to use as well just like the data?
Gabriela: Everything, everything is free.
Kirill: Fantastic. So tell us a bit about Docker. So we had one of the talks at DataScienceGO was actually about Docker. One of the workshops was about using Docker. So these models require Docker in order to run? What is Docker and how is it used?
Gabriela: Yeah, so now you’re getting to the point and I’ll probably not be the best person to explain Docker. But Docker, it’s like a container. So the way that I see Docker is you don’t have to worry about anything else, the Docker will … It’s kind of the analogy, I always like to make analogies because it’s easier to understand. It’s you get a present. So someone who’s going to give you a present, it’s not going to give you a present, usually a present that it only comes with pieces that you cannot put together. You can not assemble everything, you don’t have to worry and there is a manual with structures.
Gabriela: So the Docker is kind of like this. So I don’t have to worry about if things are going to work on my computer because I know that the Docker will have everything that I need to make this work. So I don’t have to worry with installation. I don’t have to worry if let’s say if my Python version is working, everything comes inside this Docker container, this container. And the model asset exchange, we wrap everything. So everything that we do, it’s wrapped into a Docker container. So there is no installation outside.
Kirill: Okay, so you don’t have to worry with what version of Spyder or Jupyter and how-
Gabriela: TensorFlow. Yeah.
Kirill: Okay. Okay, that’s pretty cool. All right. That’s how this whole thing works. All right, and what kind of feedback did you get? So when you presented this at the DataScienceGO or in your other talks, what kind of feedback do you get from people in the audience? What are they most excited about?
Gabriela: Yeah. They are excited about how easy it is to use. And the whole learning path that they can follow. It’s another thing that people get excited. The application sides of the models, that you can create a web app, that you can create different applications. That’s something that gets people excited.
Kirill: Mm-hmm (affirmative). Okay. So you can create like a web application, and actually, it’s going to run online so somebody can visit it and use it?
Gabriela: Yeah, you can run locally, for example, or in our case, we have a long running instance, running with this web app, so you can try it out first before deploying yourself.
Kirill: Okay. And this question is, I’m curious about this. By the sounds of it, these models are written in Python. And at the same time as I understand your background is originally with R and hence you started R-Ladies. How does those two add up together?
Gabriela: Exact, so everything is Python. So one thing that I didn’t mention, with the whole year, which for me, looks like it’s been three years. So I became a manager in December last year.
Kirill: Congratulations. That’s so awesome.
Gabriela: Yeah. So everything shift. And I have less and less time to code, to write code. I’m more thinking about the usability, thinking about how easy it is to use the things that we are developing, thinking strategically, how can we position our project? Not only inside IBM, but as outside IBM, make connections, talk to different teams, thinking about the roadmap. And of course, taking care of my team, which I have 10 people, data scientists and software engineers. But going back to your question, I’m R person for sure. And even though everything is in Python, one thing that I try to do is every time I keep thinking, that’s all cool things, can use I use R to access this model?
Gabriela: So myself and another colleague, we got together and we wrote R script. And you can ingest, you can use the model asset exchange and point inside R. So you can do pretty much the same thing that you are doing in Python, you can do in R.
Kirill: Wow.
Gabriela: Yeah, so it’s not part of the web page yet in our website, but maybe later on. Once I have more time, we are going to do as an alternative. So yeah, so R still on my vein, I’m still doing things as much as I can. Even if it’s, I need to create some metrics around my team, cool. I’m going to use R and then I’ll create a dashboard with R Shiny. So I do more those things for myself, but yeah.
Kirill: Nice, very nice. And how big is your team right now?
Gabriela: So 10 people.
Kirill: 10 people. Wow, that’s very cool. That’s a huge jump. Last time we spoke were you managing anybody at all?
Gabriela: No, I wasn’t.
Kirill: So you went from zero to 10 people in less than a year?
Gabriela: Yeah, I started with maybe six, and then throughout this year, we hired another four.
Kirill: Wow. And they’re all data scientists.
Gabriela: Data scientists, software engineers, they are mix and different backgrounds as well. So like not everybody has a background in computer science. And again, we have mix, so we have like PhDs, masters, bachelor, from computer science, electric engineering, earth sciences, math. Yeah.
Kirill: Okay. Is the goal of this team to create and maintain DAX and MAX or is there a broader scope?
Gabriela: More than that actually. So MAX and DAX are in-house projects where we developed in house, but then part of my team, they work on contributing to open source projects such as TensorFlow, PyTorch, Kares, Apache Arrow, ONNX and other projects. So, I always like to mention this because we have people working full-time on contributing to external open source projects.
Kirill: That is so fascinating. It’s like IBM is paying full-time salaries for people who are working on these open source projects.
Gabriela: Yes.
Kirill: The question is why? That is very amazing, but why?
Gabriela: Yeah, that’s the question that I get asked all the time. And actually open source in general, the question that people ask is how do you make money? Why are you doing this? Because it’s all free, what is the revenue? So first of all, for the open source projects, luckily, we have companies and IBM has been contributing to open source for so many years, is we need companies to be investing. Because all the companies they use open source projects in several ways. So if we don’t keep investing, those projects are going to die. So we are lucky that companies are and again they should invest. In terms of the why, the reason is its mind sharing so you become part of the conversation.
Gabriela: So if you are contributing to TensorFlow, those contributors, they are working with different companies, let’s say, in the case of TensorFlow, they’re working with people from Google. So they are creating the new features, they have the whole roadmap together. So they participate in this discussion, and then bring back to the company. So it’s a cycle. They are giving, they are bringing back.
Kirill: Okay, I see. And also IBM in the end uses these products. So you want to on one hand, participate in maybe there’s a certain feature that IBM needs and it can be added by the IBM developers. Or on the other hand, you’re using this product for free because it’s open source so might as well pay somebody to work on it, as you said, so that it doesn’t die and also as a way of giving back to that the community of people who are developing that product as well.
Gabriela: Yeah, exactly. And we know that the open source projects like the maintainers, the co-maintainers, they work on this as a side project. And then after a few years, they get burned out. And then they have to give up on their projects. So companies need to be fast, need to have employees working on their projects, so the projects don’t die again and the developers don’t have burns out.
Kirill: Yeah, yeah. I understand what you mean. Okay. Well, that’s very, very cool. Let’s talk a bit about R-Ladies. How’s that been going since we last spoke?
Gabriela: Yeah, R-Ladies has been going very well.
Kirill: How many chapters do you have now?
Gabriela: So almost 170 in 46-
Kirill: Really? Last time we talked you had 100 chapters in 30 countries, now you have what, 170 chapters in how many countries?
Gabriela: 46.
Kirill: 46, you grew like 50% in one year.
Gabriela: Yeah.
Kirill: That is insane.
Gabriela: Yeah. It just amazing. And the community itself, it’s doing amazing. We have a good team of people working 24/7 on making sure that the community has everything that it needs. I’ve been less and less involved with R-Ladies because this month actually, it’s going to be the seventh anniversary. And I’ve been doing this for seven years. So from the very beginning. Of course, through all the years we had people helping and we have a team, the leadership team behind but I got to a point where I’m like, “Okay, I think this is going well, I don’t need to be as much involved as before, I need to do something bigger right now”.
Gabriela: And I needed to do something else with the whole AI and especially around discrimination bias and the whole discussion that we are having with AI ethics, and how this is going to affect the whole population. How is affecting, the facial recognition in China and other places. How this is affecting the population, specially the population that its under representated or from a minority group. So I’m now creating a new group called AI Inclusive where the goal is to bring more diversity in the AI field, encouraging more people to get into AI, bringing awareness of everything that is going on because again, those communities are the ones that are going to suffer the most and they are the ones that know nothing about it.
Kirill: Okay. Well, you are just unstoppable Gabriela, from one thing to the next.
Gabriela: I’m the kind of the person that I can’t be passive, I need to be doing stuff. I still have energy to do those things and it’s sad to see people being affected and they are not aware. We are aware in the US in particular, we have all the resources and all the knowledge going on over here but you can go to places like Latin America and Africa, they are not aware. So we need to bring awareness, we need to teach them on how to help or how to create things that will change.
Kirill: Okay. Yeah, no, totally I agree. Definitely. It’s very admirable that you’re working on this project. So what goes into one of these chapters, let’s say if somebody who’s listening to this podcast wants to start a chapter of R-Ladies or chapter of AI Inclusive, what is the process and what do these chapters do? Do you provide certain resources or exercises and how does the organization of R-Ladies or AI Inclusive participate in the life of each individual chapter?
Gabriela: Yeah, that’s a good question. So R-Ladies it’s chapter based. So it’s based on usually cities. So when someone that’s interested in starting a chapter on their city, they send us an email info@rladies.org. And they say “I’m interested in creating a chapter in my city, what should I do?” And then we have Laura, she’ll send you an email with everything that you need and we are going to provide you the basic infrastructure and then you can start your own chapter. We provide everything that you need to launch your chapter. And the way it works is it can be like tutorials, and we have a bunch of materials but you can also create your own material based on your audience.
Gabriela: So you can create tutorials you can give talks, you can invite speakers to give talks. Some chapters, they host book clubs, where they usually do a survey with either chapter and see, which book people is interested in learning. So they go and then they read the book and they discuss. So there are several formats that each chapter has.
Gabriela: And for the AI Inclusive, it’s going to be a similar path, because one thing that it’s interesting and it worked is chapters are, I think, as a community based organization is the way to go because it’s hard for you to get inserted in a specific community, unless you have someone local. So when you have someone local, they are going to make an impact in their community. So that’s something that we learned with R-Ladies, is the chapter base is the best way to reach those communities that are even hard to reach.
Kirill: Okay. Yeah. I see. And do you have to be female to start R-Ladies chapter or can males also start this chapters?
Gabriela: Yeah. So the mission is to bring diversity to R community. So for being the organizer you have to be women or identify as female. That’s the one requirements that we make, but it doesn’t mean that you as a male cannot attend. So, each chapter has a different policy. So like for example some chapters they say if you are a male you can come but you should bring someone, a women or gender minority just your plus one. Some chapters they say, you can come but we are going to like, the front rows, if it’s a talk, the front rows are going to be reserved for women, gender minorities, and then the back for everybody.
Gabriela: So one question that we got asked all the time is, yeah, so are you excluding and we’re like, “No, we are not excluding, we include everyone but our goal is to bring diversity”. So that’s our main mission. But we are not excluding anyone. We are just make some kind of prioritization in some ways.
Kirill: Okay. All right. I understand. Okay, very exciting. So over 170 chapters, 40 something countries. Very cool. Do you know how many members you have?
Gabriela: 55,000.
Kirill: 55,000 members worldwide. That is a massive community very.
Gabriela: That’s amazing. Yeah.
Kirill: You should try bringing them all together in one city one day.
Gabriela: Yeah, well that’s our goal, but we need a huge budget to be able to do this.
Kirill: Yeah, that’s true. What has been your favorite, one of your favorite talks at one of the R-Ladies summits or I mean, one of the R-Ladies meetups? Is there something that jumps to mind, a very impactful talk where you learned the most?
Gabriela: I don’t know. All the talks, they are so … you learn something. It’s hard to pick one, but I want to mention one chapter. Something that I saw in one chapter that brought my attention, and I was wow. So there was a chapter, I think was Indonesia, where someone posted the picture of their event. And all the women, they were in the front learning, someone was giving a tutorial and everybody was learning the front rows. And then at the very, very back, the partners, and husbands, they were in the back taking care of the children. So that was so amazing to see because, some countries, the woman is the provider. So they have to be with the children all the time. They don’t have this opportunity to be away to learn something. And that photo was just amazing to see that their partners were there with them, taking care of their children, so they could go and learn something.
Kirill: Wow, that’s awesome. That’s really supportive.
Gabriela: Yeah, I’ve never seen anything more impactful than that picture.
Kirill: Fantastic. Wow. Well, congratulations. Sounds like you’re doing some amazing work there and really helping the community. Yeah, it was very exciting. I think it was last year that R-Ladies was a community partner for DataScienceGO. We also support as much diversity as we can. And it’s very exciting to see you at the conference. I think that’s a big inspiration to all the ladies that come and who already know you. That’s cool.
Gabriela: Yeah, that was very good. And I was glad to see at DataScienceGO the code of conduct, it’s so important to make sure that we have a code of conduct for the speakers, for the audience, so they are aware of things that they should do or they should not do. So thank you for that.
Kirill: Yes. What’s your impression? You think we did quite well at DataScienceGO?
Gabriela: Yeah, I think it’s been a balanced audience. I think is still majority male, but it’s more balanced than in other conference for sure. The diversity of speakers, it was amazing.
Kirill: Yeah. We have 40% female speakers, usually 30% and 40% female speakers.
Gabriela: Yeah, I would go to the website every now and then to see the new people being added. And I’m like, I was cheering every new female underrepresented person. I’m like, yeah.
Kirill: That’s awesome. That’s very good. Did you feel that vibe within the event?. I heard that from Pablos Holman last year that he was very impressed that we have people from … This year, we had people from 25 different countries there flying in, come attend the conference coming from 25 different countries. How crazy is that? I’m always very inspired to see how diverse the audience there is and how people are so freely openly connecting and networking with each other and learning from each other’s backgrounds and home countries.
Gabriela: Yeah, me too. And I love the map that we have at DataScienceGO, where people connect, where they’re coming from, so you can see the big picture and you’re like, wow, there is a person from that very little country, and they came all the way over to DataScienceGO. It’s just amazing to see the variety. Yeah.
Kirill: Totally agree. Well, Gabriela this brings us to the end of our conversation today I wanted to say a huge thank you for first of all coming to DataScienceGO and for the second time and hopefully I’ll see you there next year and coming for the podcast for the second time as well. Where can our audience get in touch, connect with you if they haven’t yet and follow your journey?
Gabriela: Yeah, absolutely. So I’m a huge Twitter person so you can find me on Twitter, LinkedIn. In my website, you have all my information. So my website is k-roz.com. So there is a whole… My email, Twitter, LinkedIn and so on. And you should check the AI Inclusive. So the website is ai-inclusive.org and if anyone is interested in knowing more or joining us, please send us an email. We are looking for partners and people to help. So feel free to reach to us there.
Kirill: Fantastic and we’ll include all those links in the show notes, so for sure make sure to follow Gabriela, @gdequeiroz join thousands of other people that are following and getting the valuable insights that Gabriela shares. Yeah. On that note, we’re going to wrap up. Anything you’d like to wish to our audience before we finish off today?
Gabriela: Yeah. Go and check the model asset exchange, if you have any feedback, any question, we have a slack channel that you can find the URL, the website. So if you have any question, go there and ask and again, please provide us feedback. We are doing this for the community, and I hope you enjoy.
Kirill: Fantastic. Well, thank you so much Gabriela for coming again on the show. And I’m sure I’ll see if not this year, then next year.
Gabriela: Absolutely. Thank you again for all the work and the community is awesome.
Kirill: Thank you very much, ladies and gentlemen for being part of today’s episode. Super excited that we had this conversation with Gabriela and that you were part of it. It is so inspiring to see somebody putting so much effort, so much of her time, so much of her resources and just energy into creating a safe space, a safe community, safe environment, which includes everybody, which includes data scientists, which helps data scientists thrive regardless of their age or gender or ethnicity or background.
Kirill: It’s very exciting to see what the community of data science is something that we try to and we do put in as much effort as we can with our speakers at the DataScienceGO, with how we promote DataScienceGO, how inclusive DataScienceGO and other [inaudible 01:11:10] such as SuperDataScience and our courses are, but without any shadow of doubt, Gabriela is by far one of the most impactful people in that space and I highly admire what she’s doing. So if you can help out in any way with either rladies.org or ai-inclusive.org, then don’t be a stranger, reach out to Gabriela whether it’s through LinkedIn, through Twitter, through the email that she mentioned, which you also find at our show notes and get involved, start helping build this community. We want data science to be an all inclusive and amazing community where everybody can take part.
Kirill: So that is the episode, as usual, you can get all the Show Notes for this episode at www.superdatascience.com/315. That’s www.superdatascience.com/315. There you’ll get the transcript for this episode. All the URLs for resources mentioned in this podcast, including Gabriela’s LinkedIn and Twitter where you can connect with her and any other materials that we talked about. And the final thought is if you enjoyed this podcast and if you know somebody who might be interested in collaborating with R-Ladies or AI Inclusive, then forward this episode to them. Forward them the link for this episode, and they can also enjoy the insights that Gabriela shared, and maybe participate and change other people’s lives. And by the way, if you want to meet Gabrielle in person, very likely you’ll see her at DataScienceGO 2020. So if you haven’t gotten your tickets yet, then you can get them at www.datasciencego.com.
Kirill: Thank you for being here today. I look forward to seeing you back here next time and until then, happy analyzing.
Show All

Share on

Related Podcasts