SDS 501: Statistical Programming with Friends

Podcast Guest: Jared Lander

August 31, 2021

In today’s episode we look at Jared’s educational venues, he provides overviews on data science consulting, discusses R vs Python, and more. He also talks about his upcoming NY R Conference, now completely virtual, on September 9-10. SuperDataScience podcast listeners can get 20% off using promo code SDS20. You will be able to watch the SuperDataScience podcast episode with Drew Conway live on Friday 9/10 at the NY R conference.

About Jared Lander
Jared P. Lander is Chief Data Scientist of Lander Analytics, the Organizer of the New York Open Statistical Programming Meetup and the New York and Government R Conferences, an Adjunct Professor at Columbia Business School, and a Visiting Lecturer at Princeton University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. Jared oversees the long-term direction of the company and acts as Lead Data Scientist, researching the best strategy, models, and algorithms for modern data needs. He is the author of R for Everyone (now in its second edition), a book about R Programming geared toward Data Scientists and Non-Statisticians alike.
Overview
Jared and I have known each other for over 5 years. We met at Jared’s R meet-up which attracts some of the biggest names in R in the world. I’ve met so many community members through these meetups and networked on topics like deep learning and other curricula. The seeds of my deep learning study group actually began at one of these meetups, taking a few dozen from the meet up to over 200 on our newsletter.
Coming up is Jared’s NY R Conference, now completely virtual, on September 9 and 10th. SuperDataScience podcast listeners can get 20% off with SDS20. Speakers include Andrew Gelman, Asmae Toumi, Max Kuhn, Caitlin Hudon, Alexa Fredston, Jonathan Bratt, and plenty more. A week before the event, there are multiple meetups and workshops starting on September 1st. The whole goal is to create a community where open-source developers teach you their tools, in person.
Apart from these vents, Jared also has a book, now on its third edition, called “R for Everyone”. I highly recommend it for anyone starting in R or aiming to be up to date with the latest updates. It goes from installing R to installing Shiny apps and more. Jared calls it the full gamut of R capabilities for all skill levels and experience levels. Jared also speaks to something we’ve discussed in the past on putting R into production through apps, server jobs, and more. This is a great segue into Jared’s company Lander Consulting, which utilizes R for statistical purposes. They have a broad range of customers from metal workers to the Minnesota Vikings. They create and deploy models in virtually every industry and that’s something I love about consulting firms, whether your company is collecting and using data already or needs to optimize it. You can find yourself working with so many different forms of business and ways data can boost industries. Lander is hiring and if you have client-facing communication skills in addition to experience in R, you can check out their openings. One joking way Jared puts it is an example he gave of explaining his work to a former linebacker on the Vikings’ staff.
We couldn’t get through without discussing the infamous R vs. Python debate. Jared finds that both languages are almost entirely converged. Both can handle either job. So it comes down to the way you think—Python is for computer engineers, R is for mathematicians and statisticians. I find that to be one of the better answers to that question. From there we moved into a quick Q&A section.
The biggest and most well-formed question that came out of the post on LinkedIn was this: What is Jared’s secret for staying so thin while eating so much pizza? Jared always orders pizza at the conferences and actually keeps data on peoples’ pizza preferences. Unfortunately, Jared doesn’t feel qualified to quantitatively affirm which pizza is the best considering all the variables involved. He even did his thesis on pizza and the process of judging and rating pizza. Some interesting things he found was that coal-burning pizza ovens are most popular—but that may be because that’s where the tourists go. 
In this episode you will learn: 
  • Jared’s R meetups and our professional history 3:27
  • NYHackR [6:42]
  • The NY R Conference [13:25]
  • R for Everyone [18:55]
  • Lander Analytics [22:10]
  • Job openings at Lander Analytics [25:04]
  • R vs. Python [29:15]
  • The importance of pizza in Jared’s life [32:19]  
Items mentioned in this podcast:
Follow Jared:
Follow Jon:
Episode Transcript

Podcast Transcript

Jon Krohn: 00:00

This is episode number 501 with Jared Lander, Columbia University stats lecturer, organizer of the world’s largest R meetup and CEO of Lander Analytics. 
Jon Krohn: 00:12
Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and best-selling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple. 
Jon Krohn: 00:42
Welcome back to the SuperDataScience Podcast. We’ve got a juicy one for you today with Jared Lander. Jared leads the New York open statistical programming meetup, which is the world’s largest R meetup, but it also features other open source programming languages like Python, for talks from global leaders in data science and machine learning. Jared also runs the R conference, which is approaching its seventh annual iteration. He wrote the popular book R for Everyone. He’s a junked professor of stats at Columbia University and visiting lecturer at Princeton University. And none of the massive responsibilities that I’ve just mentioned are Jared’s day job. Nope, for that he’s the CEO and Chief Data Scientist of Lander Analytics, a data science consulting firm.
Jon Krohn: 01:29
Clearly, Jared has a ton of expertise on stats and machine learning, particularly in the R programming language. In today’s episode, we start off by introducing you to his various venues, meetup sessions, conferences, and book for learning about statistical programming. In the second half of the episode, Jared provides an overview of the intellectually satisfying world of data science consulting, and he details both the soft and hard skills that make an outstanding data science consultant. We wrap the episode up by venturing into the age-old R versus Python debate. Today’s episode will hold special appeal for those of you interested in R or data science consulting but should be readily digestible to anyone, whether you’re a technical data scientist or more oriented toward commercial applications. You ready? Let’s go. 
Jon Krohn: 02:23
Jared Lander, this is awesome. I’m so excited that you’re on the podcast. I’ve wanted to have you on the podcast since the very beginning. I just couldn’t get the courage up to ask. Jared, where in the world are you calling in from? 
Jared Lander: 02:37
I’m very excited to be on this podcast with you. And I’ve known you for a long time. It’s really exciting to be here. And I am right now in this small semi-rural town outside of Princeton, New Jersey. We’re spending the summer here and it’s a nice little place to be. 
Jon Krohn: 02:53
That sounds nice. And I didn’t know this about you, but in reviewing your profile for this episode, it looks like you went to Princeton High School. So maybe you even grew up in the area that you’re in right now. 
Jared Lander: 03:05
So I actually went to Princeton Day School, which is about two miles from Princeton High School, they’re right near each other. And while I went to high school in this area, I actually grew up across the river in Bucks County and we just came across the river each day, like George Washington on a pretty Christmas Eve. 
Jon Krohn: 03:21
That’s exactly what you probably look like every single day. 
Jared Lander: 03:24
Every day. 
Jon Krohn: 03:26
So, yeah, as you mentioned, we’ve known each other for a long time, not all the way back to Bucks County, but many years, something like six years now. And I know you because I was attending your meetup, which we’re going to talk about in detail in a second. But it is the world’s most popular R meetup and it takes place in New York. About pre-COVID, it was once a month and I loved going to it. You get some of the biggest name data scientists in the world. You get to talk to them in person, ask them questions and the community around it is the best. So I’ve met so many friends and professional contacts through the meetup over the years. So, yeah, so that’s how we initially got to know you. 
Jon Krohn: 04:14
And from that seed has sprung many wonderful things in my life. So there was a meeting around 2016, so at the beginning of every single meetup you ask, if anybody has any job openings that they’re hiring for. And I took that as an opportunity to just stand up and speak to the two or 300 people there and say, “I’m interested in learning about Deep Learning and so I think it would be great to get together as a group and go through material, a particular curriculum that we designed together, meet regularly and go over that curriculum.” And as I was speaking, it didn’t seem like anyone was even looking at me. And I felt really embarrassed when I sat down. So the entire talk, I just felt like I embarrassed myself, but at the end of the talk, a dozen people came up to me. 
Jon Krohn: 05:03
I got their email addresses, they had some ideas for material to cover. And that formed the seed of my Deep learning study group, which we haven’t been doing anything in the pandemic, but we’ve had something like two dozen of these sessions. It has grown to about 200 people on the email list. And that was the basis for a series of talks that I was giving on deep learning. So through running this deep learning study group, I started to feel like the public might be interested in hearing an intro to deep learning. So I put one together, and then you sent an email in something like December, 2016, saying that you didn’t have a guest for the January, 2017 meetup. And so I was like, “I mean, I have this deep learning talk that I’ve given a couple of times.” So I reached out to you about it. And after a bit of back and forth, you made sure that it was a solid talk that I had great things to cover. And then I delivered this talk and it really changed my life, Jared. So being able to speak at your open statistical programming meetup, it was through that, that I met Debra Williams at Pearson who invited me to write my book, Deep Learning Illustrated, a bunch of video tutorials on deep learning, and now the mathematical foundations of machine learning. And so Jared, I am hugely indebted to you and as a reward, I’ll now let you speak. 
Jared Lander: 06:25
Well, I’m so glad that the meetup has changed your life in this way. And that’s really the purpose of this group for people to learn, but also to help people and make everyone better. And that’s really amazing to hear that. That’s exactly what I like to have happened to people at this group. 
Jon Krohn: 06:41
Yeah, it’s extraordinary. And so while I’ve been going there for many years and you’ve been running it that entire time, I understand that the group has a longer history than that. So it’s now called the New York Open Statistical Programming Meetup, but you can even see in the meetup.com URL that it used to be called something called nyhackr with no E like hacker, but just hackr. Right, so tell us a bit about that. So it had a name change, some origin story, fill us up. 
Jared Lander: 07:15
So the data science community loves a good pun, hence nyhackr. And the meetup group was founded in April, 2009 by Josh Reich. Some people might know him as the founder of the Simple, the online bank, but I always know him as the guy who founded the R Meetup. And within the first few months of the meetup, another person Drew Conway got involved and took over as the organizer. And it was Drew Conway, which a lot of people in the data science know that name. He built it up from enough people to fit in a small room at NYU to 1800 people over the course of two years. And he really built it, this group, bring consistent meetings and bringing in good speakers. And I joined around the third or fourth meetup. It was actually a [inaudible 00:08:07] and meetup and Andrew Gelman posted about it on his blog. And so that day I read about it. I went that night to the meetup group. And at that point they were up at Columbia. Andrew Gelman let us use what he called the playroom in the SEPA building. And it was just this great room full of people talking about R, stats and being really cool with each other. 
Jon Krohn: 08:29
That’s a really great sentence. Yeah. Playing with R and stats and being really cool. 
Jared Lander: 08:34
It was such a good crew. I remember one time we were all standing around, this is back in Stack Overflow’s earlier days and we’re all saying, “Oh yeah, it’s such a helpful group on Stack Overflow. That’s so nice and helpful. And it’s so collaborative.” And we looked around the room. We’re like, “Oh, it’s all of us talking about the questions about R to each other on Stack Overflow when we’re not at the meetup.” 
Jon Krohn: 08:56
That’s so funny that both the R community and the Stack Overflow community at that time, was so much smaller. Yeah. That you would be yeah, collaborating online and not even really be aware of each other. And then all of a sudden, boom, you’re in the same room. A couple of names there. So Drew Conway, he is an iconic data scientist, absolutely. And viewers will be happy to know that I recently booked him as a guest. So about a month from now, we’re recording his episode. So you can expect about a month after this episode airs that to be live. And so we’re very excited for that. And then you also mentioned Andrew Gelman. So I think many people in the stats world know who he is. He is perhaps the best known statistician on the planet. He’s a Columbia University faculty member. And you were a statistician in his lab, right? 
Jared Lander: 09:49
Yes. I mean, I’ll say I was in his lab and maybe I quite wasn’t. I did research with him. If I get to call myself as being part of his lab, I’ll take it as a great honor. And my research started off as data entry, then worked my way up to actually coding and building plots for him. And I learned actually a lot about the process of writing up your code to do an analysis from being in his lab because he needed something done. And I’m like, “Oh crap, I’ve got to figure this out. Andrew Gelman needs this.” And I learned a lot about how to code in R but also just smart ways to do things, smart ways to plot things that’s really helpful. 
Jon Krohn: 10:25
Yeah. He’s a brilliant statistician. I’ve seen him speak at a number of events that you’ve run Jared, and I’ve known his name for a really long time because my introduction to R was his book with Jennifer Hill on hierarchical multilevel models. I can’t remember the exact title right now. 
Jared Lander: 10:45
It’s Data Analysis Using Regression and Multilevel/Hierarchical Models. [crosstalk 00:10:49]. The slash is important. It makes it easier to search for it on Amazon. 
Jon Krohn: 10:56
Great. 
Jon Krohn: 11:03
Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such, I only subscribed to a handful of email newsletters. Those that provide a massive signal to noise ratio. One of the very few that meet my strict criteria is the Data Science Insider. If you weren’t aware of it already, the Data Science Insider is a 100% free newsletter that the SuperDataScience team creates and sends out every Friday. We pour over all of the news and identify the most important breakthroughs in the field of data science, machine learning and artificial intelligence. The top five, simply five news items. The top five items are handpicked. The items that we’re confident will be most relevant to your personal and professional growth. Each of the five articles is summarized into a standardized, easy to read format and then packed gently into a single email. This means that you don’t have to go and read the whole article. 
Jon Krohn: 12:01
You can read our summary and be up to speed on the latest and greatest data innovations in no time at all. That said, if any items do particularly tickle your fancy then you can click through and read the full article. This is what I do. I skim the Data Science Insider newsletter every week. Those items that are relevant to me, I read the summary in full. And if that signals to me that I should be digging into the full original piece for example to pour over figures, equations, code, or experimental methodology, I click through and dig deep. So if you’d like to get the best signal to noise ratio out there in data science, machine learning and AI news, subscribe to the Data Science Insider which is completely free, no strings attached at www.superdatascience.com/dsi. And now let’s return to our amazing episode. 
Jon Krohn: 12:59
So yeah, that was my intro to R. I was a MATLAB person before that. And during my PhD, a post-doc who I was working with said, “This is the most amazing book.” At that time, it was brand new and I did learn an absolute ton from it. So, yeah, it’s amazing for me to be able to see him indirectly now through you. So speaking at your conference, actually, I mean, we should talk about that. So you run the R conference. It’s seventh iteration in New York is coming up on September 9th and 10th, and it’s going to be in person, but also have a virtual component. So that’s exciting for every possible reason, because one, we get to have a conference in person again for the first time in a long time, two years. 
Jared Lander: 13:47
I’m very excited about that. 
Jon Krohn: 13:49
But on top of that, anybody, anywhere in the world, who’s listening to the podcast right now, will be able to sign up and attend virtually as well. 
Jared Lander: 13:57
Yes. And we’ve always wanted this meetup group to be accessible. And the conference is an extension of the meetup, and we want to be able to reach as many people as possible. So we’re very excited that it’s both in-person because the conference is a lot of fun to be there. There’s great food, there’s music, there’s great people. Let alone, anyone in the world who can’t get on a plane to New York, for whatever reason, because of the pandemic, they don’t want to travel, whatever, they can reach it from anywhere. We’re really excited for this big reach this year. 
Jon Krohn: 14:25
Nice. Yeah, that is going to be great. And so last year it was entirely virtual and Andrew Gelman was the speaker right after me. And I constantly have these… I wake up in the middle of the night, but it’s a nightmare because I remember that talk, I made it too long, which I always do. I never learn, I always make my talks too long. So I raced to the end of, to race through and get through the last final slides. And meanwhile, Andrew Gelman is waiting in the virtual wing watching me. And so now his only exposure to me is just me speaking at double the normal rate and frequency. 
Jared Lander: 15:02
Consider yourself lucky because the very first year I purposely put someone right after Andrew Gelman, so that it’d be a good lead and you put friends right after Seinfeld. I thought this was going to be a great lead, but that poor Dan Shen, one of my students, I per se, want to give a good lead. And he’s like, “You made me go right after Andrew Gelman. That’s not fair.” So now I put people in front of Andrew Gelman, at least I won’t be in the room waiting for him watching you. 
Jon Krohn: 15:27
Yeah. I mean, it is truly an honor. So it’s amazing, so it’s again, another example of how the community that you have fostered in New York around first R and now open statistical programming in general, regardless of what language is being used for that. Yeah, really, it’s amazing what it’s been able to do. Now, so this conference coming up September 9th and 10th, who do you have lined up for speakers and also you have workshops right as well? 
Jared Lander: 15:54
Yes. We have workshops and a meetup the week before the conference. We like to jam-pack a week or two full of events and much like how the Meetup is the Open Statistical Programming Meetup, the conference is R and friends. So anything involving open source, computing, statistics, data, we have talks about. And we’re really excited to get all this in there. We have a meetup and workshops on September 1st, and those workshops are taught by people like Max Kuhn, David Robertson, Lucy D’agostino. And we have [Yuron Yensens 00:16:29]. We have so many people, [inaudible 00:16:31] teaching at these workshops. So it’s a really great way to learn a lot of stuff. And that’s, again, part of the community. We’re having open source developers come and teach you how to use their tool. 
Jon Krohn: 16:39
Nice. So that Jared is an amazing who’s who of workshop leaders and you haven’t even gotten to the speakers yet, right? 
Jared Lander: 16:48
Right. A lot of those people running the workshops are also speaking at the conference, but in addition to them, we have Andrew Gelman doing his usual double life talk with no slides. We have Asmae Toumi, we have Max Kuhn, Caitlin Hudon. We have Wes McKinney. Again, it’s R and friends, Wes Mckinney can be there. It’s for R and friends. 
Jon Krohn: 17:09
The creator of the Pandas library. 
Jared Lander: 17:11
Exactly. And actually beyond me Wes and Dan Shen are the only two people who’ve spoken at all seven of the New York R conferences. 
Jon Krohn: 17:21
Oh wow. 
Jared Lander: 17:22
Yeah. And Wes is always a good sport about it. I made him autograph our books. I made him wear our t-shirts and he’s a great sport about it. In fact, one time he saw my notes and it just said troll Wes on it. That’s all it said on my notes to introduce him. And he’s like, “I expect you to know that by now.” Because he’s a really good sport and he’s just a great guy to have there. 
Jon Krohn: 17:44
Nice. Yeah. Very exciting, amazing speakers. So anybody listening in the world can view a recording or can they even live stream. So obviously these meetups are perfect for people who live in the New York area. They can come in and join, but like the R conference, is it possible for people to view speakers at your meetup from anywhere? 
Jared Lander: 18:08
Yeah. So for the past, over a year we’ve been virtual and anyone could join at any time. And then afterwards we post the videos up on YouTube. It’s at the Lander Analytics YouTube page, and it can be accessed from nyhackr.org. And we have 11 years of presentations up there and about seven years of videos of anything we’ve had at the meetup. So anyone could attend any time. Then after the pandemic ends, we are planning, still we putting all the recordings online and making them live when possible. 
Jon Krohn: 18:40
Nice. All right. That’s awesome, Jared. Really appreciate you making that available to everyone. So I think we’ve now covered a lot of the live events that people can participate in, either in-person or virtually that you run. You also have a book, R for Everyone. So this is a introduction, a hands-on introduction to R from creating data structures, manipulating them, creating visualizations, running statistical tests. Really a great book published by Addison Wesley. And I think you’re on the second edition, ain’t that right? 
Jared Lander: 19:20
The second edition has been published and I am working on the third edition. 
Jon Krohn: 19:23
Oh wow. The Tidyverse just changes so quickly. 
Jared Lander: 19:29
That’s part of the problem. Actually, one of my chapters about how to get data out of Google Sheets, I’ve had to rewrite it twice because this… That means I wrote it the first time then two more times. Because there’s been so many changes, that’s while I’ve been writing the third edition. 
Jon Krohn: 19:44
Oh, I see. Yeah. I was like, I make sense that you’ve rewritten it twice, given the couple editions. So you’ve rewritten it twice just for the third edition? 
Jared Lander: 19:52
Yes. That’s three times now. And this one edition, I had to write this section. 
Jon Krohn: 19:56
Nice. Well I’m sure it will be worth it. Yeah. An iconic book in the space. I can highly recommend it to people who are getting started in R or that want to be up to date on all of the various packages, ways of, as I said, visualizations, statistical programming and so on. Is there are any big sections that I’ve missed? 
Jared Lander: 20:19
Yeah. So the book does start at the beginning, how to install R, because for some people getting their firemen set up is tough, but it accelerates. And by the end there’s doing statistical modeling time series, machine learning. There’s how to build shiny apps in our markdown. So it really takes you out. You can start anywhere if you’re intermediate, start somewhere in the middle and work your way up. It’s the full gamut of just about anything you’d want to do in R. 
Jon Krohn: 20:43
And in case people missed it in episode 491 with Veerle van Leemput we covered how R can be deployed into production systems. So Jared talking about R Shiny, this is a way of creating production quality applications that have a data model running in the backend. And that episode 491 goes into a lot of detail on various ways that you can productionize R code and Veerle makes the case that there’s no reason why you would need to go to another programming language other than R to have production systems running. So everything from your model training to model deployment, even in production systems can be done in R. I don’t know if you have any thoughts that you’d like to add on that particular topic, Jared? 
Jared Lander: 21:31
Yeah. Through my company, we’ve put R into production at many organizations and whether that means a Shiny app or a plumber API, we can expose your work as curl rest point or whether that means just R running as a service job on the machine. There are so many different ways to put R into production. And this is a full, complete language that’s hardened now. And you can do just about anything you want with it. So we don’t just prototype in R, we deploy in R. 
Jon Krohn: 22:02
Wow. All right. Well, you just made an amazing transition to what I wanted to cover as the next topic anyway, which is your company. So you run Lander Analytics, which you just introduced. So what other kinds of, other than, or maybe not other than, but in addition to deploying R apps for people, what other kinds of work do you do with your consultancy? Do you have any big use cases that you’d like to share with the audience? 
Jared Lander: 22:31
Yeah. As much as I love R, the real thing here is the statistics knowledge and machine learning knowledge, the data knowledge, and that’s something that is incredibly valuable because at the end of the day, I have employees that could write this in multiple languages, right? I obviously prefer R but we really want to be able to handle the statistics. And since we’re a consulting company, we have a broad range of customers that we work with. We did a optimization routine for a metal manufacturer to see where they should fabricate the metal and that didn’t technically have statistics. Those Convex optimization along [inaudible 00:23:10]. We have a pharmaceutical company, we helped them figure out variation in their manufacturing process, using topological data analysis. We have another customer, we do a lot of spatial analysis and we really push the machines to the limits with some of the spatial analysis we do. And perhaps most notably, we did work with the Minnesota Vikings to help them figure out which players to draft in the entry draft. 
Jon Krohn: 23:36
Very cool. The Minnesota Vikings hey? 
Jared Lander: 23:38
Yes. 
Jon Krohn: 23:41
That is an incredible breadth of applications. So from manufacturers across metal to pharmaceuticals sports teams, it sounds like you guys could be creating models and deploying them into production and basically in the industry, which is, I think one of the things that’s most exciting about a data science career in general, and maybe being a consultant in particular is that you get to work on such a wide variation of projects. Almost any business today generates and collects data. And if they aren’t, a consulting firm could even be helping them get started with that. 
Jared Lander: 24:19
Absolutely. We can help a company figure out their data strategy, figure what data they need to collect in order to improve their business somehow. And every industry has data uses somewhere in their pipeline. And it’s so fascinating being a consulting company that we have to learn about this client’s business. We need to help figure out what’s going on, how we can prove it with data. And so we go into a company that makes metal and we need to learn about metallurgy. We go to an insurance company, we need to learn about actuarial tables. And it’s amazing all these different industries we get to touch and how cool it is. They all have this data. They’re all doing interesting stuff. They just need to figure out how to make use of it. 
Jon Krohn: 25:03
Amazing. And as it happens, if a listener is listening and they think that this sounds appealing that working as a consultant at a statistical consulting firm like yours is maybe something that they would like to do, you have some open positions? 
Jared Lander: 25:20
Yes. We have a few roles. So it could really appeal to different types of people, depending on the role. We always need data scientists. So we’re looking for all those data scientists at different specialties, whether that’s spatial analysis, whether that’s visualization, machine learning. We’re looking for a Linux system admin, someone who’s really good at Bash and knowing how to work in a [inaudible 00:25:43]. And we’re looking for business development, someone who would help us, help customers get the most out of their services. So a lot of roles that we’re looking to fill here and you can find more at landeranalytics.com. 
Jon Krohn: 25:57
Nice. Very exciting. I hope that you find some amazing people through that. I know that there are some amazing listeners out there. So when you are hiring, let’s say the data scientists in particular, I think that that’s probably going to be most likely of interest to an audience member on the show. So when you are hiring data scientists, what do you look for? Both in terms of soft traits, as well as particular technical skills. So you mentioned, at a high level, you talked about you’re looking for, it could be, people could have specializations at data visualization or a spatial analysis, but are there particular hard skills and soft skills that you’re looking for in general? 
Jared Lander: 26:39
So on the human side, they have to be able to communicate with the customer. We are a client facing business and we do have some people who work for me, who they do work and other people present to the customers, but the more client facing the better. So that means verbal communication skills written just the ability to explain things to customer, because remember a lot of people who hire us, they don’t have training in statistics or math, right, so being able to explain the concept to the customer in a way that clicks for them makes a big difference. So that’s a very, very important skill to have. One of my favorite stories with a customer is I was explaining the models I was fitting to the Minnesota Vikings stuff. And there’s a former linebacker who was very upset with me. He did not like what I was doing. 
Jared Lander: 27:27
It looked like he’s going to leap across the table and beat me up, right. Which he could have easily, obviously. But then I showed this one plot, a variable importance plot, and I saw on his face, it clicked and oh he lit up and then he came over out to the meeting, shook my hand with a big smile, very excited to thank me. And it’s all about communicating to different people, different ways. Some people want to see an equation. Some people want to plot, some people want to explain. So that skill is incredibly important in anyone who works at my company. 
Jon Krohn: 27:56
Nice. So on the job description it says, “Must be able to communicate stats concepts to linebackers.” 
Jared Lander: 28:02
Yes. That’s important. That’s key for every role we have now. 
Jon Krohn: 28:06
Perfect. So that’s the soft skills. What about hard skills? Is there anything that’s general? I mean, does somebody need to be an expert in R specifically to be a data scientist at your firm? Or could they be an expert in another programming language? 
Jared Lander: 28:21
So we employ people who know multiple languages. Now R is obviously first and foremost because of me because that’s my favorite language to use. But we do have people who use Python. We have people who’ve written [inaudible 00:28:32], obviously SQL is very important, even C++, we still use C++ quite often. And the real thing is I need someone who’s expert in a language and then maybe familiar with another language, but then importantly, good programming practices. Using Git defensive programming, being smart about the way you structure things, having a mind for both speed and reliability. So it’s those things that are very important, regardless of the language, we need someone who can go and program their statistical thoughts in a robust way.
Jon Krohn: 29:10
Nicely said, that sounds really great. So I don’t know if you want to weigh into this topic or not, but a question that gets banded around a lot that I see in conferences or on the internet is this R versus Python debate. And so do you have any thoughts on that? I have a lot of opinions on that and I’ve expressed them on the show before, but I’d love to hear yours. 
Jared Lander: 29:36
Yeah. So you hear about this all the time and it always seems like there’s a big fight over it, but I find that fight’s usually one direction. 
Jon Krohn: 29:44
I’m not picking a fight with him. He’s picking a fight with me. 
Jared Lander: 29:51
Yeah. But I do often see people saying, people will be shouted from the roofs, “You have to use Python. Nothing else is real.” Then you have people saying “I’m an R user.” Use whatever you’d like, I don’t care what you use, right. And that’s my take. The languages at this point are 95% converged. There’s very little one language can do that the other one can’t. R used to be much stronger in statistics and machine learning than Python mostly caught up. Python used to be much better as a general purpose language, but R has mostly caught up. So now it comes down to the way you think. 
Jared Lander: 30:24
Python is designed for people who think like computer engineers, R is designed for people who think like mathematicians and statisticians. So for me, a lot of it comes down to the way you want to use it. And I think it sums up in two big things, starting counting a zero versus one. I’ve never met a human who I started counting at zero in their day-to-day life. But then also even the way you call functions or methods, Python is object dot method, R is function of object. And as a math person, I have F of X, Y, I don’t have X.F in math and I’m a math person. So it really depends on how you want to think about your programming and how it relates to your math. 
Jon Krohn: 31:07
That was a beautiful explanation. I’m actually really glad I asked because that was one of the best answer to the question that I’ve heard yet. So thank you very much. I’m not surprised that you had a great answer to it. 
Jared Lander: 31:19
Thank you. 
Jon Krohn: 31:20
So I asked my social media followings on LinkedIn and Twitter, if they had any questions for you. And I got a lot of reactions, so it was definitely a popular post, but for some reason only one question actually came out of it. And this was from Colin Fay. So Colin Fay is in France. And I understand from you that he actually was a speaker in the past at the Open Statistical Programming Meetup. And so he is a data scientist and engineer at a company called Think R which you might know more about than me. 
Jared Lander: 32:01
Actually, a company not too dissimilar from mine. It’s a data science consulting shop out of France. 
Jon Krohn: 32:06
All right. So the main difference being the language they can speak. Consultants who could speak French, they work over at Think R. Cool. So, well, he has a really serious question for you, Jared, and I’m sure this is what everyone who knows you has been dying to ask you and find out the answer to, so it’s, what is your secret for staying so thin while eating so many pizza slices? 
Jared Lander: 32:35
Oh yes. Pizza is a big part of my life. One of the first reasons people… Go ahead. 
Jon Krohn: 32:42
At the conferences or at your meetup, and maybe even at the conferences too, but at the meetup, you always order pizza and they always come from a different place. And you actually have a lot of data going back and many years on people’s pizza preferences, right? 
Jared Lander: 32:55
Yes. Every month at the meetup when it’s in person, we have pizza from different places. Sometimes we’ll do repeats as longitudinal and people rate each place each time they do it. So we have the data being collected. It’s actually at bit.ly/pizzapoll. You can find the data, as the people vote, it gets updated. 
Jon Krohn: 33:14
That’s amazing. Do you know offhand where the best pizza in New York is based on your polls? 
Jared Lander: 33:20
So best is very difficult to answer because it’s such a subjective thing. Do you like Neapolitan? Do you like a classic New York slice? Do you want a square slice? So best is something that I don’t weigh in on because that’s too much to your personal taste. 
Jon Krohn: 33:36
Oh man. I mean, quantitatively, we don’t have some metric that is higher for one pizza place over all the others in the data? 
Jared Lander: 33:43
It’s funny. One of the first things people hear of me is my master’s thesis from over a decade ago about pizza. 
Jon Krohn: 33:51
I didn’t know that. That is not the first thing I’ve heard. It’s evidently the last thing. 
Jared Lander: 33:55
Oh, look at that, things have changed. And it got picked up by famous food pods and people heard about this. And at the time I was analyzing menu pages data. If you remember that website from back in the day, MenuPages where you found menus from your favorite restaurants and they had ratings, and this is before Yelp was a big thing, before Google Maps or ratings, that’s the ratings I can get. And so there, I use the number of ratings as a proxy for the popularity of a place. And I found some interesting findings that old coal-burning places like Lombardi’s, Totonno’s, John’s, those are the most popular. 
Jared Lander: 34:32
Now speaking of causal inference, that could be because those are the historic places that the tourists take you to, or it could be that they’re the best. It could be the most… There’s many different reasons, but that’s a proxy we could use for the most popular. And as far as my pizza poll data, we could go through and find it’s ranking of essentially one through five scale. So it could go there and count up the numbers of fives, numbers of fours. Then you got to figure out how you weigh the distribution though.
 
Jon Krohn: 34:58
All right, well, so I guess people can go to bit.ly/pizzapoll and look at the data and maybe draw some conclusions for themselves. You can feel free to reach out to Jared or me by our social media profiles and let us know what you find. That’d be really interesting. So Jared, on that note, how can people stay up to date on what you’re doing on your thoughts? Are you active on social media? 
Jared Lander: 35:27
I am semi-active on Twitter. I tweet mostly about statistics or data, sometimes about pizza. That’s the majority of what makes up my life right about now. So I’m on Twitter, I’m a member of other social media sites like LinkedIn, but I don’t really post much, probably one of the best way to find about things I’m talking about is @jaredlander.com. I try to post, I need to post more often than I do, but I do post things up there. Recently, I posted a three-part series about how to collect and analyze temperature data from all the different rooms in the house. I did another post actually a lot of people liked about how to calculate the cost of insurance plans, where they should go for a lower plan or higher plan. There’s actually one of the more complicated mathematical analysis I’ve done, which speaks to how hard it is to figure out the insurance plans. I had to do a thorough analysis using simulations to figure this thing out. 
Jon Krohn: 36:16
Interesting. That does sound like a good reason to visit your website. Brilliant. So final question for you is a book recommendation. 
Jared Lander: 36:27
I’ve had so many great books in my hands lately. Two really come to mind, Generalized additive models because GAMs are almost magical types of models and Convex optimization. This book Convex optimization is explained so well, the mathematical concepts, and they’re explained so nicely, there’s actually a whole suite of software to go with it. There’s CVX, there’s CVXR, CVXPY, and there’s multiple languages you could use as Convex optimization, which lets you do anything they talk about in the book. So it’s those two books, Convex optimization and Generalized additive models have been really great for me. 
Jon Krohn: 37:02
Nice. That’s really cool. Can you let us know in 30 seconds what a GAM is? Why is it so special and so magical? 
Jared Lander: 37:09
A GAM allows you to fit a curvy linear model. It’s essentially a flexible linear model which comes up with a flexible, technically wiggly is a technical term. It’s a wiggly curve that captures the nuances in the data, but still gives you confidence intervals. So it gives you the power of a lot of these machine learning models like XG boost, but with the interpretability of a linear model. 
Jon Krohn: 37:31
Nice. That is a great explanation. I don’t know enough about GAM. It sounds like I need to be checking out this GAMs book and Convex optimization sounds cool too. All right, Jared, it’s been awesome having you on the show. We got to learn a lot about various ways that we can be staying up to date on the latest in open statistical programming, especially R through your meetup, through the R conference, through your book R for Everyone. And yeah, if people are looking for either help with stats, math projects, data projects, they know that they can come to you at Lander Analytics and if they want to be solving those problems, they know that they can apply for positions on your website. So really an amazing episode covering tons of different ways of staying up to date on the latest and greatest in the field, as well as some wonderful practical advice for us on programming languages and the skills that are looked for in data science consultants. Really appreciate it, Jared. Thank you so much for being on the show. 
Jared Lander: 38:37
Thank you so much for having me. It’s always great chatting with you. 
Jon Krohn: 38:39
Now Jared is an absolute linchpin of the data science community in New York, but also more broadly in the R community around the world. I greatly appreciate the work he’s doing for these communities and really enjoyed hearing about how they evolve to where they are today. I’m personally grateful for how the communities he’s fostered have facilitated my own professional growth and the breadth of the impact I can have with my lectures and the content I publish. It’s hard to imagine where I’d be without Jared and all of the community that he’s put together. In today’s episode, Jared detailed the open statistical programming meetup, the New York R conference and his book R for Everyone. He talked about the diversity of industries and applications of data science consulting work. In his case, across metallurgy, pharmaceutical manufacturer, insurance and Minnesota Vikings draft picks. He talked about how data science consultants need to be able to communicate stats effectively to clients with backgrounds of any ilk, including Minnesota Vikings linebackers. They also need to be expert in at least one programming language like R, Python, Go, SQL or C++, and then demonstrate solid professional best practices like software version control, defensive programming, and reliable code. 
Jon Krohn: 40:04
And at the end, Jared highlighted that R or Python today are interchangeable for almost all use cases. But R may appeal more to you if you think like a statistician while Python more so if you think like a software engineer. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show and the URLs for Jared’s Twitter profile and website, as well as my own social media profiles at www.superdatascience.com/501. That’s www.superdatascience.com/501. 
Jon Krohn: 40:38
If you enjoyed this episode, I of course greatly appreciate it if you left a review on your favorite podcasting app or on the super data science YouTube channel, where we have the video version of this episode. To let me know your thoughts on this episode directly, please do feel welcome to add me on LinkedIn or Twitter, and then tag me in a post about it. Your feedback is huge for deciding what topics we should be covering. All right, thanks to Ivana, Jaime, Mario and JP on the SuperDataScience team for managing and producing another fun episode for us today. Keep on rocking it out there folks. And I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 
Show All

Share on

Related Podcasts